Skip to content
GitHub
Decisions

ADR-081: World Agent role expansion and session boundary

Context

Under the in-band decision-support shift, the World Agent’s role expands: it now generates executable predicate code (and applies_when filters) for each rule, generates inline tests, orchestrates the internal eval framework over generated code, and detects patterns in approved operator overrides at decision time to propose new T2 rules. ADR-078 D2 then named the World Agent as the owner of any customer-facing agent surface that the Spectral dashboard exposes — previously the World Agent was operator-only.

This ADR consolidates the expanded role into a single record and pins the session boundary model — the part of the expansion that has structural implications beyond the enumeration of new responsibilities.

Two things this ADR is not: (1) it is not a deployment ADR — World Agent is one code-level agent definition that lives in the workers deployable per ADR-060, and that is unchanged here. (2) It is not the design of customer-facing affordances — see D5 below for the v0 stance on that gap.

Decision

D1 — One agent definition; runs in workers; no separate deployable

The World Agent is a single code-level agent definition (LangGraph + Deep Agents per the surviving ADR-007 substrate). It is hosted in the workers deployable per ADR-060 (framework-layer composition seam at the workers entrypoint). No separate deployable is added for World Agent; no per-world process / pod / image is provisioned. There is one shared agent implementation; per-world distinction is a session-level concern (D2), not a deployment concern. Runtime placement is ratified, with its full rationale and operating conditions, in ADR-109.

D2 — Session boundary: a single (org, domain) world instance

When an authenticated caller starts a session with the World Agent, the session is bounded to a single (org, domain) world instance for its entire lifetime. The world the session operates on is fixed at session start; the session cannot reach across worlds. To work with a different world, the caller starts a new session.

Operationalization on existing substrate (per ADR-007 carry-forward + ADR-043): each World Agent session is a Conversation entity in the conversation-persistence model; the conversation’s metadata carries the (org, domain) world coordinate; LangGraph checkpointer state is keyed by conversation ID and never reads/writes outside the bounded world. Tools the World Agent invokes within a session receive the world coordinate from the session context and enforce it (e.g., a rule-candidate lookup tool filters by (org, domain); a code-gen invocation targets the bounded world model’s context schema; an audit-record query is scoped to the bounded world’s (org, domain) records).

D3 — Two caller modes: operator session and customer session

Two kinds of caller can start a World Agent session:

  • Operator session. Spectral staff authenticated under the operator scope model (per ADR-006 sections 4–5, ADR-039). Operators can start sessions against any world by selecting the (org, domain) they intend to work with. Each session is still bounded to one world; multi-world operator workflows are multiple sessions, not one multi-world session.
  • Customer session. Customer authenticated under org/domain auth (per ADR-006 + ADR-033 + ADR-039). The customer’s session is bounded to their own world(s); auth scope makes other customers’ worlds unreachable.
  • Operator acting on behalf of customer. A subcase of operator session via the RFC 8693 act claim (per ADR-076 D4 + ADR-087): the operator’s signed JWT carries an act claim identifying the customer org being acted for. The session is bounded to that customer’s world; audit records capture both identities (acting principal + on-behalf-of customer).

The session-boundary rule (D2) applies uniformly across all three caller modes. The World Agent does not have a “multi-tenant” mode that crosses worlds within one session.

D4 — Consolidated expanded role

The World Agent’s responsibilities under the shift, consolidated from prior ADRs:

  • Rule authoring assistance (existing role) — co-design of natural-language rule assertions with the operator; remains a core surface.
  • Code generation (new) — produces the executable predicate function and optional applies_when filter for each rule. Subject to the predicate runtime safety contract (ADR-083 D2 — codegen-time AST analysis): AST-level static analysis at generation time rejects dangerous constructs.
  • Inline test generation (new) — produces tests co-located with each predicate, drawn from the rule’s natural-language intent.
  • Internal eval framework orchestration (new) — runs the internal eval framework over generated code across the multi-axis scoring dimensions (predicate correctness, test fidelity, determinism, runtime safety, trace integrity, readability) per ADR-074’s principle-migration tracking for ADR-020 / ADR-075’s framework reimagining.
  • Implementation-readiness gate participation — World Agent’s outputs (codegen, tests, eval pass) feed the five-check implementation-readiness gate; the gate runs alongside the conformity gate at rule enshrinement.
  • Override-pattern detection (new) — over operator-approved overrides at decision time, detects emerging patterns and proposes new T2 rule candidates. The new platform→worlds signal stream per ADR-074’s addendum for ADR-057; replaces the retired failure-cluster signal source.
  • World-state introspection — answers queries about the current world model state, rule corpus, audit lineage, version history (operator-facing surface today; customer-facing surface gap per D5).

The generation responsibilities above (code generation, inline test generation, override-pattern proposal) execute via LangChain with_structured_output + provider-native tool-calling per ADR-102; DSPy is deferred post-alpha. They run as tools the operator-direct World Agent drives within the conversational authoring session per ADR-090. The “internal eval framework orchestration” bullet does not survive: eval-of-agents is separate infrastructure (ADR-090); the World Agent generates code + its own inline tests and runs the readiness-gate smoke check, not a separate eval system.

The “Inline test generation” and “Implementation-readiness gate participation” bullets are governed by the behavioral-completeness publish gate (ADR-107 + ADR-108), which replaces the earlier best-effort, five-check implementation-readiness gate (a count, not a deploy gate):

  • Tests are not co-generated from the predicate. A rule’s behavioral spec is extracted from its NL text independently of the predicate (ADR-108 D1), and the discriminating tests are deterministically materialized from that spec (ADR-108 D2) — co-blind generation (predicate + its own tests) cannot catch an incomplete rule, which is precisely the defect this overturns.
  • The per-rule behavioral suite is an enforced publish-gate artifact, not a readiness count: behavioral completeness hard-fails publication (ADR-108 D5), the suite ships in the content-addressed deploy bundle with a deploy-time backstop (ADR-108 D6), and the predicate’s declared inputs are bound declared == read (ADR-107 D2). No “best-effort, not a deploy gate” posture survives in the corpus.

The World Agent still generates the predicate + drives the spec extraction (its D4 role); what changed is that its output is now gated, not merely scored.

The World Agent’s memory storage and tier vocabulary remain governed by ADR-058. The agent-tool-invocation framework-layer composition pattern (closed-over factories, tool-error taxonomy) remains governed by ADR-060.

D5 — Customer-facing affordances are a v0 prototype gap (fast-follow, not deferred indefinitely)

The v0 console mockups were form-based — Decisions, World model, Actions, and System Card tabs presented structured panels rather than chat affordances; those customer surfaces are now redesigned in docs/design/annotated-screens/, still structured panels with no customer-facing chat. No specific customer-facing World Agent affordances are scoped in v0.

This is recognized as a prototype gap, not as deferred design space. The intent is fast-follow: customer-facing surfaces emerge as concrete affordances on the dashboard (e.g., “explain this decision,” “explore my world model,” “review proposed rule candidates against my world”) and ride the World Agent’s session-bounded surface per D2 + D3. The scope-filtering principle is established (customer scope = own-world only, per D3); the affordance set is what waits for design.

No ADR is required for each affordance as it lands; affordances ship as feature work against the World Agent’s existing session-bounded surface, with auth scope controlling what each affordance can do. An ADR is required only if a customer-facing affordance requires breaking the session boundary, the scope-filtering model, or the audit-chain capture of identity — none of which are anticipated.

Alternatives considered

Per-world World Agent instances (separate runtime per (org, domain) world). Rejected. One shared agent definition with session-level scoping is the standard multi-tenant agent pattern and matches the existing workers-resident composition. Per-world runtimes would multiply deployment surface, complicate scaling, and offer no isolation benefit beyond what session-boundary enforcement at the tool layer already provides.

Multi-tenant World Agent sessions (one session reaches across multiple worlds). Rejected. The session-boundary rule (D2) is a structural property: the session’s identity is the bounded world. Allowing a session to switch worlds would force tools to recompute scope per call, force the audit chain to attribute decisions to ambiguous world coordinates, and remove a clear safety property that authors can rely on.

Defer the session-boundary decision until customer-facing affordances are concretely designed. Rejected. The boundary is load-bearing for tool design, audit capture, and conversation persistence layout — every customer-facing affordance built later will rely on it. Settling it now keeps fast-follow affordance work unblocked.

Treat the v0 customer-facing affordance gap as deferred design space requiring its own ADR per affordance. Rejected. Affordances are feature-shaped, not architecture-shaped. The architectural commitments (session boundary, scope filtering, World Agent ownership) are settled here; affordances ride on top as feature work.

Author a separate ADR for the operator-acting-on-behalf-of-customer model (RFC 8693 act claim handling for World Agent). Rejected. The act-claim mechanism is established by ADR-076 D4 and ADR-087 at the platform-pillar level; D3 here references it without re-specifying. A separate ADR would duplicate that material.

Consequences

  • The World Agent’s expanded responsibilities (D4) are recorded canonically. Future references to “the World Agent generates code” or “the World Agent detects override patterns” point at this ADR rather than reconstructing the role piecemeal.
  • The session-boundary model (D2) is the load-bearing constraint that every tool, every persistence layout, and every audit-record capture must respect. Tool implementations enforce world-coordinate filtering structurally (not by handler discipline).
  • The three caller modes (D3 — operator, customer, operator-acting-on-behalf-of-customer) all use the same session-boundary rule; no caller-mode-specific bypasses exist.
  • The customer-facing affordance gap (D5) is on the fast-follow path. The flag-it-as-gap stance keeps the v0 record honest while reserving the design space for concrete affordance work, not abstract deferral.
  • ADR-058 (World Agent memory) continues to govern the memory-storage shape; ADR-060 (agent tool invocation) continues to govern the framework-layer composition; ADR-007 surviving substrate continues to govern LangGraph + Deep Agents patterns. This ADR slots between them on role + session-boundary specifics.
  • Linear scope: the Spectral Agent epic family that retired per ADR-078 may include subtask work that could re-home onto World Agent (operator-facing introspection tools, conversation-persistence wiring for World Agent sessions). The re-homing is Phase 4 build-plan territory; this ADR makes the destination explicit (World Agent) without scoping the work itself.
  • The override-pattern detection mechanism (D4) is the new platform→worlds signal source per ADR-074’s principle migration for the retired failure-cluster signal. Its specific shape (signal schema, aggregation window, threshold, candidate-proposal format) is downstream design; this ADR names the source agent and the session boundary it operates under.
  • The World Agent’s generated tests are an enforced artifact, not a readiness count. Per ADR-108, tests are materialized from an independent behavioral spec and gate publication + deploy; D4’s inline-test-generation + readiness-gate framing is the earlier, weaker form. The World Agent generates them; the platform enforces them.
  • Operator-direct authoring is the conversational cockpit. Per ADR-090, operators author by chatting with the single World Agent (ADR-101) and supplying web links + document attachments; the agent drives the authoring tools (ingest / distill / research / code-gen / propose). Built under Stream W: the tool-using LangGraph agent (SPEC-558), link/document ingestion (SPEC-559), the authoring orchestrator (SPEC-560), and the cockpit UI (SPEC-370 re-scoped). The session boundary (D2) + caller modes (D3) are unchanged; the per-node executor is LangChain per ADR-102.