Decisions

ADR-090: World Agent positioning and the world-model evolution workflow

Context

ADR-081 established the World Agent’s expanded role under the in-band decision-support shift: code generation, rule-candidate proposal, customer-mode chat surface. The companion Codex page evolution-loop.mdx inherits significant pre-pivot framing — continuous-flow signal accumulation, per-candidate atomic review/approval, supervisor-style aggregation queue — that no longer fits the post-shift product workflow.

The actual product workflow under the in-band shift has two distinct phases:

Phase 1 — Initial onboarding (episodic, batch):

Operator brings a new customer into a domain. Primary mechanism: gather source material (web resources, statute, regulatory guidance, scholarly publication, customer-supplied documentation) describing the problem space.
World Agent ingests source material, anchors provenance, and extracts proposed rules from the body. Web-research capability fills observed gaps in operator-provided sources (bounded, domain-relevant expansion).
Operator and World Agent iterate over the generated rule SET in a collaborative review session — the unit of review is the set, not the candidate.
Operator publishes a world-model version when the rule set is ready.

Phase 2 — Post-deployment iteration (continuous, feedback-driven): 5. Customer-side operators review historical decisions made by the deployed modules. Structured feedback (annotations, rule-level commentary, outcome reports, flag-for-review) flows back. 6. World Agent processes accumulated feedback into proposed rule changes (revisions, additions, retirements), which the operator reviews against the current world model and incorporates into a future publication.

The pre-pivot evolution loop was a continuous single-candidate flow driven by the supervisor (retired per ADR-074) — fundamentally different shape from the above. This ADR captures the post-shift positioning so subsequent Codex rewrites, Linear epics, and implementation work align to the actual workflow rather than carrying pre-pivot framing forward by accident.

The surface of Phase 1 authoring is conversational: the operator works with the single World Agent (ADR-101) — creating a world, then chatting with it and supplying web links + document attachments — and the agent consumes, evaluates, and proposes the rules through that tool-using session. This is the primary alpha authoring flow. The two-phase shape holds; D1 and D3 state it concretely — the distillation / web-research / code-gen / proposal capabilities are the tools the chat agent drives, not a fire-and-forget batch run with review bolted on after. The per-node executor is LangChain (with_structured_output + tool-calling) per ADR-102; DSPy is deferred.

This is a doctrine ADR — it sets workflow shape, role boundaries, and substrate expectations. Concrete implementation details (specific tool APIs, schema shapes, UI affordances) follow in downstream epics and refinement sessions.

Decision

D1 — Batch source-material ingestion as the primary authoring path

Initial onboarding for a new customer domain runs through source-material distillation, not blank-page operator authoring. The operator gathers source documentation that describes the domain’s problem space — web resources, statute, regulatory guidance, scholarly publication, internal customer documentation — and ingests it into the distillation pipeline.

The World Agent:

Stores each ingested document as a provenance source (durable record of the source-of-truth artifacts the rule set derives from)
Parses the corpus to extract proposed rules covering the observable domain shape
Anchors provenance per proposed rule back to specific source-material excerpts
Assigns the authoritative-source tier (typically distilled for direct source extraction; authoritative when extracting from a recognized domain authority within the corpus) from the five-value taxonomy authoritative > curated > distilled > researched > observed per ADR-100 D2 and the world-model.mdx authoritative-source dimension

This supersedes the prior framing in evolution-loop.mdx where “operator authoring” (blank-page rule drafting) was the primary path and distillation was a parallel signal source. Under D1, distillation IS authoring — the operator’s act of selecting and ingesting source material is the authoring intent; the World Agent operationalizes it.

The surface for this is a conversational, tool-using session: the operator creates a world, chats with the World Agent, and supplies source material as web links + document attachments in the conversation; the agent ingests, evaluates, fills gaps, generates predicates + tests, and proposes rules as tools it drives within the session. Source-material distillation is the authoring substance; the session makes it operator-conversational and incremental, not a fire-and-forget batch run with review bolted on after. Blank-page drafting remains available within the same session when source material is sparse. Ingestion-driven conversational authoring is the primary operator flow.

D2 — World Agent web-research capability for gap-filling

Operator-provided source material has reach limits. To produce a usable rule set for the domain, the World Agent supplements with bounded online research, shaped by observation of the ingested corpus.

The capability:

Gap analysis — World Agent identifies areas where the ingested corpus has weak or no coverage (concepts referenced but not defined, regulatory framings implied but not provided, scope edges the source material doesn’t address)
Query construction — Research queries derive from gap analysis, shaped by domain vocabulary established in operator-provided sources to stay relevant
Source evaluation — Web-sourced material is evaluated for authority + relevance + recency before being adopted as rule provenance
Provenance tier — Web-sourced rules carry provenance distinguishable from operator-provided distillation. They are assigned the researched tier, ranked below distilled and above observed because the source selection is machine-mediated (ADR-100 D3 resolved this as a new researched tier rather than an extended-distilled sub-classification).

Bounding constraints:

Research scope is tied to the domain being authored — not unbounded web traversal
Source quality criteria (authority, recency, relevance) filter what is adopted as provenance
Operator review at D3 surfaces all web-sourced rules with clear provenance markers; operator can accept, modify, or reject

This is a substantive expansion of the World Agent’s tool surface beyond what ADR-081 specifies. Web-search tool capability is added to the World Agent’s authoring-time toolset.

D3 — Collaborative rule-set review (not atomic per-candidate review)

Review is not a separate phase after a batch run — it is part of the same tool-using conversational session as ingestion + generation (D1). The operator and the World Agent move fluidly between supplying sources, generating/refining rules, and reviewing the set; the affordances below operate within that one session.

The unit of review is the rule set, not the individual candidate. Operator and World Agent iterate together over the generated set:

World Agent surfaces the full proposed rule set with: rule text, generated predicate code, inline test proposals, provenance citations, implementation-readiness gate evidence per rule
Operator works through the set in operator-judged order — by theme, by provenance tier, by perceived risk, etc. — not in a fixed queue
Within the review session: per-rule actions remain (accept, modify, reject, request-revision-with-rationale, request more research) as UI affordances; operator can also bulk-act over groups of related rules
Iteration is collaborative: operator can ask the World Agent to revise rules, explore gaps further, refine generated predicates; World Agent surfaces evidence, generates alternatives, and accepts operator guidance

Implication for SPEC-371’s per-candidate use case handlers (ApproveCandidate, RejectCandidate, RequestRevision): they remain valid as per-rule operations invoked from within a review session, but the operator does not navigate a “queue” one candidate at a time. The session is the operator’s review of the set; the use case handlers serve actions within the session.

The two gates from the prior evolution-loop.mdx framing carry forward but reframe:

Implementation-readiness gate — the World Agent’s automated check on its own outputs (AST safety per ADR-083 D2, semantic-equivalence between natural-language form and generated predicate, inline-test pass). Failures surface to the review session as the operator works through the set.
Conformity gate — operator-mediated within the review session. The conformity check is the operator’s judgment about whether the rule belongs in this world model. Evidence (authoritative-source verification, contradiction with existing rules, T1 conflicts) is surfaced by the World Agent; the operator decides.

D4 — Episodic version publication as operator-controlled cadence

A world-model version publishes when the operator decides the rule set is ready. Publication is the operator’s deliberate action — not auto-triggered by quantity of approved rules, time since last publication, or any internal signal.

Within a publication cycle:

Multiple rules may be added, modified, or retired across one or more collaborative review sessions
Individual rule actions (accept, reject, revise) within sessions do NOT broadcast separate events — aligned with the audit’s Q1 removal of worldmodel.observation.enshrined and worldmodel.version.published (the latter for a different reason — no state-change consumer per ADR-030)
The publication action commits the version atomically (per SPEC-371’s PublishWorldModelVersion use case handler): WorldModel version + EvaluationAuthorityRef + WorldModelCard + outbox row for worlds.world_model_card.published (the only event broadcast at version publication; consumed by platform’s System Card projection)
WorldModelCard captures the rule set + provenance + methodology disclosure per ADR-082 D2

Between publications, a working set of the world model holds the in-progress rule changes (additions, modifications, retirements). Operator can re-open a published version’s working set, iterate on it, then publish a new version. Working-set persistence is part of the worlds-side state.

The episodic version cadence makes the world model behave like a versioned authored artifact (similar to how a software library publishes versioned releases), not a continuous-stream evolving entity. Each version is the unit of authority per ADR-026; consumers (decision modules, customer audit chain, System Cards) pin to versions and don’t see in-progress changes.

D5 — Customer-operator feedback against historical decisions as the primary ongoing-improvement driver

After deployment, the world model evolves through structured customer-operator feedback on historical decisions. The customer-side operator reviews decisions the deployed modules have made, provides structured feedback, and that feedback flows back as input to the next world-model authoring cycle.

Feedback shapes (illustrative; exact set per implementation):

Decision annotation — operator marks a specific decision with a note (“this should have been Y instead of X because Z”)
Rule-level commentary — operator flags rule R as too restrictive or too permissive in case category C
Outcome report — operator records what actually happened after the decision (the agent did Y; the customer accepted Z; the regulator pushed back; etc.)
Flag-for-review — operator marks a decision for later review (lightweight signal; the existing override-pattern signal mechanism)

The World Agent processes accumulated feedback to propose rule changes for the next publication cycle:

Revisions to existing rules (tighten, loosen, refine, restate)
Additions (gaps that feedback reveals)
Retirements (rules that consistently don’t fit observed cases)

These proposals flow into a review session (per D3) as candidate changes to the working set. The Spectral-internal operator (or potentially the customer’s authority figure with rule-authoring rights) reviews them collaboratively with the World Agent and decides what to publish in the next version.

Spectral does not auto-publish rule changes from feedback. The operator remains the enshrinement authority per ADR-080 D5 (Spectral-as-trusted-operator). Feedback is input, not commitment.

This is the post-deployment iteration loop — distinct from D1’s initial onboarding, but consuming the same D3 collaborative-review and D4 episodic-publication machinery.

D6 — Platform substrate for D5: override-pattern signal aggregation extended for richer feedback shapes

The platform-side foundation for D5 builds on the existing override-pattern signal aggregation pattern documented in event-system.mdx:

OnDecisionRecordedSignalHandler consumes DecisionRecordedEvent when the decision was flagged or marked noteworthy
Routes into the override-pattern aggregation pipeline
Emits override_pattern_signal.aggregated when cluster crosses the operator-triage threshold

This substrate is currently scoped to flag-for-review aggregation. D5 requires extension:

Capture richer feedback shapes at the customer-dashboard level: annotations, rule-level commentary, outcome reports (in addition to flag-for-review)
Aggregate by feedback type as well as by decision pattern
Surface to World Agent as structured feedback inputs, not just opaque “cluster crossed threshold”

The platform substrate stays in spectral.platform; the World Agent consumer (in spectral.worlds.application.evolution_loop per the existing OnOverridePatternAggregatedHandler) extends to handle the richer feedback shapes.

Implementation may choose to:

Reuse override_pattern_signal.aggregated with extended payload (additive per ADR-044 D11)
Emit additional event types per feedback shape
Internal architectural choice; the doctrinal commitment is that the platform substrate carries customer-operator feedback to the World Agent in a structured way

Scope at alpha — single-deployment. The aggregation correlates one deployment’s own feedback. Aggregating across customers who share a market world (so a recurring pattern observed across many deployments drives one rule-change proposal) presupposes the N:1 (or composed/derived-worlds) relaxation of the domain↔world link reserved in ADR-098 D5 — under the 1:1 link in force today a world’s signals are exactly that one deployment’s. Cross-customer aggregation is therefore post-alpha and gated on the ADR-098 cardinality relaxation; the alpha loop is single-deployment. The (world_id, pattern) keying is forward-compatible: when the relaxation lands, the same substrate correlates many deployments into one cluster with no schema or handler change.

Alternatives considered

Continue with the current Codex evolution-loop.mdx framing. Rejected. The framing carries pre-pivot supervisor-driven continuous-flow assumptions (per-candidate queue, parallel signal sources, automated aggregation) that don’t match the post-shift workflow. Allowing it to stand would force every downstream epic to either inherit the misfit or invent its own corrections in isolation.

Fold this ADR into ADR-081. Rejected. ADR-081 captures session-boundary + caller-mode decisions; this ADR captures workflow shape. They are separate concerns, and combining them would muddy what each ADR is responsible for. ADR-081 cross-references this ADR for the workflow-shape and web-research additions.

Split into two ADRs — onboarding workflow (D1–D4) + ongoing-improvement workflow (D5–D6). Rejected. The two phases share the D3 review machinery + D4 publication cadence; splitting would force each ADR to re-establish those decisions independently. The workflow phases are coherent enough to land together.

Defer the web-research capability (D2) to post-alpha. Rejected. Operator-provided source material has reach limits in real onboarding scenarios; producing usable rule sets at alpha depends on the World Agent being able to fill gaps. Treating web research as post-alpha would force alpha onboardings to either accept thin rule sets or fall back to extensive blank-page authoring, neither of which fits the product story.

Treat per-rule review (the prior ApproveCandidate / RejectCandidate / RequestRevision framing) as the canonical operator surface, with the collaborative session as a UX wrapper. Rejected. The session is the operator’s actual review unit; per-rule actions are affordances within it. Inverting the framing would preserve pre-pivot atomicity assumptions and constrain UI design downstream.

Define an explicit customer-side authority figure with rule-authoring rights (rather than leaving D5’s “Spectral-internal operator or potentially the customer’s authority figure” undefined). Rejected at this ADR’s scope. The trust posture per ADR-080 D5 keeps Spectral as the trusted operator at alpha; customer-side authority is a post-alpha consideration that follows its own ADR when the customer-trust expansion is concrete.

Consequences

Codex rewrites required. evolution-loop.mdx needs substantial restructuring to reflect the two-phase model (onboarding-batch + post-deployment-feedback) instead of the continuous-flow signal-sources framing. world-agent.mdx needs updates: web-research capability added to the role list; distillation-as-authoring acknowledged; collaborative-review framing in place of per-candidate iteration. customer-dashboard.mdx needs the richer feedback-shape surfaces (annotation, rule-level commentary, outcome report) documented as customer-side affordances.
Linear scope re-derives. SPEC-491 (code-gen) and SPEC-492 (restatement) survive with framing adjustments. SPEC-371’s ApproveCandidate / RejectCandidate / RequestRevision handlers flex from queue-driving operations into session-internal affordances. The audit’s new-epics cluster reshapes: drop the “Evolution Loop Core” framing in favor of more targeted epics — distillation-as-authoring pipeline (Phase 1 D1), web-research capability (D2), customer-operator feedback surfaces in Customer Dashboard + extended override-pattern aggregation (D5 + D6). The override-pattern + RuleHealthService epics that were Alpha Deferred in the audit’s plan stay deferred but their scopes clarify against D5 / D6.
ADR-081 cross-references this ADR for the workflow-shape + web-research-capability additions; the session-boundary + caller-mode decisions in ADR-081 stand unchanged.
Eval-of-agents stays separate. Per project_eval_system_separate_from_agents memory, the eval system is parallel infrastructure not a World Agent function. D5’s customer-operator feedback is not an eval mechanism — it is a structured input to the World Agent’s rule-revision proposals. Eval system remains its own substrate (Alpha Deferred per the audit’s plan).
Trust posture preserved. ADR-080 D5’s Spectral-as-trusted-operator framing stands; D5’s feedback-to-proposals path runs through the operator-controlled review session per D3 (no auto-publication).
Publication transaction unchanged. ADR-026 version-as-unit-of-authority + SPEC-371’s atomic publication transaction (WorldModel + EvaluationAuthorityRef + WorldModelCard + outbox row for worlds.world_model_card.published) carry forward verbatim. D4 ratifies the operator-controlled cadence; the publication mechanics are unaffected.
Web-search tool capability needs implementation scope. Specific web-search tool API and source-quality evaluation logic are downstream implementation concerns. The provenance-tier question D2 left open is closed: web-sourced rules take the new researched tier per ADR-100 D3. The doctrinal commitment in D2 is that the capability exists; the API surface is a refinement-time decision.
worldmodel.observation.enshrined removal aligned. The audit’s Q1 disposition removed this event for lack of a documented consumer; D4’s framing of in-session rule actions as not-separately-broadcast retroactively explains why no consumer was justified.
The authoring surface is a conversational, tool-using session. D1/D3’s surface is the operator chatting with the single World Agent (ADR-101) and supplying links/documents; the agent drives ingest → distill → research → code-gen → propose as tools. The per-node executor is LangChain per ADR-102 (DSPy deferred); the chat is a tool-using LangGraph agent built under Stream W (SPEC-556 family — SPEC-558/559/560). The two-phase workflow, the gates, and the publication cadence hold.

References

ADR-026 — WorldModel version as unit of authority
ADR-030 — Authority version as metadata only across the context boundary
ADR-044 D11 — additive payload extension for inter-context events
ADR-074 — scan pipeline retirement (supervisor + continuous-signal-flow framing retired)
ADR-080 D5 — Spectral-as-trusted-operator trust posture
ADR-081 — World Agent role + session boundary (extended by this ADR for workflow shape + web research)
ADR-082 D2 — WorldModelCard methodology disclosure
ADR-083 — Decision-module execution sandbox (AST safety constraints on generated code)
ADR-100 — rule-provenance model reconciliation; D2/D3 close D2’s open web-research tier choice (new researched tier) and settle the five-value authoritative-source taxonomy this ADR’s D1 assigns from
ADR-101 — one-agent topology; the single World Agent serves operators directly through the conversational authoring surface (D1/D3)
ADR-102 — LLM stack: LangGraph + LangChain with_structured_output + tool-calling is the per-node executor for the authoring tools (DSPy deferred)
project_eval_system_separate_from_agents — eval-of-agents is separate infrastructure, not a World Agent function
feedback_carries_forward_vs_resemblance — distinguishing carry-forward from retired-with-resemblance (applied to the evolution-loop framing reshape)
Codex — system-design/world-model-system/evolution-loop.mdx (substantial rewrite required)
Codex — system-design/world-model-system/world-agent.mdx (substantial update required)
Codex — system-design/platform/customer-dashboard.mdx (feedback-shape surfaces to document)

Previous
ADR-089: Action discoverability endpoint — OpenAPI 3.x, per-`(org, domain)`, with version pinning Next
ADR-091: API versioning — date-based, key-pinned, additive-only