Skip to content
GitHub
Agents

Agent Architecture

Spectral ships one production agent: the World Agent (worlds context). It is internal-only at v0; the World Agent gains a customer-facing chat affordance post-release per ADR-081 D5. Conversation persistence is substrate for the post-release World Agent affordance.

Test-agents in apps/test-agents are reference implementations rather than production agents — working agent code used to demonstrate and explore integration patterns. See Test Agents.

  • Audience. Operations team (internal) at v0; customers post-release (read-only chat affordance per ADR-081 D5).
  • Shape of interaction. Generative + reflective: generate predicate code from natural-language rules, propose inline tests, reflect on coverage and provenance.
  • Write authority. None — proposes rule candidates through the governed Evolution Loop; generates predicate code that the operator decides on at the enshrinement gate.
  • Scope of authority. The world model it resides in, scoped to (org_id, domain_id).
  • Where its code lives. spectral.worlds — one resident agent per world model.
  • Where its runtime lives. apps/workers — the World Agent runs in the workers entrypoint of the one Cloudflare app container (ADR-109 D3), giving workload isolation off the synchronous decision path, per ADR-060 and ADR-081 D1.
  • Memory keying. (org_id, domain_id) — one memory store per world model.

Cross-context information flow is shaped by the event substrate (Event System) and by apps/workers as the framework-layer composition seam.

For the doctrine on calls and notifications between contexts, see Contract Surfaces and ADR-065.


The World Agent is the internal resident of each world model. It owns the world-model authoring path: generates predicate code from natural-language rules, proposes inline tests, drafts authoritative-source provenance citations, runs the implementation-readiness gate on its own outputs, and aggregates override-pattern signals into rule-candidate proposals.

Each world model has exactly one resident World Agent. The architectural pattern is instantiated once per (org_id, domain_id) world model, with memory and exploration history scoped to that world. Full specification at World Model System / World Agent.

  • Predicate code generation. From an enshrined-pending rule’s natural-language form, generates the deterministic predicate code that will run inside the deployed action module. Constrained at generation time (AST-level static analysis rejects I/O, mutation, nondeterministic constructs) per ADR-083 D2.
  • applies_when generation. Proposes context-only conditional filters for rules that need conditional activation.
  • Inline test proposal. Co-located tests drawn from the rule’s natural-language intent.
  • Provenance citation drafting. Proposes authoritative-source citations for operator verification at the conformity gate.
  • Override-pattern detection. Aggregates customer-flagged decisions (review-request + noteworthy marks routed from the Customer Dashboard) by pattern across a world model’s accumulated feedback and surfaces emergent patterns as rule-candidate proposals.

Per ADR-081 D3 the World Agent supports two caller modes that share the same agent definition and runtime, distinguished by auth scope filtering at the session boundary:

  • Operator mode — full authoring surface; operators run code generation, propose candidates, review override-pattern aggregations.
  • Customer mode (post-release) — read-only over the customer’s own world; chat surface keyed to (org_id, domain_id). Lands just after v0 ships as the first post-v0 dashboard expansion per ADR-081 D5.

The World Agent has no write authority on the rule corpus. Rule enshrinement requires both gates (implementation-readiness + conformity) and an operator’s explicit sign-off at the reviewer surface. The World Agent proposes; the operator decides.

Universal three-tier lifecycle (interaction / session / persistent) per ADR-058 D1, anchored to (org_id, domain_id). Memory holds reasoning, code-generation patterns, exploration history, and discovery observations — never rule content. The reference-only invariant (ADR-058 D14) is enforced by the body_text trigram-similarity trigger on worlds.world_agent_memory (ADR-058 D8). Full architecture at World Agent — Memory architecture.

  • Not a free-form world-model mutator. Enshrinement runs through both gates + human sign-off.
  • Not customer-facing at v0. The post-release customer chat (ADR-081 D5) is auth-scoped read-only over the customer’s own world.

  • Authority isolation. The world model’s authority rests on its rule corpus + evolution governance + two gates + operator sign-off, not on the agent’s conversational output. If the World Agent went down tomorrow, the World Model Card would still be authoritative for its published version, and every deployed action module would still produce binding decisions deterministically. The World Agent has no write authority on the rule corpus by construction, not by convention — that structural guarantee is load-bearing for the same reason the authoring/decision-host isolation between worlds and platform is load-bearing: the standard’s authority survives because the system that builds it cannot reach into the standard’s executable form, and the system that runs the executable form cannot reach into the authority chain.
  • Customer-safety. Customers receive binding decisions via /decide; operator reasoning (which can be exploratory, uncertain, and provenance-dependent) does not flow to the customer surface. The post-release customer-facing World Agent chat is auth-scoped read-only over the customer’s own world.

apps/test-agents hosts working agent code used to demonstrate and explore integration patterns. Test-agents are reference implementations, not a test harness — they are agent code engineers can run, modify, and learn from. Automated test composition is acceptable but secondary to the demonstration purpose. See Test Agents.


All agent runtimes live in the workers tier (apps/workers entrypoint) per ADR-060 D-runtime and ADR-081 D1. The World Agent runs in the workers entrypoint of the one Cloudflare app container per ADR-109 D1/D3, off the synchronous decision path. The API entrypoint is thin — auth + AgentTask dispatch via outbox + SSE streaming proxy. Strict isolation from the /decide hot path holds; the Supabase event substrate is locked.

Streaming pattern. Workers consumes AgentTask events, runs the LangGraph orchestrator, streams output via Supabase Realtime channel keyed by conversation_id. apps/api proxies the Realtime channel as SSE to the client. Two hops (workers → Realtime → API → SSE); latency is negligible vs LLM token latency.

AgentTask dispatch via outbox (per ADR-044 D12). platform.agent_tasks carries business state (status, HITL approval linkage, conversation_id, result back-reference, retention). Dispatch flows through core.outbox with event_type='agent.task.dispatched'. Workers listens on the channel, reads the payload, pulls the agent_tasks row, executes. Approval interrupts use LangGraph interrupt() to suspend the run; the checkpointer persists state; an operator response (HTTP into apps/api) resumes via Command(resume=...).

Framework-layer composition seam. Workers IS the framework-layer composition seam where tool dependencies wire via DI per Contract Surfaces. Agent code never imports another context; the workers entrypoint imports both worlds and platform at startup and injects implementations into agent tool factories.

Error handling: LLM-mediated, LangGraph circuit breaker

Section titled “Error handling: LLM-mediated, LangGraph circuit breaker”

ADR-060 D2 + D3 specifies:

  • Tool errors flow back to the LLM as tool messages with the four-class ToolError taxonomy (ToolUserError, ToolPolicyError, ToolTransientError, ToolTerminalError) plus a human-readable description.
  • The LLM decides next action: retry as-is, retry with modified args, surface to operator, abandon.
  • LangGraph’s orchestrator-level recursion limit (default 25; configurable per agent) caps runaway loops as the circuit breaker.
  • No agent-layer retry budget. Tool implementations may include single-retry-on-transient-IO as an implementation detail; that is not contract.

See agent tool invocation for the cross-cutting tool contract.

LangGraph manages conversation state through its built-in checkpointing system:

  • Checkpointer: LangGraphCheckpointer wraps AsyncPostgresSaver
  • Schema isolation: Checkpoint tables live in a dedicated langgraph PostgreSQL schema (framework-owned; AsyncPostgresSaver.setup() provisions; the Supabase CLI migration pipeline does not touch it). Per ADR-043 D7.
  • Thread mapping: LangGraph thread_id maps directly to conversation_id
  • Same-transaction participation per ADR-043 D9: AsyncPostgresSaver.aput runs on the request-scope connection from spectral_platform.db.request_scope per connection pooling — avoiding torn-write risk between business ops and checkpoint writes.
  • Lifecycle: Checkpointer initialized once at worker startup; the compiled graph is reused across requests.
  • Encryption posture: Checkpointer payloads are encrypted at rest via Supabase storage encryption; key rotation, key-management posture (Supabase Vault / pgsodium — no external KMS), and recovery drill procedures live in docs/runbooks/checkpointer-encryption.md.

The orchestrator is stateless per-request; all conversation state is managed by the checkpointer.