Agents

Agent Architecture

Spectral ships three distinct agents at 0.3.0. Each has a different audience, a different authority surface, and a different architectural home. Keeping their roles separate — and their state isolated — is what keeps the world-model authority credible, the optimization pipeline responsive, and the operator surface trustworthy.

The three agents at a glance

	Spectral Agent	World Agent	Operations Agent
Audience	Customers	Operations team (internal)	Operations team (internal)
Shape of interaction	Conversational: explains scan verdicts, advises on frameworks, troubleshoots failures, guides onboarding	Read-oriented: ask what it knows about the world model, where coverage is thin, what its signal stream has surfaced	Task-oriented: curate this rule, promote this candidate, trigger this distillation, drive a publication
Has write authority?	Yes — on the customer-facing tool surface (scan analysis, framework guidance), customer-approved at every step	No — proposes rule candidates through the governed Evolution Loop; never mutates the world model directly	Yes — on the Operations app tool surface (authoring, distillation, publication, observability)
Scope of authority	One workspace at a time; customer-keyed	The world model it resides in	Operations-app state (curation queues, publication pipeline, observability surfaces)
Where its code lives	`spectral.platform` — supervisor + 4 specialists (LangGraph)	`spectral.worlds` — one resident agent per world model	`apps/operations` — operator seat (TanStack Start UI + LangGraph runtime)
Where its runtime lives	`apps/workers` (per ADR-060)	`apps/workers`	`apps/workers`
Memory keying	Workspace-scoped (per-user conversation isolation lives on the conversation table, not on the memory tiers)	World-model-scoped (one memory store per world model)	Operator-scoped + per-task scope

The three agents do not share state, do not share memory, and do not share tools. Where they touch the same operator seat (World Agent and Operations Agent are both reachable from the Operations app), the operator is the integration point — there is no hidden routing layer between agents. The World Agent never speaks to a customer; the Spectral Agent never reads world- model authoring state; the Operations Agent never owns world-model authority. Cross-agent information flow is shaped by the event substrate (Event System) and by apps/workers as the framework-layer composition seam — not by direct agent-to-agent calls.

For the doctrine on calls and notifications between contexts, see Contract Surfaces and ADR-065.

Spectral Agent

The Spectral Agent is the customer’s conversational interface to the optimization platform. It analyzes scan results, troubleshoots failures, advises on evaluation frameworks, and guides new customers through onboarding.

For the decision rationale and alternatives, see ADR-007.

Supervisor + specialist model

Built on LangGraph with the Deep Agents pattern. A single supervisor receives all customer messages and delegates to specialist subagents based on intent:

Customer message
  → Supervisor (routes by intent)
    → Specialist subagent (focused tools + system prompt)
      → Tool calls (closed-over repository access)
    ← Specialist findings
  ← Supervisor synthesizes customer-facing response

The supervisor carries no tools itself. It delegates to at most two specialists per turn, then synthesizes their findings into a single response.

Specialist subagents

Specialist	When used	Tools
scan-analyst	Scan outcomes, scores, verdicts, changesets, failure patterns	`list_recent_scans`, `get_scan_detail`, `get_scan_scores`, `get_changeset_detail`
onboarding-guide	New customers, setup questions, trace ingestion	`get_onboarding_status`, `get_trace_ingestion_guide`
framework-advisor	Evaluation framework questions, rubric tuning, objective function	`get_framework_config`, `get_rubrics`, `get_world_model_context`, `get_system_card`, `get_evaluation_framework_provenance`
troubleshooter	Scan failures, configuration issues, data quality, recurring errors	`diagnose_scan_failure`, `check_workspace_health`, `get_recent_errors`

Each specialist has a focused system prompt. Specialists are conditionally registered: if no tools are provided at construction time, the specialist is omitted from the graph. This supports incremental rollout without code changes.

Task lifecycle

Every interaction follows the same lifecycle, whether initiated by the customer or by an event:

pending → processing → (complete | failed)

Conversation — channel-agnostic message container. Tracks initiated_by (customer, agent, system) and optional trigger_event_id for event-driven conversations.
AgentTask — message from API to worker. The API writes tasks to agent_tasks; the worker subscribes via Supabase Realtime and processes them through the LangGraph graph.
ConversationMessage — a single message with role (user, assistant, system, tool), content, channel_origin, and optional tool_calls.

Event-driven proactive conversations

The agent does not only respond to customer questions. Domain events trigger agent-initiated conversations:

Event	Trigger	Agent behavior
`verdict.issued`	Verdict engine finalizes a scan verdict	Proactive conversation rendering the verdict and next-step recommendations. Fires on every scan regardless of autonomy mode — verdict and `CompositeScore` are always stored and surfaced; in `observe_only` the agent does not propose applying a ChangeSet because none is created.
`approval.required`	Changeset-lifecycle handler raises a ChangeSet for review	Proactive conversation prompting the customer to review the proposed changeset, with explainability + agent performance card attached.
`supervisor.recommendation.issued`	Supervisor surfaces a directional recommendation	Creates or updates a proactive conversation carrying the recommendation narrative with `mode_classification` (ACTIVE / PLATEAU / FRONTIER / NO_DATA).

Event handlers — OnVerdictIssuedHandler, OnApprovalRequiredHandler, and OnSupervisorRecommendationHandler — create or update a system-initiated Conversation, add a context message summarizing the event, and queue an AgentTask for the graph to process. Analysis flows back to the customer through the notification system. The agent is a consumer of these events only; it does not emit them. (The event names refine the broad event categories sketched in ADR-007 under the contract-surface doctrine of ADR-065.)

Human-in-the-loop approval

Tools are categorized by mutation impact:

Category	Approval required	Examples
`read`	No	List scans, get scores, check health
`suggest`	No	Recommend framework changes, explain verdicts
`mutate`	Yes	Apply changeset, update framework, modify config

Mutation tool calls trigger a LangGraph interrupt. The system creates an AgentApproval record with the proposed action and payload. The customer approves or rejects through the API or Slack interactive buttons, resuming the agent with Command(resume=decision).

AgentApproval is the per-tool-call interrupt mechanism for mid-conversation proposed mutations and is distinct from autonomy-mode changeset approval. Changeset approval (per optimization engine — autonomy governance) is an asynchronous gate on a packaged ChangeSet’s promotion; AgentApproval is a synchronous LangGraph-interrupt() checkpoint inside a single conversation turn. The two mechanisms share UX intent (consent before action) but have separate code paths and lifecycles — conflating them is a doctrinal error.

Customers can pre-approve specific action types via WorkspaceAgentAuthorization, bypassing the interrupt for that operation. The pre-authorization surface is a UX parallel to autonomy-mode auto-acceptance — both let trusted action types skip an explicit consent step — but the underlying mechanisms remain distinct.

Tool architecture

Tools use a closed-over dependency injection pattern. Factory functions accept repository protocols and return plain callables:

def make_scan_analyst_tools(
    *,
    scan_repo: ScanRepository,
    eval_result_repo: EvalResultRepository,
    changeset_repo: ChangeSetRepo,
    failure_cluster_repo: FailureClusterRepository,
) -> list[Callable[..., str]]:
    return [
        _make_list_scans(scan_repo),
        _make_get_scan_detail(scan_repo, failure_cluster_repo),
        _make_get_scan_scores(eval_result_repo),
        _make_get_changeset_detail(changeset_repo),
    ]

The tool factory lives in the application layer; LangGraph graph construction lives in infrastructure. Each tool is unit-testable by passing mock repositories to the factory — no LangGraph, database, or LLM required.

Memory

The Spectral Agent uses the universal three-tier memory schema (interaction / session / persistent) per ADR-058 D1, parameterized for the Spectral Agent’s scan-domain anchors (per-cycle interaction, per-run session, per-workspace persistent). ADR-058 supersedes the earlier ADR-018 Spectral-specific Cycle / Run / Workspace vocabulary; the three durability tiers are universal across all three Spectral agents and the per-agent parameterization lives in agent memory primitives.

The Spectral Agent does not reach into world-model memory, and the World Agent does not reach into Spectral memory. The only information flow between worlds and platform is the event-driven signal path per ADR-017.

Two distinct memory-write paths feed the Spectral Agent’s interaction-tier (T1) memory, both through the spectral_agent_memory gateway. The gateway is the will-be repository surface owned by the Spectral Agent’s memory implementation epic per ADR-058 D15; the protocol-level discipline below is the contract regardless of when the gateway lands.

Scan-pipeline event handler writes. The OnScanCompletedHandler writes T1 entries summarizing scan outcomes (verdict, CompositeScore, ChangeSet shape) — these describe what the scan produced. The OnScanCompletedFeedbackHandler similarly routes feedback signals into T1 / T2 alongside its workspace-scoped FeedbackSignal records.
Agent runtime per-tool-call writes. The observed_tool decorator captures ToolCallMetadata at every tool invocation during conversation; the tool body’s reasoning and per-call workflow meta-state route through the gateway into T1 — these describe what the agent did mid-conversation.

Compounding (interaction → session → persistent) lives in the gateway and runs at cycle-end and run-end scan-event boundaries; the persistent tier is reasoning-shaped (workshop discipline per agent memory primitives), not a cache of canonical content.

Channels & notifications

Conversations are channel-agnostic. ConversationChannelBinding maps a conversation to a channel-specific reference (Slack thread timestamp, web session ID, etc.). Currently implemented adapters:

Adapter	Location	Delivery
In-app	`infrastructure/agent/channels/in_app_adapter.py`	Supabase Realtime
Slack	`infrastructure/agent/channels/slack_adapter.py`	Slack Web API + Events API; threads map to conversations
Email	`infrastructure/agent/channels/email_adapter.py`	Pluggable `EmailSender`, digest aggregation

A single conversation can span multiple channels — start in Slack, continue in the web dashboard. The NotificationService orchestrates multi-channel delivery: persists in-app baseline, resolves per-workspace preferences, dispatches to each configured adapter, logs individual failures without blocking other channels.

State management

LangGraph manages conversation state through its built-in checkpointing system:

Checkpointer: LangGraphCheckpointer wraps AsyncPostgresSaver
Schema isolation: Checkpoint tables live in a dedicated langgraph PostgreSQL schema (framework-owned; AsyncPostgresSaver.setup() provisions; the Supabase CLI migration pipeline does not touch it). Per ADR-043 D7.
Thread mapping: LangGraph thread_id maps directly to conversation_id
Repository gateway: All AsyncPostgresSaver calls flow through the single spectral.platform.agent.CheckpointerGateway per ADR-043 D8.
Same-transaction participation (ADR-043 D9): AsyncPostgresSaver.aput runs on the request-scope connection from spectral_platform.db.request_scope per connection pooling — avoiding torn-write risk between business ops and checkpoint writes.
Lifecycle: Checkpointer initialized once at worker startup; the compiled graph is reused across requests.
Encryption posture: Checkpointer payloads are encrypted at rest via Supabase storage encryption; key rotation, KMS posture, and recovery drill procedures live in docs/runbooks/checkpointer-encryption.md.

The orchestrator is stateless per-request; all conversation state is managed by the checkpointer.

Runtime placement (workers)

All three Spectral agent runtimes (Spectral / Ops / World) live in apps/workers per ADR-060 D-runtime. apps/api is thin — auth + AgentTask dispatch via outbox + SSE streaming proxy.

Streaming pattern. Workers consumes AgentTask events, runs the LangGraph orchestrator, streams output via Supabase Realtime channel keyed by conversation_id. apps/api proxies the Realtime channel as SSE to the client. Two hops (workers → Realtime → API → SSE); latency is negligible vs LLM token latency.

AgentTask dispatch via outbox (per ADR-044 D12). platform.agent_tasks carries business state (status, HITL approval linkage, conversation_id, result back-reference, retention). Dispatch flows through core.outbox with event_type='agent.task.dispatched'. Workers listens on the channel, reads the payload, pulls the agent_tasks row, executes. Approval interrupts use LangGraph interrupt() to suspend the run; the checkpointer persists state; an operator response (HTTP into apps/api) resumes via Command(resume=...).

Framework-layer composition seam. Workers IS the framework-layer composition seam where tool dependencies wire via DI per Contract Surfaces. Agent code never imports another context; the workers entrypoint imports both worlds and platform at startup and injects implementations into agent tool factories.

Error handling: LLM-mediated, LangGraph circuit breaker

ADR-060 D2 + D3 specifies:

Tool errors flow back to the LLM as tool messages with the four-class ToolError taxonomy (ToolUserError, ToolPolicyError, ToolTransientError, ToolTerminalError) plus a human-readable description.
The LLM decides next action: retry as-is, retry with modified args, surface to operator, abandon.
LangGraph’s orchestrator-level recursion limit (default 25; configurable per agent) caps runaway loops as the circuit breaker.
No agent-layer retry budget. Tool implementations may include single-retry-on-transient-IO as an implementation detail; that is not contract.

See agent tool invocation for the cross-cutting tool contract.

What the Spectral Agent is not

Not a world-model authority. It reads world-model context through worlds’s producer-owned contract surfaces (spectral.worlds.contracts.events.* for notification flow, spectral.worlds.contracts.protocols.* for synchronous calls per ADR-065 D2 + D3) but has no authorship surface over rules.
Not a silent actor. Every mutation either passes through the approval ladder or is explicitly pre-authorised per action type.
Not an Operations Agent. Customer-facing only.

World Agent

Internal resident of each world model. Not customer-facing at any tier; accessible to operators through the Operations app as a read-oriented exploration surface. Full specification lives in the World Model System / World Agent page; the section here covers its role relative to the other two agents.

Role

Each world model has exactly one resident World Agent. The World Agent:

Explores domain coverage against signal-stream observations
Proposes rule candidates through the governed Evolution Loop (never mutates directly)
Surfaces coverage gaps and provenance weakness to operators on demand
Maintains discovery continuity across world model versions through version-spanning memory

Tool surface

The World Agent’s tool surface is scoped to its own world. It has unrestricted read access to the world model’s rule corpus, to the three-source EvalSet corpus (per ADR-022), and to its own three-tier memory. It does not have write access to the rule corpus — rule promotion runs through the Evolution Loop’s governed pipeline, not through the agent.

Memory

Universal three-tier lifecycle (interaction / session / persistent) per ADR-058 D1, with World Agent anchors: agent_interaction_id (interaction), agent_session_id (session), and world_id (persistent — durable across world model versions per ADR-058 D2). Each row also carries a typology enum(episodic, semantic, procedural) discriminator per ADR-058 D3 (transient tiers are episodic; the persistent tier holds semantic + procedural about the world’s domain). Schema and behavioral details in agent memory primitives and world-agent.

Memory contains reasoning, exploration history, and discovery observations; it never contains rule content. Rules live in the world model itself; the World Agent has read access to them rather than a memorised copy. The reference-only invariant (ADR-058 D14) is enforced by the body_text trigram-similarity trigger on worlds.world_agent_memory (ADR-058 D8).

Customer-facing posture

Never. The customer-facing prohibition is absolute. Operators access the World Agent through the Operations app; customers do not see its outputs directly. When world-model reasoning needs to reach a customer, it goes through the WorldModelCard (formal artifact) or through the Spectral Agent (which reads world-model context via worlds’s producer-owned contract surfaces in spectral.worlds.contracts.* per ADR-065).

What the World Agent is not

Not a customer-facing surface
Not an accessor of Spectral memory at any tier
Not a decision-maker in enshrinement — human sign-off governs promotion
Not the Operations Agent — see the boundary table below

Operations Agent

Net-new in 0.3.0. Lives in apps/operations and is the operator’s task-oriented collaborator for world-model authoring, distillation, publication, and observability. Where the World Agent is read-oriented (understand the world), the Operations Agent is write-oriented (act on the world-authoring surface).

Role

World-model authoring. Drafting rule candidates from Authoritative sources, staging them into the Evolution Loop, reviewing promotion queues.
Distillation. Proposing condensations of redundant rules, flagging rules with drifted provenance chains.
Publication. Generating WorldModelCard drafts, reviewing release notes (per ADR-026), coordinating version publication.
Operational observability. Summarising scan-volume trends, convergence-signal health, customer adoption of world-model-grounded vs customer-directed stimuli.

Memory

Operations-app scoped. Tracks the operator’s in-flight tasks, pending reviews, recent signal digests. Independent from both Spectral and World Agent memory; does not mirror world-model rule content (that remains authoritative in the world model itself, accessed by reference).

Customer-facing posture

Never. Operations-only. Every output of the Operations Agent is internal. When operator work produces a customer-facing artifact (a published WorldModelCard, a release note), the artifact goes out through the platform’s normal publication path — not through the agent’s conversational surface.

What the Operations Agent is not

Not the World Agent. Task-oriented vs read-oriented; see the Boundary table below.
Not customer-facing at any tier.
Not a free-form world-model mutator. Every mutation that touches the rule corpus passes through the Evolution Loop’s governed promotion gate.

Boundary & interaction patterns

Operations Agent vs World Agent

Both live at the operator’s seat inside the Operations app. Keeping their roles distinct is what makes the operator interaction coherent rather than schizophrenic.

	Ops Agent	World Agent
Purpose	Perform operational tasks on the Operations-app tool surface	Explore and reflect on the world model
Interaction shape	Task: curate this rule, promote this candidate, trigger this process	Read: what do you know, where are you uncertain, how confident are you
Write authority?	Yes — on the Operations-app surface	No — proposes through the Evolution Loop, never mutates directly
Scope	Operations-app state (curation queues, promotion actions, observability)	The world model it resides in
Memory	Operations-app operator-state	Three-tier world-model memory

The operator using both is the integration point. The Ops Agent does not delegate reasoning to the World Agent; the World Agent does not take tasks from the Ops Agent. Each is summoned by the operator for its own purpose, and the operator stitches the two conversations together when needed.

The exact routing / handoff / context-sharing pattern between operator, Ops Agent, and World Agent is specified at Operations App — Operations Agent / Interaction pattern. The hybrid default (separate surfaces + a read-oriented ask_world_agent tool on the Ops Agent) preserves operator clarity about do vs explore while letting Ops tasks pull in coverage context when needed.

Why two agents and not one with mode-switching

A single richer agent with mode-switching would conflate the do and explore modes the operator actually exercises separately. The cost of two agents (two memory systems, two tool surfaces, operator cognitive load to summon the right one) is real but bounded; the cost of one agent is that the act-vs-reflect boundary becomes a runtime mode the agent must self-regulate, and the world-model authority claim becomes harder to defend (an agent that can mutate the world model will be asked to skip the gate, even if a mode flag normally prevents it). The two-agent split makes the boundary structural rather than procedural — the World Agent has no write authority on the rule corpus by construction, not by convention. That structural guarantee is load-bearing for the same reason the customer-side platform / world-model-system isolation is load-bearing: the standard’s authority survives because the system that builds it cannot reach into the standard.

Spectral Agent ↔ other agents

No shared state, no direct communication.

Spectral Agent reads world-model context only through worlds’s producer-owned contract surfaces (spectral.worlds.contracts.events.* for notifications, spectral.worlds.contracts.protocols.* for synchronous calls per ADR-065 D2 + D3) — and the locally-projected snapshots that materialize from them. Never through world-model internals, never through the World Agent’s memory.
Flow from platform → worlds is event-only, per ADR-017. Events carry observations (platform.failure_cluster.detected, scan.convergence.delta, memory.observation.promoted); they do not carry agent state.
The World Agent consumes signal events as one of its exploration inputs. The Spectral Agent is not in that loop.

What the topology guarantees

Authority isolation. The world model’s authority rests on its rule corpus + evolution governance, not on any agent’s conversational output. If all three agents went down tomorrow, the WorldModelCard would still be authoritative for its published version.
Customer-safety. Customers talk only to the Spectral Agent. Operator reasoning (which can be exploratory, uncertain, and provenance-dependent) never leaks directly to the customer surface.
Operator coherence. An operator always knows which agent they are talking to and why — Ops Agent for do this, World Agent for what do you think about this. The two never bleed.

Code map (Spectral Agent)

Component	Location	Layer
Domain entities	`spectral.platform.domain.agent.models`	Domain
Domain events	`spectral.platform.domain.shared.events`	Domain
Tool factories	`spectral.platform.application.agent.tools`	Application
Notification service	`spectral.platform.application.agent.notifications`	Application
Event handlers	`spectral.platform.application.agent.on_scan_completed` (handler class lives in application; runtime is `apps/workers` per ADR-060)	Application
Authorization service	`spectral.platform.application.agent.authorization`	Application
Email digest service	`spectral.platform.application.agent.email_digest`	Application
LangGraph orchestrator	`spectral.platform.infrastructure.agent.langgraph_orchestrator`	Infrastructure
Checkpointer adapter	`spectral.platform.infrastructure.agent.checkpointer`	Infrastructure
Channel adapters	`spectral.platform.infrastructure.agent.channels`	Infrastructure

World Agent implementation lives in spectral.worlds; Operations Agent runs in apps/workers per ADR-060 (Runtime placement), with the Operations app surface in apps/operations. Each context’s agent-level code is structured the same way (domain / application / infrastructure) but the modules are local to their context.

Next steps

World Model System / World Agent — the full World Agent specification (memory tiers, exploration patterns, operator chat framing)
Operations App — the Operations-app surface that hosts the Operations Agent
Optimization Engine — the scan pipeline the Spectral Agent analyzes
Access Control — roles and scopes that govern agent permissions

Previous
Overview Next
Memory System