Skip to content
GitHub
Agents

Agent Architecture

Spectral ships three distinct agents at 0.3.0. Each has a different audience, a different authority surface, and a different architectural home. Keeping their roles separate — and their state isolated — is what keeps the world-model authority credible, the optimization pipeline responsive, and the operator surface trustworthy.

Spectral AgentWorld AgentOperations Agent
AudienceCustomersOperations team (internal)Operations team (internal)
Shape of interactionConversational: explains scan verdicts, advises on frameworks, troubleshoots failures, guides onboardingRead-oriented: ask what it knows about the world model, where coverage is thin, what its signal stream has surfacedTask-oriented: curate this rule, promote this candidate, trigger this distillation, drive a publication
Has write authority?Yes — on the customer-facing tool surface (scan analysis, framework guidance), customer-approved at every stepNo — proposes rule candidates through the governed Evolution Loop; never mutates the world model directlyYes — on the Operations app tool surface (authoring, distillation, publication, observability)
Scope of authorityOne workspace at a time; customer-keyedThe world model it resides inOperations-app state (curation queues, publication pipeline, observability surfaces)
Where its code livesspectral.platform — supervisor + 4 specialists (LangGraph)spectral.worlds — one resident agent per world modelapps/operations — operator seat (TanStack Start UI + LangGraph runtime)
Where its runtime livesapps/workers (per ADR-060)apps/workersapps/workers
Memory keyingWorkspace-scoped (per-user conversation isolation lives on the conversation table, not on the memory tiers)World-model-scoped (one memory store per world model)Operator-scoped + per-task scope

The three agents do not share state, do not share memory, and do not share tools. Where they touch the same operator seat (World Agent and Operations Agent are both reachable from the Operations app), the operator is the integration point — there is no hidden routing layer between agents. The World Agent never speaks to a customer; the Spectral Agent never reads world- model authoring state; the Operations Agent never owns world-model authority. Cross-agent information flow is shaped by the event substrate (Event System) and by apps/workers as the framework-layer composition seam — not by direct agent-to-agent calls.

For the doctrine on calls and notifications between contexts, see Contract Surfaces and ADR-065.


The Spectral Agent is the customer’s conversational interface to the optimization platform. It analyzes scan results, troubleshoots failures, advises on evaluation frameworks, and guides new customers through onboarding.

For the decision rationale and alternatives, see ADR-007.

Built on LangGraph with the Deep Agents pattern. A single supervisor receives all customer messages and delegates to specialist subagents based on intent:

Customer message
→ Supervisor (routes by intent)
→ Specialist subagent (focused tools + system prompt)
→ Tool calls (closed-over repository access)
← Specialist findings
← Supervisor synthesizes customer-facing response

The supervisor carries no tools itself. It delegates to at most two specialists per turn, then synthesizes their findings into a single response.

SpecialistWhen usedTools
scan-analystScan outcomes, scores, verdicts, changesets, failure patternslist_recent_scans, get_scan_detail, get_scan_scores, get_changeset_detail
onboarding-guideNew customers, setup questions, trace ingestionget_onboarding_status, get_trace_ingestion_guide
framework-advisorEvaluation framework questions, rubric tuning, objective functionget_framework_config, get_rubrics, get_world_model_context, get_system_card, get_evaluation_framework_provenance
troubleshooterScan failures, configuration issues, data quality, recurring errorsdiagnose_scan_failure, check_workspace_health, get_recent_errors

Each specialist has a focused system prompt. Specialists are conditionally registered: if no tools are provided at construction time, the specialist is omitted from the graph. This supports incremental rollout without code changes.

Every interaction follows the same lifecycle, whether initiated by the customer or by an event:

pending → processing → (complete | failed)
  • Conversation — channel-agnostic message container. Tracks initiated_by (customer, agent, system) and optional trigger_event_id for event-driven conversations.
  • AgentTask — message from API to worker. The API writes tasks to agent_tasks; the worker subscribes via Supabase Realtime and processes them through the LangGraph graph.
  • ConversationMessage — a single message with role (user, assistant, system, tool), content, channel_origin, and optional tool_calls.

The agent does not only respond to customer questions. Domain events trigger agent-initiated conversations:

EventTriggerAgent behavior
verdict.issuedVerdict engine finalizes a scan verdictProactive conversation rendering the verdict and next-step recommendations. Fires on every scan regardless of autonomy mode — verdict and CompositeScore are always stored and surfaced; in observe_only the agent does not propose applying a ChangeSet because none is created.
approval.requiredChangeset-lifecycle handler raises a ChangeSet for reviewProactive conversation prompting the customer to review the proposed changeset, with explainability + agent performance card attached.
supervisor.recommendation.issuedSupervisor surfaces a directional recommendationCreates or updates a proactive conversation carrying the recommendation narrative with mode_classification (ACTIVE / PLATEAU / FRONTIER / NO_DATA).

Event handlers — OnVerdictIssuedHandler, OnApprovalRequiredHandler, and OnSupervisorRecommendationHandler — create or update a system-initiated Conversation, add a context message summarizing the event, and queue an AgentTask for the graph to process. Analysis flows back to the customer through the notification system. The agent is a consumer of these events only; it does not emit them. (The event names refine the broad event categories sketched in ADR-007 under the contract-surface doctrine of ADR-065.)

Tools are categorized by mutation impact:

CategoryApproval requiredExamples
readNoList scans, get scores, check health
suggestNoRecommend framework changes, explain verdicts
mutateYesApply changeset, update framework, modify config

Mutation tool calls trigger a LangGraph interrupt. The system creates an AgentApproval record with the proposed action and payload. The customer approves or rejects through the API or Slack interactive buttons, resuming the agent with Command(resume=decision).

AgentApproval is the per-tool-call interrupt mechanism for mid-conversation proposed mutations and is distinct from autonomy-mode changeset approval. Changeset approval (per optimization engine — autonomy governance) is an asynchronous gate on a packaged ChangeSet’s promotion; AgentApproval is a synchronous LangGraph-interrupt() checkpoint inside a single conversation turn. The two mechanisms share UX intent (consent before action) but have separate code paths and lifecycles — conflating them is a doctrinal error.

Customers can pre-approve specific action types via WorkspaceAgentAuthorization, bypassing the interrupt for that operation. The pre-authorization surface is a UX parallel to autonomy-mode auto-acceptance — both let trusted action types skip an explicit consent step — but the underlying mechanisms remain distinct.

Tools use a closed-over dependency injection pattern. Factory functions accept repository protocols and return plain callables:

def make_scan_analyst_tools(
*,
scan_repo: ScanRepository,
eval_result_repo: EvalResultRepository,
changeset_repo: ChangeSetRepo,
failure_cluster_repo: FailureClusterRepository,
) -> list[Callable[..., str]]:
return [
_make_list_scans(scan_repo),
_make_get_scan_detail(scan_repo, failure_cluster_repo),
_make_get_scan_scores(eval_result_repo),
_make_get_changeset_detail(changeset_repo),
]

The tool factory lives in the application layer; LangGraph graph construction lives in infrastructure. Each tool is unit-testable by passing mock repositories to the factory — no LangGraph, database, or LLM required.

The Spectral Agent uses the universal three-tier memory schema (interaction / session / persistent) per ADR-058 D1, parameterized for the Spectral Agent’s scan-domain anchors (per-cycle interaction, per-run session, per-workspace persistent). ADR-058 supersedes the earlier ADR-018 Spectral-specific Cycle / Run / Workspace vocabulary; the three durability tiers are universal across all three Spectral agents and the per-agent parameterization lives in agent memory primitives.

The Spectral Agent does not reach into world-model memory, and the World Agent does not reach into Spectral memory. The only information flow between worlds and platform is the event-driven signal path per ADR-017.

Two distinct memory-write paths feed the Spectral Agent’s interaction-tier (T1) memory, both through the spectral_agent_memory gateway. The gateway is the will-be repository surface owned by the Spectral Agent’s memory implementation epic per ADR-058 D15; the protocol-level discipline below is the contract regardless of when the gateway lands.

  1. Scan-pipeline event handler writes. The OnScanCompletedHandler writes T1 entries summarizing scan outcomes (verdict, CompositeScore, ChangeSet shape) — these describe what the scan produced. The OnScanCompletedFeedbackHandler similarly routes feedback signals into T1 / T2 alongside its workspace-scoped FeedbackSignal records.
  2. Agent runtime per-tool-call writes. The observed_tool decorator captures ToolCallMetadata at every tool invocation during conversation; the tool body’s reasoning and per-call workflow meta-state route through the gateway into T1 — these describe what the agent did mid-conversation.

Compounding (interaction → session → persistent) lives in the gateway and runs at cycle-end and run-end scan-event boundaries; the persistent tier is reasoning-shaped (workshop discipline per agent memory primitives), not a cache of canonical content.

Conversations are channel-agnostic. ConversationChannelBinding maps a conversation to a channel-specific reference (Slack thread timestamp, web session ID, etc.). Currently implemented adapters:

AdapterLocationDelivery
In-appinfrastructure/agent/channels/in_app_adapter.pySupabase Realtime
Slackinfrastructure/agent/channels/slack_adapter.pySlack Web API + Events API; threads map to conversations
Emailinfrastructure/agent/channels/email_adapter.pyPluggable EmailSender, digest aggregation

A single conversation can span multiple channels — start in Slack, continue in the web dashboard. The NotificationService orchestrates multi-channel delivery: persists in-app baseline, resolves per-workspace preferences, dispatches to each configured adapter, logs individual failures without blocking other channels.

LangGraph manages conversation state through its built-in checkpointing system:

  • Checkpointer: LangGraphCheckpointer wraps AsyncPostgresSaver
  • Schema isolation: Checkpoint tables live in a dedicated langgraph PostgreSQL schema (framework-owned; AsyncPostgresSaver.setup() provisions; the Supabase CLI migration pipeline does not touch it). Per ADR-043 D7.
  • Thread mapping: LangGraph thread_id maps directly to conversation_id
  • Repository gateway: All AsyncPostgresSaver calls flow through the single spectral.platform.agent.CheckpointerGateway per ADR-043 D8.
  • Same-transaction participation (ADR-043 D9): AsyncPostgresSaver.aput runs on the request-scope connection from spectral_platform.db.request_scope per connection pooling — avoiding torn-write risk between business ops and checkpoint writes.
  • Lifecycle: Checkpointer initialized once at worker startup; the compiled graph is reused across requests.
  • Encryption posture: Checkpointer payloads are encrypted at rest via Supabase storage encryption; key rotation, KMS posture, and recovery drill procedures live in docs/runbooks/checkpointer-encryption.md.

The orchestrator is stateless per-request; all conversation state is managed by the checkpointer.

All three Spectral agent runtimes (Spectral / Ops / World) live in apps/workers per ADR-060 D-runtime. apps/api is thin — auth + AgentTask dispatch via outbox + SSE streaming proxy.

Streaming pattern. Workers consumes AgentTask events, runs the LangGraph orchestrator, streams output via Supabase Realtime channel keyed by conversation_id. apps/api proxies the Realtime channel as SSE to the client. Two hops (workers → Realtime → API → SSE); latency is negligible vs LLM token latency.

AgentTask dispatch via outbox (per ADR-044 D12). platform.agent_tasks carries business state (status, HITL approval linkage, conversation_id, result back-reference, retention). Dispatch flows through core.outbox with event_type='agent.task.dispatched'. Workers listens on the channel, reads the payload, pulls the agent_tasks row, executes. Approval interrupts use LangGraph interrupt() to suspend the run; the checkpointer persists state; an operator response (HTTP into apps/api) resumes via Command(resume=...).

Framework-layer composition seam. Workers IS the framework-layer composition seam where tool dependencies wire via DI per Contract Surfaces. Agent code never imports another context; the workers entrypoint imports both worlds and platform at startup and injects implementations into agent tool factories.

Error handling: LLM-mediated, LangGraph circuit breaker

Section titled “Error handling: LLM-mediated, LangGraph circuit breaker”

ADR-060 D2 + D3 specifies:

  • Tool errors flow back to the LLM as tool messages with the four-class ToolError taxonomy (ToolUserError, ToolPolicyError, ToolTransientError, ToolTerminalError) plus a human-readable description.
  • The LLM decides next action: retry as-is, retry with modified args, surface to operator, abandon.
  • LangGraph’s orchestrator-level recursion limit (default 25; configurable per agent) caps runaway loops as the circuit breaker.
  • No agent-layer retry budget. Tool implementations may include single-retry-on-transient-IO as an implementation detail; that is not contract.

See agent tool invocation for the cross-cutting tool contract.

  • Not a world-model authority. It reads world-model context through worlds’s producer-owned contract surfaces (spectral.worlds.contracts.events.* for notification flow, spectral.worlds.contracts.protocols.* for synchronous calls per ADR-065 D2 + D3) but has no authorship surface over rules.
  • Not a silent actor. Every mutation either passes through the approval ladder or is explicitly pre-authorised per action type.
  • Not an Operations Agent. Customer-facing only.

Internal resident of each world model. Not customer-facing at any tier; accessible to operators through the Operations app as a read-oriented exploration surface. Full specification lives in the World Model System / World Agent page; the section here covers its role relative to the other two agents.

Each world model has exactly one resident World Agent. The World Agent:

  • Explores domain coverage against signal-stream observations
  • Proposes rule candidates through the governed Evolution Loop (never mutates directly)
  • Surfaces coverage gaps and provenance weakness to operators on demand
  • Maintains discovery continuity across world model versions through version-spanning memory

The World Agent’s tool surface is scoped to its own world. It has unrestricted read access to the world model’s rule corpus, to the three-source EvalSet corpus (per ADR-022), and to its own three-tier memory. It does not have write access to the rule corpus — rule promotion runs through the Evolution Loop’s governed pipeline, not through the agent.

Universal three-tier lifecycle (interaction / session / persistent) per ADR-058 D1, with World Agent anchors: agent_interaction_id (interaction), agent_session_id (session), and world_id (persistent — durable across world model versions per ADR-058 D2). Each row also carries a typology enum(episodic, semantic, procedural) discriminator per ADR-058 D3 (transient tiers are episodic; the persistent tier holds semantic + procedural about the world’s domain). Schema and behavioral details in agent memory primitives and world-agent.

Memory contains reasoning, exploration history, and discovery observations; it never contains rule content. Rules live in the world model itself; the World Agent has read access to them rather than a memorised copy. The reference-only invariant (ADR-058 D14) is enforced by the body_text trigram-similarity trigger on worlds.world_agent_memory (ADR-058 D8).

Never. The customer-facing prohibition is absolute. Operators access the World Agent through the Operations app; customers do not see its outputs directly. When world-model reasoning needs to reach a customer, it goes through the WorldModelCard (formal artifact) or through the Spectral Agent (which reads world-model context via worlds’s producer-owned contract surfaces in spectral.worlds.contracts.* per ADR-065).

  • Not a customer-facing surface
  • Not an accessor of Spectral memory at any tier
  • Not a decision-maker in enshrinement — human sign-off governs promotion
  • Not the Operations Agent — see the boundary table below

Net-new in 0.3.0. Lives in apps/operations and is the operator’s task-oriented collaborator for world-model authoring, distillation, publication, and observability. Where the World Agent is read-oriented (understand the world), the Operations Agent is write-oriented (act on the world-authoring surface).

  • World-model authoring. Drafting rule candidates from Authoritative sources, staging them into the Evolution Loop, reviewing promotion queues.
  • Distillation. Proposing condensations of redundant rules, flagging rules with drifted provenance chains.
  • Publication. Generating WorldModelCard drafts, reviewing release notes (per ADR-026), coordinating version publication.
  • Operational observability. Summarising scan-volume trends, convergence-signal health, customer adoption of world-model-grounded vs customer-directed stimuli.

Operations-app scoped. Tracks the operator’s in-flight tasks, pending reviews, recent signal digests. Independent from both Spectral and World Agent memory; does not mirror world-model rule content (that remains authoritative in the world model itself, accessed by reference).

Never. Operations-only. Every output of the Operations Agent is internal. When operator work produces a customer-facing artifact (a published WorldModelCard, a release note), the artifact goes out through the platform’s normal publication path — not through the agent’s conversational surface.

  • Not the World Agent. Task-oriented vs read-oriented; see the Boundary table below.
  • Not customer-facing at any tier.
  • Not a free-form world-model mutator. Every mutation that touches the rule corpus passes through the Evolution Loop’s governed promotion gate.

Both live at the operator’s seat inside the Operations app. Keeping their roles distinct is what makes the operator interaction coherent rather than schizophrenic.

Ops AgentWorld Agent
PurposePerform operational tasks on the Operations-app tool surfaceExplore and reflect on the world model
Interaction shapeTask: curate this rule, promote this candidate, trigger this processRead: what do you know, where are you uncertain, how confident are you
Write authority?Yes — on the Operations-app surfaceNo — proposes through the Evolution Loop, never mutates directly
ScopeOperations-app state (curation queues, promotion actions, observability)The world model it resides in
MemoryOperations-app operator-stateThree-tier world-model memory

The operator using both is the integration point. The Ops Agent does not delegate reasoning to the World Agent; the World Agent does not take tasks from the Ops Agent. Each is summoned by the operator for its own purpose, and the operator stitches the two conversations together when needed.

The exact routing / handoff / context-sharing pattern between operator, Ops Agent, and World Agent is specified at Operations App — Operations Agent / Interaction pattern. The hybrid default (separate surfaces + a read-oriented ask_world_agent tool on the Ops Agent) preserves operator clarity about do vs explore while letting Ops tasks pull in coverage context when needed.

Why two agents and not one with mode-switching

Section titled “Why two agents and not one with mode-switching”

A single richer agent with mode-switching would conflate the do and explore modes the operator actually exercises separately. The cost of two agents (two memory systems, two tool surfaces, operator cognitive load to summon the right one) is real but bounded; the cost of one agent is that the act-vs-reflect boundary becomes a runtime mode the agent must self-regulate, and the world-model authority claim becomes harder to defend (an agent that can mutate the world model will be asked to skip the gate, even if a mode flag normally prevents it). The two-agent split makes the boundary structural rather than procedural — the World Agent has no write authority on the rule corpus by construction, not by convention. That structural guarantee is load-bearing for the same reason the customer-side platform / world-model-system isolation is load-bearing: the standard’s authority survives because the system that builds it cannot reach into the standard.

No shared state, no direct communication.

  • Spectral Agent reads world-model context only through worlds’s producer-owned contract surfaces (spectral.worlds.contracts.events.* for notifications, spectral.worlds.contracts.protocols.* for synchronous calls per ADR-065 D2 + D3) — and the locally-projected snapshots that materialize from them. Never through world-model internals, never through the World Agent’s memory.
  • Flow from platform → worlds is event-only, per ADR-017. Events carry observations (platform.failure_cluster.detected, scan.convergence.delta, memory.observation.promoted); they do not carry agent state.
  • The World Agent consumes signal events as one of its exploration inputs. The Spectral Agent is not in that loop.
  • Authority isolation. The world model’s authority rests on its rule corpus + evolution governance, not on any agent’s conversational output. If all three agents went down tomorrow, the WorldModelCard would still be authoritative for its published version.
  • Customer-safety. Customers talk only to the Spectral Agent. Operator reasoning (which can be exploratory, uncertain, and provenance-dependent) never leaks directly to the customer surface.
  • Operator coherence. An operator always knows which agent they are talking to and why — Ops Agent for do this, World Agent for what do you think about this. The two never bleed.

ComponentLocationLayer
Domain entitiesspectral.platform.domain.agent.modelsDomain
Domain eventsspectral.platform.domain.shared.eventsDomain
Tool factoriesspectral.platform.application.agent.toolsApplication
Notification servicespectral.platform.application.agent.notificationsApplication
Event handlersspectral.platform.application.agent.on_scan_completed (handler class lives in application; runtime is apps/workers per ADR-060)Application
Authorization servicespectral.platform.application.agent.authorizationApplication
Email digest servicespectral.platform.application.agent.email_digestApplication
LangGraph orchestratorspectral.platform.infrastructure.agent.langgraph_orchestratorInfrastructure
Checkpointer adapterspectral.platform.infrastructure.agent.checkpointerInfrastructure
Channel adaptersspectral.platform.infrastructure.agent.channelsInfrastructure

World Agent implementation lives in spectral.worlds; Operations Agent runs in apps/workers per ADR-060 (Runtime placement), with the Operations app surface in apps/operations. Each context’s agent-level code is structured the same way (domain / application / infrastructure) but the modules are local to their context.