ADR-060: Agent tool invocation, framework-layer composition, and LLM-mediated error handling
Status: Accepted (2026-04-25)
Context
Spectral runs three LangGraph-driven agents — Spectral Agent (customer-facing scan analysis), World Agent (domain exploration), Operations Agent (operator workflow). Each has its own tool registry: Spectral Agent’s tools live in system-design/agents/agent-architecture.mdx, World Agent’s in world-agent.mdx, Ops Agent’s in operations-agent.mdx (per S9 ground truth in SPEC-266). Per-agent registries already existed; the cross-cutting contract did not. The three runtimes had drifted in small ways — error shapes inconsistent across agents, observability metadata inconsistent, approval payloads ad hoc, and (the load-bearing question) no settled answer to where each agent runtime would actually live or how inter-context tool dependencies would compose.
Three architectural questions were entangled:
- Where do the three agent runtimes live? Spectral Agent had been provisioned in workers (per TA-5 D12 AgentTask + TA-14 checkpointer + TA-19 D1). World Agent and Ops Agent were unspecified. A single answer was needed before tool patterns could be designed.
- How does a tool in one context reach data or behavior in another context? TA-12 D13 had pencilled in framework-layer composition for
ask_world_agent; TA-7 D3 + TA-8 D3 had pencilled in SQL grants between contexts for outcome reads + T3 body fetches. Two different mechanisms, no settled default. - How does an agent recover from tool errors? Default LangGraph and pydantic-ai behavior surfaces tool errors back to the LLM as tool messages; the LLM decides next action. An earlier draft proposed a hand-rolled retry middleware at the agent layer with explicit per-tool budgets. The two paths fight each other.
This ADR resolves the cross-cutting contract for agent tool invocation, including the three architectural questions above.
Decision
The decision is structured as three load-bearing architectural ratifications followed by ten per-mechanism decisions. The ratifications are stated separately because their reach extends beyond TA-15 itself — they bind subsequent dispositions and supersede prior ones.
Architectural ratification — Agent runtime placement: all three agents run in workers
apps/workers hosts the LangGraph orchestrators for all three agents. apps/api becomes thin: authentication, AgentTask dispatch via outbox (per TA-5 D12), and SSE streaming proxy. Workers consumes AgentTask events, loads checkpointer state (per TA-14), runs the orchestrator, executes tool calls, writes memory, and streams output via Supabase Realtime channel keyed by conversation_id; apps/api proxies the Realtime channel as SSE to the client. Approval interrupts use LangGraph’s interrupt() to suspend the run; the checkpointer persists state; an operator response (HTTP into apps/api) resumes via Command(resume=...).
Architectural ratification — Inter-context composition: notifications via events; calls via DI; no SQL grants between contexts
The architectural axis for inter-context mechanism choice is flow shape, not sync vs async function semantics:
- Notification flow (one-way push; producer doesn’t await a result) → typed event payloads in
<producer>.contracts.events.*published onto the TA-5 substrate (per ADR-065 D2). - Call flow (caller dispatches a request and needs the result) → callee-owned OHS Protocol in
<callee>.contracts.protocols.*(per ADR-065 D3); impl in callee context’s application layer; bridge tool lives inapps/*per ADR-065 D5 (composes the Protocol into the caller agent’s tool list via DI).
Both flow shapes are implemented with async def Python functions in workers; transport choice is orthogonal to function-definition semantics. No SQL grants between contexts at any layer. This is the agent-tool-invocation projection of ADR-063, where the canonical statement lives.
Architectural ratification — TA-27 collapses to ratification
Founder-lens challenge: scaffolding exceptions during architectural planning indicates the default isn’t right. TA-7 D3 (worlds_outcomes_reader grant) and TA-8 D3 (worlds_t3_reader grant) are both removed. Inter-context outcome reads + T3 body reads are notification-shaped — the Reader Protocol path was retired by ADR-064 D3 (broadened 2026-04-30) in favor of event-driven local replicas at each consumer context. TA-27 (SPEC-331) lands as ratification of the inter-context composition ratification above rather than fresh disposition. See ADR-063 for the canonical statement.
D1 — Tool envelope is a pattern, not a heavy value object
Tools remain plain async callables produced by closed-over-DI factories (the existing Spectral Agent pattern; extended to all three agents). Cross-cutting metadata is captured at call time via a lightweight ToolCallMetadata pydantic VO emitted by an observed_tool decorator. There is no ToolCallEnvelope wrapper around every tool body.
D2 — Error taxonomy: four classes in spectral.core.tools.errors
ToolUserError— invalid input from user/operator (bad args, missing context); user-visibleToolPolicyError— policy/scope/approval denied; user-visible with framingToolTransientError— infrastructure transient (DB blip, brief LLM provider rate-limit)ToolTerminalError— non-recoverable (invariant violation)
The taxonomy’s role is to shape what the LLM sees via the tool message, not to drive a hand-rolled retry dispatcher.
D3 — Error propagation is LLM-mediated; LangGraph recursion-limit is the circuit breaker
Tool errors flow back to the LLM as tool messages with error class plus human-readable description per D2. The LLM decides next action: retry as-is, retry with modified args, surface to operator, abandon. LangGraph orchestrator-level recursion limit (default 25; configurable per agent) caps runaway loops. There is no agent-layer retry budget — the LLM is the retry decision-maker. Tool implementations may include single-retry-on-transient-IO as an implementation detail (e.g., a DB connection blip); that is not contract.
D4 — Approval via LangGraph interrupt(); standardized ToolApprovalRequest payload
ToolApprovalRequest (pydantic VO in spectral.core.tools.approval) carries:
tool_name: stragent_name: strargs_summary: str(sanitized; PII-stripped; safe to display to the operator)effect_description: str(human-readable description of what will change)correlation_id: UUID
Operator response: approve / deny / request-revision. On approve: Command(resume=ApprovalGranted(...)). On deny: tool aborts with ToolPolicyError(reason=APPROVAL_DENIED). On revision: the agent revises the proposed action and re-emits the approval request. All paths audit-logged per TA-16.
D5 — Per-tool classification ground truth stays in S9 / agent-architecture pages
This ADR specifies the mechanism. SPEC-266 (S9 Ops Agent tool registry), agent-architecture.mdx (Spectral Agent), and world-agent.mdx (World Agent) remain the per-agent tool ground truth. This ADR does not restate the lists.
D6 — ask_world_agent composes via in-process DI through the workers entrypoint
WorldAgentRunner Protocol lives in spectral.worlds.contracts.protocols.world_agent per ADR-065 D3 (callee-owned OHS Protocol; the original spectral.core.tools.protocols placement is superseded). Two methods:
ask(question: str, *, world_id: UUID) -> str— stateless mode (no session, no memory; per S10)chat(message: str, *, session_id: UUID, world_id: UUID) -> str— stateful mode
Impl lives in spectral.worlds.application. Per ADR-065 D5, bridge tools (e.g. an Ops-Agent ask_world_agent callable) live in apps/* framework deliverables, never in caller-context code; the bridge imports WorldAgentRunner (framework-to-context, allowed under validator rule 7) and is composed into the Ops Agent tool list via DI at workers startup. Tool body: await runner.ask(question, world_id=...). OTel trace context flows in-process. No correlation_id, no events, no suspend/resume.
WorldAgentRunner is the reference example; the same shape (callee-owned Protocol in <callee>.contracts.protocols.*; impl in callee context; bridge tool in apps/*) applies to any future inter-context call-shaped tool. Notification-shaped reads (e.g., rule-candidate outcomes, T3 memory body) follow event-driven local replicas instead per ADR-064 D3.
D7 — DLQ inspection tools added to the Ops Agent surface
Resolves the TA-6 D6 deferral:
list_dlq_events(handler_name?, age_range?, limit)— readget_dlq_event_detail(event_id)— read; returns event payload + sanitizedfailure_historyreplay_dlq_event(event_id, reason: str)— mutate with call-time approval; callscore.outbox_replay()per TA-6 D5;reasonfield captured in audit log
Backed by OutboxReader / OutboxReplayer protocols injected at the workers entrypoint.
D8 — Cluster triage tools added to the Ops Agent surface; symmetric with D7
Resolves the TA-9 D5 deferral:
list_failure_clusters(severity?, status?)— readget_cluster_detail(cluster_id)— read; returns snapshot + linked failurestriage_cluster(cluster_id, status: dismissed|escalate|wait, notes)— mutate with call-time approval; updatesplatform.rule_candidates_pendingoperator-managed columns per TA-9 D3
Symmetric with D7 under S9’s mutate-with-call-time-approval pattern. Cluster triage is an operational status change (not a governance gate like approve_candidate); both fit the same shape.
D9 — Workshop discipline at the tool → memory boundary is doctrine plus a repository wrapper
Per the workshop framing crystallized in TA-13. Tool outputs containing canonical content (rule body via get_candidate_detail, scan trace via cluster detail, customer PII anywhere) are not round-tripped into memory rows verbatim. The agent uses content in-context for reasoning; the memory-write path stores meta-knowledge (“operator asked about candidate X” / “agent inspected cluster Y”), not content. The repository gateway (TA-13 D11 / TA-12 D11) enforces typology-driven classification; the trigram trigger (TA-12 D8 / TA-13 D4) backstops doctrine drift. There is no separate sanitization decorator on tools — discipline lives in the memory-write path, not the tool surface.
D10 — Helpers landing in spectral.core.tools
ToolCallMetadata(metadata.py) — pydantic VO:tool_name,agent_name,latency_ms,okbool,error_class(nullable),trace_id,started_at,ended_atToolErrorbase + four subclasses (errors.py) — D2 taxonomyToolApprovalRequest(approval.py) — D4 payloadobserved_tooldecorator — wraps an async tool callable; emitsToolCallMetadataper call to structlog and OTel; integrates with TA-10 LLM cost tracking when the tool body invokes an LLM. Decorator implementation lands with the first consumer (SPEC-242 Spectral Agent integration) per TA-12 / TA-14 precedent — this ADR fixes the contract surface (metadata + approval + error types).
Alternatives considered
Inter-context tool calls via events (request-reply pattern). Considered for ask_world_agent and similar call-flow tools. Rejected: request-reply over events introduces correlation IDs, timeouts, suspend-resume orchestration, and lost-response handling for what is structurally an in-process function call. DI at the framework-layer seam uses primitives that already exist (closed-over factories), preserves normal stack traces, and incurs no substrate-handler overhead. ADR-063 captures the broader inter-context framing.
HTTP through apps/api for ask_world_agent. Rejected after the agent-runtime-placement ratification: workers IS the framework-layer composition seam for workers-resident tool calls; an HTTP roundtrip would be a needless network hop.
Hand-rolled retry middleware at agent layer with explicit per-tool budget. Rejected. Bypasses LLM judgment, fights LangGraph default behavior, and adds machinery that would not earn its keep. The LLM is already the decision-maker for “what to do about a tool error” — wrapping its judgment in a budget table re-implements what the LLM does naturally.
Asymmetric D7 / D8 (replay-as-tool; triage-as-UI-only). Rejected. Cluster triage is not a governance gate; it is symmetric with DLQ replay under S9’s mutate-with-call-time-approval pattern. Splitting the surface across tool and UI layers would force operators to context-switch between agent chat and an ops dashboard for closely related actions.
Per-tool typed event pair for inter-context tool calls (alternative to a generic ToolInvocationRequested/Answered shape if events-default had survived). Moot under the inter-context composition ratification above (notification flow → events; call flow → DI).
Sanitization decorator on tool functions (D9 alternative). Rejected. Discipline belongs at the memory-write path (where the typology decision is made), not at the tool surface (which legitimately surfaces canonical content for in-context reasoning).
Inter-context SQL grants kept as exception list (TA-7 D3 + TA-8 D3 retained). Rejected after the founder-lens challenge that produced the TA-27 ratification above. ADR-063 captures the canonical reframing.
Consequences
- Single inter-context composition mechanism for tool calls (DI at framework-layer seam) — minimal substrate footprint, standard async/await semantics, normal stack traces, easy testing.
- TA-27 (SPEC-331) collapses to ratification. Inter-context SQL grants don’t ship at any layer. Captured canonically in ADR-063.
- TA-7 D3 + TA-8 D3 grants removed. Reimplementations happen in the consumer epics (SPEC-310 outcome read; SPEC-311 T3 body fetch via DI-injected reader).
- TA-5 D5 partially superseded — notification-flow portion holds; inter-context SQL default does not. ADR-044 (TA-5) carries the supersession status line.
- TA-19 D2 inheritance from TA-5 D5 superseded. ADR-048 (TA-19) reflects that the workers tier composes inter-context dependencies via DI; no shared DB role assumes an inter-context grant.
spectral.core.toolspackage landed (commit5eabc3c):errors.py,metadata.py,approval.py,protocols.py, plus 20 contract tests pinning the surface (test count: 114 → 134).apps/apibecomes thin. Auth + AgentTask dispatch + SSE streaming proxy. Operationally simpler at the cost of a streaming roundtrip via Supabase Realtime (workers → Realtime → API → SSE). Latency penalty is negligible vs LLM token latency.- Workers entrypoint is load-bearing. Inter-context composition for all agent-resident call flows lives there. The composition module is more substrate to maintain than a monolithic process, but the context seal is enforced structurally — agent context code never imports another context.
- LLM prompts must handle error tool-messages gracefully — implementation discipline carried into the consumer epics.
observed_tooldecorator implementation deferred to first consumer (SPEC-242) per TA-12 / TA-14 precedent. The contract surface is settled now; the decorator wires TA-16 substrate (structlog + OTel) and TA-10 cost tracking when first integrated.- Approval audit trail. Every
ToolApprovalRequestand operator response is logged through TA-16; approval timing and reason fields support post-hoc review.
References
- ADR-007 — LangGraph agent architecture; closed-over-DI tool factory pattern
- ADR-065 —
spectral.coreadmission discipline (the new tool surface ships under core) - ADR-031 — single-library + app-as-framework-layer-leaves; framework-layer composition
- ADR-043 — TA-14 LangGraph checkpointer (approval interrupts depend on checkpointer behavior)
- ADR-044 — TA-5 event substrate (carries the D5 supersession from this ADR)
- ADR-048 — TA-19 deployment topology; workers tier
- ADR-058 — TA-12 World Agent memory + agent-memory-primitives
- ADR-059 — TA-13 Ops Agent memory
- ADR-063 — canonical inter-context access pattern
- TA-15 disposition — SPEC-318 comment
66b07620 - TA-15 verification — SPEC-318 comment
c6868aa3 src/spectral/core/tools/— landed contract surfacetests/core/test_contract_tools.py— 20 contract tests- Codex
system-design/agents/agent-tool-invocation.mdx— declarative pattern documentation - Codex
system-design/agents/agent-architecture.mdx— runtime placement + streaming pattern updates