ADR-060: Agent tool invocation, framework-layer composition, and LLM-mediated error handling
Context
Spectral’s agent runtimes are LangGraph-driven. When this contract was settled the agent tool registries had drifted — error shapes inconsistent, observability metadata inconsistent, approval payloads ad hoc — and there was no cross-cutting tool-invocation contract, nor a settled answer to where an agent runtime lives or how inter-context tool dependencies compose. The surviving topology is a single agent — the World Agent (ADR-101) — but the cross-cutting contract below governs it and is the reference shape for any future agent.
Three architectural questions were entangled:
- Where does the agent runtime live? The agent runtime runs in the workers entrypoint (per ADR-109 D3), off the synchronous
/api/decidepath. A single answer was needed before tool patterns could be designed. - How does a tool in one context reach data or behavior in another context? Two mechanisms had been pencilled in — framework-layer composition for
ask_world_agentvs SQL grants between contexts for outcome reads + memory body fetches — with no settled default. - How does an agent recover from tool errors? Default LangGraph behavior surfaces tool errors back to the LLM as tool messages; the LLM decides next action. An earlier draft proposed a hand-rolled retry middleware at the agent layer with explicit per-tool budgets. The two paths fight each other.
This ADR resolves the cross-cutting contract for agent tool invocation, including the three architectural questions above.
Decision
The decision is structured as three load-bearing architectural ratifications followed by ten per-mechanism decisions. The ratifications are stated separately because their reach extends beyond TA-15 itself — they bind subsequent dispositions and supersede prior ones.
Architectural ratification — Agent runtime placement: the agent runs in the workers entrypoint
The workers entrypoint hosts the LangGraph orchestrator (ADR-109 D3). The API entrypoint stays thin: authentication, AgentTask dispatch via the outbox (ADR-044), and SSE streaming proxy. Workers consumes AgentTask events, loads checkpointer state (ADR-043), runs the orchestrator, executes tool calls, writes memory, and streams output via a Supabase Realtime channel keyed by conversation_id; the API proxies the Realtime channel as SSE to the client. Approval interrupts use LangGraph’s interrupt() to suspend the run; the checkpointer persists state; an operator response (HTTP into the API) resumes via Command(resume=...).
Architectural ratification — Inter-context composition: notifications via events; calls via DI; no SQL grants between contexts
The architectural axis for inter-context mechanism choice is flow shape, not sync vs async function semantics:
- Notification flow (one-way push; producer doesn’t await a result) → typed event payloads in
<producer>.contracts.events.*published onto the TA-5 substrate (per ADR-065 D2). - Call flow (caller dispatches a request and needs the result) → callee-owned OHS Protocol in
<callee>.contracts.protocols.*(per ADR-065 D3); impl in callee context’s application layer; bridge tool lives inapps/*per ADR-065 D5 (composes the Protocol into the caller agent’s tool list via DI).
Both flow shapes are implemented with async def Python functions in workers; transport choice is orthogonal to function-definition semantics. No SQL grants between contexts at any layer. This is the agent-tool-invocation projection of ADR-063, where the canonical statement lives.
Architectural ratification — TA-27 collapses to ratification
Founder-lens challenge: scaffolding exceptions during architectural planning indicates the default isn’t right. TA-7 D3 (worlds_outcomes_reader grant) and TA-8 D3 (worlds_t3_reader grant) are both removed. Inter-context outcome reads + T3 body reads are notification-shaped — the Reader Protocol path was retired by ADR-064 D3 (broadened 2026-04-30) in favor of event-driven local replicas at each consumer context. TA-27 (SPEC-331) lands as ratification of the inter-context composition ratification above rather than fresh disposition. See ADR-063 for the canonical statement.
D1 — Tool envelope is a pattern, not a heavy value object
Tools remain plain async callables produced by closed-over-DI factories. Cross-cutting metadata is captured at call time via a lightweight ToolCallMetadata pydantic VO emitted by an observed_tool decorator. There is no ToolCallEnvelope wrapper around every tool body.
D2 — Error taxonomy: four classes in spectral.core.tools.errors
ToolUserError— invalid input from user/operator (bad args, missing context); user-visibleToolPolicyError— policy/scope/approval denied; user-visible with framingToolTransientError— infrastructure transient (DB blip, brief LLM provider rate-limit)ToolTerminalError— non-recoverable (invariant violation)
The taxonomy’s role is to shape what the LLM sees via the tool message, not to drive a hand-rolled retry dispatcher.
D3 — Error propagation is LLM-mediated; LangGraph recursion-limit is the circuit breaker
Tool errors flow back to the LLM as tool messages with error class plus human-readable description per D2. The LLM decides next action: retry as-is, retry with modified args, surface to operator, abandon. LangGraph orchestrator-level recursion limit (default 25; configurable per agent) caps runaway loops. There is no agent-layer retry budget — the LLM is the retry decision-maker. Tool implementations may include single-retry-on-transient-IO as an implementation detail (e.g., a DB connection blip); that is not contract.
D4 — Approval via LangGraph interrupt(); standardized ToolApprovalRequest payload
ToolApprovalRequest (pydantic VO in spectral.core.tools.approval) carries:
tool_name: stragent_name: strargs_summary: str(sanitized; PII-stripped; safe to display to the operator)effect_description: str(human-readable description of what will change)correlation_id: UUID
Operator response: approve / deny / request-revision. On approve: Command(resume=ApprovalGranted(...)). On deny: tool aborts with ToolPolicyError(reason=APPROVAL_DENIED). On revision: the agent revises the proposed action and re-emits the approval request. All paths audit-logged per TA-16.
D5 — Per-tool classification ground truth stays in the agent’s Codex pages
This ADR specifies the mechanism. The World Agent’s Codex pages remain the tool-registry ground truth (the per-tool classification — read vs mutate, approval requirements). This ADR does not restate the lists.
D6 — Inter-context call-shaped tools compose via in-process DI through the workers entrypoint
WorldAgentRunner Protocol lives in spectral.worlds.contracts.protocols.world_agent per ADR-065 D3 (callee-owned OHS Protocol). Two methods:
ask(question: str, *, world_id: UUID) -> str— stateless mode (no session, no memory; per S10)chat(message: str, *, session_id: UUID, world_id: UUID) -> str— stateful mode
Impl lives in spectral.worlds.application. Per ADR-065 D5, bridge tools live in apps/* framework deliverables, never in caller-context code; a bridge imports WorldAgentRunner (framework-to-context, allowed under the validator’s app-context-surface rule) and is composed into a caller agent’s tool list via DI at workers startup. Tool body: await runner.ask(question, world_id=...). OTel trace context flows in-process. No correlation_id, no events, no suspend/resume.
The World Agent is the only agent (per ADR-101), serving operators directly, so no caller agent issues ask_world_agent today. The bridge-tool mechanism and the WorldAgentRunner Protocol stand as the reference shape for any future inter-context call-shaped tool — callee-owned Protocol in <callee>.contracts.protocols.*, impl in callee context, bridge tool in apps/* — with no alpha consumer. Notification-shaped reads (e.g., rule-candidate outcomes, T3 memory body) follow event-driven local replicas instead per ADR-064 D3.
D7 — DLQ inspection and replay is an operator capability, not an agent tool
DLQ inspection and replay is an operator capability — operator endpoints / Supabase Studio (per ADR-054 D6) — not an agent tool surface. This resolves the TA-6 D6 deferral. The capability covers:
- list DLQ events (filter by
handler_name,age_range;limit) — read - DLQ event detail (
event_id) — read; returns event payload + sanitizedfailure_history - replay a DLQ event (
event_id,reason: str) — mutate; callscore.outbox_replay()per TA-6 D5;reasoncaptured in audit log
The core.outbox_replay() substrate and the OutboxReader / OutboxReplayer protocols stand; they back the operator capability rather than an agent tool surface.
D8 — Cluster triage is not an alpha capability
Cluster triage does not exist at alpha. Its signal source — the failure-cluster stream — retired with the scan pipeline (ADR-074), and there is no separate operator-agent surface to host it (the World Agent is the only agent per ADR-101, serving operators directly). This closes the TA-9 D5 deferral with no surviving mechanism: the list_failure_clusters / get_cluster_detail / triage_cluster operations and the platform.rule_candidates_pending operator-triage columns they fronted are gone with the pipeline.
D9 — Workshop discipline at the tool → memory boundary is doctrine plus a repository wrapper
Per the workshop framing crystallized in TA-13. Tool outputs containing canonical content (rule body via get_candidate_detail, scan trace via cluster detail, customer PII anywhere) are not round-tripped into memory rows verbatim. The agent uses content in-context for reasoning; the memory-write path stores meta-knowledge (“operator asked about candidate X” / “agent inspected cluster Y”), not content. The repository gateway (TA-13 D11 / TA-12 D11) enforces typology-driven classification; the trigram trigger (TA-12 D8 / TA-13 D4) backstops doctrine drift. There is no separate sanitization decorator on tools — discipline lives in the memory-write path, not the tool surface.
D10 — Helpers landing in spectral.core.tools
ToolCallMetadata(metadata.py) — pydantic VO:tool_name,agent_name,latency_ms,okbool,error_class(nullable),trace_id,started_at,ended_atToolErrorbase + four subclasses (errors.py) — D2 taxonomyToolApprovalRequest(approval.py) — D4 payloadobserved_tooldecorator — wraps an async tool callable; emitsToolCallMetadataper call to structlog and OTel; integrates with LLM cost tracking when the tool body invokes an LLM. Decorator implementation lands with the first consumer — this ADR fixes the contract surface (metadata + approval + error types).
Alternatives considered
Inter-context tool calls via events (request-reply pattern). Considered for ask_world_agent and similar call-flow tools. Rejected: request-reply over events introduces correlation IDs, timeouts, suspend-resume orchestration, and lost-response handling for what is structurally an in-process function call. DI at the framework-layer seam uses primitives that already exist (closed-over factories), preserves normal stack traces, and incurs no substrate-handler overhead. ADR-063 captures the broader inter-context framing.
HTTP through apps/api for ask_world_agent. Rejected after the agent-runtime-placement ratification: workers IS the framework-layer composition seam for workers-resident tool calls; an HTTP roundtrip would be a needless network hop.
Hand-rolled retry middleware at agent layer with explicit per-tool budget. Rejected. Bypasses LLM judgment, fights LangGraph default behavior, and adds machinery that would not earn its keep. The LLM is already the decision-maker for “what to do about a tool error” — wrapping its judgment in a budget table re-implements what the LLM does naturally.
DLQ replay and cluster triage as agent tools. An earlier framing weighed exposing both as agent-tool surfaces with call-time approval. Moot under the current topology: DLQ replay is an operator capability (D7) and cluster triage retired with the scan pipeline (D8) — neither is an agent tool.
Per-tool typed event pair for inter-context tool calls (alternative to a generic ToolInvocationRequested/Answered shape if events-default had survived). Moot under the inter-context composition ratification above (notification flow → events; call flow → DI).
Sanitization decorator on tool functions (D9 alternative). Rejected. Discipline belongs at the memory-write path (where the typology decision is made), not at the tool surface (which legitimately surfaces canonical content for in-context reasoning).
Inter-context SQL grants kept as exception list (TA-7 D3 + TA-8 D3 retained). Rejected after the founder-lens challenge that produced the TA-27 ratification above. ADR-063 captures the canonical reframing.
Consequences
- Single inter-context composition mechanism for tool calls (DI at framework-layer seam) — minimal substrate footprint, standard async/await semantics, normal stack traces, easy testing.
- TA-27 (SPEC-331) collapses to ratification. Inter-context SQL grants don’t ship at any layer. Captured canonically in ADR-063.
- TA-7 D3 + TA-8 D3 grants removed. Reimplementations happen in the consumer epics (SPEC-310 outcome read; SPEC-311 T3 body fetch via DI-injected reader).
- Inter-context SQL grants don’t ship — the notification-flow portion of the event substrate holds; the inter-context-SQL default does not (canonically ADR-063).
- The workers tier composes inter-context dependencies via DI (ADR-109); no shared DB role assumes an inter-context grant.
spectral.core.toolspackage landed (commit5eabc3c):errors.py,metadata.py,approval.py,protocols.py, plus 20 contract tests pinning the surface (test count: 114 → 134).apps/apibecomes thin. Auth + AgentTask dispatch + SSE streaming proxy. Operationally simpler at the cost of a streaming roundtrip via Supabase Realtime (workers → Realtime → API → SSE). Latency penalty is negligible vs LLM token latency.- Workers entrypoint is load-bearing. Inter-context composition for all agent-resident call flows lives there. The composition module is more substrate to maintain than a monolithic process, but the context seal is enforced structurally — agent context code never imports another context.
- LLM prompts must handle error tool-messages gracefully — implementation discipline carried into the consumer epics.
observed_tooldecorator implementation deferred to first consumer (SPEC-242) per TA-12 / TA-14 precedent. The contract surface is settled now; the decorator wires TA-16 substrate (structlog + OTel) and TA-10 cost tracking when first integrated.- Approval audit trail. Every
ToolApprovalRequestand operator response is logged through TA-16; approval timing and reason fields support post-hoc review.
References
- ADR-007 — LangGraph agent architecture; closed-over-DI tool factory pattern
- ADR-065 —
spectral.coreadmission discipline (the new tool surface ships under core) - ADR-031 — single-library + app-as-framework-layer-leaves; framework-layer composition
- ADR-043 — TA-14 LangGraph checkpointer (approval interrupts depend on checkpointer behavior)
- ADR-044 — event substrate
- ADR-109 — TA-19 deployment topology; workers tier
- ADR-058 — TA-12 World Agent memory + agent-memory-primitives
- ADR-063 — canonical inter-context access pattern
- TA-15 disposition — SPEC-318 comment
66b07620 - TA-15 verification — SPEC-318 comment
c6868aa3 src/spectral/core/tools/— landed contract surfacetests/core/test_contract_tools.py— 20 contract tests- Codex
system-design/agents/agent-tool-invocation.mdx— declarative pattern documentation - Codex
system-design/agents/agent-architecture.mdx— runtime placement + streaming pattern updates