Skip to content
GitHub
Decisions

ADR-102: LLM execution stack — LangGraph orchestration + LangChain structured-step; DSPy deferred post-alpha

Context

The agent-stack audit (2026-06-04; planning/0.3.0/agent-stack-audit/findings.md) established that the DSPy integration never worked on the production path: the control-plane↔DSPy shim (ControlPlaneLM) was built inverted — it overrode the DSPy LM call and delegated down to a core.llm.LLMProvider.complete that has no sanctioned concrete implementation (the SDK lint forbids the only way to build one), hardcoding Usage(0,0,0) and discarding token counts; configure_world_agent_dspy was never called in a booted process; every DSPy Module was a bare dspy.Predict; and the SPEC-4 “code adapter” was a no-op dspy.ChatAdapter(). DSPy’s genuine differentiator — Optimizers (BootstrapFewShot / MIPRO) — was always deferred post-alpha. So at alpha scope DSPy delivered no realized capability that the LangGraph ecosystem does not, while carrying framework-maturity and supply-chain surface.

Co-design (recorded in planning/0.3.0/agent-stack-audit/replan.md) chose to drop DSPy for alpha and adopt the LangGraph-native structured-step seam. The change is bounded: LangGraph orchestration (ADR-007) is unchanged; the node-level executor swaps.

Decision

D1 — LangGraph orchestrates; LangChain is the per-node structured executor

LangGraph (ADR-007, kept) remains the orchestration substrate — graph, state, checkpointer, tools. The per-node LLM execution is LangChain: model.with_structured_output(PydanticSchema) for structured generation (codegen, test-method generation, restatement, distillation extraction, proposal synthesis), and provider-native tool-calling (bind_tools / langgraph.prebuilt.create_react_agent / ToolNode) for tool-using loops (the conversational authoring agent, web research).

D2 — DSPy is deferred post-alpha; LangChain is the alpha LLM call layer

DSPy and its LiteLLM backend leave the alpha dependency path. The DSPy-adoption work is deferred post-alpha. The node-level call boundary is abstracted (D5) so re-introducing DSPy per node later — for its Optimizer lever, the genuine post-alpha reason — is incremental, per-node rework, not a substrate migration.

D3 — Provider access via the LangChain provider packages

LLM access goes through the langchain-<provider> integration packages, which wrap the raw provider SDKs and expose with_structured_output / bind_tools / usage_metadata on BaseChatModel. The wired provider set is langchain-anthropic, langchain-xai, langchain-openai, and langchain-google-genai — selectable per profile by the provider/model id, plus an openai-compatible provider that points the OpenAI client at a profile-supplied base_url (a self-hosted or alt-vendor OpenAI-API-shaped endpoint). ChatXAI subclasses the OpenAI-compatible base, so the xai / openai / openai-compatible branches share the injectable-httpx-client seam (in-flight credential refresh + the cassette record/replay transport); anthropic and google-genai use their own SDK clients. The OpenAI branch additionally expresses the OpenAI subscription (Codex) request-shape — for an org-BYO ChatGPT-subscription OAuth credential it builds ChatOpenAI against the Codex backend with the Responses API + the Codex instructions/headers envelope (experimental, non-datacenter-only) — entirely through langchain-openai, so the no-raw-SDK invariant holds for that path too. Direct imports of litellm and of the raw provider SDKs (anthropic, openai, the Gemini SDKs google.genai / google.generativeai) from Spectral source remain forbidden; tools/quality/check_llm_sdk_allowlist.py enforces “LLM access goes through the LangChain provider packages” and the control_plane_lm.py allowlist entry is removed. The lint continues to assert no library absent from the graph.

Third-party LLM-stack dependencies (the langchain-<provider> packages, and litellm where transitively present) are pinned n-1 with a cooling period before adopting a new release — matched to the supply-chain-compromise threat class (a poisoned release of an otherwise-sound dependency, which surfaces over time); litellm is explicitly pinned in the lockfile rather than floated transitively.

D4 — The control-plane decisions stand; the integration seam re-seats onto the LangChain call boundary

ADR-035 D2–D10 are unaffected as decisions: no out-of-process gateway, the canonical purpose taxonomy, the hierarchical profile model, per-context rate limits and quotas, cost management + budget enforcement, graceful degradation, profile versioning, the core.llm contracts, and fallback-chain defaults. What moves is the integration point — from the (defunct) DSPy shim to the LangChain call boundary:

  • Usage capture: the usage_metadata on the returned AIMessage (via with_structured_output(..., include_raw=True)) carries real token counts, written to core.llm_usage. This structurally fixes the Usage(0,0,0) defect.
  • Enforcement: a pre-call enforcing Runnable checks budget / rate-limit (fail-closed) before the model invocation; a post-call BaseCallbackHandler emits usage + latency. Two-axis degradation (ADR-035 D7) and full OTel GenAI spans may follow; usage capture + budget enforcement are in scope for “a working call.”

LLM observability is Pydantic Logfire on OTel grounds (ADR-036), independent of the SDK layer.

D5 — Retire the string→string LLMProvider.complete call boundary

The pydantic-ai-era core.llm.LLMProvider.complete(prompt, *, profile) -> .text seam — and the ControlPlaneLM / configure_world_agent_dspy shim built on it — are retired. It is the wrong shape (it flattens DSPy/LangChain structured I/O to a string and discards usage) and its only consumers were test-only. The seam becomes a typed structured-step (a LangChain runnable behind a node-level capability); the durable contract is the LangGraph node’s typed input/output. DSPy’s framework-maturity exposure is contained at the node-execution seam: the swap unit is per-node call sites + the executor-integration seam, with LangChain as the executor inside the node, so replacing a per-node executor later is bounded per-node rework — not a substrate migration. LangGraph orchestration and the node contracts are untouched.

Alternatives considered

Keep DSPy and build the missing concrete provider leaf. Rejected — the integration never produced a real call, no realized capability would be preserved (bare Predict + no-op ChatAdapter), and the leaf would re-open exactly the SDK-discipline question the lint exists to close. DSPy’s only edge (Optimizers) is post-alpha.

Keep the LLMProvider.complete string seam and put LangChain beneath it. Rejected — the string→string contract discards structured output (the whole point of the structured-step) and token usage (the whole point of the control plane). The seam should be typed.

Fork DSPy off LiteLLM / route DSPy through a non-LiteLLM client. Moot — DSPy is deferred; the question does not arise at alpha.

Consequences

  • ADR-035 D2–D10 stand; the integration seam is re-seated onto the LangChain call boundary.
  • check_llm_sdk_allowlist.py is rewritten (SPEC-562) — provider-package framing; raw-SDK + direct-litellm imports still forbidden; the control_plane_lm.py allowlist entry removed.
  • DSPy adoption (SPEC-31 / SPEC-4 / SPEC-490) is Alpha Deferred; dspy + litellm leave the alpha workers dependency set when SPEC-557 lands.
  • One empirical risk to validate (SPEC-557): multi-line predicate source as a with_structured_output field against the real provider — mitigable via method="function_calling" vs "json_schema". This is SPEC-498’s ~30–50-real-predicate bar.
  • Codex llm-platform is rewritten (LangGraph + LangChain structured-step; DSPy as a deferred post-alpha option; the control plane wraps the LangChain call boundary); the stale PydanticAIProvider example in architecture is corrected (SPEC-562 Codex sweep).

References

  • ADR-035 — in-process LLM control plane (D2–D10 stand; integration seam re-seated)
  • ADR-007 — LangGraph orchestration (unchanged)
  • ADR-036 — LLM observability on Pydantic Logfire (OTel-native)
  • ADR-101 — one-agent topology
  • tools/quality/check_llm_sdk_allowlist.py — SDK-allowlist lint (rewritten under SPEC-562)
  • planning/0.3.0/agent-stack-audit/replan.md — agent-stack re-plan + decision record