ADR-035: LLM stack — pydantic-ai SDK abstraction; in-process control plane; canonical purpose taxonomy
Status: Accepted (2026-04-20) Supersedes: ADR-003 on the LiteLLM stack-portion choice (the multi-provider abstraction principle stands; the SDK choice changes)
Context
ADR-003 chose LiteLLM as the multi-provider LLM SDK; ADR-008 superseded that for the scan pipeline by adopting pydantic-ai. ADR-007 chose LangGraph for agent orchestration (with DeepAgents). TA-10’s scope (per SPEC-257 absorbing M7) covered the cross-cutting LLM control-plane: hierarchical profiles (global → workspace → customer-fixed), purpose taxonomy, override classification, per-context and per-purpose rate limits and quotas, cost management with budget enforcement, graceful degradation, profile versioning with rollback and A/B reservation.
A key open question surfaced during the spike: introduce an out-of-process HTTP LLM gateway (LiteLLM Proxy self-hosted, Portkey, Cloudflare AI Gateway, Vercel AI Gateway, OpenRouter) as the routing/quota/cost/failover layer, or build that control plane in-process on top of pydantic-ai? Adversarial research and a primary-source landscape survey informed the disposition; the LiteLLM March 2026 supply-chain compromise tipped the balance further.
Vocabulary used throughout:
- Provider API — Anthropic / OpenAI / Google endpoints.
- SDK abstraction — in-process Python library normalizing provider differences (pydantic-ai; LangChain provider packages).
- HTTP LLM gateway — out-of-process HTTP service between the app and providers (LiteLLM Proxy, Portkey, etc.).
- Control plane — routing / profile resolution / quota / budget / degradation. Spectral builds this in-process.
Decision
D1 — SDK abstractions: pydantic-ai standard; langchain-<provider> only where LangGraph forces it
- pydantic-ai is the default in-process SDK abstraction for all direct LLM calls (scan pipeline, world rule distillation, any non-LangGraph path). ADR-008 stands.
langchain-anthropic,langchain-openai,langchain-googleare permitted solely as the LangChain chat-model adapters passed to LangGraph’sinit_chat_modelfor agent orchestration per ADR-007. NoChatLiteLLM; nolangchain-community.- LiteLLM is removed from the dep graph entirely. Motivated by (a) the March 2026 supply-chain compromise, (b) ADR-008’s already-identified pain points, (c) pydantic-ai providing sufficient provider coverage.
- A CI allowlist (
tools/quality/check_llm_sdk_allowlist.py, pre-push tier) assertslitellmabsent fromuv treeand forbids direct imports of raw provider SDKs (anthropic,openai,google.generativeai) from Spectral code — everything flows through pydantic-ai or LangGraph/init_chat_model.
D2 — No out-of-process HTTP LLM gateway
The control plane is built in-process in Python on top of pydantic-ai. Call path: Spectral application code → in-process control plane → pydantic-ai → provider API.
Rejection grounds (Spectral-specific):
- Source-of-truth fragmentation versus
spectral.coreadmission discipline. Routing rules, override classification, degradation policy, and profile change governance are product semantics governed bysrc/spectral/core/(ADR-065 and the architecture validator). A gateway’s YAML / virtual-key / config DSL moves the source of truth outside the repo’s governance. - Tenant attribution at 5-tuple
(account_id, workspace_id, bc, purpose, scan_id?)does not fit gateway virtual-key hierarchies without key-explosion or post-hoc reconciliation lag. - Purpose-level degradation is not provider failover. “Scoring fails 5× → upgrade remaining scoring calls to reasoning for this scan” is evaluation-consistency policy tied to scan state. Gateways implement provider-level failover; they do not trip on structured-output validation failures or carry scan-scoped routing state.
- Critical-path failure domain. An HTTP gateway is a new outage surface. In-process means LLM availability == provider API availability.
- LiteLLM supply-chain event (March 2026) is reputational even for self-hosted; LiteLLM Proxy would keep it in the image.
Exit is symmetric: if the in-process control plane ever hits a wall, pydantic-ai’s OpenAIProvider(base_url=...) makes introducing a gateway below pydantic-ai a localized infrastructure change.
D3 — Canonical purpose taxonomy (contract shared between contexts)
spectral.core.llm.purposes.PurposeKey enum:
scoring— evaluation scoring (high volume, cost-sensitive)detection— anti-deception, parse validation (high volume, very cost-sensitive)reasoning— diagnosis, optimization, calibration rewrites, rule distillation (quality-critical)agent_turn— Spectral Agent / Ops Agent conversational turnagent_tool— agent tool-invocation call (may differ from turn)world_agent— World Agent exploration / hypothesis (reasoning-tier; worlds-specific)customer_replay— re-executing customer agents during the observe phaseembedding— embedding generation (full policy in TA-11; key reserved here)
This is shared between contexts because events, cost rollups, and observability aggregate on this key; both contexts must agree.
D4 — Hierarchical profile model with credential-source reserve
spectral.core.llm.profiles:
- Resolution precedence: customer-fixed > workspace-override > global-default, per-purpose granularity.
- Override classification per purpose:
locked(platform-only; default fordetection,customer_replay);operator_allowed(workspace can override within platform-approved IDs; default forscoring,reasoning,agent_*);open(any supported model; enterprise contract tier, not alpha). - Credential-source reservation (not implemented in alpha):
ModelProfile.credential_source: Literal["spectral_managed", "workspace_byo"] = "spectral_managed"pluscredential_ref: str | None = None.workspace_byoreserves the path for customer-provided subscriptions or keys without committing to the implementation in TA-10. A later spike (post-TA-17) defines binding mechanics. - Resolver is a pure function in each context’s application layer:
resolve_profile(workspace_id, purpose) → ResolvedProfile.ResolvedProfilecarries model ID plus enforcement envelope (rate-limit, budget, fallback chain, defaults).
D5 — Per-context / per-purpose rate limit and quota
Token-bucket per (workspace_id, purpose); per-workspace per-purpose daily spend cap. Alpha implementation: in-process (single API + single worker). A RateLimiter protocol abstracts the backend — Redis or pg_advisory_locks introduced when horizontal scale forces it. Enforcement via a TenantScopedLLMProvider wrapper around pydantic-ai (analogous to TenantScopedQuery from ADR-033); per-context repos cannot bypass.
D6 — Cost management, budget enforcement, and admin-surface contracts
genai-pricesis the primary cost source; a fallback registry inspectral.core.llm.pricingcovers modelsgenai-pricesdoes not yet know.- Per-call emits (a) an OTel span with GenAI-semconv plus Spectral attributes (
spectral.purpose,spectral.workspace_id,spectral.account_id,spectral.bc,spectral.scan_id?,spectral.agent_turn_id?) consumed per ADR-036, AND (b) a row tocore.llm_usagefor in-app budget enforcement. - Rolling spend is read from
core.llm_usage; on-call check fails closed withResult[BudgetExceeded]when the(workspace, purpose)daily cap is exceeded. Hard caps only in alpha; soft alerts (warn-at-80%) deferred. - Admin-surface contracts pinned now (UIs built later):
core.llm_usagequeryable by workspace admins scoped to their workspace — RLS policy onworkspace_id; no schema-qualified access between contexts.core.llm_usagequeryable platform-wide by ops-role —service_rolebypass of RLS per the discipline in ADR-033.- Rollup:
core.llm_usage_dailymaterialized view (or nightly-computed table) with(account_id, workspace_id, bc, purpose, model, date, input_tokens, output_tokens, cost_estimate, call_count)for cheap workspace-admin queries without scanning the hot table. - Workspace-admin UI in
apps/dashboardand ops control-plane UI inapps/operationsconsume these views via FastAPI per ADR-034.
D7 — Graceful degradation across two orthogonal axes
- Provider-level (transport): HTTP 5xx/429 → exponential backoff retry (max 3) on the same provider; then next provider in the purpose’s fallback chain. Per-tenant retry budget prevents one bad workspace from exhausting provider rate limits for the population.
- Purpose-level (quality): consecutive validation/refusal failures on a purpose within a window trigger upgrade rather than lateral fallback.
DegradationPolicy(trigger=N_failures_in_window, action=upgrade_to_purpose). Per-eval-dimension override (ADR-003) carries forward as scan-scoped policy.
Circuit transitions emit domain events (llm.circuit.opened, llm.circuit.closed) consumed by ADR-036 observability and by the Spectral Agent.
D8 — Profile versioning with rollback (A/B reserved)
core.llm_profiles persisted with (id, version, active_at, deactivated_at, created_by, audit_log). Activation append-only; rollback = re-activate prior version. variant: str | None reserved on schema; no resolver branch in alpha. Governance: locked-override modifications require a platform-role audit entry; operator_allowed changes are workspace-admin-auditable.
D9 — LLMProvider protocol lives in spectral.core
spectral.core.llm.protocols.LLMProvider. The protocol is a narrow, focused contract over types (ResolvedProfile, LLMResponse, ToolResponse) that already belong in core. Per ADR-065 admission discipline (kernel admission validated by tools/quality/validate_architecture.py rules 1–7 + ruff D100 / D104).
D10 — Alpha fallback chain defaults
Same-provider-family defaults to preserve prompt-caching affinity:
reasoning,agent_turn,agent_tool,world_agent: Claude Opus 4.7 → Claude Sonnet 4.6 → Claude Haiku 4.5scoring,detection: Gemini 2.5 Flash → Gemini 3.1 Flash-Litecustomer_replay: matches the customer agent’s model (Spectral-managed default; BYO when D4 opens)embedding: deferred to TA-11 / ADR-038
Cross-provider fallback (e.g., Claude → GPT-5.4) is configurable per profile, not default, until cost/reliability signals argue otherwise.
Alternatives considered
LiteLLM Proxy self-hosted (the pro-gateway lead candidate). Rejected: control-plane fragmentation, tenant-attribution mismatch, the LiteLLM supply-chain event carries forward, operational overhead does not shrink the scope of Spectral semantics that must still live somewhere.
Portkey OSS gateway. Rejected: same core reasons plus config-DSL lock-in (virtual keys, configs, guardrails not OpenAI-compat-portable).
Cloudflare AI Gateway / Vercel AI Gateway / OpenRouter. Rejected: external-service latency tax; vendor-hosted control-plane lock-in; none express purpose-level quality degradation.
Keep LiteLLM SDK alongside pydantic-ai. Rejected post-supply-chain event.
Rebuild Spectral Agent on pydantic-graph to unify SDKs. Rejected as TA-10 scope — that is an ADR-007 revisit.
LLMProvider protocol per context (duplicated). Rejected — pedantry over discipline.
Purpose-level degradation via gateway circuit breakers. Rejected: gateways trip on transport errors, not structured-output validation failures.
Inverted resolution precedence (global > workspace > customer-fixed). Rejected: customer-fixed must be highest-priority override; locked classification handles platform-locked purposes.
Consequences
- ADR-003 partially superseded on the SDK stack portion. ADR-008 stands with cross-reference.
init_chat_modelwired to LangChain-native provider packages (langchain-anthropic,langchain-openai,langchain-google); neverChatLiteLLM. No ADR-007 text change.src/spectral/core/llm/subpackage landed at commit4c637af(back-fill during SPEC-319):purposes.py,profiles.py,budgets.py,degradation.py,usage.py,pricing.py,protocols.py,__init__.py, plustests/core/test_contract_llm.py. Every addition was approved under the contract-requirement-test discipline in force at the time; that discipline is now superseded by ADR-065’s admission discipline and the per-PR lint that previously enforced it (tools/quality/check_core_contract_tests.py) was retired in Phase 5 / M2.core.llm_usage,core.llm_profiles,core.llm_profile_changes,core.llm_usage_dailyare first residents of thecoreschema per ADR-032 D2.tools/quality/check_llm_sdk_allowlist.pyis load-bearing — any regression (e.g., re-introducingChatLiteLLM) is caught at pre-push.- TA-11 / ADR-038 inherits the
embeddingpurpose-key reservation. - TA-15 / ADR-060 inherits the degradation + budget contracts for agent tool invocation.
- TA-16 / ADR-036 inherits the OTel GenAI-semconv contract with Spectral attributes.
- TA-17 / ADR-037 owns credential storage; BYO subscription mechanics unblock a later spike that defines OAuth flows with providers supporting federated subscriptions.
- TA-24 / ADR-061 inherits the
LLMProviderprotocol as the mock boundary;FakeLLMProviderimplements it. - Alpha rate-limit in-process — the
RateLimiterprotocol keeps swap localized when horizontal scale arrives.
References
- ADR-003 — retired; original LiteLLM decision distilled into the addendum below
- ADR-007 — LangGraph +
init_chat_model - ADR-008 — pydantic-ai for the scan pipeline (this ADR generalizes it)
- ADR-065 —
spectral.coreadmission discipline - ADR-031 — single-library structure
- ADR-033 —
TenantScopedLLMProviderpattern (analogous toTenantScopedQuery) - ADR-036 — OTel substrate; LLM observability streams
- ADR-038 — TA-11 (embedding purpose-key consumer)
- ADR-060 — TA-15 agent tool invocation
- ADR-061 — TA-24 LLM testing strategy (
FakeLLMProvider) - TA-10 disposition — SPEC-313 comment
0a50c35f - TA-10 verification — SPEC-313 comment
53719313 tools/quality/check_llm_sdk_allowlist.py— SDK allowlist lint (commit1ed000c)src/spectral/core/llm/— landed contract surface (commit4c637af)
Addendum: ADR-003 — LiteLLM for Multi-Provider LLM Abstraction
ADR-003 (Accepted 2026-03-21; retired by this ADR) selected LiteLLM as the multi-provider LLM SDK abstraction for the v0.2 scanning pipeline. The premise was that calls fanned out across reasoning / scoring / detection / customer tiers to multiple providers (Anthropic, Google, OpenAI), and a unified SDK was preferred over per-provider SDK integrations with separate retry, cost, and error handling.
Why a future reader should know about ADR-003:
- The multi-provider abstraction principle it established is preserved here — Spectral still routes across providers and tiers — but the realization is different: pydantic-ai is the in-process SDK abstraction, and the in-process control plane (D1–D6 here) carries the routing, retry, cost, and budget concerns that ADR-003 had delegated to LiteLLM.
- LiteLLM is removed from the dependency graph entirely. The
tools/quality/check_llm_sdk_allowlist.pylint catches any regression that re-introduces it. - The HTTP-LLM-gateway alternative ADR-003 had implicitly framed (a separate proxy service) is rejected here in favor of an in-process control plane (D1).
Git history at the commit retiring ADR-003 preserves the original text.