Decisions

ADR-035: LLM stack — pydantic-ai SDK abstraction; in-process control plane; canonical purpose taxonomy

Status: Accepted (2026-04-20) Supersedes: ADR-003 on the LiteLLM stack-portion choice (the multi-provider abstraction principle stands; the SDK choice changes)

Context

ADR-003 chose LiteLLM as the multi-provider LLM SDK; ADR-008 superseded that for the scan pipeline by adopting pydantic-ai. ADR-007 chose LangGraph for agent orchestration (with DeepAgents). TA-10’s scope (per SPEC-257 absorbing M7) covered the cross-cutting LLM control-plane: hierarchical profiles (global → workspace → customer-fixed), purpose taxonomy, override classification, per-context and per-purpose rate limits and quotas, cost management with budget enforcement, graceful degradation, profile versioning with rollback and A/B reservation.

A key open question surfaced during the spike: introduce an out-of-process HTTP LLM gateway (LiteLLM Proxy self-hosted, Portkey, Cloudflare AI Gateway, Vercel AI Gateway, OpenRouter) as the routing/quota/cost/failover layer, or build that control plane in-process on top of pydantic-ai? Adversarial research and a primary-source landscape survey informed the disposition; the LiteLLM March 2026 supply-chain compromise tipped the balance further.

Vocabulary used throughout:

Provider API — Anthropic / OpenAI / Google endpoints.
SDK abstraction — in-process Python library normalizing provider differences (pydantic-ai; LangChain provider packages).
HTTP LLM gateway — out-of-process HTTP service between the app and providers (LiteLLM Proxy, Portkey, etc.).
Control plane — routing / profile resolution / quota / budget / degradation. Spectral builds this in-process.

Decision

D1 — SDK abstractions: pydantic-ai standard; `langchain-<provider>` only where LangGraph forces it

pydantic-ai is the default in-process SDK abstraction for all direct LLM calls (scan pipeline, world rule distillation, any non-LangGraph path). ADR-008 stands.
langchain-anthropic, langchain-openai, langchain-google are permitted solely as the LangChain chat-model adapters passed to LangGraph’s init_chat_model for agent orchestration per ADR-007. No ChatLiteLLM; no langchain-community.
LiteLLM is removed from the dep graph entirely. Motivated by (a) the March 2026 supply-chain compromise, (b) ADR-008’s already-identified pain points, (c) pydantic-ai providing sufficient provider coverage.
A CI allowlist (tools/quality/check_llm_sdk_allowlist.py, pre-push tier) asserts litellm absent from uv tree and forbids direct imports of raw provider SDKs (anthropic, openai, google.generativeai) from Spectral code — everything flows through pydantic-ai or LangGraph/init_chat_model.

D2 — No out-of-process HTTP LLM gateway

The control plane is built in-process in Python on top of pydantic-ai. Call path: Spectral application code → in-process control plane → pydantic-ai → provider API.

Rejection grounds (Spectral-specific):

Source-of-truth fragmentation versus spectral.core admission discipline. Routing rules, override classification, degradation policy, and profile change governance are product semantics governed by src/spectral/core/ (ADR-065 and the architecture validator). A gateway’s YAML / virtual-key / config DSL moves the source of truth outside the repo’s governance.
Tenant attribution at 5-tuple (account_id, workspace_id, bc, purpose, scan_id?) does not fit gateway virtual-key hierarchies without key-explosion or post-hoc reconciliation lag.
Purpose-level degradation is not provider failover. “Scoring fails 5× → upgrade remaining scoring calls to reasoning for this scan” is evaluation-consistency policy tied to scan state. Gateways implement provider-level failover; they do not trip on structured-output validation failures or carry scan-scoped routing state.
Critical-path failure domain. An HTTP gateway is a new outage surface. In-process means LLM availability == provider API availability.
LiteLLM supply-chain event (March 2026) is reputational even for self-hosted; LiteLLM Proxy would keep it in the image.

Exit is symmetric: if the in-process control plane ever hits a wall, pydantic-ai’s OpenAIProvider(base_url=...) makes introducing a gateway below pydantic-ai a localized infrastructure change.

D3 — Canonical purpose taxonomy (contract shared between contexts)

spectral.core.llm.purposes.PurposeKey enum:

scoring — evaluation scoring (high volume, cost-sensitive)
detection — anti-deception, parse validation (high volume, very cost-sensitive)
reasoning — diagnosis, optimization, calibration rewrites, rule distillation (quality-critical)
agent_turn — Spectral Agent / Ops Agent conversational turn
agent_tool — agent tool-invocation call (may differ from turn)
world_agent — World Agent exploration / hypothesis (reasoning-tier; worlds-specific)
customer_replay — re-executing customer agents during the observe phase
embedding — embedding generation (full policy in TA-11; key reserved here)

This is shared between contexts because events, cost rollups, and observability aggregate on this key; both contexts must agree.

D4 — Hierarchical profile model with credential-source reserve

spectral.core.llm.profiles:

Resolution precedence: customer-fixed > workspace-override > global-default, per-purpose granularity.
Override classification per purpose: locked (platform-only; default for detection, customer_replay); operator_allowed (workspace can override within platform-approved IDs; default for scoring, reasoning, agent_*); open (any supported model; enterprise contract tier, not alpha).
Credential-source reservation (not implemented in alpha): ModelProfile.credential_source: Literal["spectral_managed", "workspace_byo"] = "spectral_managed" plus credential_ref: str | None = None. workspace_byo reserves the path for customer-provided subscriptions or keys without committing to the implementation in TA-10. A later spike (post-TA-17) defines binding mechanics.
Resolver is a pure function in each context’s application layer: resolve_profile(workspace_id, purpose) → ResolvedProfile. ResolvedProfile carries model ID plus enforcement envelope (rate-limit, budget, fallback chain, defaults).

D5 — Per-context / per-purpose rate limit and quota

Token-bucket per (workspace_id, purpose); per-workspace per-purpose daily spend cap. Alpha implementation: in-process (single API + single worker). A RateLimiter protocol abstracts the backend — Redis or pg_advisory_locks introduced when horizontal scale forces it. Enforcement via a TenantScopedLLMProvider wrapper around pydantic-ai (analogous to TenantScopedQuery from ADR-033); per-context repos cannot bypass.

D6 — Cost management, budget enforcement, and admin-surface contracts

genai-prices is the primary cost source; a fallback registry in spectral.core.llm.pricing covers models genai-prices does not yet know.
Per-call emits (a) an OTel span with GenAI-semconv plus Spectral attributes (spectral.purpose, spectral.workspace_id, spectral.account_id, spectral.bc, spectral.scan_id?, spectral.agent_turn_id?) consumed per ADR-036, AND (b) a row to core.llm_usage for in-app budget enforcement.
Rolling spend is read from core.llm_usage; on-call check fails closed with Result[BudgetExceeded] when the (workspace, purpose) daily cap is exceeded. Hard caps only in alpha; soft alerts (warn-at-80%) deferred.
Admin-surface contracts pinned now (UIs built later):
- core.llm_usage queryable by workspace admins scoped to their workspace — RLS policy on workspace_id; no schema-qualified access between contexts.
- core.llm_usage queryable platform-wide by ops-role — service_role bypass of RLS per the discipline in ADR-033.
- Rollup: core.llm_usage_daily materialized view (or nightly-computed table) with (account_id, workspace_id, bc, purpose, model, date, input_tokens, output_tokens, cost_estimate, call_count) for cheap workspace-admin queries without scanning the hot table.
- Workspace-admin UI in apps/dashboard and ops control-plane UI in apps/operations consume these views via FastAPI per ADR-034.

D7 — Graceful degradation across two orthogonal axes

Provider-level (transport): HTTP 5xx/429 → exponential backoff retry (max 3) on the same provider; then next provider in the purpose’s fallback chain. Per-tenant retry budget prevents one bad workspace from exhausting provider rate limits for the population.
Purpose-level (quality): consecutive validation/refusal failures on a purpose within a window trigger upgrade rather than lateral fallback. DegradationPolicy(trigger=N_failures_in_window, action=upgrade_to_purpose). Per-eval-dimension override (ADR-003) carries forward as scan-scoped policy.

Circuit transitions emit domain events (llm.circuit.opened, llm.circuit.closed) consumed by ADR-036 observability and by the Spectral Agent.

D8 — Profile versioning with rollback (A/B reserved)

core.llm_profiles persisted with (id, version, active_at, deactivated_at, created_by, audit_log). Activation append-only; rollback = re-activate prior version. variant: str | None reserved on schema; no resolver branch in alpha. Governance: locked-override modifications require a platform-role audit entry; operator_allowed changes are workspace-admin-auditable.

D9 — `LLMProvider` protocol lives in `spectral.core`

spectral.core.llm.protocols.LLMProvider. The protocol is a narrow, focused contract over types (ResolvedProfile, LLMResponse, ToolResponse) that already belong in core. Per ADR-065 admission discipline (kernel admission validated by tools/quality/validate_architecture.py rules 1–7 + ruff D100 / D104).

D10 — Alpha fallback chain defaults

Same-provider-family defaults to preserve prompt-caching affinity:

reasoning, agent_turn, agent_tool, world_agent: Claude Opus 4.7 → Claude Sonnet 4.6 → Claude Haiku 4.5
scoring, detection: Gemini 2.5 Flash → Gemini 3.1 Flash-Lite
customer_replay: matches the customer agent’s model (Spectral-managed default; BYO when D4 opens)
embedding: deferred to TA-11 / ADR-038

Cross-provider fallback (e.g., Claude → GPT-5.4) is configurable per profile, not default, until cost/reliability signals argue otherwise.

Alternatives considered

LiteLLM Proxy self-hosted (the pro-gateway lead candidate). Rejected: control-plane fragmentation, tenant-attribution mismatch, the LiteLLM supply-chain event carries forward, operational overhead does not shrink the scope of Spectral semantics that must still live somewhere.

Portkey OSS gateway. Rejected: same core reasons plus config-DSL lock-in (virtual keys, configs, guardrails not OpenAI-compat-portable).

Cloudflare AI Gateway / Vercel AI Gateway / OpenRouter. Rejected: external-service latency tax; vendor-hosted control-plane lock-in; none express purpose-level quality degradation.

Keep LiteLLM SDK alongside pydantic-ai. Rejected post-supply-chain event.

Rebuild Spectral Agent on pydantic-graph to unify SDKs. Rejected as TA-10 scope — that is an ADR-007 revisit.

LLMProvider protocol per context (duplicated). Rejected — pedantry over discipline.

Purpose-level degradation via gateway circuit breakers. Rejected: gateways trip on transport errors, not structured-output validation failures.

Inverted resolution precedence (global > workspace > customer-fixed). Rejected: customer-fixed must be highest-priority override; locked classification handles platform-locked purposes.

Consequences

ADR-003 partially superseded on the SDK stack portion. ADR-008 stands with cross-reference.
init_chat_model wired to LangChain-native provider packages (langchain-anthropic, langchain-openai, langchain-google); never ChatLiteLLM. No ADR-007 text change.
src/spectral/core/llm/ subpackage landed at commit 4c637af (back-fill during SPEC-319): purposes.py, profiles.py, budgets.py, degradation.py, usage.py, pricing.py, protocols.py, __init__.py, plus tests/core/test_contract_llm.py. Every addition was approved under the contract-requirement-test discipline in force at the time; that discipline is now superseded by ADR-065’s admission discipline and the per-PR lint that previously enforced it (tools/quality/check_core_contract_tests.py) was retired in Phase 5 / M2.
core.llm_usage, core.llm_profiles, core.llm_profile_changes, core.llm_usage_daily are first residents of the core schema per ADR-032 D2.
tools/quality/check_llm_sdk_allowlist.py is load-bearing — any regression (e.g., re-introducing ChatLiteLLM) is caught at pre-push.
TA-11 / ADR-038 inherits the embedding purpose-key reservation.
TA-15 / ADR-060 inherits the degradation + budget contracts for agent tool invocation.
TA-16 / ADR-036 inherits the OTel GenAI-semconv contract with Spectral attributes.
TA-17 / ADR-037 owns credential storage; BYO subscription mechanics unblock a later spike that defines OAuth flows with providers supporting federated subscriptions.
TA-24 / ADR-061 inherits the LLMProvider protocol as the mock boundary; FakeLLMProvider implements it.
Alpha rate-limit in-process — the RateLimiter protocol keeps swap localized when horizontal scale arrives.

References

ADR-003 — retired; original LiteLLM decision distilled into the addendum below
ADR-007 — LangGraph + init_chat_model
ADR-008 — pydantic-ai for the scan pipeline (this ADR generalizes it)
ADR-065 — spectral.core admission discipline
ADR-031 — single-library structure
ADR-033 — TenantScopedLLMProvider pattern (analogous to TenantScopedQuery)
ADR-036 — OTel substrate; LLM observability streams
ADR-038 — TA-11 (embedding purpose-key consumer)
ADR-060 — TA-15 agent tool invocation
ADR-061 — TA-24 LLM testing strategy (FakeLLMProvider)
TA-10 disposition — SPEC-313 comment 0a50c35f
TA-10 verification — SPEC-313 comment 53719313
tools/quality/check_llm_sdk_allowlist.py — SDK allowlist lint (commit 1ed000c)
src/spectral/core/llm/ — landed contract surface (commit 4c637af)

Addendum: ADR-003 — LiteLLM for Multi-Provider LLM Abstraction

ADR-003 (Accepted 2026-03-21; retired by this ADR) selected LiteLLM as the multi-provider LLM SDK abstraction for the v0.2 scanning pipeline. The premise was that calls fanned out across reasoning / scoring / detection / customer tiers to multiple providers (Anthropic, Google, OpenAI), and a unified SDK was preferred over per-provider SDK integrations with separate retry, cost, and error handling.

Why a future reader should know about ADR-003:

The multi-provider abstraction principle it established is preserved here — Spectral still routes across providers and tiers — but the realization is different: pydantic-ai is the in-process SDK abstraction, and the in-process control plane (D1–D6 here) carries the routing, retry, cost, and budget concerns that ADR-003 had delegated to LiteLLM.
LiteLLM is removed from the dependency graph entirely. The tools/quality/check_llm_sdk_allowlist.py lint catches any regression that re-introduces it.
The HTTP-LLM-gateway alternative ADR-003 had implicitly framed (a separate proxy service) is rejected here in favor of an in-process control plane (D1).

Git history at the commit retiring ADR-003 preserves the original text.

Previous
ADR-034: Frontend data access via API proxy; realtime via SSE Next
ADR-036: Observability stack — OTel substrate, three-stream LLM trace architecture, content-class routing