ADR-035: LLM stack — in-process control plane; canonical purpose taxonomy
Context
ADR-007 chose LangGraph for agent orchestration. TA-10’s scope (per SPEC-257 absorbing M7) covered the cross-cutting LLM control-plane: hierarchical profiles (global → customer-org → customer-domain), purpose taxonomy, override classification, per-context and per-purpose rate limits and quotas, cost management with budget enforcement, graceful degradation, profile versioning with rollback and A/B reservation. (The per-node SDK layer is the LangChain provider packages, per ADR-102; this ADR settles the control plane that wraps it.)
A key open question surfaced during the spike: introduce an out-of-process HTTP LLM gateway (LiteLLM Proxy self-hosted, Portkey, Cloudflare AI Gateway, Vercel AI Gateway, OpenRouter) as the routing/quota/cost/failover layer, or build that control plane in-process on top of the SDK layer? Adversarial research and a primary-source landscape survey informed the disposition; the LiteLLM March 2026 supply-chain compromise tipped the balance further.
Vocabulary used throughout:
- Provider API — Anthropic / OpenAI / Google endpoints.
- SDK abstraction — in-process Python library normalizing provider differences (the LangChain provider packages, per ADR-102).
- HTTP LLM gateway — out-of-process HTTP service between the app and providers (LiteLLM Proxy, Portkey, etc.).
- Control plane — routing / profile resolution / quota / budget / degradation. Spectral builds this in-process.
Decision
D1 — SDK abstractions
The LLM SDK layer is the LangChain provider packages (ADR-102); pydantic-ai is removed. LangGraph orchestration (per ADR-007) stands. The in-process control-plane decisions (D2–D10) are independent of this SDK-layer choice — their integration seam re-seats onto the LangChain call boundary (ADR-102 D4).
D2 — No out-of-process HTTP LLM gateway
The control plane is built in-process in Python, wrapping the LangChain call boundary (ADR-102 D4). Call path: Spectral application code → in-process control plane → LangChain provider package → provider API.
Rejection grounds (Spectral-specific):
- Source-of-truth fragmentation versus
spectral.coreadmission discipline. Routing rules, override classification, degradation policy, and profile change governance are product semantics governed bysrc/spectral/core/(ADR-065 and the architecture validator). A gateway’s YAML / virtual-key / config DSL moves the source of truth outside the repo’s governance. - Tenant attribution at the
(org_id, domain_id, bc, purpose)tuple (naming aligned per ADR-086) does not fit gateway virtual-key hierarchies without key-explosion or post-hoc reconciliation lag. - Purpose-level degradation is not provider failover. “Scoring fails 5× → upgrade remaining scoring calls to reasoning for this evaluation run” is evaluation-consistency policy tied to run state. Gateways implement provider-level failover; they do not trip on structured-output validation failures or carry run-scoped routing state.
- Critical-path failure domain. An HTTP gateway is a new outage surface. In-process means LLM availability == provider API availability.
- LiteLLM supply-chain event (March 2026) is reputational even for self-hosted; LiteLLM Proxy would keep it in the image.
Exit is symmetric: if the in-process control plane ever hits a wall, the LangChain provider packages’ base_url override makes introducing a gateway below the SDK layer a localized infrastructure change.
D3 — Canonical purpose taxonomy (contract shared between contexts)
spectral.core.llm.purposes.PurposeKey enum:
scoring— evaluation scoring (high volume, cost-sensitive)detection— anti-deception, parse validation (high volume, very cost-sensitive)reasoning— diagnosis, optimization, calibration rewrites, rule distillation (quality-critical)agent_turn— World Agent conversational turnagent_tool— agent tool-invocation call (may differ from turn)world_agent— World Agent exploration / hypothesis (reasoning-tier; worlds-specific)customer_replay— re-executing customer agents during the observe phaseembedding— embedding generation (full policy in TA-11; key reserved here)
This is shared between contexts because events, cost rollups, and observability aggregate on this key; both contexts must agree.
D4 — Hierarchical profile model with DB-backed credential source
spectral.core.llm.profiles + core.llm.infrastructure.resolver:
- Resolution precedence: customer-domain > customer-org > global, per-purpose granularity — the most-specific customer scope wins, falling back through the org to the global default. The scopes are the domain/org entity pair of ADR-098. The global tier (operator-managed) and the customer-org BYO tier are implemented; the domain scope is schema-ready and named, resolved when the per-domain tier ships.
- Override classification per purpose:
locked(platform-only; default fordetection,customer_replay);operator_allowed(org/domain can override within platform-approved IDs; default forscoring,reasoning,agent_*);open(any supported model; enterprise contract tier, not alpha). - DB-backed credential source (implemented for the global + org tiers):
ModelProfile.credential_source: Literal["spectral_managed", "org_byo", "domain_byo"] = "spectral_managed"pluscredential_ref: str | None. Config is off env, in the DB control plane —core.llm_profilesholds the selection (provider, model, override-class,credential_ref) andcredential_refaddresses a Supabase Vault credential (D6 of ADR-037). No LLM selection config is read from the environment; the deployment-environment identitySPECTRAL_ENVIRONMENTgates only the local-dev subscription-OAuth exception at the global tier (api_key-only in staging/production).org_byois implemented (the customer-org BYO tier): an org admin sets a provider/model + either anapi_keyor a customer-supplied refreshable OAuth bundle, isolated per membership (org-scoped RLS overcaller_org_ids()+ the Vault SECURITY DEFINER functions). The org OAuth credential is a supplied secret Spectral stores and refreshes in place (the redirect-freerefresh_tokengrant) — Spectral does not run an authorization flow, so no registered web OAuth client is required; the customer obtains the bundle via a PKCE login helper. An org BYO subscription is the customer’s own, so it is not gated to local-dev (unlike the global managed-subscription exception). The xAI and Anthropic subscription bundles are standard bearer credentials; the OpenAI ChatGPT subscription is experimental — its token is scoped to OpenAI’s Codex backend, so calls must conform to the Codex request-shape (thechatgpt.com/backend-api/codexResponses endpoint, the Codex instructions envelope, and theChatGPT-Account-ID/originatorheaders, with the account id carried on the bundle). That conformance is encapsulated in the OpenAI provider branch (the api-key OpenAI path is untouched), and OpenAI Cloudflare-blocks datacenter origins, so the OpenAI subscription path is non-datacenter-only and not usable from a hosted deployment — the supported production path for OpenAI is an org-BYOapi_key.domain_byostays reserved for the per-domain tier. - Resolver is a pure function:
resolve_active_profile(domain_id, org_id, purpose) → ResolvedProfile | None(precedence above), backed by a per-requestDbBackedLlmResolverwhose in-process cache is invalidated on profile write (a config change takes effect without redeploy).ResolvedProfilecarries model ID plus enforcement envelope (rate-limit, budget, fallback chain, defaults). Both deployment surfaces resolve the same active profile through the sharedworlds.infrastructure.authoring_llmbundle, so the operator-set config governs the World Agent wherever it runs — the operator process, the chat consumer (per turn), and the evolution proposer (per event).
D5 — Per-context / per-purpose rate limit and quota
Token-bucket per (org_id, domain_id, purpose); per-org/domain per-purpose daily spend cap. Alpha implementation: in-process (single API + single worker). The RateLimiter contract (Protocol) stays in the pure core.llm zone; its backend implementation and the TenantScopedLLMProvider enforcement wrapper around the LangChain call boundary (analogous to TenantScopedQuery from ADR-033) are infrastructure, living in the core infrastructure zone at core.llm.infrastructure (ADR-099 — may import infra SDKs but no bounded context or domain types, per the core-infra-zone-no-context rule). A Redis or pg_advisory_locks backend is introduced when horizontal scale forces it; per-context repos cannot bypass.
D6 — Cost management, budget enforcement, and admin-surface contracts
genai-pricesis the primary cost source; a fallback registry inspectral.core.llm.pricingcovers modelsgenai-pricesdoes not yet know.- Per-call emits (a) an OTel span with GenAI-semconv plus Spectral attributes (
spectral.purpose,spectral.org_id,spectral.domain_id,spectral.bc,spectral.scan_id?,spectral.agent_turn_id?) consumed per ADR-036, AND (b) a row tocore.llm_usagefor in-app budget enforcement. - Rolling spend is read from
core.llm_usage; on-call check fails closed withResult[BudgetExceeded]when the(org, domain, purpose)daily cap is exceeded. Hard caps only in alpha; soft alerts (warn-at-80%) deferred.
LLM budget/usage state lives in the core schema. core.llm_usage is the authoritative per-call usage log (cost attribution, audit, admin/ops queries); no budget/usage state lives in the platform schema. Per-call hard-cap enforcement uses a dedicated atomic counter core.llm_budget_state in the core schema — a single INSERT … ON CONFLICT DO UPDATE … WHERE-guarded check-and-increment (race-free, O(1)) — rather than summing the append-only core.llm_usage log on the hot path or reading the intra-period-stale core.llm_usage_daily rollup. The counter is an enforcement-optimized derivation of the log, rebuildable from it at period reset. (See ADR-099.)
- Admin-surface contracts pinned now (UIs built later):
core.llm_usagequeryable by org/domain admins scoped to their domains — RLS policy ondomain_id+org_id; no schema-qualified access between contexts.core.llm_usagequeryable platform-wide by ops for cost attribution + audit — a cross-tenant observability read.core.llm_usageis customer-tenanted (RLS onorg_id/domain_id), so per ADR-033 D4 this is not a standingservice_roleexemption over customer data: the platform-wide ops access mechanism is reconciled there — via a platform-owned usage rollup (no customer RLS by construction) or per-tenancy assumed-identity reads — and remains an open reconciliation item.- Rollup:
core.llm_usage_dailymaterialized view (or nightly-computed table) with(org_id, domain_id, bc, purpose, model, date, input_tokens, output_tokens, cost_estimate, call_count)for cheap org/domain-admin queries without scanning the hot table. - Org/domain-admin UI in
apps/dashboardand ops control-plane UI inapps/operationsconsume these views via FastAPI per ADR-034.
D7 — Graceful degradation across two orthogonal axes
- Provider-level (transport): HTTP 5xx/429 → exponential backoff retry (max 3) on the same provider; then next provider in the purpose’s fallback chain. Per-tenant retry budget prevents one bad org/domain from exhausting provider rate limits for the population.
- Purpose-level (quality): consecutive validation/refusal failures on a purpose within a window trigger upgrade rather than lateral fallback.
DegradationPolicy(trigger=N_failures_in_window, action=upgrade_to_purpose).
Circuit transitions emit domain events (llm.circuit.opened, llm.circuit.closed) consumed by ADR-036 observability.
D8 — Profile versioning with rollback (A/B reserved)
core.llm_profiles is persisted with (id, version, activated_at, deactivated_at, created_by, audit_log). Activation is append-only; rollback = re-activate a prior version (a profile’s credential_ref stays resolvable across a rollback — the credential read is by id, independent of the active flag). The operator dashboard (apps/operations) drives set / version-history / rollback over the /operator/llm-config routes. variant: str | None reserved on schema for A/B; no resolver branch in alpha. Governance: locked-override modifications require a platform-role audit entry; operator_allowed changes are domain-admin-auditable.
D9 — LLMProvider protocol lives in spectral.core
The LLMProvider contract (spectral.core.llm.protocols.LLMProvider) is a narrow, focused Protocol over types (ResolvedProfile, LLMResponse, ToolResponse) that already belong in core, and stays in the pure core.llm zone. Its concrete implementations are infrastructure: TracedProvider, the ModelProfileRegistry / router, and the budget/rate-limit enforcement concretes live in the core infrastructure zone at core.llm.infrastructure (ADR-099 — the infra zone may import infra SDKs but no bounded context and no domain types, per the core-infra-zone-no-context rule). LLMContext — the cross-context routing/budget vocabulary — is its own enum at core.llm (shared routing/budget vocabulary, not a platform domain type, and not folded into ContentClass). Kernel admission is validated by tools/quality/validate_architecture.py per the ADR-097 rule registry + ruff D100 / D104.
D10 — Alpha fallback chain defaults
Same-provider-family defaults to preserve prompt-caching affinity:
reasoning,agent_turn,agent_tool,world_agent: Claude Opus 4.7 → Claude Sonnet 4.6 → Claude Haiku 4.5scoring,detection: Gemini 2.5 Flash → Gemini 3.1 Flash-Litecustomer_replay: matches the customer agent’s model (Spectral-managed default; BYO when D4 opens)embedding: deferred to TA-11 / ADR-038
Cross-provider fallback (e.g., Claude → GPT-5.4) is configurable per profile, not default, until cost/reliability signals argue otherwise.
Alternatives considered
LiteLLM Proxy self-hosted (the pro-gateway lead candidate). Rejected: control-plane fragmentation, tenant-attribution mismatch, the LiteLLM supply-chain event persists as reputational risk, operational overhead does not shrink the scope of Spectral semantics that must still live somewhere.
Portkey OSS gateway. Rejected: same core reasons plus config-DSL lock-in (virtual keys, configs, guardrails not OpenAI-compat-portable).
Cloudflare AI Gateway / Vercel AI Gateway / OpenRouter. Rejected: external-service latency tax; vendor-hosted control-plane lock-in; none express purpose-level quality degradation.
Carry a second LLM SDK alongside the sanctioned one. Rejected — a second SDK that nothing uses is dead weight and a second supply-chain surface (the LiteLLM March 2026 event sharpened this).
Rebuild the agent on pydantic-graph to unify SDKs. Rejected as TA-10 scope — that is an ADR-007 revisit.
LLMProvider protocol per context (duplicated). Rejected — pedantry over discipline.
Purpose-level degradation via gateway circuit breakers. Rejected: gateways trip on transport errors, not structured-output validation failures.
Inverted resolution precedence (global > org > domain). Rejected: the most-specific customer scope must be the highest-priority override (D4); locked classification handles platform-locked purposes.
Consequences
- The LLM SDK layer is the LangChain provider packages (ADR-102); the in-process control plane (this ADR) wraps it.
init_chat_modelwired to LangChain-native provider packages (langchain-anthropic,langchain-openai,langchain-google); neverChatLiteLLM. No ADR-007 text change.src/spectral/core/llm/subpackage:purposes.py,profiles.py,budgets.py,degradation.py,usage.py,pricing.py,protocols.py,__init__.py, plustests/core/test_contract_llm.py. Additions to this contract surface are governed by ADR-065’s admission discipline.core.llm_usage,core.llm_profiles,core.llm_profile_changes,core.llm_usage_dailyare first residents of thecoreschema per ADR-032 D2.tools/quality/check_llm_sdk_allowlist.pyis load-bearing — any regression (e.g., re-introducingChatLiteLLM) is caught at pre-push.- TA-11 / ADR-038 inherits the
embeddingpurpose-key reservation. - TA-15 / ADR-060 inherits the degradation + budget contracts for agent tool invocation.
- TA-16 / ADR-036 inherits the OTel GenAI-semconv contract with Spectral attributes.
- ADR-037 owns credential storage; its Supabase Vault store backs the
credential_reffor the global tier and the customer-org BYO tier (api_key+ refreshableoauthbundle), with the per-domain BYO tier resolving through the same store when it ships. - TA-24 / ADR-061 inherits the
LLMProviderprotocol as the mock boundary;FakeLLMProviderimplements it. - Alpha rate-limit in-process — the
RateLimiterprotocol keeps swap localized when horizontal scale arrives.
References
- ADR-007 — LangGraph +
init_chat_model - ADR-065 —
spectral.coreadmission discipline - ADR-031 — single-library structure
- ADR-033 —
TenantScopedLLMProviderpattern (analogous toTenantScopedQuery) - ADR-036 — OTel substrate; LLM observability streams
- ADR-038 — TA-11 (embedding purpose-key consumer)
- ADR-060 — TA-15 agent tool invocation
- ADR-061 — TA-24 LLM testing strategy (
FakeLLMProvider) - ADR-099 — core infrastructure zone; the LLM control-plane concretes (
TracedProvider,ModelProfileRegistry/ router, budget/rate-limit enforcement) live atcore.llm.infrastructure;LLMContextlives atcore.llm; LLM budget/usage state lives in thecoreschema, enforced via the dedicatedcore.llm_budget_statecounter (D6) - TA-10 disposition — SPEC-313 comment
0a50c35f - TA-10 verification — SPEC-313 comment
53719313 tools/quality/check_llm_sdk_allowlist.py— SDK allowlist lint (commit1ed000c)src/spectral/core/llm/— the LLM contract surface