LLM Platform
Spectral’s LLM stack is in-process Python, built on pydantic-ai as the canonical SDK abstraction for direct LLM calls (scan pipeline, rule distillation, any non-LangGraph path). The control plane — routing, profile resolution, rate limits, budgets, graceful degradation — runs in-process; there is no out-of-process HTTP LLM gateway. Decision lineage in ADR-008 (pydantic-ai adoption) and ADR-035 (current stack composition).
Substrate
Section titled “Substrate”langchain-anthropic, langchain-openai, and langchain-google are permitted only as
LangChain chat-model adapters passed to LangGraph’s init_chat_model for agent
orchestration (per ADR-007). LiteLLM is not
in the dependency graph — pydantic-ai covers the provider surface Spectral uses, and the
March 2026 supply-chain compromise plus the pain points already named in
ADR-008 settled the decision against it.
Purpose taxonomy
Section titled “Purpose taxonomy”spectral.core.llm.purposes.PurposeKey is the contract every LLM call carries across worlds and platform:
scoring— evaluation scoring (high volume, cost-sensitive)detection— anti-deception, parse validationreasoning— diagnosis, optimization, calibration rewrites, rule distillationagent_turn— conversational turn (Spectral / Ops Agent)agent_tool— tool-invocation call within an agent turnworld_agent— World Agent exploration/hypothesiscustomer_replay— re-executing customer agents during observeembedding— embedding generation (full policy in embeddings)
Events, cost rollups, and observability all aggregate on this key.
Cost posture
Section titled “Cost posture”The purpose taxonomy is also the cost-control lever. A scan exercises every purpose: scoring
runs on the high-volume cost-optimized tier (cheap models, lots of calls); reasoning and
world_agent run on the highest-capability tier (fewer calls, more expensive); detection and
agent_* sit between. The intentional asymmetry — capability where it pays off, throughput
where it doesn’t — is what keeps per-scan unit economics workable as workspace scale grows.
Concrete unit-cost ranges sharpen as Spectral observes real workspace usage; the structure that
makes those ranges defensible (purpose-level routing + workspace-override governance) is in
place from today.
Hierarchical profile resolution
Section titled “Hierarchical profile resolution”Resolution precedence: customer-fixed > workspace-override > global-default, per-purpose granularity.
Override classification per purpose:
locked— platform-only (default fordetection,customer_replay)operator_allowed— workspace can override within platform-approved IDs (default forscoring,reasoning,agent_*)open— any supported model (enterprise contract tier)
ResolvedProfile carries the chosen model ID plus enforcement envelope (rate-limit, budget, fallback chain, defaults). Resolver is a pure function in each context’s application layer: resolve_profile(workspace_id, purpose) → ResolvedProfile.
ModelProfile.credential_source ∈ {"spectral_managed", "workspace_byo"} plus credential_ref: str | None. Today, all workspace credentials default to spectral_managed (platform-provisioned LLM credentials); workspace_byo reserves the path for customer-supplied subscriptions or API keys, with the AEAD-encrypted storage and binding mechanics populated a future when customer subscription federation ships.
Rate limit and budget
Section titled “Rate limit and budget”Token-bucket per (workspace_id, purpose); per-workspace per-purpose daily spend cap. A TenantScopedLLMProvider wrapper around pydantic-ai applies the rate-limit and budget envelope; repositories in worlds or platform cannot bypass.
Cost source: genai-prices, with a fallback registry in spectral.core.llm.pricing for models the package does not yet know.
Per call emits:
- An OTel span with GenAI-semconv plus Spectral attributes (
spectral.purpose,spectral.workspace_id,spectral.account_id,spectral.context,spectral.scan_id?,spectral.agent_turn_id?). - A row to
core.llm_usagefor in-app budget enforcement.
Budget enforcement reads rolling spend from core.llm_usage; on-call check fails closed with Result[BudgetExceeded] when the daily cap is exceeded.
Admin queries against core.llm_usage are app-layer-gated to workspace admins; RLS scopes the rows to the calling workspace, and admin-only access is enforced via scope checks on the route. Ops staff query platform-wide via the Supabase service_role connection (which is exempt from RLS policies). A core.llm_usage_daily rollup serves cheap “spend this month” queries.
Per-context quota isolation
Section titled “Per-context quota isolation”Rate-limit budgets are independent per context — a spectral.worlds purpose burning its budget
must not starve a spectral.platform purpose, and vice versa. TenantScopedLLMProvider keys
its budget ledger on (workspace_id, context, purpose), not just (workspace_id, purpose). An
isolation test exercises this directly: the test exhausts one context’s budget for a purpose,
then verifies the same purpose still resolves successfully under the other context’s budget.
The test fails if either context’s quota leaks into the other’s accounting.
Graceful degradation (two axes)
Section titled “Graceful degradation (two axes)”- Provider-level (transport). HTTP 5xx/429 → exponential backoff retry (max 3) on the same provider; then next provider in the purpose’s fallback chain. Per-tenant retry budget prevents one bad workspace from exhausting provider rate limits for the population.
- Purpose-level (quality). Consecutive validation/refusal failures on a purpose within a window trigger upgrade rather than lateral fallback:
DegradationPolicy(trigger=N_failures_in_window, action=upgrade_to_purpose). Per-eval-dimension override carries forward as scan-scoped policy.
Circuit transitions emit domain events (llm.circuit.opened, llm.circuit.closed) consumed by the observability stack and the Spectral Agent.
Profile versioning and rollback
Section titled “Profile versioning and rollback”core.llm_profiles persists (id, version, active_at, deactivated_at, created_by, audit_log). Activation is append-only; rollback re-activates a prior version. The schema reserves a variant: str | None column for A/B routing without binding the resolver branch in today.
Governance: locked-override modifications require a platform-role audit entry; operator_allowed changes are workspace-admin-auditable.
Alpha fallback chain defaults
Section titled “Alpha fallback chain defaults”Same-provider-family to preserve prompt-caching affinity:
reasoning,agent_turn,agent_tool,world_agent: Claude Opus 4.7 → Claude Sonnet 4.6 → Claude Haiku 4.5scoring,detection: Gemini 2.5 Flash → Gemini 3.1 Flash-Litecustomer_replay: matches the customer agent’s modelembedding: per embeddings D11 ladder
Cross-provider fallback is configurable per profile; not a default.
SDK allowlist
Section titled “SDK allowlist”tools/quality/check_llm_sdk_allowlist.py (pre-push tier) asserts litellm absent from uv tree and forbids direct imports of raw provider SDKs (anthropic, openai, google.generativeai) from Spectral code. All calls flow through pydantic-ai or LangGraph/init_chat_model with langchain-anthropic / langchain-openai / langchain-google adapters.
Testing
Section titled “Testing”FakeLLMProvider (in spectral.core.llm.testing) implements the LLMProvider protocol and returns canned responses keyed by purpose plus content-class. Integration tests use VCR-style cassettes; nightly drift detection compares live-provider output against cassettes. See testing for the full posture.
See also
Section titled “See also”- ADR-008 — pydantic-ai as the canonical SDK abstraction
- ADR-035 — current LLM stack composition
- Embeddings —
embeddingpurpose consumer - Observability stack — OTel substrate; content-class routing
- Agent tool invocation —
agent_toolpurpose consumer