Skip to content
GitHub
Agents

LLM Platform

Spectral’s LLM stack is in-process Python. LangChain is the LLM-execution layer — with_structured_output for structured generation and provider-native tool-calling (bind_tools / create_react_agent) are the per-node LLM execution inside the World Agent’s LangGraph harness. The control plane — routing, profile resolution, rate limits, budgets, graceful degradation — runs in-process and wraps the LangChain call boundary; there is no out-of-process HTTP LLM gateway. The control plane is context-agnostic kernel substrate: its contracts live in core.llm and its concrete implementations (the traced provider, the model-profile router, the budget / rate-limit enforcement) live in the core infra zone core.llm.infrastructure per ADR-099; the budget / usage state is core-schema (core.llm_usage). Decision lineage in ADR-102 (LangGraph + LangChain; DSPy deferred post-alpha) and ADR-035 (the in-process control plane).

LangGraph is the orchestration substrate (per ADR-007); LangChain with_structured_output

  • provider-native tool-calling is the LLM-execution layer inside each node. LLM access flows through the LangChain provider integration packages (langchain-anthropic, langchain-openai, langchain-google-genai), which wrap the provider SDKs. Spectral code does not call LiteLLM or raw provider SDKs directly; all LLM access flows through the LangChain provider packages per ADR-102 D3. DSPy — with its Optimizer lever — is deferred post-alpha; the node-level seam is abstracted so it can be reintroduced per-node later.

spectral.core.llm.purposes.PurposeKey is the contract every LLM call carries across worlds and platform:

  • code_generation — World Agent generates predicate code from natural-language rules (highest-capability tier)
  • applies_when_generation — World Agent generates the optional context-only filter alongside a predicate
  • distillation — operator-driven distillation runs against source materials
  • reasoning — diagnosis, coverage reflection, restatement drafting
  • agent_turn — conversational turn (World Agent chat surface)
  • agent_tool — tool-invocation call within an agent turn
  • world_agent — World Agent exploration/hypothesis
  • embedding — embedding generation (full policy in embeddings)

Events, cost rollups, and observability all aggregate on this key.

The purpose taxonomy is also the cost-control lever. Authoring-time work — code_generation, applies_when_generation, distillation, reasoning, world_agent — runs on the highest-capability tier (fewer calls, more expensive); agent_* sits between authoring and high-volume use cases; embedding runs on the high-volume cost-optimized tier. The intentional asymmetry — capability where it pays off, throughput where it doesn’t — is what keeps the authoring-time cost envelope workable as the action registry scales. Concrete unit-cost ranges sharpen as Spectral observes real authoring volume; the structure that makes those ranges defensible (purpose-level routing + domain-override governance) is in place from today.


Resolution precedence: customer-domain > customer-org > global, per-purpose granularity — the most-specific customer scope wins, falling back through the org to the global default.

Override classification per purpose:

  • locked — platform-only (default for code_generation, applies_when_generation — the authoring path)
  • operator_allowed — domain can override within platform-approved IDs (default for reasoning, agent_*, distillation)
  • open — any supported model (enterprise contract tier)

ResolvedProfile carries the chosen model ID plus enforcement envelope (rate-limit, budget, fallback chain, defaults). Resolution is a pure precedence function — resolve_active_profile(domain_id, org_id, purpose) → ResolvedProfile | None — backed by a per-request resolver whose in-process cache is invalidated on profile write, so an operator config change takes effect without a redeploy.

Configuration lives in the DB control plane, not the environment

Section titled “Configuration lives in the DB control plane, not the environment”

The active provider, model, and credential are not read from the environment — they live in the DB control plane (core.llm_profiles for the selection, core.llm_credentials + Supabase Vault for the secret). The operator sets the global default through the operations cockpit; an org admin sets their organization’s own model through the customer dashboard. ModelProfile.credential_source distinguishes the platform-managed global credential (spectral_managed) from the customer-org bring-your-own credential (org_byo); credential_ref addresses the Vault secret, keeping raw keys out of core.llm_profiles. The global and customer-org tiers are live; the per-domain scope is named and resolves through the same Vault store when that tier ships.

The org BYO credential is one of two kinds: a static api_key, or a customer-supplied refreshable OAuth bundle. The bundle is a supplied secret — the customer obtains it themselves via a PKCE login helper and pastes it in; the platform stores it in Vault and renews it with the redirect-free refresh_token grant, rotating the Vault secret in place so the credential_ref stays stable across refreshes. The platform never runs an authorization flow, so no registered web OAuth client is needed. Org-scoped config is org-admin gated and isolated per membership (org-scoped RLS over the caller’s active orgs). Whether an OAuth (subscription) credential may be configured or resolved at all — at either the global or the org scope — is governed by the SPECTRAL_ENABLE_OAUTH_LLM_CONFIG feature flag (see below).

The xAI and Anthropic subscription bundles are standard bearer credentials. The OpenAI ChatGPT subscription is experimental: its token is scoped to OpenAI’s Codex backend rather than the public API, so the OpenAI provider branch conforms to the Codex request-shape (the chatgpt.com/backend-api/codex Responses endpoint, the Codex instructions envelope, and the ChatGPT-account-id / originator headers, with the account id carried on the bundle). That conformance is encapsulated in the OpenAI branch — the api-key OpenAI path is unaffected. Because OpenAI Cloudflare-blocks datacenter origins, the OpenAI subscription path works only from a non-datacenter origin and not from a hosted deployment; the supported production path for OpenAI is an org-BYO API key.

Resolution is uniform wherever the World Agent runs: the operator authoring-chat surface, the chat consumer (re-resolved per turn), and the evolution proposer (re-resolved per event) all resolve the same active profile through one shared bundle — the operator-set config governs the agent across both deployment surfaces, not just the operator process. Credential-free cassette replay short-circuits before any credential is resolved, so the replay test path needs no stored secret.

OAuth (subscription) credentials are gated by the SPECTRAL_ENABLE_OAUTH_LLM_CONFIG feature flag, which resolves to enabled when the flag is set or the deployment is local dev. It is the single source of truth on both sides: the operator and customer config routes reject an OAuth write when it is off (non-local), and credential resolution refuses to build a chat model from an OAuth credential — so a stale OAuth profile can neither be set nor serve traffic. Local dev is always exempt, so the flat-rate Grok subscription works out of the box; other environments default to API-key-only and opt into OAuth explicitly. The flag is off by default because the subscription-OAuth shapes are experimental and may be blocked at datacenter origins (confirmed for the OpenAI Codex path; the config surface carries a per-provider advisory). The operator and customer config responses expose the flag state and per-provider warnings so the UI shows the OAuth option only when it is usable.


Token-bucket per (domain_id, purpose); per-domain per-purpose daily spend cap. A control-plane wrapper at the LangChain call boundary — a pre-call enforcing step plus a usage callback that records the real usage_metadata token counts — applies the rate-limit and budget envelope; repositories in worlds or platform cannot bypass.

Cost source: genai-prices, with a fallback registry in spectral.core.llm.pricing for models the package does not yet know.

Per call emits:

  • An OTel span with GenAI-semconv plus Spectral attributes (spectral.purpose, spectral.domain_id, spectral.org_id, spectral.context, spectral.agent_turn_id?).
  • A row to core.llm_usage for in-app budget enforcement.

Budget enforcement reads rolling spend from core.llm_usage; on-call check fails closed with Result[BudgetExceeded] when the daily cap is exceeded.

Admin queries against core.llm_usage are app-layer-gated to domain admins; RLS scopes the rows to the calling domain, and admin-only access is enforced via scope checks on the route. Ops staff query platform-wide via the Supabase service_role connection (which is exempt from RLS policies). A core.llm_usage_daily rollup serves cheap “spend this month” queries.

Rate-limit budgets are independent per context — a spectral.worlds purpose burning its budget must not starve a spectral.platform purpose, and vice versa. The control-plane wrapper keys its budget ledger on (domain_id, context, purpose), not just (domain_id, purpose). An isolation test exercises this directly: the test exhausts one context’s budget for a purpose, then verifies the same purpose still resolves successfully under the other context’s budget. The test fails if either context’s quota leaks into the other’s accounting.


  • Provider-level (transport). HTTP 5xx/429 → exponential backoff retry (max 3) on the same provider; then next provider in the purpose’s fallback chain. Per-tenant retry budget prevents one bad domain from exhausting provider rate limits for the population.
  • Purpose-level (quality). Consecutive validation/refusal failures on a purpose within a window trigger upgrade rather than lateral fallback: DegradationPolicy(trigger=N_failures_in_window, action=upgrade_to_purpose). Per-rule-authoring-context override carries forward as version-scoped policy.

Circuit transitions emit domain events (llm.circuit.opened, llm.circuit.closed) consumed by the observability stack.


core.llm_profiles persists (id, version, activated_at, deactivated_at, created_by, audit_log). Activation is append-only; rollback re-activates a prior version (a profile’s credential_ref stays resolvable across a rollback — the credential is read by id, independent of the active flag, so an older version’s secret is never orphaned). The operations cockpit drives set / version-history / rollback. The schema reserves a variant: str | None column for A/B routing without binding the resolver branch today.

Governance: locked-override modifications require a platform-role audit entry; operator_allowed changes are domain-admin-auditable.


Same-provider-family to preserve prompt-caching affinity:

  • code_generation, applies_when_generation, reasoning, agent_turn, agent_tool, world_agent, distillation: Claude Opus 4.7 → Claude Sonnet 4.6 → Claude Haiku 4.5
  • embedding: per embeddings D11 ladder

Cross-provider fallback is configurable per profile; not a default.


tools/quality/check_llm_sdk_allowlist.py (pre-push tier) forbids direct imports of LiteLLM and of the raw provider SDKs (anthropic, openai, google.generativeai) from Spectral code — LLM access flows through the LangChain provider packages. It asserts no library absent from the dependency graph; LiteLLM may remain as DSPy’s transitive dependency until DSPy is fully removed, per ADR-102 D3.


A fake LLM double (in tests.core.llm) stands in for the model call boundary and returns canned responses (single, sequence, or callable form). Integration tests use VCR-style cassettes; operator-run live-provider recording refreshes only the affected cassettes when prompts, fixtures, or provider/model choices intentionally change. See testing for the full posture.