Decisions

ADR-037: Secrets management — provisioning-script architecture and target-swap discipline

Context

Spectral has four secret classes: platform operational secrets (provider API keys, observability vendor keys per ADR-036, Supabase keys, DB connection strings), customer BYO credentials (ADR-035 D4; the org tier is implemented, the per-domain tier reserved), CI secrets (ADR-062), and local dev secrets.

Alpha posture: solo-builder → 2–3 engineers, pre-SOC2. The strongest architecture from disposition is a deployer-operated provisioning script as the deploy-time orchestrator, abstracting “where secrets get pushed” behind a target-function interface — decoupling the provisioning flow from the hosting choice. Every managed secrets SaaS (Doppler, Infisical, Vault Cloud) requires its own long-lived service token guarding the other credentials; given the credential-bearing-SaaS supply-chain record (and Spectral’s removal of LiteLLM post-compromise, ADR-035 D1), no runtime secrets SaaS is adopted.

Decision

D1 — Runtime read source

Runtime secrets source from 1Password via committed op:// references in infra/environments.toml, resolved by provision.sh and published to their targets (Cloudflare Worker/Container secrets via wrangler, Supabase, GitHub) — the provisioning mechanism is ADR-110.

D2 — Runtime identity

Provisioning/deploy credentials are wrangler + vendor-CLI auth (ADR-110); there is no PaaS runtime-identity key. ADR-062 D5 captures the long-lived-key mitigation (rotation cadence, scoped GitHub Environments, leakage scanning) for the credentials that remain.

D3 — Local dev laptops are independent of the provisioning script

.env.example/.env.local (gitignored) + direnv + Pydantic Settings. Dev-tier values come from developer-minted free-tier keys or a team shared vault. Dev laptops do not need access to the production secrets backend.

D4 — `tools/provision/provision.sh` is the canonical config-provisioning orchestrator

Language: bash (POSIX; macOS / Linux / WSL; macOS system bash 3.2-safe).
One operation, several forms: resolve an environment from infra/environments.toml (op:// refs via 1Password) and either run a command with it (--env <name> -- <cmd>), preview a publish (--env <name> --dry-run), publish it to the env’s sink (--env <name>), or rotate a secret (--rotate <KEY> — produce a new value via a registered generator or interactive prompt, store it in 1Password, republish the affected env(s); SPEC-720). Generation is explicit-only: a plain --env publish never regenerates.
Target environment: --env <name> selects a table in infra/environments.toml; reserved meta-tables (currently [rotation], which declares per-key generator + scope) are not environments.
Secret source: secret references resolve from 1Password via infra/environments.toml (op:// references) per ADR-110 — no plaintext cache.
Direction B (ADR-110): the script resolves + publishes config (to a local .env or a GitHub Environment); GitHub Actions deploys and reconciles vendor resources (Cloudflare/wrangler, Supabase) — the script never pushes to a vendor directly.
Documented in tools/provision/README.md.

D5 — Operator key-source discipline

1Password is the documented default operator-workstation secrets store (ADR-110 D2); the privacy intent is satisfied by recommending a default for any operator rather than treating individual tooling as private.

D6 — LLM credentials in Supabase Vault, across all tiers

Supabase Vault is the credential store for all LLM-config tiers — not just customer BYO. The global tier (SPEC-717) and the customer-org BYO tier (SPEC-718) are implemented: core.llm_credentials indexes scope-keyed credentials whose secret lives in Vault, read once through a SECURITY DEFINER store and never echoed back; a profile’s credential_ref (ADR-035 D4) addresses the credential, keeping raw keys out of core.llm_profiles. Credential kinds are api_key (a static bearer) and oauth (a refreshable subscription token bundle); the oauth bundle is rotated in place by a SECURITY DEFINER update_llm_credential_secret so a token refresh keeps the credential_ref stable (no churned credential row). Org-scoped reads are confined per membership (RLS over caller_org_ids()), while the secret stays reachable only through the platform_role-granted store functions. The per-domain BYO scope is named/reserved in the same store and resolves when that tier ships — same Vault store, scope-keyed, with no separate per-domain table and no application-layer envelope-encryption layer.

D7 — Rotation runbook is the provisioning script plus a one-page ops doc

docs/runbooks/secrets-management.md documents standard rotation + first-time generation (provision.sh --rotate <KEY>: produce a value from the key’s registered generator in tools/provision/generators/, else an interactive prompt; store it in 1Password; republish per the key’s [rotation] shared/per-env scope), emergency full-surface rotation, vendor caveats, and rolling-restart triggers per target. Generation is the first rotation; a normal --env publish never regenerates. CI-specific rotation extends through docs/runbooks/ci-secrets.md (ADR-062).

D8 — Composite audit trail

Script runs logged via git history (commits touching tools/provision/) + 1Password item history + platform-native audit logs (Cloudflare + GitHub org audit) + Supabase PGAudit. No unified UI at alpha; export wiring is post-alpha SOC2-readiness work.

D9 — The provisioning script is the alpha IaC

There is no declarative IaC layer and no Terraform. provision.sh resolves an environment from infra/environments.toml + 1Password and publishes it (to a local .env or a GitHub Environment); GitHub Actions deploys, reconciling vendor resources from committed declarative config (wrangler + CLIs). The provisioning + env-config model is ADR-110.

D10 — TA-16 observability vendor key binding

Logfire / Sentry / Grafana Cloud tokens declared in infra/environments.toml, provisioned via provision.sh, pushed to the runtime backend, read at the FastAPI composition root via Pydantic Settings. Closes the ADR-036 → ADR-037 dependency.

D11 — Provider-swap design discipline

The provider-swap seam is the publish sink + the deploy workflow’s per-vendor reconcile (ADR-110 Direction B), not a per-target push function in the script: provision.sh resolves an environment and publishes it to a sink (a local .env or a GitHub Environment), and the GitHub Actions deploy reconciles the vendor resources from committed declarative config. A future hosting/secrets-backend swap changes the sink + the deploy workflow’s reconcile steps; the resolution flow, the secret/variable classification, and the --rotate generation/propagation semantics are unchanged. (ADR-110 carries this seam to the Cloudflare/wrangler + Supabase + GitHub target set.)

D12 — Key management is the Supabase substrate; no external KMS

There is no external cloud KMS (no GCP KMS, no AWS KMS) anywhere in the architecture. Key management lives in the Supabase substrate: Supabase Vault (pgsodium-backed) is the credential store for every LLM-config tier (D6), so the earlier plan to migrate customer BYOK off Vault to envelope encryption rooted in an external KMS is dropped — the same scope-keyed Vault store covers the global tier + the org BYO tier (implemented) and the per-domain BYO tier (named/reserved) without a second key-management dependency.

Should a forward trigger ever require at-rest column/blob envelope encryption — e.g. the LangGraph checkpointer conversation state (ADR-043 D10; docs/runbooks/checkpointer-encryption.md) — its master-key root of trust is the same Supabase substrate (Vault / pgsodium) through the KeyManagementProvider provider-swap seam (D11), not an external KMS. The decision is locked; the infrastructure work is deferred until such a trigger fires.

Alternatives considered

Managed secrets SaaS as runtime source of truth (Doppler / Infisical / Vault Cloud / 1Password op run at runtime). Rejected — each introduces a credential-bearing SaaS subprocessor requiring a long-lived service token guarding the other secrets; cost and supply-chain exposure inconsistent with ADR-035 D1. (1Password is used as the operator-workstation secrets source per ADR-110 D2, not as a runtime secrets service.)

HashiCorp Vault self-hosted. Operational attention tax incompatible with a 1–3 engineer alpha.

Per-domain external secrets manager for customer BYO creds. Fails on cost and IAM blast radius.

.env.provision.* encrypted at rest with age/sops. Rejected for alpha — the decryption key becomes its own chicken-and-egg; superseded outright by the ADR-110 1Password env-config model (no plaintext cache).

Consequences

The provider-swap seam — the publish sink + the deploy workflow’s per-vendor reconcile (D11) — is load-bearing; changes to it, or to the --rotate generation/propagation semantics (D4/D7), require ADR-level review.
Unblocks ADR-036 (observability vendor keys have a home) and ADR-062 (CI secrets inherit the GitHub-Environment-scoped pattern).
ADR-035 D4’s DB-backed credential source is implemented for the global tier + the org BYO tier via the Vault store (D6); the per-domain BYO tier resolves through the same store when it ships — no GCP KMS / envelope-encryption migration (D12).
No runtime secrets SaaS and no Terraform/Pulumi for secret provisioning (D9); provisioning is provision.sh resolve+publish + GitHub Actions deploy per ADR-110.

References

ADR-110 — provisioning + the env-config model
ADR-035 — supply-chain discipline motivating no credential-bearing SaaS
ADR-036 — vendor key binding (D10)
ADR-062 — CI secrets handling (extends D7)
ADR-065 — spectral.core admission discipline

Previous
ADR-036: Observability stack — OTel substrate, three-stream LLM trace architecture, content-class routing Next
ADR-038: Embedding model — single canonical local model, hybrid retrieval, blue-green re-embedding