ADR-037: Secrets management — provisioning-script architecture and target-swap discipline
Context
Spectral has four secret classes: platform operational secrets (provider API keys, observability vendor keys per ADR-036, Supabase keys, DB connection strings), customer BYO credentials (ADR-035 D4; the org tier is implemented, the per-domain tier reserved), CI secrets (ADR-062), and local dev secrets.
Alpha posture: solo-builder → 2–3 engineers, pre-SOC2. The strongest architecture from disposition is a deployer-operated provisioning script as the deploy-time orchestrator, abstracting “where secrets get pushed” behind a target-function interface — decoupling the provisioning flow from the hosting choice. Every managed secrets SaaS (Doppler, Infisical, Vault Cloud) requires its own long-lived service token guarding the other credentials; given the credential-bearing-SaaS supply-chain record (and Spectral’s removal of LiteLLM post-compromise, ADR-035 D1), no runtime secrets SaaS is adopted.
Decision
D1 — Runtime read source
Runtime secrets source from 1Password via committed op:// references in infra/environments.toml, resolved by provision.sh and published to their targets (Cloudflare Worker/Container secrets via wrangler, Supabase, GitHub) — the provisioning mechanism is ADR-110.
D2 — Runtime identity
Provisioning/deploy credentials are wrangler + vendor-CLI auth (ADR-110); there is no PaaS runtime-identity key. ADR-062 D5 captures the long-lived-key mitigation (rotation cadence, scoped GitHub Environments, leakage scanning) for the credentials that remain.
D3 — Local dev laptops are independent of the provisioning script
.env.example/.env.local (gitignored) + direnv + Pydantic Settings. Dev-tier values come from developer-minted free-tier keys or a team shared vault. Dev laptops do not need access to the production secrets backend.
D4 — tools/provision/provision.sh is the canonical config-provisioning orchestrator
- Language: bash (POSIX; macOS / Linux / WSL; macOS system bash 3.2-safe).
- One operation, several forms: resolve an environment from
infra/environments.toml(op://refs via 1Password) and either run a command with it (--env <name> -- <cmd>), preview a publish (--env <name> --dry-run), publish it to the env’s sink (--env <name>), or rotate a secret (--rotate <KEY>— produce a new value via a registered generator or interactive prompt, store it in 1Password, republish the affected env(s); SPEC-720). Generation is explicit-only: a plain--envpublish never regenerates. - Target environment:
--env <name>selects a table ininfra/environments.toml; reserved meta-tables (currently[rotation], which declares per-key generator + scope) are not environments. - Secret source: secret references resolve from 1Password via
infra/environments.toml(op://references) per ADR-110 — no plaintext cache. - Direction B (ADR-110): the script resolves + publishes config (to a local
.envor a GitHub Environment); GitHub Actions deploys and reconciles vendor resources (Cloudflare/wrangler, Supabase) — the script never pushes to a vendor directly. - Documented in
tools/provision/README.md.
D5 — Operator key-source discipline
1Password is the documented default operator-workstation secrets store (ADR-110 D2); the privacy intent is satisfied by recommending a default for any operator rather than treating individual tooling as private.
D6 — LLM credentials in Supabase Vault, across all tiers
Supabase Vault is the credential store for all LLM-config tiers — not just customer BYO. The global tier (SPEC-717) and the customer-org BYO tier (SPEC-718) are implemented: core.llm_credentials indexes scope-keyed credentials whose secret lives in Vault, read once through a SECURITY DEFINER store and never echoed back; a profile’s credential_ref (ADR-035 D4) addresses the credential, keeping raw keys out of core.llm_profiles. Credential kinds are api_key (a static bearer) and oauth (a refreshable subscription token bundle); the oauth bundle is rotated in place by a SECURITY DEFINER update_llm_credential_secret so a token refresh keeps the credential_ref stable (no churned credential row). Org-scoped reads are confined per membership (RLS over caller_org_ids()), while the secret stays reachable only through the platform_role-granted store functions. The per-domain BYO scope is named/reserved in the same store and resolves when that tier ships — same Vault store, scope-keyed, with no separate per-domain table and no application-layer envelope-encryption layer.
D7 — Rotation runbook is the provisioning script plus a one-page ops doc
docs/runbooks/secrets-management.md documents standard rotation + first-time generation (provision.sh --rotate <KEY>: produce a value from the key’s registered generator in tools/provision/generators/, else an interactive prompt; store it in 1Password; republish per the key’s [rotation] shared/per-env scope), emergency full-surface rotation, vendor caveats, and rolling-restart triggers per target. Generation is the first rotation; a normal --env publish never regenerates. CI-specific rotation extends through docs/runbooks/ci-secrets.md (ADR-062).
D8 — Composite audit trail
Script runs logged via git history (commits touching tools/provision/) + 1Password item history + platform-native audit logs (Cloudflare + GitHub org audit) + Supabase PGAudit. No unified UI at alpha; export wiring is post-alpha SOC2-readiness work.
D9 — The provisioning script is the alpha IaC
There is no declarative IaC layer and no Terraform. provision.sh resolves an environment from infra/environments.toml + 1Password and publishes it (to a local .env or a GitHub Environment); GitHub Actions deploys, reconciling vendor resources from committed declarative config (wrangler + CLIs). The provisioning + env-config model is ADR-110.
D10 — TA-16 observability vendor key binding
Logfire / Sentry / Grafana Cloud tokens declared in infra/environments.toml, provisioned via provision.sh, pushed to the runtime backend, read at the FastAPI composition root via Pydantic Settings. Closes the ADR-036 → ADR-037 dependency.
D11 — Provider-swap design discipline
The provider-swap seam is the publish sink + the deploy workflow’s per-vendor reconcile (ADR-110 Direction B), not a per-target push function in the script: provision.sh resolves an environment and publishes it to a sink (a local .env or a GitHub Environment), and the GitHub Actions deploy reconciles the vendor resources from committed declarative config. A future hosting/secrets-backend swap changes the sink + the deploy workflow’s reconcile steps; the resolution flow, the secret/variable classification, and the --rotate generation/propagation semantics are unchanged. (ADR-110 carries this seam to the Cloudflare/wrangler + Supabase + GitHub target set.)
D12 — Key management is the Supabase substrate; no external KMS
There is no external cloud KMS (no GCP KMS, no AWS KMS) anywhere in the architecture. Key management lives in the Supabase substrate: Supabase Vault (pgsodium-backed) is the credential store for every LLM-config tier (D6), so the earlier plan to migrate customer BYOK off Vault to envelope encryption rooted in an external KMS is dropped — the same scope-keyed Vault store covers the global tier + the org BYO tier (implemented) and the per-domain BYO tier (named/reserved) without a second key-management dependency.
Should a forward trigger ever require at-rest column/blob envelope encryption — e.g. the LangGraph checkpointer conversation state (ADR-043 D10; docs/runbooks/checkpointer-encryption.md) — its master-key root of trust is the same Supabase substrate (Vault / pgsodium) through the KeyManagementProvider provider-swap seam (D11), not an external KMS. The decision is locked; the infrastructure work is deferred until such a trigger fires.
Alternatives considered
Managed secrets SaaS as runtime source of truth (Doppler / Infisical / Vault Cloud / 1Password op run at runtime). Rejected — each introduces a credential-bearing SaaS subprocessor requiring a long-lived service token guarding the other secrets; cost and supply-chain exposure inconsistent with ADR-035 D1. (1Password is used as the operator-workstation secrets source per ADR-110 D2, not as a runtime secrets service.)
HashiCorp Vault self-hosted. Operational attention tax incompatible with a 1–3 engineer alpha.
Per-domain external secrets manager for customer BYO creds. Fails on cost and IAM blast radius.
.env.provision.* encrypted at rest with age/sops. Rejected for alpha — the decryption key becomes its own chicken-and-egg; superseded outright by the ADR-110 1Password env-config model (no plaintext cache).
Consequences
- The provider-swap seam — the publish sink + the deploy workflow’s per-vendor reconcile (D11) — is load-bearing; changes to it, or to the
--rotategeneration/propagation semantics (D4/D7), require ADR-level review. - Unblocks ADR-036 (observability vendor keys have a home) and ADR-062 (CI secrets inherit the GitHub-Environment-scoped pattern).
- ADR-035 D4’s DB-backed credential source is implemented for the global tier + the org BYO tier via the Vault store (D6); the per-domain BYO tier resolves through the same store when it ships — no GCP KMS / envelope-encryption migration (D12).
- No runtime secrets SaaS and no Terraform/Pulumi for secret provisioning (D9); provisioning is
provision.shresolve+publish + GitHub Actions deploy per ADR-110.