Skip to content
GitHub
Developer

Checkpointer encryption runbook

Operational procedures for activating envelope encryption on the LangGraph checkpointer when a forward trigger fires.

System reference: Codex system-design/agent-architecture.mdx · ADR-043 D10.


Trigger conditions

Activation triggers are owned by ADR-043 D10 — see the ADR for the authoritative list. The checkpointer relies on disk-level encryption + role-scoped DB access + audit logs + retention cascade until a trigger fires.


Implementation shape

The activation builds an EncryptedSerializer wrapping AsyncPostgresSaver’s SerializerProtocol. Per-domain DEK generated and wrapped via the Supabase key-management substrate (Vault / pgsodium). DEK caching with TTL. Provider-swap seam via KeyManagementProvider protocol per ADR-037 D11.

Estimated effort: ~2 engineer-weeks plus master-key provisioning plus rotation runbook authoring.

Components

  1. KeyManagementProvider protocol in spectral.core.crypto.protocols (or domain-appropriate location).
  2. SupabaseVaultKeyProvider impl in spectral_workers infrastructure — the master-key root of trust is the Supabase substrate (Vault / pgsodium), not an external cloud KMS (ADR-037 D12). The provider-swap seam (component 1) keeps an alternate root reachable without a rewrite if a future requirement demands it.
  3. EncryptedSerializer wrapping the checkpointer’s serializer (the LangGraph default; provider-swap seam preserved).
  4. Per-domain DEK lifecycle:
    • Generate DEK on domain creation; wrap with the Vault master key; store wrapped DEK in platform.domain_keys (or analogous).
    • Cache unwrapped DEK in process memory with TTL (default 1 h).
    • Rotate the Vault master key per quarterly cadence; re-wrap DEKs without re-encrypting payloads.

Migration

When activated:

  1. Land platform.domain_keys migration.
  2. Provision the Vault master key per environment (a pgsodium key id held in Supabase Vault).
  3. Deploy a backfill job that generates per-domain DEKs for existing domains.
  4. Deploy the workers update with EncryptedSerializer enabled via feature flag.
  5. Re-encrypt existing checkpointer rows (one-time backfill; runs in workers).
  6. Remove the feature flag once backfill completes.

Verification

After activation:

-- Confirm checkpointer rows are encrypted (payloads should be base64 ciphertext, not the cleartext serializer output)
SELECT pg_typeof(state), octet_length(state)
FROM langgraph.checkpoints
LIMIT 10;

A roundtrip test confirms the workers can decrypt and resume an arbitrary thread.


Rotation

Quarterly cadence (mirrors the ADR-062 D5 secrets rotation).

  1. Rotate the Vault master key version.
  2. Re-wrap all domain DEKs against the new key version (no payload re-encryption needed).
  3. Verify a sample of threads decrypts successfully.

Old master-key versions retained in Vault for audit + emergency decrypt.


Disaster scenarios

  • DEK unwrap fails (Vault key unavailable): workers fail closed; /health returns 503 (auth check fails on Spectral Agent paths). Resolve Vault/pgsodium key access; verify with a sample roundtrip.
  • Domain DEK lost: the domain’s checkpointer history becomes unrecoverable. Mitigation: the wrapped DEK lives in platform.domain_keys, covered by the standard Supabase backup + PITR cadence (ADR-040).
  • Vault master key lost: all domain DEKs unwrappable; full checkpointer history unrecoverable. The master key shares the Supabase failure domain with the data (the tradeoff of a Supabase-native root of trust vs an external KMS), so the mitigation is the same DR posture as the rest of the data — Supabase managed backups + PITR, with the Vault key included in the project’s backup scope. DR-runbook escalation on a confirmed loss.

See also

  • ADR-043 — Spectral Agent conversation persistence (D10 forward trigger)
  • ADR-037 — D12 Supabase-substrate key management (no external KMS)
  • docs/runbooks/secrets-management.md — quarterly rotation cadence
  • docs/runbooks/disaster-recovery.md — DR scenarios