Checkpointer encryption runbook
Operational procedures for activating envelope encryption on the LangGraph checkpointer when a forward trigger fires.
System reference: Codex system-design/agent-architecture.mdx · ADR-043 D10.
Trigger conditions
Activation triggers are owned by ADR-043 D10 — see the ADR for the authoritative list. The checkpointer relies on disk-level encryption + role-scoped DB access + audit logs + retention cascade until a trigger fires.
Implementation shape
The activation builds an EncryptedSerializer wrapping AsyncPostgresSaver’s SerializerProtocol. Per-workspace DEK generated and wrapped via KMS. DEK caching with TTL. Provider-swap seam via KeyManagementProvider protocol per ADR-037 D11.
Estimated effort: ~2 engineer-weeks plus KMS IAM setup plus rotation runbook authoring.
Components
KeyManagementProviderprotocol inspectral.core.crypto.protocols(or domain-appropriate location).GcpKmsProviderimpl inspectral_workersinfrastructure (per ADR-037 D12 KMS reservation; even if compute lives on Render, KMS is the master-key root of trust).EncryptedSerializerwrapping the checkpointer’s serializer (the LangGraph default; provider-swap seam preserved).- Per-workspace DEK lifecycle:
- Generate DEK on workspace creation; wrap with KMS key; store wrapped DEK in
platform.workspace_keys(or analogous). - Cache unwrapped DEK in process memory with TTL (default 1 h).
- Rotate KMS key per quarterly cadence; re-wrap DEKs without re-encrypting payloads.
- Generate DEK on workspace creation; wrap with KMS key; store wrapped DEK in
Migration
When activated:
- Land
platform.workspace_keysmigration. - Provision KMS keys per environment (
spectral-staging-kms,spectral-production-kms). - Deploy a backfill job that generates per-workspace DEKs for existing workspaces.
- Deploy the workers update with
EncryptedSerializerenabled via feature flag. - Re-encrypt existing checkpointer rows (one-time backfill; runs in workers).
- Remove the feature flag once backfill completes.
Verification
After activation:
-- Confirm checkpointer rows are encrypted (payloads should be base64 ciphertext, not the cleartext serializer output)SELECT pg_typeof(state), octet_length(state)FROM langgraph.checkpointsLIMIT 10;A roundtrip test confirms the workers can decrypt and resume an arbitrary thread.
Rotation
Quarterly cadence (mirrors the ADR-062 D5 secrets rotation).
- Rotate the KMS key version.
- Re-wrap all workspace DEKs against the new key version (no payload re-encryption needed).
- Verify a sample of threads decrypts successfully.
Old KMS key versions retained per the KMS retention policy for audit + emergency decrypt.
Disaster scenarios
- DEK unwrap fails (KMS outage): workers fail closed;
/healthreturns 503 (auth check fails on Spectral Agent paths). Wait for KMS recovery; verify with sample roundtrip. - Workspace DEK lost: the workspace’s checkpointer history becomes unrecoverable. Mitigation: KMS replication; multi-region key-version retention.
- KMS key destroyed: all workspace DEKs unwrappable; full checkpointer history unrecoverable. Mitigation: DR runbook escalation; restore from
pg_dump(which contains the wrapped DEKs but not the destroyed KMS material).
See also
- ADR-043 — Spectral Agent conversation persistence (D10 forward trigger)
- ADR-037 — D12 GCP KMS reservation
docs/runbooks/secrets-management.md— quarterly rotation cadencedocs/runbooks/disaster-recovery.md— DR scenarios