Rollback runbook
How to recover a bad production deploy. Cutover is generation-stamped and the production branch is fast-forward-only (ADR-109 D5), so rollback is forward-fix — there is no blue/green CNAME flip and no legacy-generation drain.
The model
- A container deploy ships a new generation (one container). The new container claims its generation’s outbox rows; the prior generation’s rows age out unclaimed — there is no reaper to run.
- A Pages deploy ships a new static/Function deployment for one frontend/docs project; it does not affect the API/workers generation.
- Schema migrations are forward-only (ADR-032 D4) + expand/contract, so the prior generation’s code keeps working against the new schema.
productionis fast-forward-only (the ruleset blocks force-push), so you cannot move it backward — you roll back by deploying a new commit that reverts the change.
By failure class
Behavior regression (code, schema fine)
The new generation serves wrong responses / regresses, but the schema is not at fault.
git revertthe offending commit(s) onmain.- Fast-forward
productionto the revert and push (redeploy perdeployment.md). - The reverted (prior-generation) code runs safely against the already-migrated schema by the expand/contract guarantee.
Pages frontend/docs regression
The bad deploy affects only app., ops., or docs..
- Revert the offending frontend/docs commit on
main. - Fast-forward
productionto the revert and push sodeploy-pages.ymlredeploys the affected Pages project. - If the bad deploy is already live and the revert is not ready, use Cloudflare Pages’ deployment rollback for the affected project, then still land the revert/fix in git.
Migration-caused issue
The migration that applied is itself the problem.
- Do not roll back the schema — migrations are forward-only.
- If the prior code can run against the new schema (expand/contract held), revert the code as above.
- Otherwise fix forward: a follow-up migration that restores the compatibility property (e.g. make a
NOT NULLcolumn nullable again until backfill completes), shipped with the corrected code.
Catastrophic / DR
The database is corrupted or a region is down. Declare DR per disaster-recovery.md (PITR or managed-backup restore).
The V1-against-V2-schema corner
The classic “prior code breaks against the new schema” failure is structurally prevented by expand/contract discipline (ADR-109 D5): a schema change must leave the prior generation working. Breaking DDL — ADD COLUMN … NOT NULL without a default, DROP COLUMN/DROP TABLE, ALTER COLUMN … TYPE, ADD … UNIQUE on an existing table — is rejected at PR time by the migration-compat lint (ADR-053 D5) unless the migration carries an explicit -- compat: breaking (reason: …) override with reviewer signoff. The rollback path stays simple because the corner cannot be entered without explicit opt-in.
Communication
For any production rollback: note it (deploy SHA, generation, failure class) in the ops channel; mark / annotate the GitHub Release if one was created; after resolution, capture any deploy-doctrine gap.
Related
- ADR-109 — generation cutover + expand/contract.
- ADR-053 — CD pipeline (migration-compat lint, D5).
- ADR-032 — forward-only migrations.
deployment.md— the deploy procedure ·disaster-recovery.md— DR.