CD Pipeline Overview
The CD pipeline is composite-action workflows + thin per-target jobs, with concurrency mutexes ordering staging-then-production deploys and a 12-step cutover protocol that closes 13 known operator failure modes. Decision lineage in ADR-053; operational runbooks at docs/runbooks/deployment.md and docs/runbooks/rollback.md.
Concurrency model
Section titled “Concurrency model”- Staging:
concurrency: { group: deploy-staging-${{ github.ref }}, cancel-in-progress: true }. Latest commit wins. - Production:
concurrency: { group: deploy-prod, cancel-in-progress: false }. Queued; never canceled. - Production gates on a same-SHA staging-success marker — concurrency does not order across separate groups; the marker provides the ordering primitive.
Migration-compat lint
Section titled “Migration-compat lint”tools/quality/check_migration_compat.py rejects in supabase/migrations/*.sql:
DROP COLUMNDROP TABLEALTER COLUMN ... TYPEto incompatible targetADD COLUMN ... NOT NULLwithoutDEFAULTADD ... UNIQUEconstraint on populated column
Override: -- compat: breaking (reason: <reason>) marker on the file. The override forces explicit human review; an unannotated breaking change blocks the PR. The V1-against-V2-schema corner is prevented at PR time, not deploy time.
Production cutover (12 steps)
Section titled “Production cutover (12 steps)”For tag push (prod-N or v*.*.*):
- Acquire
concurrency: deploy-prodmutex. - Verify same-SHA staging-success marker; abort if absent.
- Pre-merge dry-run; abort on failure.
- Apply schema via management API
branches/staging/merge→ main; assertschema_migrationsrow-count delta matches expected. - Allocate generation —
INSERT INTO core.deployments RETURNING generation. Capture<N>. - Per-service deploy of green (authenticated API, not deploy hook): for each affected target, deploy with
SPECTRAL_GENERATION=<N>set per-service. Workers + api in same generation. - Poll service API until
status='live'. Hard timeout 25 min. - Poll
/versionuntil{commit_sha, schema_version, generation}all match expected. - Workers heartbeat verification — poll
core.workersuntil workers at new generation reportstate='running'. - 30 s sanity check — assert no blue service has started a new deploy in the last 30 s. Auto-redeploy on env-group rotation would otherwise silently roll the blue side mid-flip.
- CNAME flip — public CNAMEs point at green origins. TTL must be pre-lowered to 60 s at least 2 hours before this step.
- Hold blue warm 24 h, then sync blue to green.
Failures in steps 1–5 are nothing visible to users; abort cleanly. Failures in 6–9 leave green broken; do not flip; investigate. Step 11 failure → rollback path 2 (CNAME back to blue).
Rollback decision tree
Section titled “Rollback decision tree”- Cutover incomplete (CNAME not flipped): abort; blue still serving; investigate green; rebuild as needed. No legacy-drain.
- Post-cutover behavior-only issue: flip CNAME back to blue. 60 s TTL bounds the window. Blue stays warm during hold. Outbox at gen-N drains naturally.
- Post-cutover deploy-generation-specific data issue: redeploy prior code at new generation N+2 tagged
vX.Y.Z-rollback. Stranded gen-(N+1) outbox rows drained viadrain-legacy-generation.yml. - Migration-caused issue: migrations are forward-only + expand/contract + compat-linted. Old code works against new schema by design. Rollback is code-level only, via path 3.
- Past image retention OR upstream-yanked dep: declare DR per
docs/runbooks/disaster-recovery.md.
See also
Section titled “See also”- ADR-053 — decision lineage
- Deployment topology — service inventory + generation stamping
- ADR-049 — container strategy + image conventions
- Hosting
- Secrets management — environment-group placement
docs/runbooks/deployment.md,docs/runbooks/rollback.md,docs/runbooks/legacy-drain.md,docs/runbooks/ci-secrets.md