ADR-053: CD pipeline — fast-forward `production`-branch deploy via a reusable GitHub Actions workflow
Context
The hosting topology is one Cloudflare app container running both the API and workers entrypoints, against cloud Supabase, behind the Cloudflare edge (ADR-109); provisioning resolves config from infra/environments.toml + 1Password into a GitHub Environment (ADR-110). This ADR settles how a deploy is triggered, sequenced, gated, and recovered on top of that topology.
The shape is deliberately small for a pre-launch, single-operator, single-environment system. There is no blue-green colour flip, no staged soak/promote, no separate drain service — those were sized for a multi-service Render topology that no longer exists. A deploy is a fast-forward push of a branch; the engine that runs it is one reusable workflow so a second environment becomes a thin caller, not a rewrite.
Decision
D1 — Trigger: a fast-forward push of production deploys; main does not
main is integration — pushing main runs CI but does not deploy. A deploy is an explicit fast-forward push of the production branch, which fires .github/workflows/deploy-production.yml. The production-ff-only repository ruleset (provisioned by tools/provision/github_resources.sh) enforces fast-forward-only + no-delete on that branch, and the production GitHub Environment’s protection rule gates the run. The deploy advance is git merge --ff-only main on production then git push. This keeps remote main always safe to be at — deploying is a separate, deliberate act.
The fast-forward-branch model is kept even though there is only one environment today: it builds the habit, and main → staging lands later as a sibling caller (D2) with no change to the trigger discipline.
D2 — Reusable engine + thin per-environment callers
.github/workflows/deploy.yml is the deploy engine: a workflow_call reusable workflow parameterized by an environment input. deploy-production.yml is a thin caller that invokes it with environment: production. A staging deploy (deploy-staging.yml, main-triggered) is added as a second thin caller when a Supabase stage project exists — the engine is already environment-parameterized, so that is a few lines, not a fork. At alpha there is one environment, production.
D3 — Config comes from the GitHub Environment, reconciled from committed source
The deploy reads the matching GitHub Environment (vars + secrets), which tools/provision/provision.sh populates from infra/environments.toml + 1Password. Committed config is the source of truth; the Environment is its reconciled projection; nothing is passed to the workflow by hand. On a config change (a new key, a rotated secret) the operator re-runs provision.sh --env production to republish before deploying.
D4 — Least-privilege secret contract; secrets scoped per step
Callers enumerate the secret set explicitly (secrets: block, not secrets: inherit), so the contract is visible and minimal. Each secret is scoped to the individual step that needs it — credentials are never in scope during pnpm install or any other third-party-package-lifecycle step. Non-secret identifiers (Cloudflare account id, the /health URL) live in job-level env with stable fallbacks so a freshly-created Environment still deploys.
D5 — Deploy step order: migrations → container → edge → smoke
The deploy job runs, in order, against the resolved Environment:
- Apply Supabase migrations (
tools/deploy/apply_supabase.sh) — schema first, so it is ready before the new container serves. - Deploy the app container via wrangler (
tools/deploy/deploy_container.sh) — sets the container runtime secrets + non-secret vars on the Worker (the SPEC-503 secret hop:SUPABASE_DB_URL/SUPABASE_SECRET_KEYas Worker secrets, the rest as--var), whichinfra/cloudflare/edge/src/index.tsforwards into the container at start. - Reconcile the Cloudflare edge (
tools/deploy/reconcile_edge.sh) — R2 bucket + theapi.custom domain, which attaches to the Worker the deploy just created/updated. - Smoke test (
tools/deploy/smoke.sh) — D6.
D6 — A green /health through the edge is the deploy go/no-go
smoke.sh polls https://api.runspectral.com/health until it returns 200, with a generous retry budget (a cold first attach of the api. custom domain returns 403 while Cloudflare issues the cert and propagates the Worker route — observed to exceed 5 min; the budget carries ~12 min). A 200 through the edge is the live-deploy proof; never reaching 200 fails the deploy. /health also surfaces subsystem state (outbox wired, per-domain LLM credential) for operator inspection.
D7 — One deploy at a time per environment, never interrupted
concurrency: { group: deploy-${{ inputs.environment }}, cancel-in-progress: false }. Deploys queue per environment; a live deploy is never cancelled mid-flight.
D8 — Generation stamped per deploy, atomic with the image
SPECTRAL_GENERATION is set as a container --var from the Environment at deploy time, so it is atomic with the deployed image — a container crash-restart sees the same generation. Generation is the outbox-claim key (ADR-109 D4): workers claim core.outbox rows WHERE generation = $MY_GENERATION, so a superseded generation simply stops claiming new work once it is no longer deployed.
D9 — Migration-compat lint at PR time
tools/quality/check_migration_compat.py (wired into ci.yml on PR + push-main, and tools/dev/precheck.sh) rejects unannotated breaking changes in supabase/migrations/*.sql: DROP COLUMN, DROP TABLE, incompatible ALTER COLUMN ... TYPE, ADD COLUMN ... NOT NULL without DEFAULT, ADD ... UNIQUE on a populated column. An explicit -- compat: breaking (reason: <reason>) marker overrides and forces human review. Migrations are forward-only (ADR-032) + expand/contract (ADR-109), so old code runs against new schema by design; the lint catches the V1-against-V2-schema corner at PR time rather than deploy time.
D10 — Deploy-manifest coverage lint
tools/quality/check_deploy_manifest_coverage.py (ci.yml) asserts every directory under apps/ and src/spectral/ is either mapped in .github/deploy-manifest.yml or declared non_deployed: — catching a new component that silently ships in (or fails to ship in) the container. With one collapsed container the manifest’s role is this coverage assertion, not per-target rollout routing.
D11 — Rollback is forward-fix
There is no colour swap-back and no drain service. A bad deploy is rolled forward: fix on main, fast-forward production, redeploy. The container comes up at a new generation; the prior generation’s stranded core.outbox rows stop being claimed once it is no longer deployed (D8). Migration-caused issues are recovered the same way — forward-only + expand/contract means the prior code already runs against the new schema, so a code-level forward deploy suffices. A loss past Cloudflare/Supabase retention or an upstream-yanked dependency escalates to DR per ADR-040 (Supabase-native managed backups + PITR).
D12 — Release-page-only changelog; product-version tags curated
A product-version tag vX.Y.Z (cut by the founder at feature-bundle moments, not on every deploy) fires release.yml: git-cliff renders the notes for the range since the previous vX.Y.Z into the GitHub Release body, and generate-sbom.yml attaches the SBOM on the same trigger. The notes do not commit back to the repo — there is no CHANGELOG.md; the GitHub Releases page is the canonical changelog. This drops the commit-back workflow-loop hazard, the write-token requirement, and bot-author commits, keeping the release workflow’s top-level permission at contents: read.
Alternatives considered
Tag-triggered production deploy (prod-N/vX.Y.Z). Rejected in favour of the fast-forward-branch trigger: the branch is the single ordered source of what is deployed, the FF ruleset gives the safety guarantee declaratively, and product-version tags stay free to mark releases independent of deploy cadence.
Blue-green / colour-CNAME cutover with a soak-and-promote window. Rejected at alpha: it was sized for a multi-service topology and a traffic profile that does not exist. The single-container deploy with a /health gate and forward-fix rollback is the right size; the colour machinery is a forward-trigger when traffic volume makes a measured promote worthwhile.
A standing legacy-generation drain service / reaper. Rejected: under the generation-claim model (D8) a superseded generation stops claiming work on its own; in-flight work finishes within the container SIGTERM→SIGKILL window. No drain service exists or is added.
secrets: inherit for the reusable workflow. Rejected per D4 — the explicit contract is the least-privilege posture and keeps the secret set auditable.
Reusable-workflow workflow_call vs duplicated per-environment workflows. The reusable engine (D2) is chosen precisely so a second environment is a thin caller; duplicating the job per environment was rejected as drift-prone.
pgroll for migration management. Deferred; the D9 lint is the alpha-tier safety net. pgroll is a substrate replacement — forward-trigger if migrations get consistently complex.
Consequences
- Deploying is a deliberate, ordered act — a fast-forward of
production, gated by a ruleset and an Environment protection rule;mainstays always-safe. - A second environment is cheap —
deploy-staging.ymlis a thin caller of the same engine when a Supabase stage project lands. - Config never passes by hand — the Environment is the reconciled projection of committed source;
provision.shis the only writer. - Secrets are leak-resistant structurally — explicit contract, per-step scoping, no credentials in package-lifecycle steps.
- The deploy proves itself — a green
/healththrough the edge is the go/no-go; a broken deploy fails loudly instead of returning a premature success. - Rollback stays simple — forward-fix, no swap-back state machine, leaning on forward-only + expand/contract migrations and the generation-claim model.
- No
CHANGELOG.mdin the repo — readers use the GitHub Releases page. - The deploy-manifest is a coverage-lint artifact, not a rollout router — its per-deployable targeting is vestigial under the collapsed container and is reconciled to the one-container model as separate infra work.
References
- ADR-109 — Cloudflare-collapsed hosting topology; generation + outbox-claim model
- ADR-110 — provisioning; the GitHub Environment as the reconciled config projection
- ADR-032 — forward-only migrations
- ADR-040 — DR posture (Supabase-native) for the D11 escalation path
- ADR-051 —
ci.ymlusesty check - ADR-062 — CI secrets handling, fork-PR safety, Environment scoping
.github/workflows/deploy.yml— the reusable deploy engine.github/workflows/deploy-production.yml— theproductioncallertools/deploy/—apply_supabase.sh,deploy_container.sh,reconcile_edge.sh,smoke.shtools/quality/check_migration_compat.py— D9 linttools/quality/check_deploy_manifest_coverage.py— D10 lintdocs/runbooks/deployment.md— the operator deploy + rollback runbook