Skip to content
GitHub
Decisions

ADR-053: CD pipeline — fast-forward `production`-branch deploy via a reusable GitHub Actions workflow

Context

The hosting topology is one Cloudflare app container running both the API and workers entrypoints, against cloud Supabase, behind the Cloudflare edge (ADR-109); provisioning resolves config from infra/environments.toml + 1Password into a GitHub Environment (ADR-110). This ADR settles how a deploy is triggered, sequenced, gated, and recovered on top of that topology.

The shape is deliberately small for a pre-launch, single-operator, single-environment system. There is no blue-green colour flip, no staged soak/promote, no separate drain service — those were sized for a multi-service Render topology that no longer exists. A deploy is a fast-forward push of a branch; the engine that runs it is one reusable workflow so a second environment becomes a thin caller, not a rewrite.

Decision

D1 — Trigger: a fast-forward push of production deploys; main does not

main is integration — pushing main runs CI but does not deploy. A deploy is an explicit fast-forward push of the production branch, which fires .github/workflows/deploy-production.yml. The production-ff-only repository ruleset (provisioned by tools/provision/github_resources.sh) enforces fast-forward-only + no-delete on that branch, and the production GitHub Environment’s protection rule gates the run. The deploy advance is git merge --ff-only main on production then git push. This keeps remote main always safe to be at — deploying is a separate, deliberate act.

The fast-forward-branch model is kept even though there is only one environment today: it builds the habit, and main → staging lands later as a sibling caller (D2) with no change to the trigger discipline.

D2 — Reusable engine + thin per-environment callers

.github/workflows/deploy.yml is the deploy engine: a workflow_call reusable workflow parameterized by an environment input. deploy-production.yml is a thin caller that invokes it with environment: production. A staging deploy (deploy-staging.yml, main-triggered) is added as a second thin caller when a Supabase stage project exists — the engine is already environment-parameterized, so that is a few lines, not a fork. At alpha there is one environment, production.

D3 — Config comes from the GitHub Environment, reconciled from committed source

The deploy reads the matching GitHub Environment (vars + secrets), which tools/provision/provision.sh populates from infra/environments.toml + 1Password. Committed config is the source of truth; the Environment is its reconciled projection; nothing is passed to the workflow by hand. On a config change (a new key, a rotated secret) the operator re-runs provision.sh --env production to republish before deploying.

D4 — Least-privilege secret contract; secrets scoped per step

Callers enumerate the secret set explicitly (secrets: block, not secrets: inherit), so the contract is visible and minimal. Each secret is scoped to the individual step that needs it — credentials are never in scope during pnpm install or any other third-party-package-lifecycle step. Non-secret identifiers (Cloudflare account id, the /health URL) live in job-level env with stable fallbacks so a freshly-created Environment still deploys.

D5 — Deploy step order: migrations → container → edge → smoke

The deploy job runs, in order, against the resolved Environment:

  1. Apply Supabase migrations (tools/deploy/apply_supabase.sh) — schema first, so it is ready before the new container serves.
  2. Deploy the app container via wrangler (tools/deploy/deploy_container.sh) — sets the container runtime secrets + non-secret vars on the Worker (the SPEC-503 secret hop: SUPABASE_DB_URL/SUPABASE_SECRET_KEY as Worker secrets, the rest as --var), which infra/cloudflare/edge/src/index.ts forwards into the container at start.
  3. Reconcile the Cloudflare edge (tools/deploy/reconcile_edge.sh) — R2 bucket + the api. custom domain, which attaches to the Worker the deploy just created/updated.
  4. Smoke test (tools/deploy/smoke.sh) — D6.

D6 — A green /health through the edge is the deploy go/no-go

smoke.sh polls https://api.runspectral.com/health until it returns 200, with a generous retry budget (a cold first attach of the api. custom domain returns 403 while Cloudflare issues the cert and propagates the Worker route — observed to exceed 5 min; the budget carries ~12 min). A 200 through the edge is the live-deploy proof; never reaching 200 fails the deploy. /health also surfaces subsystem state (outbox wired, per-domain LLM credential) for operator inspection.

D7 — One deploy at a time per environment, never interrupted

concurrency: { group: deploy-${{ inputs.environment }}, cancel-in-progress: false }. Deploys queue per environment; a live deploy is never cancelled mid-flight.

D8 — Generation stamped per deploy, atomic with the image

SPECTRAL_GENERATION is set as a container --var from the Environment at deploy time, so it is atomic with the deployed image — a container crash-restart sees the same generation. Generation is the outbox-claim key (ADR-109 D4): workers claim core.outbox rows WHERE generation = $MY_GENERATION, so a superseded generation simply stops claiming new work once it is no longer deployed.

D9 — Migration-compat lint at PR time

tools/quality/check_migration_compat.py (wired into ci.yml on PR + push-main, and tools/dev/precheck.sh) rejects unannotated breaking changes in supabase/migrations/*.sql: DROP COLUMN, DROP TABLE, incompatible ALTER COLUMN ... TYPE, ADD COLUMN ... NOT NULL without DEFAULT, ADD ... UNIQUE on a populated column. An explicit -- compat: breaking (reason: <reason>) marker overrides and forces human review. Migrations are forward-only (ADR-032) + expand/contract (ADR-109), so old code runs against new schema by design; the lint catches the V1-against-V2-schema corner at PR time rather than deploy time.

D10 — Deploy-manifest coverage lint

tools/quality/check_deploy_manifest_coverage.py (ci.yml) asserts every directory under apps/ and src/spectral/ is either mapped in .github/deploy-manifest.yml or declared non_deployed: — catching a new component that silently ships in (or fails to ship in) the container. With one collapsed container the manifest’s role is this coverage assertion, not per-target rollout routing.

D11 — Rollback is forward-fix

There is no colour swap-back and no drain service. A bad deploy is rolled forward: fix on main, fast-forward production, redeploy. The container comes up at a new generation; the prior generation’s stranded core.outbox rows stop being claimed once it is no longer deployed (D8). Migration-caused issues are recovered the same way — forward-only + expand/contract means the prior code already runs against the new schema, so a code-level forward deploy suffices. A loss past Cloudflare/Supabase retention or an upstream-yanked dependency escalates to DR per ADR-040 (Supabase-native managed backups + PITR).

D12 — Release-page-only changelog; product-version tags curated

A product-version tag vX.Y.Z (cut by the founder at feature-bundle moments, not on every deploy) fires release.yml: git-cliff renders the notes for the range since the previous vX.Y.Z into the GitHub Release body, and generate-sbom.yml attaches the SBOM on the same trigger. The notes do not commit back to the repo — there is no CHANGELOG.md; the GitHub Releases page is the canonical changelog. This drops the commit-back workflow-loop hazard, the write-token requirement, and bot-author commits, keeping the release workflow’s top-level permission at contents: read.

Alternatives considered

Tag-triggered production deploy (prod-N/vX.Y.Z). Rejected in favour of the fast-forward-branch trigger: the branch is the single ordered source of what is deployed, the FF ruleset gives the safety guarantee declaratively, and product-version tags stay free to mark releases independent of deploy cadence.

Blue-green / colour-CNAME cutover with a soak-and-promote window. Rejected at alpha: it was sized for a multi-service topology and a traffic profile that does not exist. The single-container deploy with a /health gate and forward-fix rollback is the right size; the colour machinery is a forward-trigger when traffic volume makes a measured promote worthwhile.

A standing legacy-generation drain service / reaper. Rejected: under the generation-claim model (D8) a superseded generation stops claiming work on its own; in-flight work finishes within the container SIGTERM→SIGKILL window. No drain service exists or is added.

secrets: inherit for the reusable workflow. Rejected per D4 — the explicit contract is the least-privilege posture and keeps the secret set auditable.

Reusable-workflow workflow_call vs duplicated per-environment workflows. The reusable engine (D2) is chosen precisely so a second environment is a thin caller; duplicating the job per environment was rejected as drift-prone.

pgroll for migration management. Deferred; the D9 lint is the alpha-tier safety net. pgroll is a substrate replacement — forward-trigger if migrations get consistently complex.

Consequences

  • Deploying is a deliberate, ordered act — a fast-forward of production, gated by a ruleset and an Environment protection rule; main stays always-safe.
  • A second environment is cheapdeploy-staging.yml is a thin caller of the same engine when a Supabase stage project lands.
  • Config never passes by hand — the Environment is the reconciled projection of committed source; provision.sh is the only writer.
  • Secrets are leak-resistant structurally — explicit contract, per-step scoping, no credentials in package-lifecycle steps.
  • The deploy proves itself — a green /health through the edge is the go/no-go; a broken deploy fails loudly instead of returning a premature success.
  • Rollback stays simple — forward-fix, no swap-back state machine, leaning on forward-only + expand/contract migrations and the generation-claim model.
  • No CHANGELOG.md in the repo — readers use the GitHub Releases page.
  • The deploy-manifest is a coverage-lint artifact, not a rollout router — its per-deployable targeting is vestigial under the collapsed container and is reconciled to the one-container model as separate infra work.

References

  • ADR-109 — Cloudflare-collapsed hosting topology; generation + outbox-claim model
  • ADR-110 — provisioning; the GitHub Environment as the reconciled config projection
  • ADR-032 — forward-only migrations
  • ADR-040 — DR posture (Supabase-native) for the D11 escalation path
  • ADR-051ci.yml uses ty check
  • ADR-062 — CI secrets handling, fork-PR safety, Environment scoping
  • .github/workflows/deploy.yml — the reusable deploy engine
  • .github/workflows/deploy-production.yml — the production caller
  • tools/deploy/apply_supabase.sh, deploy_container.sh, reconcile_edge.sh, smoke.sh
  • tools/quality/check_migration_compat.py — D9 lint
  • tools/quality/check_deploy_manifest_coverage.py — D10 lint
  • docs/runbooks/deployment.md — the operator deploy + rollback runbook