ADR-049: Container build strategy — repo Dockerfiles, debian-slim-trixie base, one deployed app container
Context
The runtime artifacts are container images built from repo Dockerfiles in infra/docker/, deployed to Cloudflare Containers. The deployed shape is settled by ADR-109 (one collapsed app container running both the API and workers entrypoints); this ADR settles the build strategy beneath it — base image, multi-stage conventions, local-dev parity, build-vs-runtime secrets, lifecycle, SBOM. Prior art: a platform-native Dockerfile build pattern (digest-pinned base + lockfile-frozen deps) validated in production.
Decision
D1 — Build strategy: repo Dockerfiles, deployed to Cloudflare Containers via wrangler
Images build from repo Dockerfiles and deploy to Cloudflare Containers via wrangler (ADR-109 / ADR-110). Digest-pinned base + lockfile-frozen deps close drift risk (same recipe, not same bits). An SBOM is generated as a build artifact (D7).
A GHCR build-and-push upgrade path stays documented behind named revisit triggers: first enterprise SLSA/signing ask; an incident where a bad rebuild blocks deploy; a second compute target; native-extension build-host drift.
Image inventory. The deployed runtime at alpha is one app container — infra/docker/app.Dockerfile launched by app_supervisor.py, running the API and workers entrypoints as sibling processes (ADR-109 D1). The per-service api.Dockerfile / workers.Dockerfile (and the frontend dashboard.Dockerfile / operations.Dockerfile) are retained as the separately-launchable seam — the option to split a tier (or stand the human portals on their own host) onto its own image later without a code change. There is no self-run backup image: disaster recovery is Supabase-native managed backups + PITR (ADR-040), so the former backup-nightly cron image is retired.
D2 — Base image: uniform debian-slim-trixie family
- Python services (api, workers):
python:3.14-slim-trixie(GIL build) - Node services (dashboard, operations):
node:24-slim-trixie
Alpine rejected (musl breaks Python C extensions). Distroless rejected (no shell/apt for build and ops ergonomics at alpha). Chainguard rejected (paid tier + new substrate + marginal value at alpha). Trixie over bookworm — a fresh-start alpha begins on current stable, not oldstable.
Python 3.14 GIL build, not the free-threaded variant (free-threaded wheel compat still maturing). Revisit trigger: ecosystem maturity on specific deps.
D3 — Multi-stage build conventions
- Per-service Dockerfile at
infra/docker/<service>.Dockerfile; the collapsed runtime isapp.Dockerfile - Two-stage (builder + runtime) for Python/Node
- Build context = repo root;
.dockerignoretrims context - Non-root user
spectral(UID/GID 1000); workdir/app;/etc/spectral/for secret files - No tini; uvicorn + the worker runtime handle SIGTERM correctly
- HEALTHCHECK lives in the Cloudflare container deploy config (ADR-109 / ADR-110), not in the Dockerfile
- Python env hygiene:
PYTHONDONTWRITEBYTECODE=1,PYTHONUNBUFFERED=1 - BuildKit cache mounts for uv + pnpm stores (perf only; no secrets)
- Layer order: lockfile → install → source (cache-friendly)
- Base-image digest pinning per Dockerfile — Dependabot-docker manages refresh (D7)
D4 — Local compose for production-like dev flow
- Location
infra/local/compose.yml; env atinfra/local/.env(gitignored;.env.examplecommitted) - Wrapped via
pnpm compose:*scripts in the repo-rootpackage.json - Postgres comes from
supabase starton the host (not duplicated in compose); services reach it athost.docker.internal:54322 - Profiles scope subsets: default = api+workers;
frontendadds dashboard + operations - No code volume mounts — compose tests the built image, not hot-reload.
pnpm devowns fast iteration (ADR-046 D13); compose is the production-like secondary - Ports match
pnpm devdefaults — the two flows are mutually exclusive
D5 — Version-string surface; generation-stamping lives in the CD pipeline
Build-time infra/docker/build-version.sh emits version.json (sha, short_sha, describe, built_at, uv_lock_sha, pnpm_lock_sha); each Dockerfile COPYs it into /app/version.json. The running version is surfaced on /health (which reports the package version + per-feature wiring) and read by the operator routes; there is no separate /version endpoint.
Deploy triggering, the deployment-generation cutover, SPECTRAL_GENERATION placement, and the release-only changelog are settled by ADR-053 (the CD pipeline) and ADR-109 D5 (generation-based cutover). SPECTRAL_GENERATION is set per-service as a container var at deploy, never via a shared config backend, so updating shared config never re-stamps a running instance; core.deployments allocates a generation atomically via INSERT ... RETURNING generation (ADR-109 D5).
D6 — Build-time vs runtime secrets
Principle: build is public, runtime is secret.
- No secret
ARGs (visible indocker history), no secretCOPYs, no authenticatedRUNs - BuildKit cache mounts for perf only
- Runtime secrets flow:
tools/provision/provision.shreconciles from 1Password (ADR-110) into the GitHub Environment, and the deploy sets them on the Worker, which forwards them into the container (the secret hop) — as env vars or, for blobs, Secret Files under/etc/spectral/ - Per-service scoped credentials — no shared superuser keys
- Non-root user owns
/appand/etc/spectral/ - Rotation = update the 1Password-backed config (ADR-110) → re-publish the Environment → redeploy → the container restarts and reloads
The provisioning scripts are the sole system-documented interface for secret values. Upstream sources (where the operator reads values from) are out of system scope (ADR-037 D5).
D7 — Image lifecycle, rollback retention, SBOM
Cloudflare owns container-version retention (ADR-109 / ADR-110). Per-plan defaults; not tuned at dogfood. core.deployments retention is indefinite.
Rollback is forward-fix (ADR-053 D11): fix on main, fast-forward production, redeploy at a new generation; the prior generation’s outbox rows stop being claimed once it is no longer deployed. A loss past Cloudflare/Supabase retention or an upstream-yanked dependency escalates to DR per ADR-040. No custom rollback tooling.
SBOM: CycloneDX JSON via syft. Generated by a parallel GitHub Actions workflow on product-version tag push (v*), uploaded to the GitHub Release assets. Branch-push deploys do not trigger it — would drown the signal.
Base-image refresh: Dependabot-docker, monthly cadence. Grouped PRs per base family (python, node, debian, astral-tooling). On-CVE bumps surface out-of-cadence through the same channel. Dependency updates also cover github-actions.
Image signing / SLSA attestation / hadolint / trivy / multi-arch — all deferred. Revisit triggers under D1.
Alternatives considered
GHCR build-and-push from start. Rejected (substrate overhead at alpha; revisit-triggered).
Alpine musl. Rejected (Python C-extension pain outweighs the size win).
Distroless everywhere. Rejected (no shell/apt for build + ops ergonomics at alpha).
Chainguard. Rejected (paid tier + new substrate + marginal value).
Two separate deployed containers (API and workers split now). Rejected for alpha per ADR-109 — a single combined image is simpler to build, deploy, and keep warm; the separately-launchable Dockerfiles preserve the split option without a rewrite.
A self-run backup container (pg_dump + object-store upload on a cron). Rejected — DR is Supabase-native managed backups + PITR (ADR-040); a custom backup image is operational tax with no offsetting benefit on a regenerable alpha.
DB-read generation at container boot. Rejected (boot-time races; chicken-and-egg during a rolling deploy). Generation is set as a deploy-time var instead (D5).
Shared base-images.env constants file. Rejected (Dependabot reads the Dockerfile FROM, not env files).
Repo-root compose.yml. Rejected (infra/local/ matches infra/docker/).
Consequences
- Minimum substrate surface at alpha: one Dockerfile set, one CI, one compute vendor (Cloudflare), no separate registry, no backup container.
- One deployed app container with the per-service Dockerfiles retained as the re-split / human-portal seam.
- Local dev parity via compose using the same Dockerfiles.
- Dependabot automates base-image refresh — no human-in-the-loop memory requirement.
- No bit-identical staging→prod promotion — mitigated by lockfile + digest pin.
- No image signing / SLSA at alpha — revisit-triggered; SBOM covers most of the value.
- Deploy/tag/generation mechanics are not duplicated here — they live in ADR-053 + ADR-109.
References
- ADR-109 — the deployed topology (one app container); generation-based cutover
- ADR-110 — provisioning via wrangler + CLIs + 1Password; the secret hop
- ADR-053 — CD pipeline; deploy trigger, generation stamping, release-only changelog
- ADR-040 — DR is Supabase-native managed backups + PITR (no backup container)
- ADR-046 — turbo + pnpm + uvicorn convention;
pnpm devfast-iteration seam - ADR-037 — secret-value sourcing is out of system scope (D5)
- ADR-065 —
spectral.coreadmission discipline (no surface added here) infra/docker/— the Dockerfile set (app.Dockerfile+app_supervisor.py; per-service images)infra/local/compose.yml— local composeinfra/docker/build-version.sh—version.jsonbuild surface.github/dependabot.yml— base-image refresh.github/workflows/generate-sbom.yml— SBOM workflow