Skip to content
GitHub
Decisions

ADR-049: Container build strategy — repo Dockerfiles, debian-slim-trixie base, one deployed app container

Context

The runtime artifacts are container images built from repo Dockerfiles in infra/docker/, deployed to Cloudflare Containers. The deployed shape is settled by ADR-109 (one collapsed app container running both the API and workers entrypoints); this ADR settles the build strategy beneath it — base image, multi-stage conventions, local-dev parity, build-vs-runtime secrets, lifecycle, SBOM. Prior art: a platform-native Dockerfile build pattern (digest-pinned base + lockfile-frozen deps) validated in production.

Decision

D1 — Build strategy: repo Dockerfiles, deployed to Cloudflare Containers via wrangler

Images build from repo Dockerfiles and deploy to Cloudflare Containers via wrangler (ADR-109 / ADR-110). Digest-pinned base + lockfile-frozen deps close drift risk (same recipe, not same bits). An SBOM is generated as a build artifact (D7).

A GHCR build-and-push upgrade path stays documented behind named revisit triggers: first enterprise SLSA/signing ask; an incident where a bad rebuild blocks deploy; a second compute target; native-extension build-host drift.

Image inventory. The deployed runtime at alpha is one app containerinfra/docker/app.Dockerfile launched by app_supervisor.py, running the API and workers entrypoints as sibling processes (ADR-109 D1). The per-service api.Dockerfile / workers.Dockerfile (and the frontend dashboard.Dockerfile / operations.Dockerfile) are retained as the separately-launchable seam — the option to split a tier (or stand the human portals on their own host) onto its own image later without a code change. There is no self-run backup image: disaster recovery is Supabase-native managed backups + PITR (ADR-040), so the former backup-nightly cron image is retired.

D2 — Base image: uniform debian-slim-trixie family

  • Python services (api, workers): python:3.14-slim-trixie (GIL build)
  • Node services (dashboard, operations): node:24-slim-trixie

Alpine rejected (musl breaks Python C extensions). Distroless rejected (no shell/apt for build and ops ergonomics at alpha). Chainguard rejected (paid tier + new substrate + marginal value at alpha). Trixie over bookworm — a fresh-start alpha begins on current stable, not oldstable.

Python 3.14 GIL build, not the free-threaded variant (free-threaded wheel compat still maturing). Revisit trigger: ecosystem maturity on specific deps.

D3 — Multi-stage build conventions

  • Per-service Dockerfile at infra/docker/<service>.Dockerfile; the collapsed runtime is app.Dockerfile
  • Two-stage (builder + runtime) for Python/Node
  • Build context = repo root; .dockerignore trims context
  • Non-root user spectral (UID/GID 1000); workdir /app; /etc/spectral/ for secret files
  • No tini; uvicorn + the worker runtime handle SIGTERM correctly
  • HEALTHCHECK lives in the Cloudflare container deploy config (ADR-109 / ADR-110), not in the Dockerfile
  • Python env hygiene: PYTHONDONTWRITEBYTECODE=1, PYTHONUNBUFFERED=1
  • BuildKit cache mounts for uv + pnpm stores (perf only; no secrets)
  • Layer order: lockfile → install → source (cache-friendly)
  • Base-image digest pinning per Dockerfile — Dependabot-docker manages refresh (D7)

D4 — Local compose for production-like dev flow

  • Location infra/local/compose.yml; env at infra/local/.env (gitignored; .env.example committed)
  • Wrapped via pnpm compose:* scripts in the repo-root package.json
  • Postgres comes from supabase start on the host (not duplicated in compose); services reach it at host.docker.internal:54322
  • Profiles scope subsets: default = api+workers; frontend adds dashboard + operations
  • No code volume mounts — compose tests the built image, not hot-reload. pnpm dev owns fast iteration (ADR-046 D13); compose is the production-like secondary
  • Ports match pnpm dev defaults — the two flows are mutually exclusive

D5 — Version-string surface; generation-stamping lives in the CD pipeline

Build-time infra/docker/build-version.sh emits version.json (sha, short_sha, describe, built_at, uv_lock_sha, pnpm_lock_sha); each Dockerfile COPYs it into /app/version.json. The running version is surfaced on /health (which reports the package version + per-feature wiring) and read by the operator routes; there is no separate /version endpoint.

Deploy triggering, the deployment-generation cutover, SPECTRAL_GENERATION placement, and the release-only changelog are settled by ADR-053 (the CD pipeline) and ADR-109 D5 (generation-based cutover). SPECTRAL_GENERATION is set per-service as a container var at deploy, never via a shared config backend, so updating shared config never re-stamps a running instance; core.deployments allocates a generation atomically via INSERT ... RETURNING generation (ADR-109 D5).

D6 — Build-time vs runtime secrets

Principle: build is public, runtime is secret.

  • No secret ARGs (visible in docker history), no secret COPYs, no authenticated RUNs
  • BuildKit cache mounts for perf only
  • Runtime secrets flow: tools/provision/provision.sh reconciles from 1Password (ADR-110) into the GitHub Environment, and the deploy sets them on the Worker, which forwards them into the container (the secret hop) — as env vars or, for blobs, Secret Files under /etc/spectral/
  • Per-service scoped credentials — no shared superuser keys
  • Non-root user owns /app and /etc/spectral/
  • Rotation = update the 1Password-backed config (ADR-110) → re-publish the Environment → redeploy → the container restarts and reloads

The provisioning scripts are the sole system-documented interface for secret values. Upstream sources (where the operator reads values from) are out of system scope (ADR-037 D5).

D7 — Image lifecycle, rollback retention, SBOM

Cloudflare owns container-version retention (ADR-109 / ADR-110). Per-plan defaults; not tuned at dogfood. core.deployments retention is indefinite.

Rollback is forward-fix (ADR-053 D11): fix on main, fast-forward production, redeploy at a new generation; the prior generation’s outbox rows stop being claimed once it is no longer deployed. A loss past Cloudflare/Supabase retention or an upstream-yanked dependency escalates to DR per ADR-040. No custom rollback tooling.

SBOM: CycloneDX JSON via syft. Generated by a parallel GitHub Actions workflow on product-version tag push (v*), uploaded to the GitHub Release assets. Branch-push deploys do not trigger it — would drown the signal.

Base-image refresh: Dependabot-docker, monthly cadence. Grouped PRs per base family (python, node, debian, astral-tooling). On-CVE bumps surface out-of-cadence through the same channel. Dependency updates also cover github-actions.

Image signing / SLSA attestation / hadolint / trivy / multi-arch — all deferred. Revisit triggers under D1.

Alternatives considered

GHCR build-and-push from start. Rejected (substrate overhead at alpha; revisit-triggered).

Alpine musl. Rejected (Python C-extension pain outweighs the size win).

Distroless everywhere. Rejected (no shell/apt for build + ops ergonomics at alpha).

Chainguard. Rejected (paid tier + new substrate + marginal value).

Two separate deployed containers (API and workers split now). Rejected for alpha per ADR-109 — a single combined image is simpler to build, deploy, and keep warm; the separately-launchable Dockerfiles preserve the split option without a rewrite.

A self-run backup container (pg_dump + object-store upload on a cron). Rejected — DR is Supabase-native managed backups + PITR (ADR-040); a custom backup image is operational tax with no offsetting benefit on a regenerable alpha.

DB-read generation at container boot. Rejected (boot-time races; chicken-and-egg during a rolling deploy). Generation is set as a deploy-time var instead (D5).

Shared base-images.env constants file. Rejected (Dependabot reads the Dockerfile FROM, not env files).

Repo-root compose.yml. Rejected (infra/local/ matches infra/docker/).

Consequences

  • Minimum substrate surface at alpha: one Dockerfile set, one CI, one compute vendor (Cloudflare), no separate registry, no backup container.
  • One deployed app container with the per-service Dockerfiles retained as the re-split / human-portal seam.
  • Local dev parity via compose using the same Dockerfiles.
  • Dependabot automates base-image refresh — no human-in-the-loop memory requirement.
  • No bit-identical staging→prod promotion — mitigated by lockfile + digest pin.
  • No image signing / SLSA at alpha — revisit-triggered; SBOM covers most of the value.
  • Deploy/tag/generation mechanics are not duplicated here — they live in ADR-053 + ADR-109.

References

  • ADR-109 — the deployed topology (one app container); generation-based cutover
  • ADR-110 — provisioning via wrangler + CLIs + 1Password; the secret hop
  • ADR-053 — CD pipeline; deploy trigger, generation stamping, release-only changelog
  • ADR-040 — DR is Supabase-native managed backups + PITR (no backup container)
  • ADR-046 — turbo + pnpm + uvicorn convention; pnpm dev fast-iteration seam
  • ADR-037 — secret-value sourcing is out of system scope (D5)
  • ADR-065spectral.core admission discipline (no surface added here)
  • infra/docker/ — the Dockerfile set (app.Dockerfile + app_supervisor.py; per-service images)
  • infra/local/compose.yml — local compose
  • infra/docker/build-version.shversion.json build surface
  • .github/dependabot.yml — base-image refresh
  • .github/workflows/generate-sbom.yml — SBOM workflow