ADR-049: Container strategy — Render-native Dockerfile builds, debian-slim-trixie family, five-image inventory
Status: Accepted (2026-04-24)
Context
TA-20 produced runnable artifacts for five container images (api, workers, dashboard, operations, backup-nightly) plus two static Pages projects (docs-user, docs-codex) under the Render alpha PaaS substrate (per ADR-046 D1). The decision surface spans build strategy, base image, multi-stage conventions, local dev parity, tagging + promotion, secrets, lifecycle + SBOM. Prior art: a Daybreak / ZzzAPI platform-native build pattern validated in production.
The disposition resolved a load-bearing question early — Render-native Dockerfile builds versus GHCR build-and-push from start. Render-native won at alpha for substrate minimalism, with GHCR upgrade-path documented behind named revisit triggers. A second early call collapsed the image count from six (one per Render service) to five by reusing the workers image for the retention-run cron service.
Decision
D1 — Build strategy: Render-native Dockerfile, no external registry
Render builds from the repo Dockerfile on deploy trigger. Digest-pinned base + lockfile-frozen deps close drift risk (same recipe, not same bits). SBOM generated as a build artifact.
GHCR upgrade-path documented with revisit triggers: first enterprise SLSA/signing ask; an incident where a bad rebuild blocks deploy; a second compute target; native-extension build-host drift.
Image-count refinement. Six Render service definitions (per ADR-048 D1) map to five container images. The retention-run cron reuses the workers image (Pattern A — cron posts an event; workers handle via the substrate). The backup-nightly cron has its own image (Pattern B — keeps pg_dump + age + GCS creds off the workers attack surface).
D2 — Base image: uniform debian-slim-trixie family
- Python services (api, workers):
python:3.14-slim-trixie(GIL build) - Node services (dashboard, operations):
node:24-slim-trixie - backup-nightly:
debian:trixie-slim(bash-based, no Python runtime)
Alpine rejected (musl breaks Python C extensions). Distroless rejected (cannot run backup-nightly; apt required for pg_dump). Chainguard rejected (paid tier + new substrate + marginal value at alpha). Trixie over bookworm — fresh-start alpha starts on current stable, not oldstable.
Python 3.14 GIL build, not the free-threaded variant (free-threaded wheel compat still maturing). Revisit trigger: ecosystem maturity on specific deps.
D3 — Multi-stage build conventions
- Per-service Dockerfile at
infra/docker/<service>.Dockerfile - Two-stage (builder + runtime) for Python/Node; single-stage for backup-nightly
- Build context = repo root;
.dockerignoretrims context - Non-root user
spectral(UID/GID 1000); workdir/app;/etc/spectral/for secret files - No tini; uvicorn + worker runtime + backup-nightly handle SIGTERM correctly
- HEALTHCHECK lives in
render.yaml, not in the Dockerfile - Python env hygiene:
PYTHONDONTWRITEBYTECODE=1,PYTHONUNBUFFERED=1 - BuildKit cache mounts for uv + pnpm stores (perf only; no secrets)
- Layer order: lockfile → install → source (cache-friendly)
- Base image digest pinning per Dockerfile — Dependabot-docker manages refresh (D7)
D4 — Local compose for production-like dev flow
- Location
infra/local/compose.yml; env atinfra/local/.env(gitignored;.env.examplecommitted) - Wrapped via
pnpm compose:*scripts in repo-rootpackage.json - Postgres comes from
supabase starton the host (not duplicated in compose); services reach athost.docker.internal:54322 - Profiles scope subsets: default = api+workers;
frontendadds dashboard + operations;backupadds backup-nightly + fake-gcs-server;fullis everything - No code volume mounts — compose tests built image, not hot-reload.
pnpm devowns fast iteration per ADR-046 D13; compose is the production-like secondary - Ports match
pnpm devdefaults — the two flows are mutually exclusive
D5 — Image tagging, promotion, and generation-stamping protocol
Two tag lineages coexist:
| Purpose | Pattern | Frequency | Consumer |
|---|---|---|---|
| Deploy trigger | prod-N (monotonic integer) | every prod deploy | GH Actions → Render deploy hooks |
| Product milestone | vX.Y.Z | curated (rare) | git-cliff → CHANGELOG → GH Releases |
SemVer pre-release qualifiers (alpha.N) rejected — pre-release nomenclature is for distributed artifacts; we deploy.
Trigger model. main push → GH Actions calls Render deploy hooks for services affected per .github/deploy-manifest.yml. prod-N tag push → same mechanism against prod env, diffed since the last prod tag. render.yaml sets autoDeploy: false uniformly — GH Actions orchestrates both envs.
Version-string surface (extends ADR-048 D7). Build-time infra/docker/build-version.sh emits version.json with sha, short_sha, describe, built_at, uv_lock_sha, pnpm_lock_sha. Each Dockerfile COPYs version.json into /app/version.json. /version reads short_sha + runtime SPECTRAL_GENERATION; /version/detail (auth-gated) reads the whole file + generation.
Race-free generation-stamp protocol. The 12-step cutover contract is codified in ADR-053 D9. Original D5 step 4 (Update Render env group SPECTRAL_GENERATION=N) was corrected by ADR-053 D7 to per-service env var placement to prevent env-group-update auto-redeploy of blue services with new generation. core.deployments rows get an atomic generation via INSERT ... RETURNING generation (ADR-048 D5).
Race C (Render pod crash-restart env-snapshot semantics during rolling window) materially mitigated for SPECTRAL_GENERATION once it is per-service (per the ADR-053 D7 correction); a fallback path of injecting generation as image build-arg at build time is identified for further mitigation if needed.
Race D (concurrent deploy workflows) prevented by the workflow-level concurrency mutex codified in ADR-053.
D6 — Build-time vs runtime secrets
Principle: build is public, runtime is secret.
- No secret
ARGs (visible indocker history), no secretCOPYs, no authenticatedRUNs - BuildKit cache mounts for perf only
- Runtime secrets flow:
tools/provision/setup.shprompts operator → Render Env Group → container env var or Secret File - Secret Files preferred over env vars for blobs > 1 KB or anything naturally a file (GCS service account JSON at
/etc/spectral/gcs-sa.json) - Per-service scoped credentials — no shared superuser keys; backup role is read-only +
pg_dumpscope; GCS scope is write-only on the backup bucket - Non-root user owns
/appand/etc/spectral/ - Rotation = env-group update → Render auto-redeploys affected services → containers restart and reload
The provisioning shell scripts are the sole system-documented interface for secret values. Upstream sources (where the operator reads values from) are out of system scope (ADR-037 D5).
D7 — Image lifecycle, rollback retention, SBOM
Render owns image retention. Per-plan defaults; not tuned at alpha. core.deployments retention indefinite (used by legacy-drain).
Rollback tree:
- Render UI rollback button — primary, in-retention cases.
- Retag older commit + rebuild — past retention, deps still resolvable.
- Declare DR per ADR-040 — both above failed.
No custom rollback tooling; alpha uses Render’s button 95% of the time. Long-tail rollback = D1 revisit trigger.
SBOM: CycloneDX JSON via syft. Generated by a parallel GH Actions workflow on product-version tag push (v*), uploaded to GH Release assets. Deploy tags (prod-N) do not trigger — would drown the signal.
Base-image refresh: Dependabot-docker, monthly cadence. Grouped PRs per base family (python, node, debian, astral-tooling). On-CVE bumps surface out-of-cadence through the same channel. Dependency updates also cover github-actions.
Image signing / SLSA attestation / hadolint / trivy / multi-arch — all deferred. Revisit triggers under D1.
Alternatives considered
GHCR build-and-push from start. Rejected (substrate overhead at alpha; revisit-triggered).
Alpine musl. Rejected (Python C-extension pain outweighs the ~80 MB size win).
Distroless everywhere. Rejected (backup-nightly cannot run; apt required).
Chainguard. Rejected (paid tier + new substrate + marginal value).
Full Pattern A for backup cron (cron-posts-event + worker handles). Rejected — HANDLER_MAX=60s clash + concentrating backup tooling in workers image defeats the scoped-cred principle.
Backup as Python orchestrator around shell commands. Rejected — the pipeline is one line of bash; Python orchestration is consistency-with-stack bias, not honest fit.
SemVer pre-release deploy tags (v0.3.0-alpha.N). Rejected — we deploy, not distribute.
Single combined tag (vX.Y.Z.N). Rejected in favor of two lineages.
DB-read generation at container boot. Rejected (boot-time races; chicken-and-egg during rolling deploy).
Shared base-images.env constants file. Rejected (Dependabot reads Dockerfile FROM, not env files).
Repo-root compose.yml. Rejected (root already crowded; infra/local/ matches infra/docker/ + infra/render/).
Consequences
- Minimum substrate surface at alpha: one Dockerfile set, one CI, one PaaS, no registry.
- ZzzAPI-compatible mental model (platform-native build with digest + lockfile pinning).
- Attack surface reduced by the split backup-nightly image (no
pg_dumpor GCS creds on workers). - Local dev parity via compose using the same Dockerfiles.
- Dependabot automates base-image refresh cadence — no human-in-the-loop memory requirement.
- All lifecycle concerns trace to D1 revisit triggers or “Render owns it” at alpha.
- No bit-identical staging→prod promotion — mitigated by lockfile + digest pin.
- No image signing / SLSA at alpha — revisit-triggered.
- No supply-chain attestation beyond SBOM — SBOM covers 95% of value at alpha.
- Long-tail rollback past Render retention + upstream-yanked dep is a real gap → D1 revisit trigger.
- Race C remains an open verification item (TA-21 R1-full checklist); fallback is image-build-arg generation injection.
References
- ADR-065 —
spectral.coreadmission discipline (no surface added here) - ADR-037 — D5 cofounder-discipline carry-forward; D11 provider-swap seam
- ADR-040 — backup-nightly script
- ADR-046 — Render alpha PaaS; turbo + pnpm + uvicorn convention
- ADR-048 — six Render services (mapped to five images here); generation stamping
- ADR-053 — CD pipeline orchestration; cutover sequence;
SPECTRAL_GENERATIONper-service correction - ADR-061 — backup-nightly bats + fake-gcs follow-on (TA-23 carry-forward queue)
- TA-20 disposition — SPEC-323 comment
8bfbb8c9 - TA-20 verification — SPEC-323 comment
fca01b28 - TA-26
SPECTRAL_GENERATIONplacement correction — SPEC-323 comment47627e6a infra/docker/— landed Dockerfile set (commitfad1f2a)infra/local/— local composetools/ops/backup/backup-nightly.sh— backup pipeline.github/dependabot.yml— base-image refresh.github/workflows/generate-sbom.yml— SBOM workflow- Codex
system-design/container-strategy.mdx— close-pass new page - Codex
developer-guide/local-dev.mdx— close-pass compose updates