Skip to content
GitHub
Decisions

ADR-049: Container strategy — Render-native Dockerfile builds, debian-slim-trixie family, five-image inventory

Status: Accepted (2026-04-24)

Context

TA-20 produced runnable artifacts for five container images (api, workers, dashboard, operations, backup-nightly) plus two static Pages projects (docs-user, docs-codex) under the Render alpha PaaS substrate (per ADR-046 D1). The decision surface spans build strategy, base image, multi-stage conventions, local dev parity, tagging + promotion, secrets, lifecycle + SBOM. Prior art: a Daybreak / ZzzAPI platform-native build pattern validated in production.

The disposition resolved a load-bearing question early — Render-native Dockerfile builds versus GHCR build-and-push from start. Render-native won at alpha for substrate minimalism, with GHCR upgrade-path documented behind named revisit triggers. A second early call collapsed the image count from six (one per Render service) to five by reusing the workers image for the retention-run cron service.

Decision

D1 — Build strategy: Render-native Dockerfile, no external registry

Render builds from the repo Dockerfile on deploy trigger. Digest-pinned base + lockfile-frozen deps close drift risk (same recipe, not same bits). SBOM generated as a build artifact.

GHCR upgrade-path documented with revisit triggers: first enterprise SLSA/signing ask; an incident where a bad rebuild blocks deploy; a second compute target; native-extension build-host drift.

Image-count refinement. Six Render service definitions (per ADR-048 D1) map to five container images. The retention-run cron reuses the workers image (Pattern A — cron posts an event; workers handle via the substrate). The backup-nightly cron has its own image (Pattern B — keeps pg_dump + age + GCS creds off the workers attack surface).

D2 — Base image: uniform debian-slim-trixie family

  • Python services (api, workers): python:3.14-slim-trixie (GIL build)
  • Node services (dashboard, operations): node:24-slim-trixie
  • backup-nightly: debian:trixie-slim (bash-based, no Python runtime)

Alpine rejected (musl breaks Python C extensions). Distroless rejected (cannot run backup-nightly; apt required for pg_dump). Chainguard rejected (paid tier + new substrate + marginal value at alpha). Trixie over bookworm — fresh-start alpha starts on current stable, not oldstable.

Python 3.14 GIL build, not the free-threaded variant (free-threaded wheel compat still maturing). Revisit trigger: ecosystem maturity on specific deps.

D3 — Multi-stage build conventions

  • Per-service Dockerfile at infra/docker/<service>.Dockerfile
  • Two-stage (builder + runtime) for Python/Node; single-stage for backup-nightly
  • Build context = repo root; .dockerignore trims context
  • Non-root user spectral (UID/GID 1000); workdir /app; /etc/spectral/ for secret files
  • No tini; uvicorn + worker runtime + backup-nightly handle SIGTERM correctly
  • HEALTHCHECK lives in render.yaml, not in the Dockerfile
  • Python env hygiene: PYTHONDONTWRITEBYTECODE=1, PYTHONUNBUFFERED=1
  • BuildKit cache mounts for uv + pnpm stores (perf only; no secrets)
  • Layer order: lockfile → install → source (cache-friendly)
  • Base image digest pinning per Dockerfile — Dependabot-docker manages refresh (D7)

D4 — Local compose for production-like dev flow

  • Location infra/local/compose.yml; env at infra/local/.env (gitignored; .env.example committed)
  • Wrapped via pnpm compose:* scripts in repo-root package.json
  • Postgres comes from supabase start on the host (not duplicated in compose); services reach at host.docker.internal:54322
  • Profiles scope subsets: default = api+workers; frontend adds dashboard + operations; backup adds backup-nightly + fake-gcs-server; full is everything
  • No code volume mounts — compose tests built image, not hot-reload. pnpm dev owns fast iteration per ADR-046 D13; compose is the production-like secondary
  • Ports match pnpm dev defaults — the two flows are mutually exclusive

D5 — Image tagging, promotion, and generation-stamping protocol

Two tag lineages coexist:

PurposePatternFrequencyConsumer
Deploy triggerprod-N (monotonic integer)every prod deployGH Actions → Render deploy hooks
Product milestonevX.Y.Zcurated (rare)git-cliff → CHANGELOG → GH Releases

SemVer pre-release qualifiers (alpha.N) rejected — pre-release nomenclature is for distributed artifacts; we deploy.

Trigger model. main push → GH Actions calls Render deploy hooks for services affected per .github/deploy-manifest.yml. prod-N tag push → same mechanism against prod env, diffed since the last prod tag. render.yaml sets autoDeploy: false uniformly — GH Actions orchestrates both envs.

Version-string surface (extends ADR-048 D7). Build-time infra/docker/build-version.sh emits version.json with sha, short_sha, describe, built_at, uv_lock_sha, pnpm_lock_sha. Each Dockerfile COPYs version.json into /app/version.json. /version reads short_sha + runtime SPECTRAL_GENERATION; /version/detail (auth-gated) reads the whole file + generation.

Race-free generation-stamp protocol. The 12-step cutover contract is codified in ADR-053 D9. Original D5 step 4 (Update Render env group SPECTRAL_GENERATION=N) was corrected by ADR-053 D7 to per-service env var placement to prevent env-group-update auto-redeploy of blue services with new generation. core.deployments rows get an atomic generation via INSERT ... RETURNING generation (ADR-048 D5).

Race C (Render pod crash-restart env-snapshot semantics during rolling window) materially mitigated for SPECTRAL_GENERATION once it is per-service (per the ADR-053 D7 correction); a fallback path of injecting generation as image build-arg at build time is identified for further mitigation if needed.

Race D (concurrent deploy workflows) prevented by the workflow-level concurrency mutex codified in ADR-053.

D6 — Build-time vs runtime secrets

Principle: build is public, runtime is secret.

  • No secret ARGs (visible in docker history), no secret COPYs, no authenticated RUNs
  • BuildKit cache mounts for perf only
  • Runtime secrets flow: tools/provision/setup.sh prompts operator → Render Env Group → container env var or Secret File
  • Secret Files preferred over env vars for blobs > 1 KB or anything naturally a file (GCS service account JSON at /etc/spectral/gcs-sa.json)
  • Per-service scoped credentials — no shared superuser keys; backup role is read-only + pg_dump scope; GCS scope is write-only on the backup bucket
  • Non-root user owns /app and /etc/spectral/
  • Rotation = env-group update → Render auto-redeploys affected services → containers restart and reload

The provisioning shell scripts are the sole system-documented interface for secret values. Upstream sources (where the operator reads values from) are out of system scope (ADR-037 D5).

D7 — Image lifecycle, rollback retention, SBOM

Render owns image retention. Per-plan defaults; not tuned at alpha. core.deployments retention indefinite (used by legacy-drain).

Rollback tree:

  1. Render UI rollback button — primary, in-retention cases.
  2. Retag older commit + rebuild — past retention, deps still resolvable.
  3. Declare DR per ADR-040 — both above failed.

No custom rollback tooling; alpha uses Render’s button 95% of the time. Long-tail rollback = D1 revisit trigger.

SBOM: CycloneDX JSON via syft. Generated by a parallel GH Actions workflow on product-version tag push (v*), uploaded to GH Release assets. Deploy tags (prod-N) do not trigger — would drown the signal.

Base-image refresh: Dependabot-docker, monthly cadence. Grouped PRs per base family (python, node, debian, astral-tooling). On-CVE bumps surface out-of-cadence through the same channel. Dependency updates also cover github-actions.

Image signing / SLSA attestation / hadolint / trivy / multi-arch — all deferred. Revisit triggers under D1.

Alternatives considered

GHCR build-and-push from start. Rejected (substrate overhead at alpha; revisit-triggered).

Alpine musl. Rejected (Python C-extension pain outweighs the ~80 MB size win).

Distroless everywhere. Rejected (backup-nightly cannot run; apt required).

Chainguard. Rejected (paid tier + new substrate + marginal value).

Full Pattern A for backup cron (cron-posts-event + worker handles). Rejected — HANDLER_MAX=60s clash + concentrating backup tooling in workers image defeats the scoped-cred principle.

Backup as Python orchestrator around shell commands. Rejected — the pipeline is one line of bash; Python orchestration is consistency-with-stack bias, not honest fit.

SemVer pre-release deploy tags (v0.3.0-alpha.N). Rejected — we deploy, not distribute.

Single combined tag (vX.Y.Z.N). Rejected in favor of two lineages.

DB-read generation at container boot. Rejected (boot-time races; chicken-and-egg during rolling deploy).

Shared base-images.env constants file. Rejected (Dependabot reads Dockerfile FROM, not env files).

Repo-root compose.yml. Rejected (root already crowded; infra/local/ matches infra/docker/ + infra/render/).

Consequences

  • Minimum substrate surface at alpha: one Dockerfile set, one CI, one PaaS, no registry.
  • ZzzAPI-compatible mental model (platform-native build with digest + lockfile pinning).
  • Attack surface reduced by the split backup-nightly image (no pg_dump or GCS creds on workers).
  • Local dev parity via compose using the same Dockerfiles.
  • Dependabot automates base-image refresh cadence — no human-in-the-loop memory requirement.
  • All lifecycle concerns trace to D1 revisit triggers or “Render owns it” at alpha.
  • No bit-identical staging→prod promotion — mitigated by lockfile + digest pin.
  • No image signing / SLSA at alpha — revisit-triggered.
  • No supply-chain attestation beyond SBOM — SBOM covers 95% of value at alpha.
  • Long-tail rollback past Render retention + upstream-yanked dep is a real gap → D1 revisit trigger.
  • Race C remains an open verification item (TA-21 R1-full checklist); fallback is image-build-arg generation injection.

References

  • ADR-065spectral.core admission discipline (no surface added here)
  • ADR-037 — D5 cofounder-discipline carry-forward; D11 provider-swap seam
  • ADR-040 — backup-nightly script
  • ADR-046 — Render alpha PaaS; turbo + pnpm + uvicorn convention
  • ADR-048 — six Render services (mapped to five images here); generation stamping
  • ADR-053 — CD pipeline orchestration; cutover sequence; SPECTRAL_GENERATION per-service correction
  • ADR-061 — backup-nightly bats + fake-gcs follow-on (TA-23 carry-forward queue)
  • TA-20 disposition — SPEC-323 comment 8bfbb8c9
  • TA-20 verification — SPEC-323 comment fca01b28
  • TA-26 SPECTRAL_GENERATION placement correction — SPEC-323 comment 47627e6a
  • infra/docker/ — landed Dockerfile set (commit fad1f2a)
  • infra/local/ — local compose
  • tools/ops/backup/backup-nightly.sh — backup pipeline
  • .github/dependabot.yml — base-image refresh
  • .github/workflows/generate-sbom.yml — SBOM workflow
  • Codex system-design/container-strategy.mdx — close-pass new page
  • Codex developer-guide/local-dev.mdx — close-pass compose updates