Skip to content
GitHub
Decisions

ADR-048: Deployment topology — six Render services + two Cloudflare Pages projects, deployment-generation routing

Status: Accepted (2026-04-22) — D2’s inheritance from ADR-044 D5 superseded by ADR-063; SQL grants between contexts do not ship by default; call flow between contexts goes through framework-layer composition; notification flow between contexts continues via events.

Context

This ADR codifies the deployment topology for Spectral 0.3.0 against the Render alpha PaaS substrate (ADR-046). Scope: service granularity, rollout discipline, deploy coordination, healthcheck + drain contracts, environment separation. Most topology was implied by ADR-046 D2/D5/D6/D7/D16; this ADR makes it explicit and closes the genuinely-new calls: migration hardening, cutover mechanism, worker version coexistence, /version contract, key-exchange auth.

Key architectural result: single-color rolling with deployment-generation stamping replaces both an over-engineered drain-monitor proposal and an under-engineered “handler discipline” proposal. Generation stamping gives a structural guarantee — gen-N events processed by gen-N code — that eliminates cognitive-load tax on handler engineers.

CD pipeline orchestration is split out as ADR-053 (TA-26). This ADR owns topology + deploy contracts; ADR-053 owns workflow machinery.

Decision

D1 — Six Render deployables + two Cloudflare Pages projects per environment

  • Render: api (web, FastAPI+uvicorn), dashboard (web, TanStack Start, app.runspectral.com), operations (web, TanStack Start, ops.runspectral.com), workers (background worker, LISTEN/NOTIFY consumer), retention-run (cron, ADR-042 DELETE sweep), backup-nightly (cron, ADR-040 pg_dump → age → GCS).
  • Cloudflare Pages: docs-user (public, docs.runspectral.com), docs-codex (Pages Function JWKS auth, codex.runspectral.com).
  • Split axes: runtime profile, audience, HTTP surface. Not by context — contexts are code boundaries, not deploy boundaries. Per ADR-049 D1, the six Render services map to five container images (the retention-run cron reuses the workers image).

D2 — Single workers service at alpha; outbox architecture collapsed

  • Single core.outbox table (collapses per-context outbox tables from the original ADR-044 D3).
  • Single core.event_handled keyed on (handler_name, idempotency_key) with handler_name scope-qualified (e.g., worlds.scan_completed_indexer).
  • source column (rename from source_bc; envelope field also renamed; SourceBC Literal alias dropped).
  • content_class column on both tables; future-proofs per-class retention divergence.
  • channel column with default 'outbox_default'; publisher sets explicitly when taxonomy expands; core.outbox_notify() trigger reads NEW.channel and pg_notifys — function stays frozen forever.
  • Inheritance from ADR-044 D5 superseded by ADR-063 — no SQL grants between contexts. Notification flow between contexts continues via events; call flow between contexts uses DI through the workers entrypoint composition seam (per ADR-060 / ADR-063). The “single workers” decision is reinforced by ADR-060 D-runtime — workers IS the framework-layer composition seam where dependencies between contexts wire up at startup.

D3 — Docs on Cloudflare Pages, not static-mount

  • docs.runspectral.com → Pages (docs-user), public.
  • codex.runspectral.com → Pages (docs-codex) with a Pages Function enforcing JWKS-local auth + OPERATIONS_SCOPES check (same pattern as Operations Start per ADR-046 D9).
  • Deploy via wrangler pages deploy --branch <env> invoked from bash (supply-chain protection per ADR-046 D14).
  • Supersedes part of ADR-046 D5 — Start services no longer static-mount docs. Astro builds output dist/ for Pages, not Start-service public-dirs.

D4 — Supabase branching + Management API + hardened expand/contract

  • Staging: a persistent preview branch of the production Supabase project (one project, two branches).
  • Orchestration: Management API from GH Actions (not the Supabase GitHub integration), enabling tag-based trunk dev.
  • Migration discipline hardening: AST-level compat lint rejecting DROP COLUMN / incompatible ALTER TYPE / DROP TABLE / NOT-NULL-without-default / UNIQUE-on-populated-column without an explicit -- compat: breaking marker (per tools/quality/check_migration_compat.py); schema-version gate in cutover workflow (green must report expected migration head); pre-merge dry-run on a throwaway branch with --with-data; maintenance-window pattern documented for truly breaking migrations.
  • True DB-layer blue/green is not achievable on managed Supabase (no promote/swap primitive; no inbound logical replication). Hardened expand/contract is the realistic path.

D5 — Deployment-generation stamping + single-color rolling

  • Monotonic generation stamped on every outbox row at publish time by the publisher from the SPECTRAL_GENERATION env var. Lives on the outbox row only; envelope stays substrate-agnostic.
  • Scalar SPECTRAL_GENERATION per service. Workers always tied to exactly one generation: WHERE generation = $MY_GENERATION.
  • Per-generation LISTEN channels (outbox_gen_<N>): a V2 worker is structurally incapable of receiving V1 NOTIFY.
  • core.deployments table + core.deployment_generation_seq for the monotonic counter. GH Actions writes a row via INSERT … RETURNING generation — atomic, single round trip. Per-env scope.
  • Reaper re-PENDs stuck IN_FLIGHT rows within own generation (crash recovery); cross-generation orphan-sweep is dropped.
  • Legacy-drain GH workflow (drain-legacy-generation.yml): reads core.deployments for code reference by generation, deploys a temporary worker at that code with SPECTRAL_GENERATION=<target> and SPECTRAL_DRAIN_AND_EXIT=true; worker auto-exits after cooling period; workflow deletes the service.
  • Handler-evolution policy (lightweight, not tooling): deploy handler changes freely by default; flag only for (a) external-contract changes, (b) product-committed time-precise activation semantics, (c) invariants that must be uniformly enforced (prefer DB-level enforcement).
  • Forward trigger for version-gated claims: first handler change that genuinely cannot process prior-generation events.
  • SPECTRAL_GENERATION placement. Lives on the service (per-service env var), set at deploy time as part of the per-service Render API call, NOT in the env group (ADR-053 D7 correction). Env-group changes do not bump generation; the protection is structural rather than configuration-dependent.
  • Cutover sequence is the explicit 12-step contract codified in ADR-053 D9.

D6 — Path-filtered rollout via .github/deploy-manifest.yml

  • The manifest declares path → service mapping; GH Actions diffs the current commit against the previous-deployed commit, maps changed paths to an affected-services set, deploys only those.
  • Force-full-redeploy paths: render.yaml variants, .github/deploy-manifest.yml, infra/**.
  • api + workers are a coupled-deploy pair (generation alignment). If either’s code changes, both redeploy.
  • Render autoDeploy: false on all services; Render buildFilter unused; orchestration is GH-Actions-native.
  • Coverage check: tools/quality/check_deploy_manifest_coverage.py.

D7 — /health + /version + /version/detail + core.workers

  • /health: public, binary — 200 with body ok or 503 with body degraded. No JSON, no check names. Probes: database + auth (vendor-agnostic). LLM/email/storage excluded.
  • /version: public, minimal — service, environment, generation, tag (nullable), color (prod only), reference (short 8-char), deployed_at.
  • /version/detail: auth-gated via dual-path key-exchange middleware. Includes full reference, schema (migration head), runtime/framework/os, deps_lock_hash, build_time, start_time, check statuses with latency.
  • core.workers heartbeat/diagnostic table for the worker equivalent (no HTTP surface).
  • Key-exchange middleware: extracts a key from Authorization: Bearer or X-API-Key, validates against an env-var-sourced registry, mints an internal JWT with the existing scope/issuer/key-format taxonomy; auth middleware is a no-op multi-issuer validator. Secret rotation = deploy side-effect; no rotation playbook.
  • The auth substrate (TA-18 area) extends to support dynamic keys sourced from env vars alongside DB-backed keys (extension noted in ADR-037 carry-forward).
  • Key format: sk_deploy_<32chars> prefix+random.
  • Contract test: Authorization header value never appears in log output.

D8 — Worker drain parameters

  • HANDLER_MAX = 60s (asyncio.wait_for bound on each handler).
  • maxShutdownDelaySeconds = 90s (HANDLER_MAX + 30s buffer; under Render’s 300s ceiling).
  • Reaper interval = 30s; claim TTL = 300s (5× HANDLER_MAX).
  • SPECTRAL_DRAIN_COOLING_SECONDS = 60s default for legacy-drain workers.

D9 — Single region, Virginia (us-east)

All six Render services + Supabase primary + cron jobs in Virginia. Cloudflare Pages / LB globally edge-distributed.

Forward trigger: non-US pilot, regulatory data-residency, p99 latency floor, single-region availability incident.

D10 — Two environments + plan sizing

  • Two Render blueprints: infra/render/production.yaml and infra/render/staging.yaml.
  • Staging: Starter tier, single-color per service.
  • Production: Standard tier at alpha; blue/green pairs for web services (api, dashboard, operations); workers/cron/docs stay single-instance.
  • Two GitHub Environments: staging (auto-deploy on push-main); production (tag-triggered with deployment protection rules).
  • Two Supabase environments: main branch (prod) + persistent preview branch (staging), each with its own core.deployments counter.
  • Two Render Environment Groups: spectral-staging-runtime, spectral-production-runtime; each carries rotating key material.
  • Two Cloudflare Pages targets per docs project.

Alternatives considered

Blue/green worker service pairs. Rejected; SKIP LOCKED + dedup makes overlap correct; amplifies wakeups for zero customer benefit.

External deploy-pipeline drain monitor. Rejected; second source of truth competing with substrate guarantees.

Render preDeployCommand for migrations. Rejected; GH Actions orchestrates all deploys.

Static-mount docs on Start services. Rejected (ADR-046 D5 narrowed by D3 here in favor of Pages on day 1).

Supabase GitHub integration. Rejected; push/merge-only, kills tag-based trunk dev.

True DB-layer blue/green on Supabase. Not achievable.

Multi-generation worker subscription. Rejected; violates the structural guarantee.

Pre-insert classification trigger. Rejected; publishers classify events.

One service per context. Rejected (category error — contexts are code boundaries, not deploy boundaries).

Stored procedure for core.deployments insert. Rejected in favor of sequence (Postgres-idiomatic).

LLM/email/storage checks in /health. Rejected; only actionable total-outage dependencies.

/version/detail deferred to post-alpha. Rejected; ships at alpha.

Tag-as-generation. Rejected; SQL ordering needs monotonic integers.

Per-service core.deployments rows. Rejected; generation is event-substrate-scoped.

Consequences

  • Cross-version worker coexistence solved structurally, not by human discipline.
  • Handler-evolution cognitive load collapsed to a three-criterion flag rule.
  • Migration safety hardened at the tool layer (lint + gate + dry-run).
  • Secret rotation = deploy side-effect; no rotation runbook class for the relevant key material.
  • Public /health + /version minimize attack-surface leak; diagnostic info behind auth.
  • Uniform autoDeploy: false + GH Actions across Render + Cloudflare Pages + Supabase.
  • Independent docs deploy cadence via Pages.
  • Eight deployables per env (six Render + two Pages) — the Pages projects are zero-runtime-cost.
  • ~200 LOC new substrate (generation counter + legacy-drain + key-exchange middleware).
  • Supabase branching coupling — both envs affected if branching degrades.
  • Two new tables (core.deployments, core.workers) — small operational surface.
  • Customer-facing API request routing during rollover remains rolling — unavoidable without API-level versioning, to be handled at the API-version level when external call sites demand.
  • TA-15 / ADR-060 D-runtime reinforces the single-workers architecture: agent runtimes (Spectral / Ops / World) all live in workers; framework-layer composition between contexts happens at the workers entrypoint.
  • Race C (Render pod crash-restart env-snapshot semantics during rolling deploy) materially mitigated for SPECTRAL_GENERATION via per-service placement (D5); still matters for other shared values via env group (mitigation captured in docs/runbooks/deployment-topology.md).

References

  • ADR-065spectral.core admission discipline
  • ADR-031 — single-library structure
  • ADR-039 — JWKS-local; scope taxonomy
  • ADR-040 — backup-nightly cron
  • ADR-041 — direct-to-Postgres listener (D9 dedicated connection)
  • ADR-042 — retention-run cron
  • ADR-044 — outbox + envelope contract; D5 partial supersession noted there
  • ADR-046 — Render PaaS
  • ADR-049 — five container images mapped to six Render services
  • ADR-052 — Cloudflare CNAME flip routing for blue/green; Pages Function pattern
  • ADR-053 — CD pipeline orchestration (cutover sequence; pre-merge dry-run; concurrency mutex)
  • ADR-060 — D-runtime reinforces single-workers; framework-layer composition seam
  • ADR-063 — D2 inter-context inheritance superseded
  • TA-19 disposition — SPEC-322 comment ddce0896
  • TA-19 verification — SPEC-322 comment 00e22416
  • TA-20 image-count refinement bookkeeping — SPEC-322 comment e14df5df
  • TA-26 cutover sequence + env-var placement — SPEC-322 comment 7b81b6fc
  • TA-15 D2 inheritance supersession — SPEC-322 comment 11e52812
  • .github/deploy-manifest.yml — path-filter contract
  • infra/render/{production,staging}.yaml — blueprint placeholders
  • supabase/migrations/20260422170000_core_deployments.sql
  • supabase/migrations/20260422170100_core_workers.sql
  • supabase/migrations/20260422170200_core_outbox.sql
  • supabase/migrations/20260422170300_core_event_handled.sql
  • docs/runbooks/deployment-topology.md — operational runbook
  • docs/runbooks/legacy-drain.md — legacy-drain workflow