Deployment topology runbook
Inspecting Spectral’s one-container topology — deployment-generation stamping, the outbox, worker heartbeats, and /version. The topology decision is ADR-109; the deploy procedure is deployment.md.
Topology
One Cloudflare app container runs both entrypoints (API + workers) from infra/docker/app.Dockerfile via app_supervisor.py; the World Agent and the outbox consumers/handlers run in the workers entrypoint, and predicate execution is in-process (alpha). The core.outbox consumer keeps OutboxListener.listen() (LISTEN/NOTIFY on the session pooler, with a poll fallback); generation stamping, the WHERE generation = $MY_GENERATION claim filter, and event_handled dedup are unchanged.
Inspect the current generation
-- Latest deployment generationSELECT generation, reference, deployed_at, deployed_by, tagFROM core.deploymentsORDER BY generation DESCLIMIT 5;
-- What generation are workers reportingSELECT generation, count(*), max(last_seen) AS most_recent_beatFROM core.workersWHERE state = 'running'GROUP BY generationORDER BY generation DESC;Healthy: the workers’ max generation matches the core.deployments head. Lag = a deploy is in flight or workers haven’t picked up the new generation yet.
Inspect the outbox by generation
SELECT generation, status, count(*)FROM core.outboxWHERE deleted_at IS NULLGROUP BY generation, statusORDER BY generation DESC, status;Healthy: PENDING rows at the latest generation; older generations age out to zero unclaimed.
Generation-stamp invariants
SPECTRAL_GENERATIONis set per deploy on the container (not in shared env), so “gen-N events processed by gen-N code” is structural.- The generation is allocated atomically via
INSERT INTO core.deployments … RETURNING generation— no race on allocation. - A workflow concurrency mutex (
concurrency: deploy-production) prevents concurrent production deploys.
Cutover: the new container claims its generation’s rows; the prior generation’s rows simply stop being claimed — no blue-green color flip, no legacy-generation reaper. Forward-only expand/contract migrations keep both generations safe during the overlap (ADR-109 D5).
Shutdown drain
Graceful shutdown uses the container’s SIGTERM→SIGKILL window to finish the in-flight turn (each handler bounded by HANDLER_MAX) and commit the outbox cursor. Rows stranded by a crash are recovered by the reaper once the claim TTL lapses; there is no separate drain service.
/version
curl https://api.runspectral.com/version# → {"service":"api","environment":"production","generation":…,"commit_sha":"…","deployed_at":"…"}/version/detail (auth-gated) returns the full version.json (sha, lock hashes, built-at) plus runtime/OS + check statuses. version.json is produced at build by infra/docker/build-version.sh and COPYd into /app/version.json.
Worker heartbeat
SELECT worker_id, generation, environment, state, last_seen, started_atFROM core.workersORDER BY last_seen DESC;Each worker process maintains one core.workers row: it registers on boot (state='running'), refreshes last_seen on a ~30 s heartbeat, and on shutdown marks state='exited'. When a consumer’s connection to Supabase drops, the worker auto-reconnects with bounded backoff and the row reflects state='reconnecting' for the duration (per-consumer detail is in the correlated transition logs); a consumer that exhausts its reconnect budget fails loud. A stale heartbeat (no last_seen update in > 60 s) indicates a worker crash or disconnect; the reaper handles outbox-row recovery, the row is left for ops visibility.
Related
- ADR-109 — hosting topology + generation discipline.
deployment.md— the deploy procedure ·edge.md·event-substrate.md.