Skip to content
GitHub
Operator

Deployment topology runbook

Inspecting Spectral’s one-container topology — deployment-generation stamping, the outbox, worker heartbeats, and /version. The topology decision is ADR-109; the deploy procedure is deployment.md.

Topology

One Cloudflare app container runs both entrypoints (API + workers) from infra/docker/app.Dockerfile via app_supervisor.py; the World Agent and the outbox consumers/handlers run in the workers entrypoint, and predicate execution is in-process (alpha). The core.outbox consumer keeps OutboxListener.listen() (LISTEN/NOTIFY on the session pooler, with a poll fallback); generation stamping, the WHERE generation = $MY_GENERATION claim filter, and event_handled dedup are unchanged.

Inspect the current generation

-- Latest deployment generation
SELECT generation, reference, deployed_at, deployed_by, tag
FROM core.deployments
ORDER BY generation DESC
LIMIT 5;
-- What generation are workers reporting
SELECT generation, count(*), max(last_seen) AS most_recent_beat
FROM core.workers
WHERE state = 'running'
GROUP BY generation
ORDER BY generation DESC;

Healthy: the workers’ max generation matches the core.deployments head. Lag = a deploy is in flight or workers haven’t picked up the new generation yet.

Inspect the outbox by generation

SELECT generation, status, count(*)
FROM core.outbox
WHERE deleted_at IS NULL
GROUP BY generation, status
ORDER BY generation DESC, status;

Healthy: PENDING rows at the latest generation; older generations age out to zero unclaimed.

Generation-stamp invariants

  1. SPECTRAL_GENERATION is set per deploy on the container (not in shared env), so “gen-N events processed by gen-N code” is structural.
  2. The generation is allocated atomically via INSERT INTO core.deployments … RETURNING generation — no race on allocation.
  3. A workflow concurrency mutex (concurrency: deploy-production) prevents concurrent production deploys.

Cutover: the new container claims its generation’s rows; the prior generation’s rows simply stop being claimed — no blue-green color flip, no legacy-generation reaper. Forward-only expand/contract migrations keep both generations safe during the overlap (ADR-109 D5).

Shutdown drain

Graceful shutdown uses the container’s SIGTERM→SIGKILL window to finish the in-flight turn (each handler bounded by HANDLER_MAX) and commit the outbox cursor. Rows stranded by a crash are recovered by the reaper once the claim TTL lapses; there is no separate drain service.

/version

Terminal window
curl https://api.runspectral.com/version
# → {"service":"api","environment":"production","generation":…,"commit_sha":"…","deployed_at":"…"}

/version/detail (auth-gated) returns the full version.json (sha, lock hashes, built-at) plus runtime/OS + check statuses. version.json is produced at build by infra/docker/build-version.sh and COPYd into /app/version.json.

Worker heartbeat

SELECT worker_id, generation, environment, state, last_seen, started_at
FROM core.workers
ORDER BY last_seen DESC;

Each worker process maintains one core.workers row: it registers on boot (state='running'), refreshes last_seen on a ~30 s heartbeat, and on shutdown marks state='exited'. When a consumer’s connection to Supabase drops, the worker auto-reconnects with bounded backoff and the row reflects state='reconnecting' for the duration (per-consumer detail is in the correlated transition logs); a consumer that exhausts its reconnect budget fails loud. A stale heartbeat (no last_seen update in > 60 s) indicates a worker crash or disconnect; the reaper handles outbox-row recovery, the row is left for ops visibility.