Skip to content
GitHub
Infrastructure

Deployment Topology

The API/workers runtime is one Cloudflare app container running both entrypoints, and cutover is by deployment generation — gen-N events are processed by gen-N code as a structural guarantee, not a discipline. Human-facing surfaces deploy separately as Cloudflare Pages projects. Topology in ADR-109; CD orchestration in ADR-053; operational runbooks at docs/runbooks/deployment.md and docs/runbooks/rollback.md.

The container (infra/docker/app.Dockerfile, launched by app_supervisor.py) runs the FastAPI API and the background workers runtime as sibling processes; a thin Cloudflare Worker fronts it and proxies to the API. The two entrypoints stay separately-launchable — the re-split seam — but at alpha they deploy and run as one image.

Decision-module execution rides inside the API

Section titled “Decision-module execution rides inside the API”

The API entrypoint is not only the HTTP tier — it is the multi-tenant decision-module-execution fleet. On a decision request it loads the action module for (org, domain, action, world_model_version) from the content-addressed core.storage module store, verifies the content hash and operator-approval marker per ADR-080 D2 + D3, caches the loaded module, and executes it in-process inside the runtime sandbox per ADR-083. No separate predicate host is added at alpha (ADR-109 D2).

Module versions are content-addressed and version-pinned, so a loaded module is stable across deploys: the deployment-generation machinery below governs the event substrate, not the decision-serving path — correctness across a deploy on that path comes from module content-hashing and version pinning. Per-tenant rate limits and the noisy-neighbor posture per ADR-084 apply on this fleet.

The World Agent runs in the workers entrypoint

Section titled “The World Agent runs in the workers entrypoint”

Spectral runs one production agent — the World Agent — in the workers entrypoint, off the synchronous decision path (ADR-109 D3). Its turns are long-running, bursty, and LLM-API-bound — the opposite profile from /api/decide — and its shape is LangGraph-native, with a Supabase-backed checkpointer. Workers is also where dependencies that span worlds and platform wire up at startup (per agent tool invocation).

The event substrate is a single core.outbox table plus a single core.event_handled table keyed on (handler_name, idempotency_key); handler_name is scope-qualified.

SPECTRAL_GENERATION is a container var, set at deploy time — atomic with the image, restart-safe. Shared-config changes never bump it.

  • The deploy allocates a fresh generation via INSERT INTO core.deployments RETURNING generation — atomic, single round-trip.
  • Publishers stamp every outbox row with the running SPECTRAL_GENERATION.
  • The outbox consumer keeps LISTEN/NOTIFY over the session pooler (with a poll fallback), and claims only its own generation’s rows: WHERE generation = $MY_GENERATION AND status = 'pending'.
  • A new container comes up at the next generation and claims only its own rows; the prior generation’s rows simply stop being claimed once it is no longer deployed. There is no blue-green color flip and no legacy-generation reaper.

The reaper re-PENDs stuck IN_FLIGHT rows within its own generation (crash recovery). There is no cross-generation orphan sweep — it would violate the structural guarantee.

Migrations are forward-only (ADR-032) and expand/contract (ADR-109): a deploy’s schema change must leave the prior generation’s code working against the new schema, so the two generations overlap safely during cutover. This is what makes generation-based cutover (and code-level rollback to the prior generation) safe.

An AST-level compat lint (ADR-053 D9) rejects DROP COLUMN / DROP TABLE / ALTER COLUMN TYPE to an incompatible target / ADD COLUMN NOT NULL without DEFAULT / ADD UNIQUE on a populated column at PR time, unless the file carries an explicit -- compat: breaking (reason: ...) marker. One production environment runs today; a staging environment lands as a thin sibling when a Supabase stage project exists.

The container exposes /health through the edge — the deploy go/no-go (ADR-053 D6). It is the liveness probe and carries the running version plus per-feature wiring (outbox, per-domain LLM credential). There is no separate /version surface. core.workers carries the heartbeat / diagnostic table for the workers runtime (no HTTP surface): each worker process registers a row on boot, refreshes a last_seen heartbeat, and reflects running / reconnecting (a consumer’s Supabase connection dropped and is reopening with bounded backoff) / exited — so a stalled or reconnecting consumer is observable cross-process even though the workers have no HTTP endpoint.

The deploy-manifest coverage lint (tools/quality/check_deploy_manifest_coverage.py) asserts every directory under apps/ and src/spectral/ is mapped or declared non_deployed: — catching a component that silently fails to ship in the container or Pages workflows.