Decisions

ADR-085: Module store consistency — content-addressed storage with event-driven projection

Context

Decision modules are deployable code artifacts compiled by the world agent (per ADR-076, ADR-081). Bundles are stored persistently in a module store; api pod replicas fetch bundles on cache miss (per ADR-076 D1 as clarified in Session 3d.2), verify content hash + operator approval (per ADR-080), and execute predicates inside the Layer 2 app sandbox (per ADR-083).

The consistency model — commit-then-signal upload, replica honor-or-refuse semantics, staleness protections — is the concern this ADR settles. Three concerns underpin the question:

Upload atomicity — how do new bundles become readable to api pods without exposure to partial / half-uploaded bytes?
Honor-or-refuse semantics — when a pod requests a specific module, the store must either serve the requested bytes exactly or refuse the request; silent stale serves are not acceptable for binding decisions.
Staleness protection — when the operator activates a new world-model version, api pods must invalidate any stale active-version resolution so subsequent requests route to the newly active version.

The module store substrate (object storage) is shared between spectral.worlds (writer) and spectral.platform (reader). The routing state (which version is active for (org, domain)) crosses the context boundary; it follows the contract-surface pattern of ADR-065 D2 (producer-typed payloads) and ADR-065 D4 (consumer projection), reinforced by ADR-064 (notification-shaped reads).

Decision

D1 — Module store is content-addressed and immutable

Module bundles are stored under keys derived from their sha256 content hash (per ADR-080 D1). The object-storage substrate is the context-agnostic core.storage object store (core.storage.infrastructure, the eighth core area per ADR-099 D5/D6) — a durable, write-once, key→bytes store consumed directly by both spectral.worlds (writer) and spectral.platform (reader). It holds objects at modules/{sha256}.bundle. Content-addressing is a worlds-side consumer policy over that substrate, not a property of the store: ContentHash and the sha256-as-key derivation live in worlds, and the platform loader recomputes the hash locally (ADR-080 D2); core.storage itself holds no hashing or module naming. Objects are immutable — once written, the bytes at a key never change. A new bundle is a new object at a new key; supersession is by no longer pointing at the old key.

Backend: durability is Postgres; the filesystem is a cache tier. The durable backend is PostgresObjectStore over the core.object_store table (an opaque text key → bytea, write-once) — so module bytes ride Supabase backups/PITR (ADR-040) and survive a compute recycle. The FilesystemObjectStore is demoted to a cache tier: CachingObjectStore composes the filesystem (local-disk hot reads) over the durable Postgres backend (writes go durable-first; a read miss falls through to Postgres and re-warms the cache). This is wired identically in every environment — the durable DSN is the same connection the rest of the platform uses — so behavior is uniform: a database reset clears bundles, and re-author/re-deploy rebuilds them. The “object-storage backend (R2)” once floated as the durable concretion is not adopted: the locked posture keeps Supabase the single durable substrate (no second backup/DR surface). The cross-context direct read remains clean for the reason below — the store is a content-addressed key→bytes substrate, not a relational coupling — whether the backend is a filesystem path or a bytea row keyed by the same hash.

Content-addressed storage makes the upload-atomicity concern (1) trivial: pods fetch by content hash, and the hash is known only after the bundle is fully built. There is no name under which a partial bundle could be exposed to readers — partial uploads exist under no readable key. Pods also cannot accidentally fetch a “wrong version” by name; the hash IS the identity.

This is the special case where direct read access across context boundaries is structurally clean: the object store is a content-addressed substrate, not a state-holding database. A pod in spectral.platform fetches modules/{hash}.bundle by hash known from local projection state (D2); there is no schema or state coupling to spectral.worlds. The discipline of ADR-063 (no inter-context SQL grants) targets relational-database access patterns, not content-addressed artifact fetch.

D2 — Routing state owned by worlds; projected into platform via events

The state that resolves (org_id, domain, action, world_model_version) → content_hash and (org_id, domain) → active world_model_version is owned by spectral.worlds. Tables in the worlds schema hold the canonical records. The active version is owned by the world: (org, domain) → active world_model_version is the projection of a world-owned pointer under the 1:1 domain→world link (ADR-098); if that link relaxes (ADR-098 D5), the active-version routing keys gain a world_id selector.

spectral.platform does not read these tables directly. Per ADR-065 D2 + D4 and ADR-064 D3, platform consumes producer-typed events from worlds and projects into platform-local tables:

world_model_version_published — emitted by worlds when a world-model version’s module deployments complete (all enshrined actions for that version have bundles uploaded and content_hash recorded). Payload names (org_id, domain, world_model_version) and the action → content_hash list.
world_model_version_activated — emitted by worlds when the operator (via the operator approval flow per ADR-080 D3) activates a version for (org_id, domain). Payload names (org_id, domain, world_model_version); the prior active version is implicit (platform’s projection holds it).

Platform’s projection consumer materializes the events into platform-local tables (e.g., platform.module_routing for (org, domain, action, version) → content_hash; platform.world_model_active_version for (org, domain) → version). Decision-server pods read from these projections; no cross-context SQL grant.

Event payload modules live in worlds.contracts.events.world_model_version_published and worlds.contracts.events.world_model_version_activated per ADR-065 D2. Platform-side parsing follows the consumer-ACL pattern per ADR-065 D4.

D3 — Commit-then-signal upload sequence

The world agent’s upload sequence guarantees consistency through ordering:

Build. World agent produces the bundle (composition root, rules, manifest, attestation per ADR-080 D4).
Hash. Compute the bundle’s sha256.
Upload. Write the bundle bytes to the object store at modules/{sha256}.bundle. Wait for the object store’s durability acknowledgement (write-ack).
Persist deployment record. Insert the row in the worlds-side deployments table: (org, domain, action, world_model_version, content_hash, gate outcomes, build provenance).
Verify hash before signalling. Re-read the object store entry (optional but recommended at v0 to catch object-store inconsistency); verify the bytes still hash to the recorded content_hash.
Signal. When all enshrined actions in a world-model version have completed steps 1–5, the worlds-side publisher emits world_model_version_published via the outbox (per ADR-044). Platform consumes and projects.
Activation. When the operator activates the version (per ADR-080 D3 approval flow), the worlds-side activation step updates the active-version pointer and emits world_model_version_activated via the outbox.

The “commit-then-signal” property: the bundle is durable in the object store AND the deployment record is committed in the worlds DB BEFORE any event is emitted. Platform pods never receive a signal about a module they cannot then fetch. The outbox guarantees at-least-once event delivery per ADR-044; idempotent consumption per ADR-065 handler-name conventions handles duplicates.

D4 — Cache model in `api` pod replicas

api pods cache loaded modules by (org_id, domain, action, world_model_version) → loaded module + content_hash. Each pod holds its own in-process cache; no inter-pod cache coordination.

Two cache layers operate independently:

Module cache — (org, domain, action, version) → loaded module. Populated on first request (cache miss → fetch from object store via content hash from local projection → verify hash + approval per ADR-080 → load). Eviction is bounded by aggregate cardinality (modules deploy per (org, domain, action) per ADR-076 D1; the resulting population — on the order of a few thousand modules at 100-org scale — fits comfortably); v0 uses a simple LRU with a configurable cap. No active eviction on version activation; modules from prior versions remain in cache and serve pinned-version requests until evicted naturally. This in-process cache sits above the object store’s own filesystem cache tier (D1): a process-cold module-cache miss reads the filesystem cache, which on its own miss (e.g. just after a compute recycle wiped local disk) falls through to the durable Postgres backend and re-warms the filesystem — so the durable backend is reached at most once per object per process.
Active-version resolution cache — (org, domain) → active world_model_version. Populated from the platform-local projection. Invalidated on receipt of a world_model_version_activated event for the relevant (org, domain). After invalidation, the next request resolves fresh from the projection.

The separation matters: version activation flips routing for new requests but does not need to evict loaded modules. Pinned-version requests (world_model_version in the request body per ADR-077 D2) continue to serve from any cached version they hit.

D5 — Honor-or-refuse semantics

When a pod fetches a module by content hash:

Object exists, bytes hash matches: load and serve.
Object exists, bytes hash does NOT match: integrity failure per ADR-080 D2 — log, alert, fail the request with 502/unavailable. Treat the cached projection as suspect; trigger a re-fetch of the projection (the (org, domain, action, version) → content_hash mapping) from the worlds-side events.
Object does not exist: refuse — fail the request with 502/unavailable. Never substitute another version or another module. Log + alert; this is a system unavailability signal, not a graceful degradation case. Triggers operator investigation (the projection believes a module should exist that the store does not have).

There is no fallback to a “close enough” module. The decision contract is binding; serving a wrong module would violate the contract more severely than failing the request.

Alternatives considered

Direct cross-context SQL access from api pods to worlds deployments table. Rejected. Violates ADR-063’s no-inter-context-SQL-grants posture. Forces platform to track worlds’ schema changes; couples deployment cadences; loses the consumer-projection pattern that ADR-065 D4 + ADR-064 establish for inter-context state.

Polling-based cache invalidation. Rejected. Adds polling cost across the fleet (every pod polls the projection on a timer); invalidation latency is bounded by the poll interval (typically tens of seconds at best); the LISTEN/NOTIFY substrate already exists per ADR-044 and gives sub-second invalidation.

TTL-based cache without event invalidation. Rejected for active-version resolution (TTLs that are short enough to be safe are short enough to defeat caching; TTLs that are long enough to be useful introduce staleness during activation windows). The module cache itself can use TTL-style eviction (LRU is the simpler v0 mechanism) because modules are content-addressed and immutable.

Non-content-addressed storage with named versioning (e.g., modules/{org}/{domain}/{action}/v{N}.bundle). Rejected. Reintroduces the half-uploaded-bytes problem (a reader could fetch a partially-written object at the named key). Forces atomicity discipline at the store layer rather than letting content-addressing handle it structurally. Loses the ADR-080 D1 hash-as-identity convention.

Coordinated cache eviction across pods on version activation (e.g., all pods evict simultaneously when an event arrives). Rejected. Each pod’s cache is independent; staggered eviction across the fleet is acceptable (a pod that hasn’t processed the event yet serves the prior active version for at most the event-delivery latency, which is bounded). Coordinating eviction adds complexity without buying customer-visible benefit — pinned-version requests handle the explicit-version-needed case.

Module store on a different substrate (e.g., Postgres bytea, persistent disk, NFS). Rejected for v0. The core.storage object store (ADR-099) is the standard substrate for immutable artifact storage; content-addressed access is well-supported; durability semantics are well-understood. Postgres bytea bloats the database; persistent disk doesn’t replicate; NFS adds operational complexity.

Consequences

The module store substrate is the core.storage object store (ADR-099); the worlds-side consumer addresses bundles by content hash. Operational specifics — key layout, retention policy, replication — are implementation-epic concerns.
Worlds-side tables hold the canonical routing state: worlds.module_deployments (illustrative name) for (org, domain, action, version) → content_hash; worlds.world_model_active_version for (org, domain) → version.
Platform-side projections materialize from worlds events: platform.module_routing and platform.world_model_active_version_view (or equivalent naming when the implementation epic lands). Per ADR-065 D4 consumer-ACL pattern.
Two new producer-typed event payloads land in worlds.contracts.events.*: world_model_version_published and world_model_version_activated. Bilateral contract tests per ADR-066 cover platform-side parsing.
The world-agent upload pipeline (per ADR-081 D4) gains the commit-then-signal sequence as a structural property. Implementation lands in Phase 4 build-plan work.
api pod cache invalidation rides the existing LISTEN/NOTIFY substrate per ADR-044. Pods subscribe to the platform-projected world_model_version_activated event stream; cache invalidation is per-pod-in-process.
Honor-or-refuse semantics (D5) integrate with ADR-080 D2’s hash-check-at-load failure handling — both produce 502/unavailable rather than a degraded decision; the audit chain records the failure for operator investigation.
The module store consistency model is settled by this ADR.
The module_store consistency concern is closed; the related execution sandbox (A1 → ADR-083), noisy-neighbor handling (A2 → ADR-084), and module integrity (3d.1 → ADR-080) concerns continue to apply atop this consistency model without modification.

Previous
ADR-084: Multi-tenant noisy-neighbor handling — rate limiting and per-tenant observability Next
ADR-086: Tenancy hierarchy — `org` and `domain` naming