Data Retention
Retention in Spectral is a reference-graph problem, not a time problem. Traces cited by applied change sets cannot be deleted by age — that breaks auditability, regression baselines, and signal replay. The model is state-based, with state derived via views from the reference graph plus deleted_at. Decision lineage in ADR-042.
The four states
Section titled “The four states”spectral.core.retention.states.RetentionState:
- ACTIVE — default state on ingestion. Governed by baseline TTL per
RetentionPolicy;Noneopts out. - REFERENCED — cited by one or more live downstream artifacts (applied change sets, active rules, unexpired windows). Retained indefinitely while referenced.
- ORPHANED — integrity anomaly: a row was REFERENCED but lost its upstream sponsor in a way schema discipline should have prevented. Detection runs as a reconciliation job; any non-zero result is an operator alert, never auto-cleanup.
- TOMBSTONED — soft-deleted via
deleted_at, in the grace window pending hard-delete cascade. Cascades transitively at state-computation time.
Pattern lineage: generational-GC weak hypothesis (Bacon/Cheng/Rajan; Oracle HotSpot) plus eDiscovery legal-hold preservation; Cassandra tombstones; Snowflake Time Travel.
Derived state via views
Section titled “Derived state via views”No retention_state column anywhere. Per-context SQL views compute state from the reference graph plus deleted_at at query time. Drift-free by construction.
The one stored bit: deleted_at timestamptz NULL on every workspace-scoped table. Migration convention.
RetentionPolicy
Section titled “RetentionPolicy”spectral.core.retention.policy.RetentionPolicy (frozen pydantic):
active_ttl_days: int | None(None= opt out)tombstoned_grace_days: intdisposal: DisposalPosture∈{HARD_DELETE, STRIP_PAYLOAD, RETAIN_METADATA}
DEFAULT_POLICY = (active_ttl_days=365, tombstoned_grace_days=30, disposal=HARD_DELETE). STRIP_PAYLOAD is enabled per registry entry when the (entity_type, content_class) pair calls for content stripping rather than full deletion (e.g., design-partner contracts with Safeguards flow-down — payload is removed while structural metadata and evaluation history persist).
POLICY_REGISTRY keys on (entity_type, content_class). RetentionPolicy.resolve(entity_type, content_class) returns a registered entry or falls back to DEFAULT_POLICY. Registry entries land alongside the enforcement code that consumes them.
For the per-(entity_type, content_class) catalog of registry entries — event-substrate
records, optimization-engine records, ChangeSet family, Spectral-Agent conversational
records, worlds-side operator-action records, and per-agent memory + consumer-state
configurations — see Retention Registry.
Two notes the registry alone doesn’t make obvious:
- Contract-test invariant on the event-substrate pair. A contract test pins
event_handled.active_ttl > outbox.active_ttl + outbox.grace. The dedup discipline depends on the inequality holding regardless of how either policy evolves; tightening one without the other breaks dedup correctness across the full outbox window. - Records vs memory. The platform-side workspace-scoped records (ChangeSet family +
Spectral-Agent conversational records — see the registry for the full list) are domain
records with their own per-
(entity, content_class)policies — not agent memory. They follow the four-state lifecycle here. The universal three-tier agent memory schema in ADR-058 governs the agent-memory rows on the registry separately.
REFERENCED ⇌ ACTIVE transition
Section titled “REFERENCED ⇌ ACTIVE transition”Uses last_referenced_at + grace. When a row loses its last reference, ACTIVE TTL countdown restarts from the de-reference moment + grace, not from original creation. Prevents premature expiry of historically-referenced rows.
TOMBSTONED cascade
Section titled “TOMBSTONED cascade”Cascades transitively at state-computation time — ancestor tombstone implies descendant effective-tombstone in the view logic. Hard-delete cascade runs at grace expiry (operator-scripted today; template documented).
Audit posture
Section titled “Audit posture”The “no hard DELETE” rule applies inherently to persistent-tier memory and to any-tier memory load-bearing for audit (action-linkage). Action-linkage examples: rule promotion, change-set acceptance, published decision. Transient-tier memory without action-linkage can be removed when its scope ends.
Per agent memory primitives, transient tiers (interaction, session) inherit lifecycle from their enclosing scope (chat thread, operator session) — they do not register a literal TTL with the retention framework.
Same-context inheritance only
Section titled “Same-context inheritance only”Operations or worlds data must NOT persistently reference platform data. Operations tooling queries platform data at read-time but does not create persistent cross-schema reference links. Customer-offboarding cascade therefore propagates through the platform reference graph only; OPERATIONS and worlds data are unaffected by design.
Late-corruption floor
Section titled “Late-corruption floor”docs/runbooks/disaster-recovery.md codifies the floor: as long as every workspace-scoped entity’s active_ttl_days remains ≥ 30 days, the nightly pg_dump covers the late-corruption recovery window. If any PLATFORM TTL tightens below 30 days, the nightly-backup lifecycle rule extends to match (active_ttl + tombstoned_grace).
See also
Section titled “See also”- ADR-042 — decision lineage
- Event substrate — outbox + event_handled retention
- Agent memory primitives — Primitive 6 (lifecycle)
docs/runbooks/disaster-recovery.md— late-corruption-floor procedure- Architecture — schema topology