Decisions

ADR-040: Disaster recovery — Supabase-native managed backups + PITR, on a regenerable alpha

Context

Spectral runs on a single Supabase project (Postgres + pgvector; schemas core, worlds, platform). Solo-builder, pre-revenue alpha. The architecture is regenerability-dominant: the alpha database (the dogfood world, seed data) is reproducible from cold_start_seed + the authoring harness, and the tax-prep world is a dev/demo asset — no irreplaceable data lives in the system until a design partner co-implements their domain. DR investment tracks that threshold.

Decision

D1 — DR is Supabase-native managed backups, not a self-run pipeline

Disaster recovery is Supabase’s own managed backups + point-in-time recovery. There is no self-run pg_dump/encrypt/object-store backup job. The reasoning: a backup must run off the serving platform to be a real backup — Supabase’s managed backups satisfy that on a separate control plane, whereas a self-run pipeline hosted on the compute platform shares fate with what it protects and is operational overhead the regenerable alpha doesn’t need.

D2 — Tier by data value

Pre-customer alpha (now): the DB is regenerable (cold_start_seed + the authoring harness), so Supabase daily snapshots are sufficient; there is no irreplaceable data to protect.
Before customer launch / first partner data: adopt Supabase Pro — managed daily backups + PITR (~2-minute RPO within the retention window). Partner rule corpora are load-bearing and NOT regenerable (there is no synthetic eval path for partner domains). Adopting Pro also unblocks a persistent staging branch (Free has no branching).

D3 — Partner-corpus git export (reserved)

The first design-partner corpus is load-bearing and not regenerable. A spectral db export-rules contract is reserved now (a resolver stub) and implemented alongside the first partner-corpus migration, giving a provider-independent recovery path: replay the git serialization against a fresh project.

D4 — RPO/RTO targets and drill cadence

Pre-PITR alpha: RPO ~24 h (daily snapshot), RTO ~60 min (restore + verify).
Post-PITR: RPO ~2 min, RTO ~60 min.
Quarterly functional restore drill — restore the latest managed backup to a throwaway project, schema-checksum + row-count spot-check, tear down within the billing hour. Mandatory drill after any multi-schema migration.

D5 — PITR activation triggers

Activate Pro + PITR when any fires (customer launch is the floor — adopt before then regardless): first design-partner co-implementation persisting their data; a compromised-credential incident or near-miss; sustained daily change > 10% of DB size for > 14 days; the first PITR-covered failure actually hit.

Alternatives considered

A self-run nightly pg_dump → encrypted object store. Rejected: it runs on/near the serving platform (shares fate), duplicates what Supabase’s managed backups already provide off-platform, and is operational overhead the regenerable alpha doesn’t need.
PITR on day one (~$105/month). Rejected for the pre-partner alpha: the high-probability failures (a bad migration, an accidental DELETE) cost the solo builder a day of regenerable work, not customer data. Triggered by the data-value threshold instead.
Team tier ($599/month). Rejected: it buys SOC2 paperwork not yet needed, not DR capability.

Consequences

DR is a Supabase configuration plus a runbook, not a backup pipeline to own. docs/runbooks/disaster-recovery.md is the operational contract (restore playbooks by failure scenario, the quarterly drill checklist, the PITR activation playbook, and the partner-onboarding DR step).
No self-run backup workflow, no second object-store surface, and no backup credentials in the deploy.
The decision-module store inherits this coverage: its durable backend is Postgres (core.object_store bytea, ADR-085 D1), so module bundles ride the same managed backups + PITR as the rest of the database. This is also why the module store’s durable backend is Postgres rather than R2 — an R2 backend would reintroduce exactly the “second object-store surface” (with its own backup/DR story) this posture rejects.
core.users recovery is via user re-invite (the auth.users primary is Supabase-managed; the mirror rebuilds from invite acceptance), documented in the runbook.

Previous
ADR-039: Supabase Auth confirmation and hardening — JWKS-local verification, mirror-based revocation, invite-gate Next
ADR-041: Connection pooling — Supavisor transaction mode, psycopg 3, SET LOCAL inside transactions