ADR-040: Disaster recovery — Supabase-native managed backups + PITR, on a regenerable alpha
Context
Spectral runs on a single Supabase project (Postgres + pgvector; schemas core, worlds, platform). Solo-builder, pre-revenue alpha. The architecture is regenerability-dominant: the alpha database (the dogfood world, seed data) is reproducible from cold_start_seed + the authoring harness, and the tax-prep world is a dev/demo asset — no irreplaceable data lives in the system until a design partner co-implements their domain. DR investment tracks that threshold.
Decision
D1 — DR is Supabase-native managed backups, not a self-run pipeline
Disaster recovery is Supabase’s own managed backups + point-in-time recovery. There is no self-run pg_dump/encrypt/object-store backup job. The reasoning: a backup must run off the serving platform to be a real backup — Supabase’s managed backups satisfy that on a separate control plane, whereas a self-run pipeline hosted on the compute platform shares fate with what it protects and is operational overhead the regenerable alpha doesn’t need.
D2 — Tier by data value
- Pre-customer alpha (now): the DB is regenerable (
cold_start_seed+ the authoring harness), so Supabase daily snapshots are sufficient; there is no irreplaceable data to protect. - Before customer launch / first partner data: adopt Supabase Pro — managed daily backups + PITR (~2-minute RPO within the retention window). Partner rule corpora are load-bearing and NOT regenerable (there is no synthetic eval path for partner domains). Adopting Pro also unblocks a persistent staging branch (Free has no branching).
D3 — Partner-corpus git export (reserved)
The first design-partner corpus is load-bearing and not regenerable. A spectral db export-rules contract is reserved now (a resolver stub) and implemented alongside the first partner-corpus migration, giving a provider-independent recovery path: replay the git serialization against a fresh project.
D4 — RPO/RTO targets and drill cadence
- Pre-PITR alpha: RPO ~24 h (daily snapshot), RTO ~60 min (restore + verify).
- Post-PITR: RPO ~2 min, RTO ~60 min.
- Quarterly functional restore drill — restore the latest managed backup to a throwaway project, schema-checksum + row-count spot-check, tear down within the billing hour. Mandatory drill after any multi-schema migration.
D5 — PITR activation triggers
Activate Pro + PITR when any fires (customer launch is the floor — adopt before then regardless): first design-partner co-implementation persisting their data; a compromised-credential incident or near-miss; sustained daily change > 10% of DB size for > 14 days; the first PITR-covered failure actually hit.
Alternatives considered
- A self-run nightly
pg_dump→ encrypted object store. Rejected: it runs on/near the serving platform (shares fate), duplicates what Supabase’s managed backups already provide off-platform, and is operational overhead the regenerable alpha doesn’t need. - PITR on day one (~$105/month). Rejected for the pre-partner alpha: the high-probability failures (a bad migration, an accidental
DELETE) cost the solo builder a day of regenerable work, not customer data. Triggered by the data-value threshold instead. - Team tier ($599/month). Rejected: it buys SOC2 paperwork not yet needed, not DR capability.
Consequences
- DR is a Supabase configuration plus a runbook, not a backup pipeline to own.
docs/runbooks/disaster-recovery.mdis the operational contract (restore playbooks by failure scenario, the quarterly drill checklist, the PITR activation playbook, and the partner-onboarding DR step). - No self-run backup workflow, no second object-store surface, and no backup credentials in the deploy.
- The decision-module store inherits this coverage: its durable backend is Postgres (
core.object_storebytea, ADR-085 D1), so module bundles ride the same managed backups + PITR as the rest of the database. This is also why the module store’s durable backend is Postgres rather than R2 — an R2 backend would reintroduce exactly the “second object-store surface” (with its own backup/DR story) this posture rejects. core.usersrecovery is via user re-invite (theauth.usersprimary is Supabase-managed; the mirror rebuilds from invite acceptance), documented in the runbook.