ADR-045: Test Supabase instance lifecycle — testcontainers in CI, supabase start locally, transaction-rollback isolation
Status: Accepted (2026-04-21)
Context
This ADR is a carry-forward confirmation of the v0.2 testing strategy (real Supabase for integration; four pytest markers; per-layer coverage floors; property-based invariants via Hypothesis; LLM fixture recording; role-context fixtures) plus four substantive deltas around the TA-3 / TA-5 / TA-14 / TA-18 decisions that changed the infrastructure substrate underneath the fixtures.
No landscape or adversarial research pass was needed. The deltas are mechanical consequences of already-decided spikes:
- TA-3 D3 / D8: session-var pattern via
SET LOCALinside an explicit transaction;request_scopecontext manager. - TA-5: outbox + SECURITY DEFINER status-transition function; LISTEN/NOTIFY substrate.
- TA-14 D7:
langgraphframework-owned schema. - TA-18 D4a / D13: JWKS-local JWT validation; session-var-bound RLS.
Decision
D1 — Real Supabase / real DB for every integration test (carry-forward, reinforced)
Per AGENTS.md rule and prior incident: no DB mocks under any condition.
D2 — Marker taxonomy enforced by root tests/conftest.py
Total markers: 6 (in pyproject.toml). Primary markers (required on every test item): 4 — unit, contract, integration, e2e. Auxiliary tags (additive, not required): 2 — system, nightly. The conftest fails collection on any test item missing one of the four primary markers — prevents unmarked tests from silently skipping CI-tier filters. ADR-061 D1 adds a fifth primary marker live_drift for the nightly LLM live-drift workflow.
D3 — Coverage floors 90 / 80 / 60 with one-month ratchet
Domain ≥ 90%, application ≥ 80%, infrastructure ≥ 60%. Floors land as targets in tools/quality/check_coverage.py scaffold, but enforcement starts disabled for the first month after the first real test suite lands. Month 1: tracking only (report actual coverage; flag gaps in PR comments). Month 2+: enforce. Prevents paper-discipline blocking real work during initial test ramp.
D4 — Local dev DB: supabase start (Supabase CLI)
Full Supabase stack on Docker (Postgres + Auth + Storage + Realtime). Developer-friendly. Migrations apply via supabase db reset. Local-CI divergence is bounded because CI tests do not exercise Auth / Storage / Realtime services.
D5 — CI DB: testcontainers-python + supabase/postgres:15 (Postgres-only substrate)
No Auth / Storage / Realtime service containers. Defensible because:
- ADR-039 D4a JWKS-local means JWT verification runs via PyJWT with an in-test keypair — no Supabase Auth service call.
- ADR-044 chose LISTEN/NOTIFY, not Realtime
postgres_changes. - Storage is not in the alpha context surface.
Session-scoped CI startup sequence: testcontainer boot → migrations applied → AsyncPostgresSaver.setup() for langgraph.* → SECURITY DEFINER functions installed → grants between contexts applied.
D6 — Per-test isolation: transaction rollback on async psycopg3 connection
Each test acquires a connection wrapped in BEGIN; teardown ROLLBACK. @pytest.mark.no_rollback opt-out for DDL-only tests (migration validation). Interaction with the ADR-041 D3 SET LOCAL inside an explicit transaction → nested savepoints; semantics validated in the D13 first-integration pass.
D7 — Parallel isolation: pytest-xdist with shared testcontainer + per-test transactions
One DB for the run; N workers share; each test isolates via its own transaction. Schema-per-worker / DB-per-worker is deferred to a D14 trigger.
D8 — Seeding: migrations-only for schema + per-test fixture factories for data
No session-scoped data beyond what migrations create (enum rows, reference data). Factories are small builders per domain type (async def create_workspace(db, **overrides)), composable.
D9 — Role / auth fixture taxonomy reworked for 0.3.0
Replaces v0.2 as_owner / as_member / as_operator / as_anon with:
| Fixture | Scope | Purpose |
|---|---|---|
postgres_test | session | testcontainer + migrations + langgraph + SECURITY DEFINER fns |
db | function | async psycopg3 conn with per-test auto-rollback txn |
as_workspace_member(account_id, workspace_id) | context mgr | SET LOCAL app.account_id + app.workspace_id (per ADR-039 D13) |
as_bc_role(bc) | context mgr | SET LOCAL ROLE spectral_{bc}_app — txn-scoped role switch |
jwt_for(user_id, workspace_id, scopes) | function | PyJWT-signed test JWT with controlled claims (per ADR-039 D4a) |
llm_replay_client(fixture_path) | function | recorded-response LLM client (carry-forward) |
SET LOCAL ROLE is chosen over SET ROLE so the role switch is txn-scoped and rolls back with the test transaction. Final semantics validated in D13.
D10 — Directory reorg: tests/scanning/ → tests/platform/
Per the SPEC-306 D11 context rename. Trivial git mv (both are empty stub packages today).
D11 — LLM fixture recording pattern (carry-forward; resolved by ADR-061)
Record once against live provider, replay deterministically. ADR-061 D2 lands the concrete mechanism: pytest-recording per-test cassettes at tests/<context>/_fixtures/llm/<test-id>.yaml; updated via RECORD_NEW_FIXTURES=1 env flag. Sensitive content redacted at recording time per ADR-061 D8.
D12 — Staging is not a CI target at alpha
Staging exists for manual QA / partner demos. CI uses ephemeral testcontainers only. ADR-062 D7 makes the dedicated test-only Supabase project a forward-triggered upgrade.
D13 — First-integration-test epic includes a validation pass
Before the first real integration test merges, verify:
supabase/postgres:15ships with pgvector enabled (CREATE EXTENSION vectorsucceeds in a test container)auth.userstable exists in the image (needed for the ADR-039 D4bcore.usersmirror’s FK target); if absent, migrations synthesize a minimalauth.usersin a test-only migration- Asymmetric JWT signing supported (needed for the ADR-039 D4a pre-check)
SET LOCALinside nested transactions (savepoints) preserves the session-var across savepoint boundaries and resets on ROLLBACK — no leakage to subsequent testsSET LOCAL ROLEinside a per-test txn resets cleanly on ROLLBACK — no residual role on the connection returned to the pool- TA-22 carry-forward additions: PKCE cookie split-reassembly through Cloudflare proxy + Pages Function; JWT header-size end-to-end through Cloudflare upstream buffer
- TA-26 carry-forward additions: branch-lifecycle exercise (create branch + apply migrations + run integration tests against branch URL + delete branch) round-tripping correctly via the Supabase Management API; smoke-test invocation contract (single CLI call, exits 0 on green)
A fallback path is documented in docs/runbooks/testing.md if supabase/postgres:15 disappoints: vanilla postgres:15 + CREATE EXTENSION vector in test-setup migration + a synthesized minimal auth.users table.
D14 — Forward triggers explicit
| Trigger | Response |
|---|---|
| First-integration-test image-contents anomaly (extension, auth.users, asymmetric JWT) | Fall back to vanilla postgres:15 + manual extension install per D13 |
First SET LOCAL / SET LOCAL ROLE nested-txn leakage observed under per-test rollback | Re-open per-test isolation mechanism; candidate alternatives include schema-reset-per-test or database-per-test |
| First xdist-attributable flake (deadlock, ordering, shared-state interference) | Move to schema-per-worker (not retry-in-place); post-alpha if frequency justifies |
| First DDL-testing parallelism bottleneck | Move to database-per-worker |
| First partner pilot requiring shared-staging functional-test gate | Reconsider D12; add staging CI target |
Alternatives considered
Full supabase start in CI. Rejected — slower startup, more moving parts, Auth/Storage/Realtime not exercised by integration tests. Revisit if Realtime- or Storage-dependent tests emerge.
Supabase Branching (per-PR ephemeral branches). Deferred post-alpha (Pro-tier required; testcontainers is free and faster at alpha volumes).
Dedicated hosted spectral-test-ci project. Rejected — shared state across concurrent PR runs creates flake; ephemeral-per-run is cleaner.
Schema-per-worker at alpha. Rejected; transaction-per-test is sufficient. Named as D14 forward trigger.
DB-per-worker at alpha. Rejected; same reasoning.
Full DB reset between every test. Rejected — too slow; transaction rollback idiomatic.
Session-scoped data fixtures (shared workspace across tests). Rejected — implicit state is a testing anti-pattern; factories preferred.
Dropping property-based testing. Rejected — invariant coverage is the strongest test form; carry-forward.
Dropping LLM replay. Rejected — live LLM in PR CI is cost-prohibitive and flaky.
Docker Compose service containers in Actions instead of testcontainers. Considered. testcontainers-python gives better per-test lifecycle control; Compose equivalent at flexibility cost. Go testcontainers.
Hard-enforce coverage floors from day 1. Rejected; the one-month ratchet (D3) prevents paper-discipline blocking real test ramp.
Consequences
- Zero new infrastructure at alpha. Local dev uses existing
supabase start; CI adds testcontainers-python as a dev dependency when the first integration test lands. - Root
tests/conftest.pyenforces marker presence; existing tests already comply (commit66bcd29). tests/scanning/→tests/platform/is a trivial git-mv; both are empty stub packages.- Codex
developer-guide/testing.mdxneeds a close-pass rewrite:packages/*→src/spectral/*, testcontainers substrate, reworked role fixtures, SECURITY DEFINER test pattern,langgraphschema setup, JWT fixture recipe. - GitHub Actions workflow gains a testcontainers-python step when the first integration test lands.
- Factory module (
tests/factories/) lands with the first integration test suite that needs it. spectral_test_agentspackage’s scan-pipeline E2E backbone is out of scope for TA-23. Resolved by ADR-061 D10 — test-agents are reference implementations, not a test harness.- Coverage enforcement starts tracking-only for month 1 (D3 ratchet).
- D13 validation pass is a prerequisite for the first integration-test PR merge.
References
- ADR-065 —
spectral.coreadmission discipline (no surface added here) - ADR-031 — single-library structure
- ADR-032 — schema topology
- ADR-036 — observability substrate
- ADR-039 — JWKS-local; session-var RLS
- ADR-041 — psycopg 3 + session vars +
request_scope - ADR-042 —
deleted_atconvention - ADR-043 —
langgraphschema - ADR-044 — outbox + SECURITY DEFINER pattern testable
- ADR-052 — TA-22 first-integration carry-forward additions
- ADR-053 — TA-26 pre-merge dry-run gate
- ADR-061 — D11 mechanism; live-drift marker
- ADR-062 — staging Environment scoping; D7 forward trigger
- TA-23 disposition — SPEC-326 comment
755a6d50 - TA-23 verification — SPEC-326 comment
cc355c9f - TA-22 first-integration carry-forwards — SPEC-326 comment
4380c6d9 - TA-26 first-integration carry-forwards — SPEC-326 comment
c5162d76 - TA-24/25 hand-offs resolved — SPEC-326 comment
81b1b295 tests/conftest.py(commit66bcd29) — marker enforcement- Codex
developer-guide/testing.mdx— close-pass rewrite docs/runbooks/testing.md— close-pass operational runbook
Addendum: ADR-004 — SQLite In-Memory for API Test Database
ADR-004 (Accepted 2026-03-21; retired by this ADR) split the test substrate by suite: core-package tests ran against real Postgres (because RLS, SET ROLE, and request.jwt.claims are PostgreSQL-specific), while API tests ran against SQLite-in-memory for HTTP-shape testing on the basis that those tests didn’t exercise RLS.
Why a future reader should know about ADR-004:
- The “don’t mock the database” doctrine in
AGENTS.mdrules out the SQLite-in-memory substrate for any test that touches data: mocking the DB hides migration and RLS failures, and the SQLite/Postgres dialect gap is real even for shape tests once forward triggers and CHECK constraints land. - Per-epic integration-test acceptance criteria are non-negotiable; an API test that needs database state must hit a real database (testcontainers in CI;
supabase startlocally) so the integration substrate is shared with the core suite. - The driver behind ADR-004 — friction of
supabase startfor shape tests — is addressed here by transaction-rollback isolation, schema-isolated tests, and per-suite fixtures rather than by swapping substrates.
Git history at the commit retiring ADR-004 preserves the original text.