Skip to content
GitHub
Decisions

ADR-045: Test Supabase instance lifecycle — testcontainers in CI, supabase start locally, transaction-rollback isolation

Status: Accepted (2026-04-21)

Context

This ADR is a carry-forward confirmation of the v0.2 testing strategy (real Supabase for integration; four pytest markers; per-layer coverage floors; property-based invariants via Hypothesis; LLM fixture recording; role-context fixtures) plus four substantive deltas around the TA-3 / TA-5 / TA-14 / TA-18 decisions that changed the infrastructure substrate underneath the fixtures.

No landscape or adversarial research pass was needed. The deltas are mechanical consequences of already-decided spikes:

  • TA-3 D3 / D8: session-var pattern via SET LOCAL inside an explicit transaction; request_scope context manager.
  • TA-5: outbox + SECURITY DEFINER status-transition function; LISTEN/NOTIFY substrate.
  • TA-14 D7: langgraph framework-owned schema.
  • TA-18 D4a / D13: JWKS-local JWT validation; session-var-bound RLS.

Decision

D1 — Real Supabase / real DB for every integration test (carry-forward, reinforced)

Per AGENTS.md rule and prior incident: no DB mocks under any condition.

D2 — Marker taxonomy enforced by root tests/conftest.py

Total markers: 6 (in pyproject.toml). Primary markers (required on every test item): 4 — unit, contract, integration, e2e. Auxiliary tags (additive, not required): 2 — system, nightly. The conftest fails collection on any test item missing one of the four primary markers — prevents unmarked tests from silently skipping CI-tier filters. ADR-061 D1 adds a fifth primary marker live_drift for the nightly LLM live-drift workflow.

D3 — Coverage floors 90 / 80 / 60 with one-month ratchet

Domain ≥ 90%, application ≥ 80%, infrastructure ≥ 60%. Floors land as targets in tools/quality/check_coverage.py scaffold, but enforcement starts disabled for the first month after the first real test suite lands. Month 1: tracking only (report actual coverage; flag gaps in PR comments). Month 2+: enforce. Prevents paper-discipline blocking real work during initial test ramp.

D4 — Local dev DB: supabase start (Supabase CLI)

Full Supabase stack on Docker (Postgres + Auth + Storage + Realtime). Developer-friendly. Migrations apply via supabase db reset. Local-CI divergence is bounded because CI tests do not exercise Auth / Storage / Realtime services.

D5 — CI DB: testcontainers-python + supabase/postgres:15 (Postgres-only substrate)

No Auth / Storage / Realtime service containers. Defensible because:

  • ADR-039 D4a JWKS-local means JWT verification runs via PyJWT with an in-test keypair — no Supabase Auth service call.
  • ADR-044 chose LISTEN/NOTIFY, not Realtime postgres_changes.
  • Storage is not in the alpha context surface.

Session-scoped CI startup sequence: testcontainer boot → migrations applied → AsyncPostgresSaver.setup() for langgraph.* → SECURITY DEFINER functions installed → grants between contexts applied.

D6 — Per-test isolation: transaction rollback on async psycopg3 connection

Each test acquires a connection wrapped in BEGIN; teardown ROLLBACK. @pytest.mark.no_rollback opt-out for DDL-only tests (migration validation). Interaction with the ADR-041 D3 SET LOCAL inside an explicit transaction → nested savepoints; semantics validated in the D13 first-integration pass.

D7 — Parallel isolation: pytest-xdist with shared testcontainer + per-test transactions

One DB for the run; N workers share; each test isolates via its own transaction. Schema-per-worker / DB-per-worker is deferred to a D14 trigger.

D8 — Seeding: migrations-only for schema + per-test fixture factories for data

No session-scoped data beyond what migrations create (enum rows, reference data). Factories are small builders per domain type (async def create_workspace(db, **overrides)), composable.

D9 — Role / auth fixture taxonomy reworked for 0.3.0

Replaces v0.2 as_owner / as_member / as_operator / as_anon with:

FixtureScopePurpose
postgres_testsessiontestcontainer + migrations + langgraph + SECURITY DEFINER fns
dbfunctionasync psycopg3 conn with per-test auto-rollback txn
as_workspace_member(account_id, workspace_id)context mgrSET LOCAL app.account_id + app.workspace_id (per ADR-039 D13)
as_bc_role(bc)context mgrSET LOCAL ROLE spectral_{bc}_app — txn-scoped role switch
jwt_for(user_id, workspace_id, scopes)functionPyJWT-signed test JWT with controlled claims (per ADR-039 D4a)
llm_replay_client(fixture_path)functionrecorded-response LLM client (carry-forward)

SET LOCAL ROLE is chosen over SET ROLE so the role switch is txn-scoped and rolls back with the test transaction. Final semantics validated in D13.

D10 — Directory reorg: tests/scanning/tests/platform/

Per the SPEC-306 D11 context rename. Trivial git mv (both are empty stub packages today).

D11 — LLM fixture recording pattern (carry-forward; resolved by ADR-061)

Record once against live provider, replay deterministically. ADR-061 D2 lands the concrete mechanism: pytest-recording per-test cassettes at tests/<context>/_fixtures/llm/<test-id>.yaml; updated via RECORD_NEW_FIXTURES=1 env flag. Sensitive content redacted at recording time per ADR-061 D8.

D12 — Staging is not a CI target at alpha

Staging exists for manual QA / partner demos. CI uses ephemeral testcontainers only. ADR-062 D7 makes the dedicated test-only Supabase project a forward-triggered upgrade.

D13 — First-integration-test epic includes a validation pass

Before the first real integration test merges, verify:

  • supabase/postgres:15 ships with pgvector enabled (CREATE EXTENSION vector succeeds in a test container)
  • auth.users table exists in the image (needed for the ADR-039 D4b core.users mirror’s FK target); if absent, migrations synthesize a minimal auth.users in a test-only migration
  • Asymmetric JWT signing supported (needed for the ADR-039 D4a pre-check)
  • SET LOCAL inside nested transactions (savepoints) preserves the session-var across savepoint boundaries and resets on ROLLBACK — no leakage to subsequent tests
  • SET LOCAL ROLE inside a per-test txn resets cleanly on ROLLBACK — no residual role on the connection returned to the pool
  • TA-22 carry-forward additions: PKCE cookie split-reassembly through Cloudflare proxy + Pages Function; JWT header-size end-to-end through Cloudflare upstream buffer
  • TA-26 carry-forward additions: branch-lifecycle exercise (create branch + apply migrations + run integration tests against branch URL + delete branch) round-tripping correctly via the Supabase Management API; smoke-test invocation contract (single CLI call, exits 0 on green)

A fallback path is documented in docs/runbooks/testing.md if supabase/postgres:15 disappoints: vanilla postgres:15 + CREATE EXTENSION vector in test-setup migration + a synthesized minimal auth.users table.

D14 — Forward triggers explicit

TriggerResponse
First-integration-test image-contents anomaly (extension, auth.users, asymmetric JWT)Fall back to vanilla postgres:15 + manual extension install per D13
First SET LOCAL / SET LOCAL ROLE nested-txn leakage observed under per-test rollbackRe-open per-test isolation mechanism; candidate alternatives include schema-reset-per-test or database-per-test
First xdist-attributable flake (deadlock, ordering, shared-state interference)Move to schema-per-worker (not retry-in-place); post-alpha if frequency justifies
First DDL-testing parallelism bottleneckMove to database-per-worker
First partner pilot requiring shared-staging functional-test gateReconsider D12; add staging CI target

Alternatives considered

Full supabase start in CI. Rejected — slower startup, more moving parts, Auth/Storage/Realtime not exercised by integration tests. Revisit if Realtime- or Storage-dependent tests emerge.

Supabase Branching (per-PR ephemeral branches). Deferred post-alpha (Pro-tier required; testcontainers is free and faster at alpha volumes).

Dedicated hosted spectral-test-ci project. Rejected — shared state across concurrent PR runs creates flake; ephemeral-per-run is cleaner.

Schema-per-worker at alpha. Rejected; transaction-per-test is sufficient. Named as D14 forward trigger.

DB-per-worker at alpha. Rejected; same reasoning.

Full DB reset between every test. Rejected — too slow; transaction rollback idiomatic.

Session-scoped data fixtures (shared workspace across tests). Rejected — implicit state is a testing anti-pattern; factories preferred.

Dropping property-based testing. Rejected — invariant coverage is the strongest test form; carry-forward.

Dropping LLM replay. Rejected — live LLM in PR CI is cost-prohibitive and flaky.

Docker Compose service containers in Actions instead of testcontainers. Considered. testcontainers-python gives better per-test lifecycle control; Compose equivalent at flexibility cost. Go testcontainers.

Hard-enforce coverage floors from day 1. Rejected; the one-month ratchet (D3) prevents paper-discipline blocking real test ramp.

Consequences

  • Zero new infrastructure at alpha. Local dev uses existing supabase start; CI adds testcontainers-python as a dev dependency when the first integration test lands.
  • Root tests/conftest.py enforces marker presence; existing tests already comply (commit 66bcd29).
  • tests/scanning/tests/platform/ is a trivial git-mv; both are empty stub packages.
  • Codex developer-guide/testing.mdx needs a close-pass rewrite: packages/*src/spectral/*, testcontainers substrate, reworked role fixtures, SECURITY DEFINER test pattern, langgraph schema setup, JWT fixture recipe.
  • GitHub Actions workflow gains a testcontainers-python step when the first integration test lands.
  • Factory module (tests/factories/) lands with the first integration test suite that needs it.
  • spectral_test_agents package’s scan-pipeline E2E backbone is out of scope for TA-23. Resolved by ADR-061 D10 — test-agents are reference implementations, not a test harness.
  • Coverage enforcement starts tracking-only for month 1 (D3 ratchet).
  • D13 validation pass is a prerequisite for the first integration-test PR merge.

References

  • ADR-065spectral.core admission discipline (no surface added here)
  • ADR-031 — single-library structure
  • ADR-032 — schema topology
  • ADR-036 — observability substrate
  • ADR-039 — JWKS-local; session-var RLS
  • ADR-041 — psycopg 3 + session vars + request_scope
  • ADR-042deleted_at convention
  • ADR-043langgraph schema
  • ADR-044 — outbox + SECURITY DEFINER pattern testable
  • ADR-052 — TA-22 first-integration carry-forward additions
  • ADR-053 — TA-26 pre-merge dry-run gate
  • ADR-061 — D11 mechanism; live-drift marker
  • ADR-062 — staging Environment scoping; D7 forward trigger
  • TA-23 disposition — SPEC-326 comment 755a6d50
  • TA-23 verification — SPEC-326 comment cc355c9f
  • TA-22 first-integration carry-forwards — SPEC-326 comment 4380c6d9
  • TA-26 first-integration carry-forwards — SPEC-326 comment c5162d76
  • TA-24/25 hand-offs resolved — SPEC-326 comment 81b1b295
  • tests/conftest.py (commit 66bcd29) — marker enforcement
  • Codex developer-guide/testing.mdx — close-pass rewrite
  • docs/runbooks/testing.md — close-pass operational runbook

Addendum: ADR-004 — SQLite In-Memory for API Test Database

ADR-004 (Accepted 2026-03-21; retired by this ADR) split the test substrate by suite: core-package tests ran against real Postgres (because RLS, SET ROLE, and request.jwt.claims are PostgreSQL-specific), while API tests ran against SQLite-in-memory for HTTP-shape testing on the basis that those tests didn’t exercise RLS.

Why a future reader should know about ADR-004:

  • The “don’t mock the database” doctrine in AGENTS.md rules out the SQLite-in-memory substrate for any test that touches data: mocking the DB hides migration and RLS failures, and the SQLite/Postgres dialect gap is real even for shape tests once forward triggers and CHECK constraints land.
  • Per-epic integration-test acceptance criteria are non-negotiable; an API test that needs database state must hit a real database (testcontainers in CI; supabase start locally) so the integration substrate is shared with the core suite.
  • The driver behind ADR-004 — friction of supabase start for shape tests — is addressed here by transaction-rollback isolation, schema-isolated tests, and per-suite fixtures rather than by swapping substrates.

Git history at the commit retiring ADR-004 preserves the original text.