Skip to content
GitHub
Developer

Testing runbook

Operational procedures for the local + CI test substrate, per-test isolation, the D13 first-integration validation pass, the D14 trigger ladder, and the backup-nightly bats harness.

System reference: Codex how-to/testing.mdx · ADR-045 · ADR-061.


Local dev DB

supabase start (Supabase CLI) brings up the full Supabase stack on Docker (Postgres + Auth + Storage + Realtime). Migrations apply via supabase db reset.

Terminal window
supabase start # boot the stack
supabase db reset # apply all migrations from supabase/migrations/
supabase status # show connection strings; ports used

Local-CI divergence is bounded because CI tests do not exercise Auth / Storage / Realtime services.


CI DB

testcontainers-python plus supabase/postgres:15 (Postgres-only). Session-scoped:

  1. Boot testcontainer.
  2. Apply migrations.
  3. AsyncPostgresSaver.setup() for langgraph.*.
  4. Install SECURITY DEFINER functions.
  5. Apply inter-context grants.

Per-test isolation: transaction rollback on the async psycopg3 connection. @pytest.mark.no_rollback opt-out for DDL-only tests.


Marker enforcement

Root tests/conftest.py fails collection on any test item missing one of the primary markers (unit, contract, integration, e2e). A fifth primary marker live_drift is used only by the nightly LLM live-drift workflow per ADR-061.

@pytest.mark.unit
async def test_thing(): ...

Run a tier in isolation:

Terminal window
uv run pytest -m "unit"
uv run pytest -m "unit or contract"
uv run pytest -m integration

Role and auth fixtures

FixtureScopePurpose
postgres_testsessiontestcontainer + migrations + langgraph + SECURITY DEFINER fns
dbfunctionasync psycopg3 conn with per-test auto-rollback txn
as_workspace_member(account_id, workspace_id)context mgrSET LOCAL app.account_id / app.workspace_id
as_context_role(context)context mgrSET LOCAL ROLE spectral_{context}_app (txn-scoped)
jwt_for(user_id, workspace_id, scopes)functionPyJWT-signed test JWT with controlled claims
llm_replay_client(fixture_path)functionrecorded-response LLM client

SET LOCAL ROLE is chosen over SET ROLE so the role switch is txn-scoped and rolls back with the test transaction.


D13 first-integration validation pass

Before the first real integration test merges, verify (gate items):

  1. supabase/postgres:15 ships with pgvector enabled (CREATE EXTENSION vector succeeds).
  2. auth.users exists in the image (needed for the core.users mirror FK target). If absent, migrations synthesize a minimal auth.users in a test-only migration.
  3. Asymmetric JWT signing supported (per ADR-039 D4a).
  4. SET LOCAL inside nested transactions (savepoints) preserves the session-var across savepoint boundaries and resets on ROLLBACK — no leakage to subsequent tests.
  5. SET LOCAL ROLE inside per-test txn resets cleanly on ROLLBACK — no residual role on the connection returned to the pool.
  6. PKCE cookie split-reassembly through Cloudflare proxy + Pages Function (per ADR-052 carry-forward).
  7. JWT header-size end-to-end through Cloudflare upstream buffer (per ADR-052 carry-forward).
  8. Branch lifecycle exercise — create branch, apply migrations, run integration tests against branch URL, delete branch — round-trips via the Supabase Management API + the supabase CLI invocation pattern in tools/ops/premerge_dryrun.sh.
  9. Smoke-test invocation contract — single CLI call that takes a branch connection string and exits 0 on green / non-zero on any failure.

Fallback

If supabase/postgres:15 disappoints: vanilla postgres:15 + CREATE EXTENSION vector in a test-setup migration + a synthesized minimal auth.users table.


D14 trigger ladder

TriggerResponse
First-integration-test image-contents anomaly (extension, auth.users, asymmetric JWT)Fall back to vanilla postgres:15 + manual extension install per D13
First SET LOCAL / SET LOCAL ROLE nested-txn leakage observed under per-test rollbackRe-open per-test isolation mechanism; alternatives include schema-reset-per-test or database-per-test
First xdist-attributable flake (deadlock, ordering, shared-state interference)Move to schema-per-worker (not retry-in-place)
First DDL-testing parallelism bottleneckMove to database-per-worker
First partner pilot requiring shared-staging functional-test gateReconsider D12 (staging is not a CI target at alpha); add staging CI target

Coverage floors

Domain ≥ 90%, application ≥ 80%, infrastructure ≥ 60%. Floors land as targets in tools/quality/check_coverage.py scaffold; enforcement starts disabled for the first month after the first real test suite lands. Month 1: tracking only (PR comments). Month 2+: enforce.


LLM testing posture (three tiers)

Per ADR-061:

  • Unit / contractFakeLLMProvider (in spectral.core.llm.testing); deterministic; zero external calls
  • Integration — pytest-recording per-test cassettes at tests/<context>/_fixtures/llm/<test-id>.yaml; replay byte-perfect
  • Live drift detection — nightly workflow (.github/workflows/nightly-live-drift.yml per ADR-061; lands with the deploy substrate); LIVE_PROVIDER=1 env bypasses VCR; compares to recorded cassettes via similarity threshold (0.85 default; per-test override)

Cassette recording sessions:

Terminal window
RECORD_NEW_FIXTURES=1 uv run pytest tests/platform/integration/test_scan.py -m integration

Always review the fresh cassette diff for sensitive content before commit; the cassette redaction lint blocks Authorization: Bearer ... patterns.

See docs/runbooks/llm-testing.md for drift triage + threshold calibration.


Backup-nightly bats + fake-gcs harness

tools/ops/backup/backup-nightly.sh runs pg_dumpagerclone rcat to GCS. The integration test harness uses bats (Bash Automated Testing System) plus fake-gcs-server to exercise the full pipe locally in CI.

The harness lives under tests/ops/backup/ (close-pass scaffold; lands when first integration test consumer needs it). Compose profile backup in infra/local/compose.yml brings up backup-nightly + fake-gcs-server for local exercise:

Terminal window
pnpm compose:up:backup
pnpm compose:run backup-nightly bash tools/ops/backup/backup-nightly.sh
# Verify the dump uploaded to fake-gcs:
curl http://localhost:4443/storage/v1/b/spectral-backups/o

Cassette redaction lint

tools/quality/check_cassette_redaction.py blocks Authorization: Bearer ... patterns and similar in committed cassettes. Wired into pre-push gate. Lands with the first cassette commit (until then, a dead lint with no inputs).


See also

  • ADR-045 — Test substrate
  • ADR-061 — LLM testing strategy
  • ADR-062 — CI secrets handling
  • Codex testing
  • docs/runbooks/llm-testing.md — recording sessions + drift triage
  • docs/runbooks/ci-secrets.md — Environment scoping + rotation