Skip to content
GitHub
Test Failures

qa-replay green requires the worker's chat cassettes complete AND the Supabase env exported

qa-replay green requires the worker’s chat cassettes complete AND the Supabase env exported

Two independent run-state failures that each look like a code regression but are not. Both surfaced driving the local qa/playwright.config.ts replay gate at a Wave-2 merge.

Problem

A full local qa replay (pnpm exec playwright test --config qa/playwright.config.ts) failed in two clusters that masqueraded as merged-code regressions:

  1. Customer clusterdashboard-reads, portfolio, world-model-card, and the grant- dependent auth-gating specs all failed with SUPABASE_ANON_KEY is required for the customer grant. Operations specs passed (the cockpit does its GoTrue grant server-side via a cookie; only the customer specs run the grant in the test process).
  2. Operations chat clusterauthoring-loop + candidate-review (and their downstream publish-deploy / world-model-card) timed out waiting for getByTestId('turn-assistant'). The worker log showed openai.APIConnectionError and/or CassetteMissError during the 'agent' task. decide-round-trip (which exercises the API’s cassette replay) passed.

Investigation

Cluster 1 — env export

  • The seed steps worked (operator + customer seeded), so the DB and API were fine.
  • The Playwright child process (pnpm exec) inherits only exported env vars.
  • tools/dev/resolve_supabase_env.sh emits bare KEY=val lines (it is designed for >> "$GITHUB_ENV", not for sourcing). eval "$(bash tools/dev/resolve_supabase_env.sh)" sets shell variables without exporting themecho saw SUPABASE_ANON_KEY set, but the Playwright child never received it.

Cluster 2 — incomplete world-agent cassettes

  • The worker WAS in replay mode (env confirmed) and codegen replayed — but the agent’s chat turn made a live call. openai.APIConnectionError is the openai SDK wrapping a transport- layer CassetteMissError after its retry loop (the cassette mock transport raised on a miss).
  • The committed cassette set under qa/cassettes/world-agent-0.1.0__codegen-template-1/ had 3 files; the current authoring loop makes more distinct LLM requests than that (reasoning → codegen → reasoning per propose, plus a codegen turn at approve). The unrecorded request (key 4407035…) missed.
  • This was not a code change — the World Agent / codegen prompt path was byte-identical to the pre-merge baseline. The recorded set was simply incomplete for a clean cross-session replay.

Root Cause

  1. Env: resolve_supabase_env.sh output is unexported when eval’d, so SUPABASE_ANON_KEY (and SUPABASE_URL) never reach the Playwright process that runs the customer GoTrue grant.
  2. Cassettes: the committed world-agent cassettes did not cover every LLM request the authoring/approve flow issues; the missing request raised CassetteMissError, which the openai SDK surfaced as APIConnectionError, so the agent turn never settled.

Solution

Env — auto-export when sourcing

Terminal window
# WRONG — sets unexported shell vars; the pnpm/playwright child can't see them
eval "$(bash tools/dev/resolve_supabase_env.sh)"
# RIGHT — auto-export everything the script sets
set -a; eval "$(bash tools/dev/resolve_supabase_env.sh 2>/dev/null)"; set +a
export SPECTRAL_OPERATOR_PASSWORD=operator-dev-password \
SPECTRAL_QA_CUSTOMER_PASSWORD=qa-customer-dev-password \
SPECTRAL_QA_DECISION_KEY=sp_live_qaFIXEDtestkey000000000000000000000000
pnpm exec playwright test --config qa/playwright.config.ts

Cassettes — re-record against the live provider, then replay-verify

The agent path was unchanged, so the fix is to complete the recorded fixtures, not the code:

Terminal window
# 1. Boot in RECORD mode (needs the live xai cred — the default World Agent model is xai/grok-4.3)
SPECTRAL_LLM_CASSETTE_MODE=record SPECTRAL_LLM_CASSETTE_DIR="$PWD/qa/cassettes" \
tools/dev/start.sh --full
uv run python tools/dev/cold_start_seed.py # operator, so qa_record can authenticate
uv run python tools/dev/qa_record.py # drives propose + approve; records every Grok call
# 2. Verify byte-reproducible: reboot in REPLAY, db reset, reseed, then
uv run python tools/dev/qa_record.py # credential-free replay self-check
pnpm exec playwright test --config qa/playwright.config.ts # expect 105 passed / 6 skipped / 0 failed
# 3. Redaction-scan before committing the cassettes
uv run python tools/quality/check_cassette_redaction.py

Prevention

Best Practices

  • Always set -a around eval’ing resolve_supabase_env.sh for any process that spawns a child needing the Supabase env (Playwright, seed scripts driven via pnpm).
  • Treat openai.APIConnectionError in the worker during a 'agent' task as a cassette miss first (the SDK wraps transport exceptions), not a network/cred problem — grep the worker log for CassetteMissError and the missing key … to confirm.
  • When the World Agent prompt/tooling or the authoring flow changes (and at least when chat specs miss), re-record with qa_record.py rather than hand-editing cassettes.

Warning Signs

  • SUPABASE_ANON_KEY is required on customer specs while operations specs pass → env not exported.
  • Chat specs time out on turn-assistant with APIConnectionError/CassetteMissError in the worker, while decide-round-trip passes → incomplete/stale world-agent cassettes (the API’s decide-path cassettes are separate and may be fine).

References

  • tools/dev/qa_record.py (record driver + credential-free replay self-check)
  • tools/dev/resolve_supabase_env.sh, qa/README.md
  • .github/workflows/ci.yml qa-replay job (authoritative env + run sequence)
  • Sibling run-state gotchas: docs/solutions/test-failures/qa-replay-latent-drift-classes.md