Test Failures
qa-replay green requires the worker's chat cassettes complete AND the Supabase env exported
qa-replay green requires the worker’s chat cassettes complete AND the Supabase env exported
Two independent run-state failures that each look like a code regression but are not. Both
surfaced driving the local qa/playwright.config.ts replay gate at a Wave-2 merge.
Problem
A full local qa replay (pnpm exec playwright test --config qa/playwright.config.ts) failed
in two clusters that masqueraded as merged-code regressions:
- Customer cluster —
dashboard-reads,portfolio,world-model-card, and the grant- dependentauth-gatingspecs all failed withSUPABASE_ANON_KEY is required for the customer grant. Operations specs passed (the cockpit does its GoTrue grant server-side via a cookie; only the customer specs run the grant in the test process). - Operations chat cluster —
authoring-loop+candidate-review(and their downstreampublish-deploy/world-model-card) timed out waiting forgetByTestId('turn-assistant'). The worker log showedopenai.APIConnectionErrorand/orCassetteMissErrorduring the'agent'task.decide-round-trip(which exercises the API’s cassette replay) passed.
Investigation
Cluster 1 — env export
- The seed steps worked (operator + customer seeded), so the DB and API were fine.
- The Playwright child process (
pnpm exec) inherits only exported env vars. tools/dev/resolve_supabase_env.shemits bareKEY=vallines (it is designed for>> "$GITHUB_ENV", not for sourcing).eval "$(bash tools/dev/resolve_supabase_env.sh)"sets shell variables without exporting them —echosawSUPABASE_ANON_KEYset, but the Playwright child never received it.
Cluster 2 — incomplete world-agent cassettes
- The worker WAS in replay mode (env confirmed) and codegen replayed — but the agent’s chat
turn made a live call.
openai.APIConnectionErroris the openai SDK wrapping a transport- layerCassetteMissErrorafter its retry loop (the cassette mock transport raised on a miss). - The committed cassette set under
qa/cassettes/world-agent-0.1.0__codegen-template-1/had 3 files; the current authoring loop makes more distinct LLM requests than that (reasoning → codegen → reasoning per propose, plus a codegen turn at approve). The unrecorded request (key 4407035…) missed. - This was not a code change — the World Agent / codegen prompt path was byte-identical to the pre-merge baseline. The recorded set was simply incomplete for a clean cross-session replay.
Root Cause
- Env:
resolve_supabase_env.shoutput is unexported wheneval’d, soSUPABASE_ANON_KEY(andSUPABASE_URL) never reach the Playwright process that runs the customer GoTrue grant. - Cassettes: the committed world-agent cassettes did not cover every LLM request the
authoring/approve flow issues; the missing request raised
CassetteMissError, which the openai SDK surfaced asAPIConnectionError, so the agent turn never settled.
Solution
Env — auto-export when sourcing
# WRONG — sets unexported shell vars; the pnpm/playwright child can't see themeval "$(bash tools/dev/resolve_supabase_env.sh)"
# RIGHT — auto-export everything the script setsset -a; eval "$(bash tools/dev/resolve_supabase_env.sh 2>/dev/null)"; set +aexport SPECTRAL_OPERATOR_PASSWORD=operator-dev-password \ SPECTRAL_QA_CUSTOMER_PASSWORD=qa-customer-dev-password \ SPECTRAL_QA_DECISION_KEY=sp_live_qaFIXEDtestkey000000000000000000000000pnpm exec playwright test --config qa/playwright.config.tsCassettes — re-record against the live provider, then replay-verify
The agent path was unchanged, so the fix is to complete the recorded fixtures, not the code:
# 1. Boot in RECORD mode (needs the live xai cred — the default World Agent model is xai/grok-4.3)SPECTRAL_LLM_CASSETTE_MODE=record SPECTRAL_LLM_CASSETTE_DIR="$PWD/qa/cassettes" \ tools/dev/start.sh --fulluv run python tools/dev/cold_start_seed.py # operator, so qa_record can authenticateuv run python tools/dev/qa_record.py # drives propose + approve; records every Grok call
# 2. Verify byte-reproducible: reboot in REPLAY, db reset, reseed, thenuv run python tools/dev/qa_record.py # credential-free replay self-checkpnpm exec playwright test --config qa/playwright.config.ts # expect 105 passed / 6 skipped / 0 failed
# 3. Redaction-scan before committing the cassettesuv run python tools/quality/check_cassette_redaction.pyPrevention
Best Practices
- Always
set -aaroundeval’ingresolve_supabase_env.shfor any process that spawns a child needing the Supabase env (Playwright, seed scripts driven viapnpm). - Treat
openai.APIConnectionErrorin the worker during a'agent'task as a cassette miss first (the SDK wraps transport exceptions), not a network/cred problem — grep the worker log forCassetteMissErrorand the missingkey …to confirm. - When the World Agent prompt/tooling or the authoring flow changes (and at least when chat
specs miss), re-record with
qa_record.pyrather than hand-editing cassettes.
Warning Signs
SUPABASE_ANON_KEY is requiredon customer specs while operations specs pass → env not exported.- Chat specs time out on
turn-assistantwithAPIConnectionError/CassetteMissErrorin the worker, whiledecide-round-trippasses → incomplete/stale world-agent cassettes (the API’s decide-path cassettes are separate and may be fine).
References
tools/dev/qa_record.py(record driver + credential-free replay self-check)tools/dev/resolve_supabase_env.sh,qa/README.md.github/workflows/ci.ymlqa-replay job (authoritative env + run sequence)- Sibling run-state gotchas:
docs/solutions/test-failures/qa-replay-latent-drift-classes.md