Local qa-replay run before merge catches latent drift the unit suites cannot see
Local qa-replay run before merge catches latent drift the unit suites cannot see
Problem
The SPEC-604 merge gate ran the full-stack qa replay suite locally and found four latent breaks already on main — none caused by the branch under review. The suite had simply not been executed since several earlier merges (FORCE-RLS, the masked-identifier pass, the version-history overhaul): under direct-merge-to-main, CI only runs when something is pushed, so the repo accrues qa drift invisibly while every unit suite stays green.
Root Cause(s) — the four drift classes
SET ROLE platform_roleworlds reads withoutapp.world_id(tools/dev/qa_customer_seed.py::_deploy). UnderFORCE ROW LEVEL SECURITY(SPEC-564) the enshrined-rule SELECT returned zero rows →NoEnshrinedRulesError. This is the second confirmed instance of the class predicted when SPEC-564 landed (the first was integration-test teardown). Fix:SELECT set_config('app.world_id', <world>, false)afterSET ROLEin any tooling that reads/writes worlds tables asplatform_role.- qa helpers coupled to UI display conventions.
_worlds.ts createWorldread the created world id from the success panel’s<code>text, which the masked-identifier pass (730218e1) reduced to a last-6 handle — so every world-scoped test drove/worlds/<6-chars>/…and 422’d. Fix on both sides of the convention: the masked<code>carries the full id indata-id/title(which the convention promises), and the helper readsdata-idwith a text fallback. Corollary: a test that assertstoContainText(worldId)against a masked display only “passed” while the helper was returning the short id — assertworldId.slice(-6)visible +[data-id="${worldId}"]present instead. - Copy assertions vs. overhauled surfaces.
publish-deploy.spec.tsasserted “Version 1” / “Published:” — pre-version-history-overhaul row copy. Any surface rework must re-run the suite or the assertions rot. - Bare
or()locators with co-visible alternatives. Playwrightor()fails strict mode when more than one alternative is visible at once (empty-state text + section heading; alert + version line). Racy against parallel workers that change data mid-run. Fix: append.first()when alternatives can co-render.
Solution
Run the replay gate locally as part of every merge gate while merges go directly to main (sequence and gotchas):
bash tools/dev/start.sh --stop && supabase db resetSPECTRAL_LLM_CASSETTE_MODE=replay SPECTRAL_LLM_CASSETTE_DIR=qa/cassettes \ XAI_API_KEY=placeholder-for-replay tools/dev/start.sh --fullset -a; eval "$(bash tools/dev/resolve_supabase_env.sh)"; set +a # qa needs SUPABASE_ANON_KEYuv run python tools/dev/cold_start_seed.pySPECTRAL_MODULE_STORE_ROOT="$PWD/.local/module-store" \ uv run python tools/dev/qa_customer_seed.py # must match the booted storeSPECTRAL_OPERATOR_PASSWORD=… SPECTRAL_QA_CUSTOMER_PASSWORD=… SPECTRAL_QA_DECISION_KEY=… \ pnpm exec playwright test --config qa/playwright.config.tsBoot-state pitfall: a lingering old API on :8000 makes the replay-mode boot
log Address already in use while /health still answers from the stale
process — kill the pid-file processes AND whatever holds the ports before
re-booting, or the suite runs against a non-replay API.
Two more run-state pitfalls (Wave 0 merge train, 2026-06-09):
- Stale
spectral_workersfrom a prior session starve chat replay.start.sh --stopdoes not reliably kill an orphaned worker runtime; a leftover one competes for agent-task chat rows, so World-Agent-driven scenarios (authoring-loop, candidate-review, world-model-card, publish-deploy) time out waiting forturn-assistantwhile every other test passes. Before the replay boot: kill the port holders (8000/3000/3001) ANDpkill -f spectral_workers. (One worker runtime shows as two processes — the multiprocessing spawn child is normal.) - The first suite run against a cold stack flakes on chat-heavy
beforeAllhooks — absorbed by a per-surface warm-up setup project. The first chat-driving hook on a freshly booted stack pays every one-time cost at once: Vite on-demand compilation of the Assistant route graph, the worker’s first agent-task (LangGraph compile + model-client init + cassette load + Realtime channel join), and cold DB pools — which exceeded the 60sturn-assistantwait and failed whichever chat spec ran first. Boot-script health-waits only prove the HTTP listeners answer, not that those paths are warm. The config now wires a Playwright setup project per surface (<surface>/tests/_warmup.setup.ts, adependenciesof the surface project) that pays that cost ONCE against a generous bound — the operations warm-up drives a full chat propose round-trip; the customer warm-up signs in and loads the portfolio — so every timed spec runs warm. The first run is now the real verdict; there is no “throwaway run” step. A cold full run goes green in one pass (operations 67 passed, customer 60 passed, 0 fail). Any first-run failure is real — selector, assertion, 4xx, or genuine cold-path regression — investigate it (the Wave 0 train caught a real rail-status bug exactly this way). Note: CI setsretries: 2, which used to silently mask this class on the first retry; locallyretries: 0exposed it. - The worker Realtime bridge degrades after many warm runs in one
session. Across a long merge train (~10+ qa-replay runs without a
reboot, Wave 1 cockpit), the worker’s Realtime WebSocket connections
start failing (
code: 1006+join push timeout for channel realtime:world-agent:conversation:…in.workers.log), which starves every chat-seedbeforeAlland makes the suite runtime balloon (35s → 1.8m → 3.6m) with progressively more chat-dependent specs failing. This is NOT branch drift — it is accumulated stack state. The warm-re-run rule alone does not recover it; a clean reboot does (start.sh --stop+ kill port-holders +pkill -f spectral_workers+supabase db reset+ reboot + reseed). Reboot the stack every few merges across a long train, and reboot-then-verify the moment runtime balloons or the failing set grows between runs.
Prevention
- Treat “unit suites green” as necessary, not sufficient, for changes that touch UI copy, display conventions, seeds, or RLS posture — each has a qa-side consumer the unit suites never execute.
- A UI copy/convention sweep must cover
qa/— generated*.spec.ts, the NL spec sources underqa/*/specs/(regen would resurrect stale assertions), and the helper files (_*.ts— they are not*.spec.tsand a spec-glob grep misses them). - Grep
SET ROLE platform_roleacrosstools/andtests/whenever an RLS posture change lands; every hit needs anapp.world_idaudit.
References
- SPEC-604 merge
a31899ea; gate-repair commit283b465 - SPEC-564 FORCE-RLS merge
7a72ec99; masked-identifier pass730218e1