Operations

Verify Remediation

Generated

This page is generated from qa/operations/specs/verify-remediation.md — the source of truth. Edit the spec, not this page.

Last run: not yet recorded (run the replay suite to populate status).

Overview

When the operator runs verify-at-review on a proposed rule (“Generate & run checks”), the rule’s deterministic logic is generated and its checks run. A failure carries a remediation tier that decides what the operator can do about it:

Semantic — the natural-language rule itself is ambiguous or underspecified, so consistent logic can’t be generated. The operator never edits code; instead they get a distinct treatment with the actionable feedback and a Restate in chat call-to-action that opens the World-Agent authoring chat pre-filled with the rule and the feedback. Restating the rule re-extracts the spec and regenerates the logic (the existing revise→re-verify re-entry).
Mechanical — the agent’s own bounded auto-repair already ran and exhausted; the operator sees the plain actionable feedback (today’s treatment) and can retry or send the rule back. No chat-steer CTA.

A successful verify shows neither failure treatment.

Preconditions

Signed in as the seeded operator.
A ruleset with a proposed rule whose verify-at-review yields a semantic failure — i.e. an ambiguous / underspecified rule (the semantic fixture rule, minted via the chat propose turn and served from its recorded cassette).
On the proposed-rule detail / review surface, with the proposed rule selected.

Scenarios

1. A semantic failure shows the distinct chat-steer treatment

Select the ambiguous proposed rule and run “Generate & run checks”
Expected: A distinct semantic-remediation treatment is shown (not the plain error line): the actionable feedback plus a Restate in chat call-to-action. The operator is not offered any code-editing affordance.

2. The Restate CTA routes into chat-steer, pre-filled

From the semantic treatment, activate Restate in chat
Expected: The operator lands on the World-Agent authoring chat (the Assistant page) for the same ruleset, with the composer pre-filled with the rule’s text and the codegen feedback so they can restate it.

3. A mechanical failure keeps the plain feedback treatment (unit-covered)

Run verify-at-review on a rule whose failure is mechanical (auto-repair exhausted)
Expected: The plain actionable-feedback treatment is shown (the “could not generate the deterministic logic” line) with no Restate-in-chat call-to-action.
Coverage: the live codegen cannot be steered to the mechanical tier on demand (a deterministic mechanical failure is impractical to record), so this branch is covered by the apps/operations RTL unit test (candidate-verify.dom.test.tsx), not the e2e replay.

4. A successful verify shows neither failure treatment (unit-covered)

Run verify-at-review on a rule that compiles cleanly
Expected: Neither the semantic treatment nor the plain error line is shown; the generated deterministic logic surfaces in the evidence bundle.
Coverage: the success branch is covered by the same RTL unit test; the e2e replay focuses on the genuinely-new semantic chat-steer surface (scenarios 1–2).

Test Data

Label	Value	Notes
Semantic fixture rule	`FIXTURE_AMBIGUOUS_RULE_TEXT` — a rule whose verify-at-review codegen cannot generate a predicate satisfying its behavioral spec.	Mirrored in `qa_fixtures.py` / `qa_record.py` / `tests/_chat.ts`; verify-at-review codegen recorded to a `semantic` failure (the codegen is non-deterministic, so the recorder fail-loud asserts the `semantic` classification at record time — re-record if it ever compiles).
Clean fixture rule	`A taxpayer with gross income below the filing threshold need not file.`	The existing `FIXTURE_RULE_TEXT`; used by the authoring-loop / candidate-review warmup.

Previous
States Next
World Model Card