Skip to content
GitHub
Operations

Verify Remediation

Generated

This page is generated from qa/operations/specs/verify-remediation.md — the source of truth. Edit the spec, not this page.

Last run: not yet recorded (run the replay suite to populate status).

Overview

When the operator runs verify-at-review on a proposed rule (“Generate & run checks”), the rule’s deterministic logic is generated and its checks run. A failure carries a remediation tier that decides what the operator can do about it:

  • Semantic — the natural-language rule itself is ambiguous or underspecified, so consistent logic can’t be generated. The operator never edits code; instead they get a distinct treatment with the actionable feedback and a Restate in chat call-to-action that opens the World-Agent authoring chat pre-filled with the rule and the feedback. Restating the rule re-extracts the spec and regenerates the logic (the existing revise→re-verify re-entry).
  • Mechanical — the agent’s own bounded auto-repair already ran and exhausted; the operator sees the plain actionable feedback (today’s treatment) and can retry or send the rule back. No chat-steer CTA.

A successful verify shows neither failure treatment.

Preconditions

  • Signed in as the seeded operator.
  • A ruleset with a proposed rule whose verify-at-review yields a semantic failure — i.e. an ambiguous / underspecified rule (the semantic fixture rule, minted via the chat propose turn and served from its recorded cassette).
  • On the proposed-rule detail / review surface, with the proposed rule selected.

Scenarios

1. A semantic failure shows the distinct chat-steer treatment

  • Select the ambiguous proposed rule and run “Generate & run checks”
  • Expected: A distinct semantic-remediation treatment is shown (not the plain error line): the actionable feedback plus a Restate in chat call-to-action. The operator is not offered any code-editing affordance.

2. The Restate CTA routes into chat-steer, pre-filled

  • From the semantic treatment, activate Restate in chat
  • Expected: The operator lands on the World-Agent authoring chat (the Assistant page) for the same ruleset, with the composer pre-filled with the rule’s text and the codegen feedback so they can restate it.

3. A mechanical failure keeps the plain feedback treatment (unit-covered)

  • Run verify-at-review on a rule whose failure is mechanical (auto-repair exhausted)
  • Expected: The plain actionable-feedback treatment is shown (the “could not generate the deterministic logic” line) with no Restate-in-chat call-to-action.
  • Coverage: the live codegen cannot be steered to the mechanical tier on demand (a deterministic mechanical failure is impractical to record), so this branch is covered by the apps/operations RTL unit test (candidate-verify.dom.test.tsx), not the e2e replay.

4. A successful verify shows neither failure treatment (unit-covered)

  • Run verify-at-review on a rule that compiles cleanly
  • Expected: Neither the semantic treatment nor the plain error line is shown; the generated deterministic logic surfaces in the evidence bundle.
  • Coverage: the success branch is covered by the same RTL unit test; the e2e replay focuses on the genuinely-new semantic chat-steer surface (scenarios 1–2).

Test Data

LabelValueNotes
Semantic fixture ruleFIXTURE_AMBIGUOUS_RULE_TEXT — a rule whose verify-at-review codegen cannot generate a predicate satisfying its behavioral spec.Mirrored in qa_fixtures.py / qa_record.py / tests/_chat.ts; verify-at-review codegen recorded to a semantic failure (the codegen is non-deterministic, so the recorder fail-loud asserts the semantic classification at record time — re-record if it ever compiles).
Clean fixture ruleA taxpayer with gross income below the filing threshold need not file.The existing FIXTURE_RULE_TEXT; used by the authoring-loop / candidate-review warmup.