Decisions
ADR-029: US Individual Tax Preparation as the first world-model domain
Status: Accepted (2026-04-20)
Context
The 0.3.0 rebuild needs a first world model domain to:
- Bootstrap the three-source corpus (ADR-022) and the minimum-viable rule set that unblocks end-to-end scan-pipeline integration testing.
- Anchor the
packages/test-agentsharness so CI exercises the full optimization loop against realistic stimuli rather than synthetic dummies (SPEC-249 amendment, per SPEC-293). - Double as a long-term demo and dev test-bed: the domain operators pick has to stay interesting enough to carry Spectral through alpha and onto first customers.
Four domains were on the short list: US Individual Tax Prep, Healthcare Prior Authorization, Legal Contract Analysis, and Building-Code Compliance. Selection criteria:
- Public Tier-1 sources accessible online. The world model’s authority (per ADR-025) depends on traceability to independently-established sources. Paywalled, restricted, or SME-gated material creates a bottleneck at the seeding step.
- Diverse and interesting rules. Toy domains produce toy rules. The rule space has to exercise the full provenance/mutation/holdout machinery.
- Solo-builder validatable. Until first design partners land, the operator authoring rules is the founder. Domains that require a domain expert on call to validate each rule stall dogfooding.
- Long-term demo value. The domain stays in the test-agents harness indefinitely; it must not become embarrassing once customers are live.
Decision
US Individual Federal Tax Preparation (Form 1040 and its schedules) is the first world-model domain.
- Authoritative sources (Tier-1): IRS publications — Pub 17 (overall individual tax guide), Pub 501 (dependents, standard deduction, filing info), Pub 503 (child and dependent care), Pub 526 (charitable contributions), Pub 936 (home mortgage interest), the Form 1040 Instructions, and related schedules (A, B, C, D, etc.). All freely available on irs.gov.
- Initial problem space: filing-status determination + standard deduction. Narrow enough to reach the 50-enshrined-rule minimum-viable floor (per SPEC-239 Corpus Bootstrap) in a tractable time; broad enough to include edge cases that exercise provenance-tier and mutation-test behavior.
- Minimum viable world model: 50 enshrined rules in that initial problem space.
- Healthcare Prior Authorization is preserved as a secondary domain, targeted at the first design partner rather than dev bootstrap — the founder has the expertise, but SME review at every rule would block solo iteration.
Alternatives considered
- Healthcare Prior Authorization (rejected as first domain). Strong founder expertise (real operational experience). Rejected because Tier-1 source material is fragmented across payer-specific clinical policies, many behind login walls; SME review per rule would block solo iteration. Retained as secondary for when a design partner engages.
- Legal Contract Analysis (rejected). Public contracts exist but “rules” are jurisdiction-dependent and often come from case law rather than statutory text. Provenance chains become indirect, and the solo-builder-validatable criterion fails — legal reasoning is not a domain the founder can independently validate.
- Building-Code Compliance (rejected). Codes are public but fragmented across municipalities, with significant local amendments. Bootstrapping a single world model against one jurisdiction is doable but the long-term-demo criterion suffers: customers expect “building code compliance” to mean their jurisdiction, which multiplies the world-model surface.
- Financial Compliance (FINRA / SEC) (rejected). Source material is dense and mostly freely available, but the rule structure is highly procedural and heavily driven by ongoing regulatory guidance. Keeping the world model current would demand continuous source monitoring that the solo builder cannot sustain.
Consequences
packages/test-agentsreshapes around tax. The three prior test agents (Contract Analysis / Codebase Q&A / Knowledge Extraction) are retired. A singletax_prepagent with a pluggable OTEL emitter replaces them — agent-level consolidation, emitter-level format diversity. See SPEC-249 (amended under SPEC-293) and SPEC-278 for the domain reference content.- Seeding dogfoods the Operations app. Operator walkthrough — exploring IRS source material, drafting candidate rules, promoting through review — runs via the Operations app epics SPEC-253/254/255/256. This exercises the Ops-app surface before any customer touches it.
- Public source material simplifies provenance documentation. Every rule can cite its IRS
publication + section. Provenance metadata is carried as typed fields on the producer-owned event
payload per ADR-065 D2 — for tax,
publication,section, andrevisionbecome first-class typed fields on the relevantworlds.contracts.events.*payload model rather than a generic dict. - SME consultation is not blocking for dev iteration. The founder validates rules against public sources during bootstrap; a tax professional can be pulled in before beta if the world model acquires customer-facing weight, but iteration does not wait.
- Healthcare PA re-enters scope with the first design partner. When a healthcare partner engages, a second world model is seeded for prior-auth. Multi-domain support exists from day one by construction — world models are domain-scoped (ADR-001), not singleton.
Related
- ADR-001 — three-context structure (world models are scoped to
spectral.worlds) - ADR-022 — eval generation architecture (three-source corpus the tax domain seeds)
- ADR-025 — system card authority basis (why Tier-1 public sources matter)
- SPEC-278 — Codex tax-domain reference content (the consumer of this selection)
- SPEC-239 — EvalSet corpus bootstrap (50-rule floor)
- SPEC-249 — test-agents consolidation around
tax_prep - Codex: How Spectral Works and the first-customer walkthrough cross-reference this ADR for rationale.