Skip to content
GitHub
Decisions

ADR-100: Rule-provenance model — four axes and authoritative-source taxonomy reconciliation

Context

There are two pictures of rule provenance in the corpus, and they do not match.

The designed model (Codex world-model.mdx “Two-dimensional provenance” + ADR-080 D4 + ADR-082 D2) is richer than a single field. A rule carries:

  • a two-dimensional provenance — an authoritative-source dimension (a source-strength taxonomy, designed as Authoritative / Curated / Distilled / Observed) and a code dimension (the generated predicate’s lineage to the natural-language rule, the world-agent version, the eval-suite version, the generation-run identifier); plus
  • a separate severity tier axis (T1/T2/T3, driving suppression and aggregation); and
  • a separate lifecycle status axis (candidate | enshrined | retired).

The World Model Card (Codex system-card.mdx) discloses the authoritative-source composition per version — the reader counts how many rules sit at each source-strength tier to judge how strong the version’s “established and governed before the module ran” claim is.

The implemented model ships a single domain enum ProvenanceTier = inferred | observed | asserted (src/spectral/worlds/domain/authoring/types.py, SPEC-446/447), stored on worlds.rules, with a ChangeProvenanceTier use case and a tier_changed audit action. This alphabet matches none of the four designed axes: it is neither the source-strength taxonomy (it has no Authoritative/Curated/Distilled/Observed), nor the code dimension, nor the severity tier, nor the lifecycle. It encodes an authorship/confirmation concept (inferred = system-derived, observed = operator-confirmed, asserted = operator-stated) that overlaps the lifecycle and the created_by/citation facts already recorded elsewhere.

The implementation cites ADR-026 as its authority. ADR-026 defines no provenance alphabet — it decides version-as-authority and names three restatement categories (assertion change, code regeneration, behavioral correction). The citation is unsupported.

The drift has a concrete downstream cost. The WorldModelCard publication event carries rule_health_distribution: dict[str, Any] (worlds.contracts.events.world_model_card_published), projected into the platform System Card (system-card.mdx provenance summary), which the disclosure already expects to compose over the four-tier source taxonomy. With the shipped enum the card would render the wrong alphabet — inferred/observed/asserted counts in place of a source-strength distribution. ADR-090 D2 added a fifth concern: web-research-sourced rules need a provenance marker, and D2 explicitly left the choice open (“extended Distilled with sub-classification, or a new Researched tier”).

This ADR reconciles the model to the designed four axes (a fifth — the emitted outcome — is ratified in ADR-106; see D1), fixes the source-taxonomy value set (including the web-research tier ADR-090 D2 left open), discards the unsupported enum, and corrects the citation. It specifies the target; the implementation reconciliation is deferred to a later session (D6).

Decision

D1 — Rule provenance is two-dimensional; severity and lifecycle are separate co-disclosed axes

A rule carries provenance along two dimensions:

  • (a) authoritative-source dimensionwhere the rule came from, a source-strength taxonomy (D2).
  • (b) code dimensionwhere the predicate code came from, the generated code’s lineage to the natural-language rule (D5).

Two further axes are co-disclosed with provenance on the World Model Card but are not provenance, and must not be conflated with it:

  • severity tierT1/T2/T3, governing suppression and aggregation (world-model.mdx “Severity tiers”).
  • lifecycle status — the three-status machine candidate → enshrined → retired.

Four axes are named here. A fifth independent axis — the rule’s emitted outcome (the four-state status a matched rule contributes: GREEN | GREEN-SKIP | YELLOW | RED) — is ratified in ADR-106, which also implements the winner_takes_all aggregation that consumes severity: severity orders which matched rule’s outcome wins; it does not generate the outcome. Each axis is independent: a T1 rule may be Distilled; an Authoritative rule may still be a candidate; the independence holds across all five axes. Collapsing any pair into one enum (as the shipped inferred/observed/asserted did) is the defect this ADR corrects.

D2 — Authoritative-source axis: a six-value source-strength taxonomy

The authoritative-source dimension has six values, strongest to weakest:

authoritative > curated > distilled > researched > observed > assistant_drafted

  • authoritative — sourced directly from a recognized domain authority (regulatory text, standards-body publications, published specifications). The strongest “precedes the system” claim: grounded in sources established independently of any AI system’s behavior.
  • curated — sourced from high-quality secondary material with traceable lineage (expert-written references, peer-reviewed analysis, well-attributed domain literature). Strong, but secondary to the primary sources of authoritative.
  • distilled — derived from LLM synthesis over operator-supplied primary or secondary sources without a direct quotation chain. Useful for coverage; weaker provenance, subject to drift if the underlying sources change before re-distillation.
  • researched — derived from sources the World Agent located through bounded web research (D3), not supplied by the operator. Ranked below distilled because the source selection itself is machine-mediated.
  • observed — derived from patterns in live decision traffic via override-pattern signals from the Customer Dashboard. It describes behavior already present in customer-flagged decisions, arriving after operational practice rather than before it.
  • assistant_drafted (SPEC-654) — drafted by the World Agent on an operator’s direction from a chat prompt, with no source corpus behind it. The weakest source-strength: unlike every family above it, it cites no source material at all — not operator-supplied (distilled), not machine-located (researched), not observed in live traffic (observed). Its grounding is operator intent expressed in chat, captured in the rule’s provenance_source envelope (the directing operator + the chat session), not a source the reader can independently weigh. Zero cited sources is the expected state for this family — the former distilled default produced a contradiction (“distilled from source material” alongside zero cited sources) that the operator surface reported as “No recorded origin · 0 cited sources · distilled tier”; this family resolves it. The design/positioning session that motivated assistant_drafted is recorded in ADR-103 D7.

The value set is stored as a domain enum on worlds.rules.provenance_tier. The enum is named AuthoritativeSourceTier (recommended for clarity over a corrected-but-renamed ProvenanceTier, which would still read as “the one provenance field” and invite re-collapse). Token casing is lowercase in the database and the enum (authoritative, curated, distilled, researched, observed, assistant_drafted); Codex prose may title-case. The tier is assigned at distillation/authoring time per ADR-090 D1 — a fact about how the rule was sourced, set when it is sourced.

D3 — Web-research provenance is a distinct researched tier, ranked below distilled

ADR-090 D2 left the web-research marker open between “extended Distilled with sub-classification” and “a new Researched tier.” This ADR resolves it as a new researched tier, not an extended-distilled sub-classification.

Rationale: a web-found source is machine-selected. The World Agent decides what to research and which results to adopt — more AI-mediation and weaker governance than distilled, where the operator supplied the corpus and the synthesis is over an operator-chosen body. The two are different source-governance regimes, not shades of one, so the World Model Card must count them separately for an honest composition disclosure. A sub-classification of distilled would hide researched rules inside the distilled count and overstate the version’s governance. researched ranks below distilled and above observed: weaker than operator-supplied distillation, stronger than after-the-fact traffic observation. This closes ADR-090 D2’s open choice.

D4 — The implemented inferred/observed/asserted enum is discarded; ChangeProvenanceTier and tier_changed are retired

The shipped ProvenanceTier = inferred | observed | asserted enum is discarded. The authorship/confirmation concept it encoded is not lost — it is subsumed by axes that already exist:

  • who authored / confirmed it → the lifecycle status (D1) plus created_by.
  • what it was sourced from → the provenance_source citation recorded per rule.
  • the history of those facts → the authoring audit trail.

No separate enum is needed to carry it.

The ChangeProvenanceTier use case and the tier_changed audit action are retired. Source strength is a fact set once at sourcing, not an operator confidence dial to be turned later. A genuine re-sourcing of a rule — its underlying authority changed, or a stronger source was found — is a substantive change to the rule and flows through the ADR-026 restatement mechanism (an assertion-change or rule restatement, recorded in release notes), not through an in-place tier mutation that leaves no version boundary.

The unsupported ADR-026 citation is corrected. ADR-026 governs version-as-authority and restatement; it does not define a provenance alphabet. The authority for the authoritative-source taxonomy is this ADR + ADR-082 D2 + ADR-080 D4 + Codex world-model.mdx.

D5 — Code-provenance lives in the module manifest; severity is a rule column; the card re-types to a structured summary

  • Code dimension (D1(b)) is not a worlds.rules authoring column. It lives in the ADR-080 D4 module manifest — world_agent_version, eval_framework_version (eval-suite version), and the generation-run identifier — and is projected onto the World Model Card from there. The code dimension is regenerated when the natural-language form or configuration dependencies change; it has a module-build cadence, not a rule-authoring cadence, which is why it belongs to the manifest and not to a rule column.
  • Severity tier (T1/T2/T3) is a new worlds.rules column (default t2), a separate axis from source provenance per D1.
  • World Model Card re-typing. The rule_health_distribution: dict[str, Any] field on the WorldModelCard publication event (worlds.contracts.events.world_model_card_published) is re-typed to a structured provenance_summary keyed by the six source tiers (authoritative/curated/distilled/researched/observed/assistant_drafted), so the System Card disclosure renders a typed composition rather than an opaque dict. A content-contract test pins the six-tier alphabet so producer and consumer cannot drift off it (bilateral contract test per ADR-065 D6).

D6 — This is a pre-Stream-E reconciliation; lifecycle remains a separate axis

This reconciliation lands before Stream E. The held Stream-E plan (SPEC-493/514) assumed the distilled tier already existed in the implementation; it does not (the shipped enum is inferred/observed/asserted). Settling the source taxonomy here removes that false premise before the Stream-E rule work plans against it.

The lifecycle axis is intentionally separate from provenance and remains candidate | enshrined | retired. Enshrined means accepted into the world rule catalog, not published, deployed, frozen, or caller-visible. Publication/deployment and input-contract freeze are version-boundary concerns handled outside this ADR, notably by the action ontology reconciliation work in ADR-107.

Consequences

  • The authoritative-source axis is six values (authoritative > curated > distilled > researched > observed > assistant_drafted; the sixth added by SPEC-654), closing ADR-090 D2. The card composition disclosure has a stable alphabet to count over.
  • The shipped ProvenanceTier = inferred | observed | asserted enum, the ChangeProvenanceTier use case, and the tier_changed audit action are retired; the enum is replaced by AuthoritativeSourceTier on worlds.rules.provenance_tier. Re-sourcing flows through ADR-026 restatement.
  • A new worlds.rules severity column (T1/T2/T3, default t2) makes severity a first-class axis distinct from provenance. Code provenance stays in the ADR-080 D4 manifest and is not duplicated as a rule column.
  • The WorldModelCard publication event’s rule_health_distribution: dict[str, Any] re-types to a structured provenance_summary keyed by the source tiers (additive-event-versioning discipline per ADR-044 D11; consumer ACL + bilateral contract test per ADR-065 D2/D4/D6). SPEC-654 adds the assistant_drafted count as an additive field (ge=0 default 0) under the same discipline — no wire-version bump. The platform System Card projection consumes the typed shape.
  • The ADR-026 citation in the implementation is corrected to this ADR + ADR-082 D2 + ADR-080 D4; ADR-026 itself is unchanged in substance (cite-correction only).
  • Implementation reconciliation is deferred. The migration (enum rename + value set + new severity column + event re-type) and the test-reference reconciliation (~85 references to the discarded enum and tier_changed) are a later session’s work; this ADR specifies the target.
  • Lifecycle is not expanded (D6). It remains a separate three-status axis from provenance, severity, outcome, and code provenance.

References

  • ADR-026 — world-model version as unit of authority + restatement mechanism (citation corrected here; not substantively changed; re-sourcing flows through its restatement categories)
  • ADR-044 D11 — additive payload versioning for the re-typed WorldModelCard event
  • ADR-065 D2/D4/D6 — producer-typed event payload + consumer ACL + bilateral contract test for the provenance_summary alphabet
  • ADR-080 D4 — build-provenance attestation in the module manifest; home of the code-provenance dimension (D5)
  • ADR-082 D2 — version-scoped World Model Card content: tier + assertion provenance + code provenance
  • ADR-090 D1/D2 — distillation-as-authoring tier assignment; web-research gap-fill (D2’s open tier choice closed here)
  • Codex world-model.mdx — two-dimensional provenance, severity tiers, and the three-status lifecycle
  • Codex system-card.mdx — World Model Card provenance-composition disclosure consuming the provenance_summary
  • Codex evolution-loop.mdx — gates and publication transaction that set provenance and lifecycle