Source Material & Distillation
Operator surface for the distillation pipeline: source material ingestion (storage + provenance anchoring), LLM-guided candidate distillation against source documents, and the distillation-request workflow. Distilled candidates flow through both Evolution Loop gates (implementation-readiness + conformity) and then into the enshrinement queue. Nothing becomes enshrined without human sign-off.
- Source material ingestion — operators upload, parse, and register Authoritative-tier
source documents (statute, regulatory guidance, scholarly publication) into the
SourceMaterialinventory; provenance metadata anchored on every row per ADR-065 - Provenance anchoring — every distilled candidate inherits a provenance trail back to
the
SourceMaterialrows that contributed to its extraction; the chain is end-to-end traceable (per the audit trail below) - LLM-guided distillation — operator-parameterized extraction (focus area, problem
space, expected tier) runs against selected source materials; the pipeline indexes
source documents via the
EmbeddingProviderprotocol per ADR-038 for similarity-based chunk retrieval, then invokes the LLM on retrieved chunks to draftRuleCandidaterows - Distillation-request workflow — operators submit a run, monitor per-run and per-source status, and review output candidates
- Two-gate gating — distilled candidates pass through both the implementation-readiness gate (the World Agent generates predicate code and verifies it matches the natural-language form) and the conformity gate (authoritative-source verification + no-contradiction-with-existing-rules) before reaching the enshrinement queue per Evolution Loop
Governance invariants
Section titled “Governance invariants”- Distillation is a proposal path, not a commit path. Distilled candidates are staged for operator review and pass through the conformity gate; they do not auto-enshrine.
- Source materials carry durable provenance. Candidates distilled from a source
inherit that source’s provenance chain via the
distillation_run_outputsjunction (per domain-model — DistillationRunOutput). - SourceMaterial is append-only. Metadata corrections produce a new row plus retire of the prior; URL re-parsing produces a new ingestion run, not in-place mutation. Retirement is a state transition (no row deletion). The same retire-as-state-transition rule that authoring applies to world models, rules, and rule relationships extends uniformly to source materials.
- Distillation runs have no cancel/abort. Operators wait for a terminal state
(
completedorfailed). The state machine is designed for clean extension — cancellation is added later as an additive transition out ofin_progress, not a redesign of the workflow. - Concurrent operator actions on the same source material or distillation run are
serialized at the row level — one wins; the other receives a clean conflict response.
Mirrors the row-level serialization on
ApprovalDecisionand the world-model authoring surfaces. - Every distillation-mutate API is idempotent on
(operator_id, correlation_id). Source-material ingest, source-material retire, and distillation-run submit all carry acorrelation_idrecorded on the correspondingworld_authoring_auditrow (and ondistillation_run.correlation_idfor run submissions); a re-issued request with the same correlation ID returns the original outcome. - Distillation failures are operator-visible. Silent failure is not allowed — per-source
status surfaces partial output and explicit errors; run-level status reaches a terminal
failedstate with a codedfailure_reason_coderetained on STRIP_PAYLOAD plus free-text detail stripped on disposal.
Audit trail
Section titled “Audit trail”Every operator authoring action on this surface appends a row to the
WorldAuthoringAudit record family (per
domain-model — WorldAuthoringAudit).
The discriminators introduced for distillation are:
target_type = source_material,action = ingested— operator ingests a new source materialtarget_type = source_material,action = retired— operator retires a source from the inventorytarget_type = distillation_run,action = requested— operator submits a new distillation run
The audit row is appended alongside the entity-state mutation in the same transaction; it
is never appended outside the originating use-case transaction. The
(operator_id, target_type, target_id, action, occurred_at, correlation_id) row remains
queryable for the full active window per
data-retention — operator-action records.
WorldAuthoringAudit is not agent memory (per the
records-vs-memory framing). It is
operator-scoped (app.user_id) with no domain RLS, distinct from ApprovalDecision
(same identity domain, different action surface — enshrinement-gate decision vs authoring).
The DistillationRun workflow record itself lives alongside the audit row — the audit
row records the operator action (“operator X requested distillation Y at time Z”); the
DistillationRun row carries the workflow lifecycle state (status, started_at,
completed_at, failure_reason_code) and is the system of record for run progress.
UI surfaces
Section titled “UI surfaces”This workflow lives in the ruleset workspace’s Sources tab, where the source library, the “pull candidate rules from a source” action, and the in-progress extraction sit together. Candidate triage is the separate Review tab.
The Sources tab folds together:
- Source library — list / inspect / ingest / retire source materials; shows provenance envelope (publication, section, revision)
- Distillation request — operator-parameterized form (world model, source selection,
focus area, problem space, expected tier) submitting a new
DistillationRun - In-progress extraction — a live view of run lifecycle and per-source extraction state (live progress, step timeline) inline below the library; surfaces partial output as it lands; explicit error display on per-source failure
Produced RuleCandidate rows surface in the Review tab — the two-pane review queue with
the full provenance trail (run → source materials → candidate) in the evidence bundle, where
the operator advances candidates through the conformity gate or retires them in place.
Every UI surface has a corresponding API endpoint per the API/UI parity rule — the UI is a thin client over the same routes.
API surfaces
Section titled “API surfaces”The HTTP routes the UI consumes as an API client (per the parity contract) include:
POST /worlds/:world_id/source-materials— operator ingest of a new source material (idempotent on(operator_id, correlation_id))POST /worlds/:world_id/source-materials/:id/retire— operator retire of an existing source material (idempotent)POST /worlds/:world_id/distillations— operator submit of a new distillation run (idempotent on(requested_by, correlation_id))GET /worlds/:world_id/distillations/:id— run status (run-level + per-source)GET /worlds/:world_id/distillations/:id/candidates— producedRuleCandidaterows with provenance trail
Auth middleware sets app.user_id via SET LOCAL per
ADR-041 D4; RLS predicates compare against
app.user_id per ADR-039. The
apps/operations deployable consumes these routes via HTTP per
ADR-047 and never imports
spectral.worlds internals; the architecture validator at STRICT=True enforces this.
Related reading
Section titled “Related reading”- Operations App overview — API/UI parity, context boundary
- Authoring — sibling operator surface for world-model + rule + relationship authoring (parallel audit family)
- World Model System / Evolution Loop — the governed path candidates travel
- Domain Model — SourceMaterial / DistillationRun — entity definitions
- Data Retention — POLICY_REGISTRY entries for source materials and distillation runs
- Source materials (tax-prep) — IRS publication inventory backing the initial world model