Skip to content
GitHub
Operations

Source Material & Distillation

Operator surface for the distillation pipeline: source material ingestion (storage + provenance anchoring), LLM-guided candidate distillation against source documents, and the distillation-request workflow. Distilled candidates flow into the conformity gate for operator review and then into the enshrinement queue. Nothing becomes enshrined without human sign-off.


  • Source material ingestion — operators upload, parse, and register Authoritative-tier source documents (IRS publications, statute excerpts, regulatory guidance) into the SourceMaterial inventory; provenance metadata anchored on every row per ADR-065
  • Provenance anchoring — every distilled candidate inherits a provenance trail back to the SourceMaterial rows that contributed to its extraction; the chain is end-to-end traceable (per the audit trail below)
  • LLM-guided distillation — operator-parameterized extraction (focus area, problem space, expected tier) runs against selected source materials; the pipeline indexes source documents via the EmbeddingProvider protocol per ADR-038 for similarity-based chunk retrieval, then invokes the LLM on retrieved chunks to draft RuleCandidate rows
  • Distillation-request workflow — operators (or the Operations Agent) submit a run, monitor per-run and per-source status, and review output candidates
  • Conformity gating — distilled candidates pass through the conformity gate before reaching the enshrinement queue (per Evolution Loop)
  • Distillation is a proposal path, not a commit path. Distilled candidates are staged for operator review and pass through the conformity gate; they do not auto-enshrine.
  • Source materials carry durable provenance. Candidates distilled from a source inherit that source’s provenance chain via the distillation_run_outputs junction (per domain-model — DistillationRunOutput).
  • SourceMaterial is append-only. Metadata corrections produce a new row plus retire of the prior; URL re-parsing produces a new ingestion run, not in-place mutation. Retirement is a state transition (no row deletion). The same retire-as-state-transition rule that authoring applies to world models, rules, and rule relationships extends uniformly to source materials.
  • Distillation runs have no cancel/abort. Operators wait for a terminal state (completed or failed). The state machine is designed for clean extension — cancellation is added later as an additive transition out of in_progress, not a redesign of the workflow.
  • Concurrent operator actions on the same source material or distillation run are serialized at the row level — one wins; the other receives a clean conflict response. Mirrors the row-level serialization on ApprovalDecision and the world-model authoring surfaces.
  • Every distillation-mutate API is idempotent on (operator_id, correlation_id). Source-material ingest, source-material retire, and distillation-run submit all carry a correlation_id recorded on the corresponding world_authoring_audit row (and on distillation_run.correlation_id for run submissions); a re-issued request with the same correlation ID returns the original outcome.
  • Distillation failures are operator-visible. Silent failure is not allowed — per-source status surfaces partial output and explicit errors; run-level status reaches a terminal failed state with a coded failure_reason_code retained on STRIP_PAYLOAD plus free-text detail stripped on disposal.

Every operator authoring action on this surface appends a row to the WorldAuthoringAudit record family (per domain-model — WorldAuthoringAudit). The discriminators introduced for distillation are:

  • target_type = source_material, action = ingested — operator ingests a new source material
  • target_type = source_material, action = retired — operator retires a source from the inventory
  • target_type = distillation_run, action = requested — operator submits a new distillation run

The audit row is appended alongside the entity-state mutation in the same transaction; it is never appended outside the originating use-case transaction. The (operator_id, target_type, target_id, action, occurred_at, correlation_id) row remains queryable for the full active window per data-retention — operator-action records.

WorldAuthoringAudit is not agent memory (per the records-vs-memory framing). It is operator-scoped (app.user_id) with no workspace RLS, distinct from ApprovalDecision (same identity domain, different action surface — enshrinement-gate decision vs authoring) and from operations_agent_approval (platform-side, Ops-Agent call-time approval). Three operator-action audit families remain stable across the worlds / platform split.

The DistillationRun workflow record itself lives alongside the audit row — the audit row records the operator action (“operator X requested distillation Y at time Z”); the DistillationRun row carries the workflow lifecycle state (status, started_at, completed_at, failure_reason_code) and is the system of record for run progress.

The operator UI surfaces this workflow as four panels:

  1. Source inventory — list / inspect / ingest / retire source materials; shows provenance envelope (publication, section, revision)
  2. Distillation request — operator-parameterized form (world model, source selection, focus area, problem space, expected tier) submitting a new DistillationRun
  3. Run status — live view of run lifecycle and per-source extraction state; surfaces partial output as it lands; explicit error display on per-source failure
  4. Candidate review — produced RuleCandidate rows with full provenance trail (run → source materials → candidate); operator can advance candidates to the conformity gate or retire them in place

Every UI surface has a corresponding API endpoint per the dual-occupant API/UI parity rule; the Operations Agent consumes the same endpoints to initiate and monitor runs.

The HTTP routes consumed by both the UI and the Operations Agent (per the parity contract) include:

  • POST /worlds/:world_id/source-materials — operator ingest of a new source material (idempotent on (operator_id, correlation_id))
  • POST /worlds/:world_id/source-materials/:id/retire — operator retire of an existing source material (idempotent)
  • POST /worlds/:world_id/distillations — operator submit of a new distillation run (idempotent on (requested_by, correlation_id))
  • GET /worlds/:world_id/distillations/:id — run status (run-level + per-source)
  • GET /worlds/:world_id/distillations/:id/candidates — produced RuleCandidate rows with provenance trail

Auth middleware sets app.user_id via SET LOCAL per ADR-041 D4; RLS predicates compare against app.user_id per ADR-039. The apps/operations deployable consumes these routes via HTTP per ADR-047 and never imports spectral.worlds internals; the architecture validator at STRICT=True enforces this.