Skip to content
GitHub
Operations

Source Material & Distillation

Operator surface for the distillation pipeline: source material ingestion (storage + provenance anchoring), LLM-guided candidate distillation against source documents, and the distillation-request workflow. Distilled candidates flow through both Evolution Loop gates (implementation-readiness + conformity) and then into the enshrinement queue. Nothing becomes enshrined without human sign-off.


  • Source material ingestion — operators upload, parse, and register Authoritative-tier source documents (statute, regulatory guidance, scholarly publication) into the SourceMaterial inventory; provenance metadata anchored on every row per ADR-065
  • Provenance anchoring — every distilled candidate inherits a provenance trail back to the SourceMaterial rows that contributed to its extraction; the chain is end-to-end traceable (per the audit trail below)
  • LLM-guided distillation — operator-parameterized extraction (focus area, problem space, expected tier) runs against selected source materials; the pipeline indexes source documents via the EmbeddingProvider protocol per ADR-038 for similarity-based chunk retrieval, then invokes the LLM on retrieved chunks to draft RuleCandidate rows
  • Distillation-request workflow — operators submit a run, monitor per-run and per-source status, and review output candidates
  • Two-gate gating — distilled candidates pass through both the implementation-readiness gate (the World Agent generates predicate code and verifies it matches the natural-language form) and the conformity gate (authoritative-source verification + no-contradiction-with-existing-rules) before reaching the enshrinement queue per Evolution Loop
  • Distillation is a proposal path, not a commit path. Distilled candidates are staged for operator review and pass through the conformity gate; they do not auto-enshrine.
  • Source materials carry durable provenance. Candidates distilled from a source inherit that source’s provenance chain via the distillation_run_outputs junction (per domain-model — DistillationRunOutput).
  • SourceMaterial is append-only. Metadata corrections produce a new row plus retire of the prior; URL re-parsing produces a new ingestion run, not in-place mutation. Retirement is a state transition (no row deletion). The same retire-as-state-transition rule that authoring applies to world models, rules, and rule relationships extends uniformly to source materials.
  • Distillation runs have no cancel/abort. Operators wait for a terminal state (completed or failed). The state machine is designed for clean extension — cancellation is added later as an additive transition out of in_progress, not a redesign of the workflow.
  • Concurrent operator actions on the same source material or distillation run are serialized at the row level — one wins; the other receives a clean conflict response. Mirrors the row-level serialization on ApprovalDecision and the world-model authoring surfaces.
  • Every distillation-mutate API is idempotent on (operator_id, correlation_id). Source-material ingest, source-material retire, and distillation-run submit all carry a correlation_id recorded on the corresponding world_authoring_audit row (and on distillation_run.correlation_id for run submissions); a re-issued request with the same correlation ID returns the original outcome.
  • Distillation failures are operator-visible. Silent failure is not allowed — per-source status surfaces partial output and explicit errors; run-level status reaches a terminal failed state with a coded failure_reason_code retained on STRIP_PAYLOAD plus free-text detail stripped on disposal.

Every operator authoring action on this surface appends a row to the WorldAuthoringAudit record family (per domain-model — WorldAuthoringAudit). The discriminators introduced for distillation are:

  • target_type = source_material, action = ingested — operator ingests a new source material
  • target_type = source_material, action = retired — operator retires a source from the inventory
  • target_type = distillation_run, action = requested — operator submits a new distillation run

The audit row is appended alongside the entity-state mutation in the same transaction; it is never appended outside the originating use-case transaction. The (operator_id, target_type, target_id, action, occurred_at, correlation_id) row remains queryable for the full active window per data-retention — operator-action records.

WorldAuthoringAudit is not agent memory (per the records-vs-memory framing). It is operator-scoped (app.user_id) with no domain RLS, distinct from ApprovalDecision (same identity domain, different action surface — enshrinement-gate decision vs authoring).

The DistillationRun workflow record itself lives alongside the audit row — the audit row records the operator action (“operator X requested distillation Y at time Z”); the DistillationRun row carries the workflow lifecycle state (status, started_at, completed_at, failure_reason_code) and is the system of record for run progress.

This workflow lives in the ruleset workspace’s Sources tab, where the source library, the “pull candidate rules from a source” action, and the in-progress extraction sit together. Candidate triage is the separate Review tab.

The Sources tab folds together:

  1. Source library — list / inspect / ingest / retire source materials; shows provenance envelope (publication, section, revision)
  2. Distillation request — operator-parameterized form (world model, source selection, focus area, problem space, expected tier) submitting a new DistillationRun
  3. In-progress extraction — a live view of run lifecycle and per-source extraction state (live progress, step timeline) inline below the library; surfaces partial output as it lands; explicit error display on per-source failure

Produced RuleCandidate rows surface in the Review tab — the two-pane review queue with the full provenance trail (run → source materials → candidate) in the evidence bundle, where the operator advances candidates through the conformity gate or retires them in place.

Every UI surface has a corresponding API endpoint per the API/UI parity rule — the UI is a thin client over the same routes.

The HTTP routes the UI consumes as an API client (per the parity contract) include:

  • POST /worlds/:world_id/source-materials — operator ingest of a new source material (idempotent on (operator_id, correlation_id))
  • POST /worlds/:world_id/source-materials/:id/retire — operator retire of an existing source material (idempotent)
  • POST /worlds/:world_id/distillations — operator submit of a new distillation run (idempotent on (requested_by, correlation_id))
  • GET /worlds/:world_id/distillations/:id — run status (run-level + per-source)
  • GET /worlds/:world_id/distillations/:id/candidates — produced RuleCandidate rows with provenance trail

Auth middleware sets app.user_id via SET LOCAL per ADR-041 D4; RLS predicates compare against app.user_id per ADR-039. The apps/operations deployable consumes these routes via HTTP per ADR-047 and never imports spectral.worlds internals; the architecture validator at STRICT=True enforces this.