Skip to content
GitHub
Get Started

Scan Walkthrough

This is the engineering counterpart to the customer-narrative first-customer walkthrough — same scenario, same customer (Priya at Ledger), engineering depth in place of product framing.

For the underlying mechanics each phase exercises, the seven-phase scan pipeline is documented across Optimization Engine, Event System, Memory System, and the agent pages. This page shows how all seven compose against a single scan.

The scan worker picks up Priya’s workspace from the schedule, instantiates a Scan row in apps/api’s database with state=scheduled, and dispatches an AgentTask for the scan orchestrator. The orchestrator runs in apps/workers (per ADR-060) and is the framework-layer composition root for the seven phases.

Before phase 1, the scan orchestrator runs preflight (per Optimization Engine — Scan preflight). Preflight is not a phase in the pipeline; it is an orchestrator pre-check that writes a ScanReadinessObservation to the Scan row.

  • Asks spectral.worlds — can an eval set be produced for us-federal-individual-tax v0.1.0 with Priya’s evaluation framework? Worlds answers via the callee-owned EvalSetProvider Tier 2 Protocol at spectral.worlds.contracts.protocols.eval_set_provider (per ADR-065 D3).
  • Asks the curation service — are conformance samples available? Priya doesn’t have curated samples yet, so the answer is no.
  • Writes the readiness observationmode = synthetic_only, curation_samples_count = 0, evalset_available = true. Preflight does not block the scan; synthetic-only is a valid mode. (See Configuration matrix.)

The scan transitions to state=running.

The Observe phase consumes the readiness observation and runs Priya’s customer agent against the eval-set stimuli.

  • Synchronous call into worlds. Observe synchronously requests the eval set from spectral.worlds via the EvalSetProvider Protocol. The bridge tool that composes the Protocol into the platform runtime lives in apps/workers/tools/ (per ADR-065 D5). This is the single platform → worlds call-and-wait path of the entire scan.
  • Output. A list of ScanTrace records, one per stimulus, each carrying a provenance field that names the stimulus source (rule-grounded eval-set sample vs. exploratory probe).
  • Partition. Working set vs. holdout via the eval set’s two-layer holdout structure. Working set drives optimization; holdout validates that improvements generalize.
  • Stays inside spectral.platform for the rest of the pipeline. The synthetic-track dependency on worlds is satisfied; from here the scan is platform-internal until it emits signal events at the end.

Pure platform-side scoring calibration. The phase reads the trace distribution and adjusts bootstrap CI parameters and per-rubric-dimension scoring thresholds for the workspace. No hops into worlds; no events. Output is a calibration record attached to the Scan row.

The Diagnose phase clusters failures into FailureCluster records (spectral.platform.domain.clustering).

  • Quarantines infrastructure failures and parse failures before clustering — the clusterer only sees quality EvalResult rows.
  • The clustering prompt receives only rubric-scorer explanations and scores. World-model authority outputs do not cross the clustering prompt boundary; two-authority opacity is enforced at the input shape, not by post-hoc filtering (per ADR-014).

Event emitted: every detected cluster fires a platform.failure_cluster.detected event with the producer-typed payload at spectral.platform.contracts.events.failure_cluster_detected (per ADR-065 D2). The event has two consumer paths off the same wire shape:

  • The Operations Agent upserts platform.rule_candidates_pending on every detection so operators see the cluster surface in their queue.
  • The World Agent in spectral.worlds applies a consumer-side promotion-threshold filter (frequency_pct >= 10, effect_size >= 15, actionable = true, computed over the event stream) and decides whether the pattern becomes a rule candidate, a rule revision, or noise (per Evolution Loop) only on the higher bar.

For Priya’s scan, six of nine failures cluster into a “MFS classification” group at Pub 501 §MFS. The cluster crosses the detection threshold; one platform.failure_cluster.detected event fires.

The Evaluate phase runs two scoring authorities in parallel on every trace (Optimization Engine — Two-authority evaluation):

  • World-model scorer. Answers “Did the agent produce the response the rule says it should?” Inputs (ground truth, scoring dimensions, stimulus_weight) are packaged into each eval-set sample by spectral.worlds; rule internals never cross into platform.
  • Rubric scorer. LLM-as-judge against Priya’s evaluation framework rubric. Answers “How does this output score on the rubric’s dimensions?” — and produces the natural-language explanations the Diagnose phase reasoned over.

Outputs combine into a CompositeScore (world_model_score, rubric_score, blended_delta, convergence_delta, synthetic_scores, conformance_scores). Because this scan is synthetic_only, conformance_scores is null and convergence_delta carries an absence marker.

Event emitted (notification flow, into worlds): Evaluate emits one rubric.divergence event per scan regardless of conformance-sample availability (typed payload at spectral.platform.contracts.events.rubric_divergence). The World Agent aggregates divergence across workspaces as a world-model-evolution signal.

Stage 1 customers see recommendations only; “managed apply” is a Stage 2 capability (How Spectral Works — three integration depths).

Optimize generates change-set candidates — bundled prompt edits, hyperparameter shifts, and rule-specific guidance derived from the failure clusters and the rubric-scorer explanations. A tournament across candidates (per ADR-020) ranks them on CompositeScore delta vs. the baseline. The winner becomes the proposed ChangeSet for the scan.

No interaction with worlds. No events emitted at this phase.

Safety runs the conformity check: does the proposed change set pass the world model’s structural invariants? This is the same conformity discipline that gates rule-candidate enshrinement on the worlds side, applied here as a forward check against change-set modifications.

For Priya’s scan, all proposed modifications conform. Safety passes. No events.

The Verdict engine runs eight gates (delta threshold, agent regression, dimension regression, holdout generalization gap, bootstrap CI, output similarity, Pareto cost/latency, sanity downgrade) plus a convergence gate. The eight gates are pure functions in spectral.platform.domain.verdict (no infrastructure imports).

Holdout protocol. The holdout-generalization-gap gate consumes the synthetic eval set’s holdout partition exclusively. Conformance samples (when present) are convergence anchors, not holdout inputs.

For Priya’s scan: 41 of 50 checks pass; the holdout gap is within tolerance; the delta-threshold gate passes; the Pareto gate is neutral. Final outcome: go_nogo = caution — some MFS-classification regressions in the rubric-dimension delta. (Autonomy modes never auto-accept caution regardless of configuration.)

Events emitted (intra-platform, then into worlds):

  • verdict.issued — Verdict engine emits to the platform-internal substrate; the verdict-issued handler consumes it and the Spectral Agent kicks off a proactive conversation.
  • scan.convergence.delta — emitted per scan with explicit absence-marker semantics. For Priya’s scan, the delta carries an explicit absence marker (no conformance samples).
  • scan.completed — emitted to the substrate so worlds consumers can react; the scan-completed handler in spectral.platform.application.changeset consumes it, stamps the EvaluationAuthorityRef, and finalizes the change-set row.

The cascade illustrates the discipline between worlds and platform at runtime:

  • One synchronous call between contexts — Observe → EvalSetProvider. Everything else is event-driven.
  • Four events emitted by the platform pipeline. Two flow into worlds (platform.failure_cluster.detected, rubric.divergence) and seed the worlds-side Evolution Loop. Two stay platform-internal (verdict.issued, scan.convergence.delta) and drive the Spectral Agent + change-set-lifecycle handlers.
  • Zero SQL grants between contexts at any layer. Every hop is either a Protocol call or an event with a producer-typed payload + consumer-side ACL.

After the scan completes, Priya’s workspace has these new rows:

  • Scanstate=completed, composite_score, verdict.go_nogo=caution
  • ScanTrace[] — one per stimulus, with provenance attribution
  • EvalResult[] — two per ScanTrace (one per scoring authority)
  • FailureCluster[] — including the MFS-classification cluster
  • VerdictResult — the eight-gate outcome
  • RubricDivergenceRecord — the per-scan rubric-vs-worldmodel divergence delta
  • ChangeSet — the proposed bundle, with attached AgentPerformanceCard and the EvaluationAuthorityRef stamped by the scan-completed handler
  • A Conversation row created by the Spectral Agent’s verdict-issued handler with initiated_by = agent and trigger_event_id = verdict.issued

On the worlds side, the Evolution Loop has two new inputs queued (platform.failure_cluster.detected and rubric.divergence). Single-workspace divergence remains a scan observation and does not initiate rule revision; cross-workspace aggregation across many customers is what drives world-model evolution.

The Spectral Agent’s proactive conversation lands in her dashboard:

Spectral Agent: “Your first scan finished. Verdict: 41 / 50 checks passing. The biggest cluster is MFS classification — 6 of your 9 failures were in the considered-unmarried path at Pub 501 §MFS. Want me to walk through the proposed change set?”

The customer-side narrative continues in the first-customer walkthrough.

  • Optimization Engine — full per-phase mechanics, the composite-score schema, the verdict-gate inventory, the holdout protocol
  • Event System — every typed event with payload shape and consumer attribution
  • Contract Surfaces — the doctrine the cascade above enforces
  • Agent Architecture — the Spectral Agent’s verdict-issued handler and the proactive conversation lifecycle
  • Evolution Loop — what the World Agent does with the events platform emits into worlds