Scan Walkthrough
This is the engineering counterpart to the customer-narrative first-customer walkthrough — same scenario, same customer (Priya at Ledger), engineering depth in place of product framing.
For the underlying mechanics each phase exercises, the seven-phase scan pipeline is documented across Optimization Engine, Event System, Memory System, and the agent pages. This page shows how all seven compose against a single scan.
Trigger
Section titled “Trigger”The scan worker picks up Priya’s workspace from the schedule, instantiates a Scan row in apps/api’s database with state=scheduled, and dispatches an AgentTask for the scan orchestrator. The orchestrator runs in apps/workers (per ADR-060) and is the framework-layer composition root for the seven phases.
Preflight
Section titled “Preflight”Before phase 1, the scan orchestrator runs preflight (per Optimization Engine — Scan preflight). Preflight is not a phase in the pipeline; it is an orchestrator pre-check that writes a ScanReadinessObservation to the Scan row.
- Asks
spectral.worlds— can an eval set be produced forus-federal-individual-tax v0.1.0with Priya’s evaluation framework? Worlds answers via the callee-ownedEvalSetProviderTier 2 Protocol atspectral.worlds.contracts.protocols.eval_set_provider(per ADR-065 D3). - Asks the curation service — are conformance samples available? Priya doesn’t have curated samples yet, so the answer is no.
- Writes the readiness observation —
mode = synthetic_only,curation_samples_count = 0,evalset_available = true. Preflight does not block the scan; synthetic-only is a valid mode. (See Configuration matrix.)
The scan transitions to state=running.
Phase 1 — Observe
Section titled “Phase 1 — Observe”The Observe phase consumes the readiness observation and runs Priya’s customer agent against the eval-set stimuli.
- Synchronous call into
worlds. Observe synchronously requests the eval set fromspectral.worldsvia theEvalSetProviderProtocol. The bridge tool that composes the Protocol into the platform runtime lives inapps/workers/tools/(per ADR-065 D5). This is the single platform → worlds call-and-wait path of the entire scan. - Output. A list of
ScanTracerecords, one per stimulus, each carrying aprovenancefield that names the stimulus source (rule-grounded eval-set sample vs. exploratory probe). - Partition. Working set vs. holdout via the eval set’s two-layer holdout structure. Working set drives optimization; holdout validates that improvements generalize.
- Stays inside
spectral.platformfor the rest of the pipeline. The synthetic-track dependency on worlds is satisfied; from here the scan is platform-internal until it emits signal events at the end.
Phase 2 — Calibrate
Section titled “Phase 2 — Calibrate”Pure platform-side scoring calibration. The phase reads the trace distribution and adjusts bootstrap CI parameters and per-rubric-dimension scoring thresholds for the workspace. No hops into worlds; no events. Output is a calibration record attached to the Scan row.
Phase 3 — Diagnose
Section titled “Phase 3 — Diagnose”The Diagnose phase clusters failures into FailureCluster records (spectral.platform.domain.clustering).
- Quarantines infrastructure failures and parse failures before clustering — the clusterer only sees quality
EvalResultrows. - The clustering prompt receives only rubric-scorer explanations and scores. World-model authority outputs do not cross the clustering prompt boundary; two-authority opacity is enforced at the input shape, not by post-hoc filtering (per ADR-014).
Event emitted: every detected cluster fires a platform.failure_cluster.detected event with the producer-typed payload at spectral.platform.contracts.events.failure_cluster_detected (per ADR-065 D2). The event has two consumer paths off the same wire shape:
- The Operations Agent upserts
platform.rule_candidates_pendingon every detection so operators see the cluster surface in their queue. - The World Agent in
spectral.worldsapplies a consumer-side promotion-threshold filter (frequency_pct >= 10,effect_size >= 15,actionable = true, computed over the event stream) and decides whether the pattern becomes a rule candidate, a rule revision, or noise (per Evolution Loop) only on the higher bar.
For Priya’s scan, six of nine failures cluster into a “MFS classification” group at Pub 501 §MFS. The cluster crosses the detection threshold; one platform.failure_cluster.detected event fires.
Phase 4 — Evaluate
Section titled “Phase 4 — Evaluate”The Evaluate phase runs two scoring authorities in parallel on every trace (Optimization Engine — Two-authority evaluation):
- World-model scorer. Answers “Did the agent produce the response the rule says it should?” Inputs (ground truth, scoring dimensions,
stimulus_weight) are packaged into each eval-set sample byspectral.worlds; rule internals never cross into platform. - Rubric scorer. LLM-as-judge against Priya’s evaluation framework rubric. Answers “How does this output score on the rubric’s dimensions?” — and produces the natural-language explanations the Diagnose phase reasoned over.
Outputs combine into a CompositeScore (world_model_score, rubric_score, blended_delta, convergence_delta, synthetic_scores, conformance_scores). Because this scan is synthetic_only, conformance_scores is null and convergence_delta carries an absence marker.
Event emitted (notification flow, into worlds): Evaluate emits one rubric.divergence event per scan regardless of conformance-sample availability (typed payload at spectral.platform.contracts.events.rubric_divergence). The World Agent aggregates divergence across workspaces as a world-model-evolution signal.
Phase 5 — Optimize
Section titled “Phase 5 — Optimize”Stage 1 customers see recommendations only; “managed apply” is a Stage 2 capability (How Spectral Works — three integration depths).
Optimize generates change-set candidates — bundled prompt edits, hyperparameter shifts, and rule-specific guidance derived from the failure clusters and the rubric-scorer explanations. A tournament across candidates (per ADR-020) ranks them on CompositeScore delta vs. the baseline. The winner becomes the proposed ChangeSet for the scan.
No interaction with worlds. No events emitted at this phase.
Phase 6 — Safety
Section titled “Phase 6 — Safety”Safety runs the conformity check: does the proposed change set pass the world model’s structural invariants? This is the same conformity discipline that gates rule-candidate enshrinement on the worlds side, applied here as a forward check against change-set modifications.
For Priya’s scan, all proposed modifications conform. Safety passes. No events.
Phase 7 — Verdict
Section titled “Phase 7 — Verdict”The Verdict engine runs eight gates (delta threshold, agent regression, dimension regression, holdout generalization gap, bootstrap CI, output similarity, Pareto cost/latency, sanity downgrade) plus a convergence gate. The eight gates are pure functions in spectral.platform.domain.verdict (no infrastructure imports).
Holdout protocol. The holdout-generalization-gap gate consumes the synthetic eval set’s holdout partition exclusively. Conformance samples (when present) are convergence anchors, not holdout inputs.
For Priya’s scan: 41 of 50 checks pass; the holdout gap is within tolerance; the delta-threshold gate passes; the Pareto gate is neutral. Final outcome: go_nogo = caution — some MFS-classification regressions in the rubric-dimension delta. (Autonomy modes never auto-accept caution regardless of configuration.)
Events emitted (intra-platform, then into worlds):
verdict.issued— Verdict engine emits to the platform-internal substrate; the verdict-issued handler consumes it and the Spectral Agent kicks off a proactive conversation.scan.convergence.delta— emitted per scan with explicit absence-marker semantics. For Priya’s scan, the delta carries an explicit absence marker (no conformance samples).scan.completed— emitted to the substrate so worlds consumers can react; the scan-completed handler inspectral.platform.application.changesetconsumes it, stamps theEvaluationAuthorityRef, and finalizes the change-set row.
Cascade between contexts
Section titled “Cascade between contexts”The cascade illustrates the discipline between worlds and platform at runtime:
- One synchronous call between contexts — Observe →
EvalSetProvider. Everything else is event-driven. - Four events emitted by the platform pipeline. Two flow into worlds (
platform.failure_cluster.detected,rubric.divergence) and seed the worlds-side Evolution Loop. Two stay platform-internal (verdict.issued,scan.convergence.delta) and drive the Spectral Agent + change-set-lifecycle handlers. - Zero SQL grants between contexts at any layer. Every hop is either a Protocol call or an event with a producer-typed payload + consumer-side ACL.
What landed in the database
Section titled “What landed in the database”After the scan completes, Priya’s workspace has these new rows:
Scan—state=completed,composite_score,verdict.go_nogo=cautionScanTrace[]— one per stimulus, withprovenanceattributionEvalResult[]— two perScanTrace(one per scoring authority)FailureCluster[]— including the MFS-classification clusterVerdictResult— the eight-gate outcomeRubricDivergenceRecord— the per-scan rubric-vs-worldmodel divergence deltaChangeSet— the proposed bundle, with attachedAgentPerformanceCardand theEvaluationAuthorityRefstamped by the scan-completed handler- A
Conversationrow created by the Spectral Agent’s verdict-issued handler withinitiated_by = agentandtrigger_event_id = verdict.issued
On the worlds side, the Evolution Loop has two new inputs queued (platform.failure_cluster.detected and rubric.divergence). Single-workspace divergence remains a scan observation and does not initiate rule revision; cross-workspace aggregation across many customers is what drives world-model evolution.
What Priya sees
Section titled “What Priya sees”The Spectral Agent’s proactive conversation lands in her dashboard:
Spectral Agent: “Your first scan finished. Verdict: 41 / 50 checks passing. The biggest cluster is MFS classification — 6 of your 9 failures were in the
considered-unmarriedpath at Pub 501 §MFS. Want me to walk through the proposed change set?”
The customer-side narrative continues in the first-customer walkthrough.
See also
Section titled “See also”- Optimization Engine — full per-phase mechanics, the composite-score schema, the verdict-gate inventory, the holdout protocol
- Event System — every typed event with payload shape and consumer attribution
- Contract Surfaces — the doctrine the cascade above enforces
- Agent Architecture — the Spectral Agent’s verdict-issued handler and the proactive conversation lifecycle
- Evolution Loop — what the World Agent does with the events platform emits into worlds