Skip to content
GitHub
Get Started

First-Customer Walkthrough

This is a narrated walkthrough — read it through; you don’t execute the steps yourself. It covers the full external-customer experience, from workspace creation to the first agent performance card.

It pairs with the operator walkthrough, where Maya publishes the tax-prep world model the customer now consumes.

The customer — call the company Ledger — ships an LLM-backed tax-prep assistant built on a small tree of agents. They are an early design partner; the world model Maya just published (v0.1.0 per the operator walkthrough) is the domain authority their assistant will be scanned against.

Ledger’s lead engineer — call her Priya — is doing the integration. Her goal: from “nothing in Spectral” to “first agent performance card reviewed and a change set approved.”

1 · Priya creates a workspace and selects the tax domain

Section titled “1 · Priya creates a workspace and selects the tax domain”

Priya creates her Ledger account, accepts a workspace invitation from a Spectral operator (today, workspaces are bootstrapped by Spectral operators via admin invitation), and logs in to the dashboard.

The onboarding flow asks her to pick the domain the workspace will be scanned against. The list has one entry — us-federal-individual-tax v0.1.0 — the world model Maya just published. The list grows as more world models publish.

  • Customer UI: Dashboard → Onboarding → Choose your domain
  • Spectral-agent involvement: None yet (the onboarding specialist engages in step 3)
  • State change: Workspace row created with account_id, workspace_id, and world_model_version_ref = {authority_ref for v0.1.0}
  • Codex detail: Access Control, World Model System — Version Attribution

2 · Priya instruments Ledger’s agent with OTEL

Section titled “2 · Priya instruments Ledger’s agent with OTEL”

The onboarding flow shows Priya the trace-ingestion endpoint and a domain-specific Quickstart: how to wire OpenTelemetry (OTEL — the open standard for distributed-trace data) into her tax-prep agent, which spans to emit, and what span attributes to include. Spectral is deliberately not in the hot path (per How Spectral Works); all integration is trace-level.

Three documented onboarding paths cover the common customer stacks:

  • Native OTLP (the default) — direct OTEL instrumentation, no framework wrapper
  • Anthropic-format — if Ledger is already using Anthropic’s Messages API spans
  • OpenAI-format — if Ledger is already using OpenAI Responses-API spans

The ingestion path itself accepts a wider matrix of trace shapes than the documented Quickstart; the apps/test-agents tax-prep backbone exercises the full coverage matrix (instrumentation framework × LLM-vendor span shape) per Test Agents — Pluggable OTEL emitter. A customer whose stack matches a documented path onboards self-serve; stacks outside the Quickstart land in operator-assisted onboarding.

  • Customer action: Add Spectral’s trace endpoint + API key to the app’s OTEL configuration
  • Customer UI: Dashboard → Onboarding → Trace Ingestion
  • State change: First ScanTrace rows arrive; WorkspaceOnboarding.first_trace_at is set
  • Codex detail: API Usage

3 · The Spectral Agent’s onboarding specialist suggests parameterization

Section titled “3 · The Spectral Agent’s onboarding specialist suggests parameterization”

Once traces start arriving, the Spectral Agent (customer-facing, per Agent Architecture) opens a proactive conversation. The onboarding specialist introduces itself, confirms traces are landing, and suggests a parameterization for the first scheduled scan:

Spectral Agent (onboarding specialist): “I’m seeing ~200 traces so far. Based on the spans you’re emitting, your agent covers filing-status classification and standard-deduction computation — both in the world model’s scope. I’d suggest starting with filing-status as your primary focus vector and treating standard-deduction as a secondary coverage area. You can expand later. Want me to set that as your first scan’s parameterization?”

The Spectral Agent reads the world model through an OHS Protocol — a callee-owned, in-process call surface that worlds publishes for synchronous reads (ADR-065 D3, with the selection rationale in ADR-070 Tier 2). The agent combines the world-model read with the observed traces, then surfaces coverage areas (which world-model rules this agent exercises) and focus vectors (which areas to weight more heavily in evaluation).

Priya approves the proposal with one edit: she wants MFS edge cases surfaced specifically. The parameterization saves to the workspace’s evaluation framework.

The first scheduled scan fires. Before phase 1 the scan orchestrator runs preflight — a quick pre-check that asks spectral.worlds whether an eval set can be produced and the curation service whether conformance samples are available, then writes a readiness observation to the scan row (mode = Full or synthetic_only). Preflight is the first synchronous call into worlds and does not block the scan when synthetic-only is the resulting mode.

The seven-phase scan pipeline (see Optimization Engine) then executes:

  1. Observe — second sync call into worlds: requests the eval set via the EvalSetProvider Tier 2 Protocol. Stage 1 uses world-model-grounded stimuli only; customer-directed exploratory-probe stimuli land at Stage 2/3. The Eval Generation page documents the full Customer Parameterization API including the probe surface.
  2. Calibrate — bootstrap CI calibration over the observed agent’s sample distribution.
  3. Diagnose — failure clustering against rule violations.
  4. Evaluate — two-authority evaluation (world-model authority + customer-framework authority).
  5. Optimize — candidate generation for the change set. Stage 1 customers see recommendations only; the apply step is Stage 2.
  6. Safety — conformity gate against world-model rules.
  7. Verdict — final verdict result emitted.

A scan.completed event fires. The platform’s change-set handler stamps the evaluation_authority_ref and prepares any resulting change set. The verdict engine then issues a verdict.issued event; the Spectral Agent picks it up and creates a proactive conversation with the verdict summary (per Agent Architecture).

  • Customer UI: Dashboard → Scans → (this scan) — lives as it executes
  • State change: Scan, EvalResult[], FailureCluster[], VerdictResult, ChangeSet rows created; scan.convergence.delta and scan.completed events emitted on the substrate per ADR-044 (the worlds-bound subset additionally per ADR-017)
  • Codex detail: Optimization Engine, Domain Model

5 · Change set + agent performance card + proactive conversation

Section titled “5 · Change set + agent performance card + proactive conversation”

The scan emits a change set — the bundle of proposed modifications (prompt edits, parameter shifts, rule-specific guidance) that Spectral’s optimization pass produced. Attached to the change set is the first agent performance card:

  • Identity (which agent, which world model version, which scan)
  • Coverage metrics (which rules were exercised, which were missed)
  • Scoring (verdict summary, composite-score breakdown)
  • Failure clusters (what went wrong, grouped by root cause)
  • Attribution — EvaluationAuthorityRef pointing to world model v0.1.0
  • A posture-acknowledgment footer noting the card is a preview artifact, not a compliance record

The Spectral Agent opens a proactive conversation:

Spectral Agent: “Your first scan finished. Verdict: 41 / 50 checks passing. The biggest cluster is MFS classification — 6 of your 9 failures were in the considered-unmarried path at Pub 501 §MFS. Want me to walk through the proposed change set?”

Priya reads the conversation, clicks through to the change-set detail, and reviews the recommendations. At Stage 1, recommendations are read-and-apply-yourself: the customer applies changes to their own templates. Managed template hosting and application is a Stage 2 capability.

  • Spectral-agent involvement: Scan-analyst specialist; proactive scan-completed handler
  • Customer UI: Dashboard → Change Sets → (this change set) → Review
  • State change: ChangeSet row with attached AgentPerformanceCard; Spectral Agent conversation created with initiated_by = agent, trigger_event_id = scan.completed
  • Codex detail: System Card, Optimization Engine — Change Sets

6 · Priya reviews, approves, change set accepted

Section titled “6 · Priya reviews, approves, change set accepted”

Priya walks through each proposed modification:

  • Approve — the modification transitions proposed → accepted
  • Rejectproposed → rejected with a reason
  • Ask the agent — she can reply in the chat and ask the scan-analyst specialist to explain the rationale; the agent reads the scan’s failure clusters and the cited rules to answer

At Stage 1, approval is a record that Ledger will apply the change themselves on their side. Spectral does not touch the customer’s templates — that is Stage 2.

When Priya finishes reviewing, she marks the change set accepted as a whole. The workspace now has its first full scan-to-acceptance cycle on record.

  • Customer UI: Dashboard → Change Sets → Review → per-modification actions + final Accept
  • State change: ChangeSet.status = accepted; each modification row carries its proposed → accepted|rejected outcome
  • Codex detail: Optimization Engine — Change Set Lifecycle

7 · Agent performance card downloadable as PDF

Section titled “7 · Agent performance card downloadable as PDF”

With the change set accepted, the agent performance card is finalized. Priya downloads the PDF from the change-set detail view. The PDF carries the posture-acknowledgment footer on every page.

The card is the artifact Ledger shares internally — engineering leadership, product — and, when regulated use becomes relevant, with downstream reviewers. Today it’s an evidence bundle, not a compliance artifact; the footer makes that distinction explicit.

Distribution is via ad-hoc PDF download. Richer distribution mechanisms — signed links, scheduled email, machine-readable feeds — are described in system card distribution channels as customer-demand contingencies, not roadmap commitments.

  • Customer UI: Dashboard → Change Sets → (this one) → Download Agent Performance Card
  • State change: Download event logged; no mutation to the card itself (the card is authoritative for the change set it’s attached to)
  • Codex detail: System Card

Where this walkthrough meets the operator walkthrough

Section titled “Where this walkthrough meets the operator walkthrough”

The two walkthroughs intersect at three points:

MomentCustomer (this walkthrough)Operator (operator walkthrough)
World model is referencedStep 1: workspace scoped to v0.1.0Published in step 9 of the operator walkthrough
A rule is cited in a failure clusterStep 5: agent cites Pub 501 §MFSStep 6 of the operator walkthrough: Maya enshrined those rules
A question the Spectral Agent can’t answerStep 6: if Priya asks a question whose answer lives inside the operator reasoning — e.g., “why does this world model not cover dependents?” — the framework-advisor specialist explains the deferral and points at Codex; it does not forward the question to Maya. Operator attention is reached through platform observability signals, not customer-to-operator routing.Ongoing — operators monitor workspace signals

The Spectral Agent never delegates to a human operator synchronously. If a customer asks something the agent cannot answer, the answer is either “the world model doesn’t cover that yet” (with a cited reason) or a request for clarification. Operators see trends through observability, not through ad-hoc escalation.

If a customer needs a human, the path is asynchronous and out-of-band. Today that is founder-on-call via a dedicated channel for design partners; a support-portal surface lands as customer count warrants. The customer’s recourse, when the agent cannot answer, is the explicit coverage-gap signal — which routes into the Evolution Loop’s discovery mechanism per Eval Generation — plus that out-of-band channel. Not a synchronous human handoff inside the chat.

By the end of this walkthrough, the following customer-facing surfaces have been operated end-to-end:

SurfaceExercised by
Workspace creation + domain selectionStep 1
OTEL trace ingestion (three documented Quickstart paths)Step 2
Spectral Agent onboarding specialistStep 3
Evaluation-framework parameterization over world-model scopeStep 3
Full seven-phase scan pipeline with world-model-grounded stimuliStep 4
Event-driven proactive conversationStep 5
Change-set lifecycle with per-modification reviewStep 6
Agent performance card generation + PDF downloadStep 7
Posture-acknowledgment footer on every rendered cardSteps 5, 7
Attribution envelope spanning worlds and platform (EvaluationAuthorityRef)Step 5

Stage 1 is a complete onboarding-and-evidence cycle on its own — it isn’t waiting on Stage 2 or 3 to be useful. But it’s worth being explicit about the surfaces this walkthrough doesn’t exercise:

  • Managed template hosting. Spectral does not hold Ledger’s prompt templates at Stage 1.
  • Automatic change-set application. Every change is customer-applied.
  • Customer-directed exploratory-probe stimuli. Stage 1 uses world-model-grounded stimuli only.

These all live on the product roadmap. Stage 2 / 3 scope deltas live at future-considerations — autonomy modes (second alpha wave).