How Spectral Works

Autonomous AI engineering that compounds. Spectral measures, optimizes, and improves complex AI systems against domain standards your team didn’t have to author.

Spectral is built around two pillars that operate as separate authorities.

The customer-facing platform is where customers run their agent-improvement work: workspaces, scans, recommendations, change sets, and the Spectral Agent that explains scan verdicts and walks through optimization recommendations.

The internal world-model system is the standard the platform measures against. Spectral-built, internally curated, applied identically across every customer in the same domain.

Two control planes operate the two pillars. Customers drive the platform from the dashboard. Spectral staff drive the Operations app that maintains the world-model system.

Why two pillars

The world-model system answers a structural problem in agent optimization: what is the rubric the agent is being optimized against?

If the customer authors the rubric — the assumption most evaluation tools make — optimization rewards what the customer already thought to test for. The rubric drifts toward the system’s existing behavior. Numbers rise on the eval; the agent doesn’t actually get better. Paper improvement: the gap between scoring well and being better.

Spectral takes a different approach. The rubric is a separate, Spectral-built artifact: a world model that encodes the behavioral expectations of a domain, grounds itself in authoritative sources (statute, regulatory guidance, scholarly publication), is curated through human-gated evolution, and applies identically across every customer in the domain. The platform’s eval sets derive from a world model, not from a customer’s private scenario list. The customer steers which aspects of the domain to evaluate; the criteria themselves come from a source the customer did not author.

That is the differentiator. Other tools let teams build their own evaluations — workbenches for engineering eval suites. Spectral builds the domain standard, derives evals from it, and grounds every claim it makes about a customer’s system in an external authority.

Sidecar posture

A sidecar service runs alongside the system it serves, observing without intercepting. Spectral never sits in the request path.

All observation flows through OpenTelemetry traces and spans — OTEL, the open standard for distributed-trace data. All output is delivered as recommendations or managed configuration changes. The design holds two properties simultaneously: low risk (Spectral can’t break production because it isn’t in the request path) and low friction (customers instrument with OTEL, which many already have, and start getting value immediately).

Three integration depths

Spectral integrates with a customer’s system at one of three depths. Each depth builds on the previous, earning the right to do more through demonstrated value.

Spectral ingests OTEL traces, maps them to the customer’s agents, and understands the system structure: which agents exist, how they depend on each other, what prompts and parameters they use.

From there it evaluates the system against its evaluation framework, diagnoses failure patterns, and delivers actionable recommendations — not “change your prompt” but “your context assembly in Agent A is including full document bodies when only metadata is needed, causing truncation that affects Agent B downstream.”

The customer reads the recommendations and applies changes themselves. Spectral proves its value before asking for more access.

Stage 2 — Manage

Spectral hosts and manages the customer’s prompt templates directly. Templates sync from source or import; hyperparameters track alongside them.

Recommendations apply directly to managed templates. The customer reviews proposed change sets and controls what goes live through the lifecycle: proposed → accepted → superseded (or validated / rejected).

Stage 3 — Automate within governance

The deepest integration. Spectral updates managed templates dynamically, within governance boundaries the customer defines: version management for full lifecycle control, approval gates that require human review for specific mutation types, always-available rollback, gradual rollout, kill switch, and audit trail. The customer defines the boundaries; Spectral operates within them.

Runtime model. Managed configurations deliver via the REST surface defined in ADR-006. The customer polls or fetches on deploy — simple, stateless, customer-controlled cadence. Specific endpoint shape lands with the managed-templates epic.

Integration depth and autonomy mode are orthogonal

Two axes shape what Spectral does at any given workspace.

Integration depth — the three stages above. How deeply Spectral integrates into the customer workflow.
Autonomy mode — a workspace-level setting that controls what happens to accepted change sets: observe-only, recommend, manual, bounded auto, plus a kill switch.

A customer can adjust either axis without touching the other. See Optimization Engine — Autonomy mode vs integration tier for the side-by-side mapping.

Typical pairings: Stage 1 usually runs observe-only or recommend; Stage 2 runs recommend or manual; Stage 3 runs bounded auto.

Data handling

At Stage 1, Spectral sees only telemetry data: traces, spans, and the attributes customers choose to include. At Stage 2 and 3 it also manages prompt templates — IP-sensitive, but not end-user PII. The customer’s runtime data stays in their system unless they include it in the traces they send. See Security Boundaries and Access Control for the encryption and access posture.

How the two pillars connect

The pillars communicate through three explicit seams. None lets the platform reach into the world-model system’s authoring data; none lets the world-model system reach into a customer workspace.

Published world-model versions. When the world-model system publishes a new version, it mints an opaque authority reference. The platform’s scan pipeline retrieves the reference at scan time but never reads rule content or waits on the world-model system synchronously. The customer’s evaluation always traces back to a published authority the customer did not author.
World signal events. When a customer scan surfaces a coverage gap or a probe lands in unknown territory, the platform emits a world signal event back to the world-model system. The operations team aggregates these signals as evidence for evolution-loop proposals — the customer-reality → operator-authority feedback path that compounds knowledge across the customer base. See Evolution Loop.
System cards. Both pillars generate cards. The world-model system publishes one per version; the platform generates one per agent system per scan. Both are external artifacts of conformance, methodology-disclosed, grounded in the authority reference rather than in customer self-evaluation. See System Card.

Architecture covers the codebase topology (spectral.platform / spectral.worlds / spectral.core) that backs this. Contract Surfaces is the canonical reference for how the two pillars exchange data without coupling — producer-owned typed event payloads and callee-owned protocols rather than direct imports across contexts.

The customer control plane

Customers drive Spectral from the dashboard at app.runspectral.com. The control plane covers four surfaces.

The Spectral Agent. Conversational interface to scan results, change-set explanations, autonomy adjustments, and workspace configuration. Acts on the authenticated customer’s behalf within their workspace and permissions.
Workspace governance. Membership, retention, agent configuration, and autonomy mode. Workspace-scoped, RLS-isolated.
Approval seam. What runs autonomously and what the customer reviews — controlled by the autonomy mode and integration depth in tandem, both customer-adjustable.
Autonomy progression. Observe-only and manual today; recommend, bounded auto, and kill-switch when the customer’s autonomy posture earns them.

The first-customer walkthrough takes a customer end-to-end through their first cycle on this surface.

The operational control plane

The world-model system is internal to Spectral. Authoring, distillation, enshrinement, and publication all happen on the Operations app — the operational control plane. Operators work across four primary workflows: authoring, source-material ingestion and distillation, enshrinement (every promotion is a governed human gate), and publication with release notes. The Operations Agent drafts and proposes; it cannot enshrine.

The audience asymmetry is load-bearing. The customer-facing platform is RLS-isolated per workspace and product-shaped. The operational control plane is single-tenant and authoring-shaped (per ADR-047), running on Spectral-staff identities under an operations scope that never crosses into customer sessions. That separation is what lets the world-model system stay the authority customers cite.

The operator walkthrough narrates the full bootstrap of the first world model end-to-end on this surface. The Operations subtree in System Design covers each operator workflow in detail.

What’s next

Operator walkthrough — Maya bootstraps the first world model from nothing through published v0.1.0 on the operational control plane.
First-customer walkthrough — a customer’s first end-to-end cycle on the Spectral platform.

For the technical architecture in detail, see Architecture, the World Model System subtree, and the Optimization Engine.

Previous
Welcome Next
Operator Walkthrough