Reference

Primitives

This page is the lookup surface for Spectral’s core primitives — the building blocks the rest of the system is built on. Each section is a per-primitive reference: definition, structure, lifecycle, and the relationships that constrain it.

For the conceptual narrative — how the primitives compose into the platform, what each one is for, why this shape and not another — see the System Design overview.

Workspace

The top-level container. A Workspace represents a customer’s multi-agent system as a single optimizable entity.

Each Workspace is discrete — it has its own members, permissions, change sets, and optimization history. See Access Control for the full role and isolation model.

A Workspace contains one or more Workspace Agents and an Evaluation Framework instance. Together, these define what the customer’s system does and how Spectral measures it.

Workspace Agent

An individual agent within a workspace — one member of the customer’s agent team. Workspace Agents are identified through one of two paths:

Declared: The customer defines the agent team structure during workspace setup, naming each agent and its role
Inferred: Spectral maps incoming OTEL traces to agent identities within the workspace based on span attributes (agent.name, agent.type) and behavioral patterns

Workspace Agents have defined dependencies between them. They execute in topological order based on these dependencies, and downstream agents receive upstream agent outputs as context.

Each Workspace Agent has two core components that define its behavior: a Prompt Template and Hyperparameters.

Prompt Template

The prompt configuration for a single agent. Not a flat text blob — Spectral understands prompt templates structurally:

Base prompt text — the core instruction
Context selection strategy — how the agent assembles context (retrieval settings, what gets included, truncation behavior)
Chain-of-thought configuration — none, brief, or full reasoning injection
Few-shot examples — grounding examples that demonstrate expected behavior
Upstream context injection — how outputs from dependency agents are incorporated

This structural understanding is what makes Spectral’s recommendations actionable. “Your context assembly is including full document bodies when only metadata is needed” is useful. “Change your prompt” is not.

Template Inference: The ideal onboarding path — Spectral pulls traces from OTEL spans, identifies conversation starts, and infers the prompt template via segmentation. Low friction for the customer, high value from day one, and a meaningful technical differentiator.

Post-alpha. Trace-based prompt template inference ships post-alpha. Alpha relies on the explicit-template fallback: customers declare their templates explicitly during onboarding. Inference becomes the default when the a future enabler lands.

Hyperparameters

Per-agent model configuration parameters declared via OTEL span attributes:

Model provider (gen_ai.system)
Model name (gen_ai.request.model)
Model version
Temperature
Max tokens
Confidence thresholds
Retrieval settings (top-k, similarity thresholds)
Tool invocation thresholds
Chain-of-thought mode

Spectral establishes a baseline from observed hyperparameters and recommends changes when optimization identifies improvements. Hyperparameter mutations are informed by failure analysis — specific failure patterns map to specific parameter adjustments (e.g., fabrication patterns suggest lowering temperature and enabling chain-of-thought).

Observation: Traces, Samples, and Retention

With the workspace structure established — agents, their prompts, their parameters — the next question is: how does Spectral observe what these agents are actually doing?

Trace

A raw observation from OTEL — an immutable record of what happened in the customer’s agent system.

Traces arrive continuously from the customer’s OTEL instrumentation. They are high-volume, append-only, and contain the actual data that flowed through the agent system: inputs, outputs, model parameters, latency, cost, and span relationships.

Traces are the firehose. Spectral needs to extract signal from this volume — which is where Samples come in.

Trace stores follow a <context>_<entity> naming convention, separated by purpose:

Store	Purpose
`otel_traces`	Customer production traces ingested via OTLP — permanent, never belongs to a scan
`scan_traces` + `scan_observations`	Scan-phase LLM calls — uniform trace record with observe-phase extensions
`platform.agent_traces`	Spectral Agent conversational traces (per ADR-043)

Scan cost is derived from scan_traces directly (SUM(cost_usd)), and evaluation scores live in scan_evals as an extension of scan_traces. See Domain Model for entity details and Optimization Engine for the pipeline mechanics.

Sample

A curated, retention-aware artifact derived from a Trace. Samples are the evaluation data that the optimization loop operates on.

A Sample contains:

Structural metadata — classification of what kind of input this represents, what segment of the input space it covers
Stratification tags — platform-owned classifications enabling representative coverage of the production input space (e.g., trace domain, behavioral pattern, severity band). Opaque to spectral.worlds — no reference to rule taxonomy, rule IDs, or any structure from spectral.worlds.
Ground truth (optional) — known-correct output, when provided by the customer or inferred
Evaluation history — all evaluation results ever produced against this Sample across Change Sets
Trace reference — provenance link to the originating trace

A Sample never contains copies of trace data. This is the data retention boundary — the most important invariant in the data model. The Sample references the trace while it exists and functions without it when it doesn’t.

Sample Lifecycle

State	Trace status	Capabilities
Active	Within retention window	Full participation in optimization — re-execution uses live trace data
Historical	Removed per retention policy	Evaluation history and structural metadata preserved. Cannot be used for re-execution. Trace reference becomes a provenance tombstone.
Archived	N/A	Explicitly removed from active use. Retained for audit trail.

Sample Set

A versioned collection of Samples with working/holdout splits. This is what the optimization loop actually runs against.

Version — the Sample Set is versioned because the composition of what you’re evaluating against matters
Working/holdout split — partitioned per optimization run, not permanently. Working Samples drive optimization; holdout Samples validate that improvements generalize to unseen inputs.

Continuous Refresh

The Sample collection is not a static test suite. It continuously refreshes from incoming production traffic:

New traces arrive → curation selects representative inputs → new active Samples
Active Samples age into historical as their traces expire per retention policy
The system maintains minimum active coverage per input space segment
If a segment loses all active Samples, it is flagged for refresh from incoming traffic

Retention does not break the system — it is a natural lifecycle stage. Production traffic continuously replenishes the active Sample pool. The retention policy controls how long full re-execution fidelity is available, not whether the system functions.

Customer type	Traffic pattern	Sample behavior
Continuous release	Constant fresh traffic	Always has active Samples, continuous refresh
Periodic release	Traffic between releases	Samples refresh naturally during active periods
Point-in-time	Traffic stops after engagement	Samples degrade to historical — optimization is complete
Self-improving	Evolving traffic patterns	Sample collection evolves with the system, capturing distribution shifts

Data Retention

Traces are subject to retention policies with reasonable defaults, controllable by the account admin.

Setting	Controlled by	Description
Trace retention period	Admin	How long raw trace data is held (default 90 days)
Retention clock start	System	Begins after the trace has been processed and any sample derivation is complete
Sample retention	Admin	Samples are long-lived by default; admin can configure removal

The data boundary: Samples never contain copies of trace data. At derivation time, the system extracts Spectral’s own analysis (structural classification, stratification, evaluation results) but not the customer’s data. When the trace is removed through retention, the Sample’s trace reference becomes a provenance tombstone — metadata and evaluation history persist, but no customer data remains beyond the retention window.

Reproducibility after retention: When traces are removed, the specific optimization run cannot be replayed. But in LLM systems, re-running produces different results regardless (non-determinism, model updates). What customers actually need is explainability — and that is fully preserved through the Change Set’s reasoning, evaluation history, framework snapshot, and Sample Set reference.

Change Set

With workspace structure, observation data, and retention understood, the final primitive ties everything together: the Change Set.

A Change Set is the unit of change management — a package that captures the complete state of a Workspace’s optimization at a point in time.

Structure

Field	Description
Version	The version of this Change Set
Baseline	Reference to the predecessor Change Set that was evaluated against — answers “compared to what?”
Evaluation Framework	Snapshot reference to the framework version under which results were generated
Sample Set	Reference to the Sample Set version this Change Set was evaluated against
Agents[]	Versioned configuration for each agent in the team
Explainability	Reasoning, decision log, and summary of changes (see Explainability)
Status	Lifecycle state: `proposed` → `accepted` → `superseded` (or `validated` if no changes warranted, `rejected` if declined)

A Change Set references three context dimensions: what it changed from (Baseline), what it was measured with (Evaluation Framework), and what it was measured on (Sample Set). Results are only interpretable when all three are known.

Configuration (within Agents[])

Each entry in the Agents array contains:

Field	Description
Version	Independent version for this agent’s configuration — Agent A might be v5 while Agent B is v3 if B didn’t change
Prompt Template	The agent’s full prompt configuration (structured, not flat text)
Hyperparameters	The agent’s model parameters

Versioning

Version exists at two levels:

Change Set Version — the workspace-level snapshot. Every change, even to a single agent, creates a new Change Set Version because system-level interaction effects are what matter.
Agent Version — independent per agent within a Change Set. Comparing agent versions across Change Sets immediately shows which agents changed.

A Change Set always references its Baseline — the Change Set it was evaluated against. This creates a chain: each Change Set knows what came before it and why the changes were made. Customers can read their version history as a narrative of how their system evolved.

Validated Change Sets

When an optimization run finds nothing to improve, the system still produces a Change Set with validated status. This captures what was evaluated, what was considered, and why no changes were warranted. The audit trail is maintained — the customer knows the system is working even when it’s quiet.

Relationship to the Integration Ladder

Stage	Change Set role
Stage 1	Proposed Change Sets are advisory — the customer reads the recommendations and applies changes manually
Stage 2	Proposed Change Sets can be applied to managed templates — the customer reviews and accepts
Stage 3	Change Sets can be accepted automatically, subject to lifecycle gates and version management

Evaluation Framework

The measurement framework that tells Spectral what to evaluate, what “better” means, and where improvement has business value. Without an Evaluation Framework, Spectral is optimizing blindly.

Evaluation Frameworks exist at three layers:

Layer	Owner	Purpose
Global template	Platform (Spectral)	Base frameworks derived from World Models and maintained by Spectral, organized by domain and use case pattern. See World Model System
Workspace instance	Workspace owner	Fork of a global template, customized for this customer’s specific situation. Tracks its global parent version for merge purposes
Change Set snapshot	Change Set	Point-in-time reference to the framework version under which optimization results were generated

An Evaluation Framework contains rubrics with weighted dimensions, an objective function, hard constraints, and economic parameters. For the full specification, see Optimization Engine.

World Model

A World Model is a Spectral-managed standard that encodes the behavioral expectations of a specific problem domain. It exists independently of any customer’s system.

The EvaluationFramework answers what to measure and what constitutes better. The World Model answers where those criteria come from. Without an external standard, evaluation criteria drift — they evolve to reflect what a customer’s system already does well, not what the domain actually requires. A customer can only construct evaluations for scenarios they have encountered. Synthetic generation without grounding produces volume, not value.

The World Model solves this. Every EvaluationFramework is derived from the World Model, not authored by the customer. The customer’s system operates in a problem space — a subset of the broader domain the World Model represents. The EvaluationFramework generated for that system is scoped to the customer’s problem space but grounded in the domain standard. The customer steers coverage by selecting vectors and focus areas. The criteria themselves originate from the World Model.

This changes what optimization means. Performance gains against a World Model-derived EvaluationFramework reflect genuine domain conformance — improvement measured against an external standard, not convergence toward a self-referential rubric. The loop moves agents toward what the domain actually requires.

For the full specification, see World Model System.

Previous
Overview Next
Glossary