Primitives
This page is the lookup surface for Spectral’s core primitives — the building blocks the rest of the system is built on. Each section is a per-primitive reference: definition, structure, lifecycle, and the relationships that constrain it.
For the conceptual narrative — how the primitives compose into the platform, what each one is for, why this shape and not another — see the System Design overview.
Workspace
Section titled “Workspace”The top-level container. A Workspace represents a customer’s multi-agent system as a single optimizable entity.
Each Workspace is discrete — it has its own members, permissions, change sets, and optimization history. See Access Control for the full role and isolation model.
A Workspace contains one or more Workspace Agents and an Evaluation Framework instance. Together, these define what the customer’s system does and how Spectral measures it.
Workspace Agent
Section titled “Workspace Agent”An individual agent within a workspace — one member of the customer’s agent team. Workspace Agents are identified through one of two paths:
- Declared: The customer defines the agent team structure during workspace setup, naming each agent and its role
- Inferred: Spectral maps incoming OTEL traces to agent identities within the workspace
based on span attributes (
agent.name,agent.type) and behavioral patterns
Workspace Agents have defined dependencies between them. They execute in topological order based on these dependencies, and downstream agents receive upstream agent outputs as context.
Each Workspace Agent has two core components that define its behavior: a Prompt Template and Hyperparameters.
Prompt Template
Section titled “Prompt Template”The prompt configuration for a single agent. Not a flat text blob — Spectral understands prompt templates structurally:
- Base prompt text — the core instruction
- Context selection strategy — how the agent assembles context (retrieval settings, what gets included, truncation behavior)
- Chain-of-thought configuration — none, brief, or full reasoning injection
- Few-shot examples — grounding examples that demonstrate expected behavior
- Upstream context injection — how outputs from dependency agents are incorporated
This structural understanding is what makes Spectral’s recommendations actionable. “Your context assembly is including full document bodies when only metadata is needed” is useful. “Change your prompt” is not.
Template Inference: The ideal onboarding path — Spectral pulls traces from OTEL spans, identifies conversation starts, and infers the prompt template via segmentation. Low friction for the customer, high value from day one, and a meaningful technical differentiator.
Post-alpha. Trace-based prompt template inference ships post-alpha. Alpha relies on the explicit-template fallback: customers declare their templates explicitly during onboarding. Inference becomes the default when the a future enabler lands.
Hyperparameters
Section titled “Hyperparameters”Per-agent model configuration parameters declared via OTEL span attributes:
- Model provider (
gen_ai.system) - Model name (
gen_ai.request.model) - Model version
- Temperature
- Max tokens
- Confidence thresholds
- Retrieval settings (top-k, similarity thresholds)
- Tool invocation thresholds
- Chain-of-thought mode
Spectral establishes a baseline from observed hyperparameters and recommends changes when optimization identifies improvements. Hyperparameter mutations are informed by failure analysis — specific failure patterns map to specific parameter adjustments (e.g., fabrication patterns suggest lowering temperature and enabling chain-of-thought).
Observation: Traces, Samples, and Retention
Section titled “Observation: Traces, Samples, and Retention”With the workspace structure established — agents, their prompts, their parameters — the next question is: how does Spectral observe what these agents are actually doing?
A raw observation from OTEL — an immutable record of what happened in the customer’s agent system.
Traces arrive continuously from the customer’s OTEL instrumentation. They are high-volume, append-only, and contain the actual data that flowed through the agent system: inputs, outputs, model parameters, latency, cost, and span relationships.
Traces are the firehose. Spectral needs to extract signal from this volume — which is where Samples come in.
Trace stores follow a <context>_<entity> naming convention, separated by purpose:
| Store | Purpose |
|---|---|
otel_traces | Customer production traces ingested via OTLP — permanent, never belongs to a scan |
scan_traces + scan_observations | Scan-phase LLM calls — uniform trace record with observe-phase extensions |
platform.agent_traces | Spectral Agent conversational traces (per ADR-043) |
Scan cost is derived from scan_traces directly (SUM(cost_usd)), and evaluation scores live
in scan_evals as an extension of scan_traces. See
Domain Model for entity details and
Optimization Engine for the pipeline
mechanics.
Sample
Section titled “Sample”A curated, retention-aware artifact derived from a Trace. Samples are the evaluation data that the optimization loop operates on.
A Sample contains:
- Structural metadata — classification of what kind of input this represents, what segment of the input space it covers
- Stratification tags — platform-owned classifications enabling representative coverage of
the production input space (e.g., trace domain, behavioral pattern, severity band). Opaque
to
spectral.worlds— no reference to rule taxonomy, rule IDs, or any structure fromspectral.worlds. - Ground truth (optional) — known-correct output, when provided by the customer or inferred
- Evaluation history — all evaluation results ever produced against this Sample across Change Sets
- Trace reference — provenance link to the originating trace
A Sample never contains copies of trace data. This is the data retention boundary — the most important invariant in the data model. The Sample references the trace while it exists and functions without it when it doesn’t.
Sample Lifecycle
Section titled “Sample Lifecycle”| State | Trace status | Capabilities |
|---|---|---|
| Active | Within retention window | Full participation in optimization — re-execution uses live trace data |
| Historical | Removed per retention policy | Evaluation history and structural metadata preserved. Cannot be used for re-execution. Trace reference becomes a provenance tombstone. |
| Archived | N/A | Explicitly removed from active use. Retained for audit trail. |
Sample Set
Section titled “Sample Set”A versioned collection of Samples with working/holdout splits. This is what the optimization loop actually runs against.
- Version — the Sample Set is versioned because the composition of what you’re evaluating against matters
- Working/holdout split — partitioned per optimization run, not permanently. Working Samples drive optimization; holdout Samples validate that improvements generalize to unseen inputs.
Continuous Refresh
Section titled “Continuous Refresh”The Sample collection is not a static test suite. It continuously refreshes from incoming production traffic:
- New traces arrive → curation selects representative inputs → new active Samples
- Active Samples age into historical as their traces expire per retention policy
- The system maintains minimum active coverage per input space segment
- If a segment loses all active Samples, it is flagged for refresh from incoming traffic
Retention does not break the system — it is a natural lifecycle stage. Production traffic continuously replenishes the active Sample pool. The retention policy controls how long full re-execution fidelity is available, not whether the system functions.
| Customer type | Traffic pattern | Sample behavior |
|---|---|---|
| Continuous release | Constant fresh traffic | Always has active Samples, continuous refresh |
| Periodic release | Traffic between releases | Samples refresh naturally during active periods |
| Point-in-time | Traffic stops after engagement | Samples degrade to historical — optimization is complete |
| Self-improving | Evolving traffic patterns | Sample collection evolves with the system, capturing distribution shifts |
Data Retention
Section titled “Data Retention”Traces are subject to retention policies with reasonable defaults, controllable by the account admin.
| Setting | Controlled by | Description |
|---|---|---|
| Trace retention period | Admin | How long raw trace data is held (default 90 days) |
| Retention clock start | System | Begins after the trace has been processed and any sample derivation is complete |
| Sample retention | Admin | Samples are long-lived by default; admin can configure removal |
The data boundary: Samples never contain copies of trace data. At derivation time, the system extracts Spectral’s own analysis (structural classification, stratification, evaluation results) but not the customer’s data. When the trace is removed through retention, the Sample’s trace reference becomes a provenance tombstone — metadata and evaluation history persist, but no customer data remains beyond the retention window.
Reproducibility after retention: When traces are removed, the specific optimization run cannot be replayed. But in LLM systems, re-running produces different results regardless (non-determinism, model updates). What customers actually need is explainability — and that is fully preserved through the Change Set’s reasoning, evaluation history, framework snapshot, and Sample Set reference.
Change Set
Section titled “Change Set”With workspace structure, observation data, and retention understood, the final primitive ties everything together: the Change Set.
A Change Set is the unit of change management — a package that captures the complete state of a Workspace’s optimization at a point in time.
Structure
Section titled “Structure”| Field | Description |
|---|---|
| Version | The version of this Change Set |
| Baseline | Reference to the predecessor Change Set that was evaluated against — answers “compared to what?” |
| Evaluation Framework | Snapshot reference to the framework version under which results were generated |
| Sample Set | Reference to the Sample Set version this Change Set was evaluated against |
| Agents[] | Versioned configuration for each agent in the team |
| Explainability | Reasoning, decision log, and summary of changes (see Explainability) |
| Status | Lifecycle state: proposed → accepted → superseded (or validated if no changes warranted, rejected if declined) |
A Change Set references three context dimensions: what it changed from (Baseline), what it was measured with (Evaluation Framework), and what it was measured on (Sample Set). Results are only interpretable when all three are known.
Configuration (within Agents[])
Section titled “Configuration (within Agents[])”Each entry in the Agents array contains:
| Field | Description |
|---|---|
| Version | Independent version for this agent’s configuration — Agent A might be v5 while Agent B is v3 if B didn’t change |
| Prompt Template | The agent’s full prompt configuration (structured, not flat text) |
| Hyperparameters | The agent’s model parameters |
Versioning
Section titled “Versioning”Version exists at two levels:
- Change Set Version — the workspace-level snapshot. Every change, even to a single agent, creates a new Change Set Version because system-level interaction effects are what matter.
- Agent Version — independent per agent within a Change Set. Comparing agent versions across Change Sets immediately shows which agents changed.
A Change Set always references its Baseline — the Change Set it was evaluated against. This creates a chain: each Change Set knows what came before it and why the changes were made. Customers can read their version history as a narrative of how their system evolved.
Validated Change Sets
Section titled “Validated Change Sets”When an optimization run finds nothing to improve, the system still produces a Change Set with
validated status. This captures what was evaluated, what was considered, and why no changes
were warranted. The audit trail is maintained — the customer knows the system is working even
when it’s quiet.
Relationship to the Integration Ladder
Section titled “Relationship to the Integration Ladder”| Stage | Change Set role |
|---|---|
| Stage 1 | Proposed Change Sets are advisory — the customer reads the recommendations and applies changes manually |
| Stage 2 | Proposed Change Sets can be applied to managed templates — the customer reviews and accepts |
| Stage 3 | Change Sets can be accepted automatically, subject to lifecycle gates and version management |
Evaluation Framework
Section titled “Evaluation Framework”The measurement framework that tells Spectral what to evaluate, what “better” means, and where improvement has business value. Without an Evaluation Framework, Spectral is optimizing blindly.
Evaluation Frameworks exist at three layers:
| Layer | Owner | Purpose |
|---|---|---|
| Global template | Platform (Spectral) | Base frameworks derived from World Models and maintained by Spectral, organized by domain and use case pattern. See World Model System |
| Workspace instance | Workspace owner | Fork of a global template, customized for this customer’s specific situation. Tracks its global parent version for merge purposes |
| Change Set snapshot | Change Set | Point-in-time reference to the framework version under which optimization results were generated |
An Evaluation Framework contains rubrics with weighted dimensions, an objective function, hard constraints, and economic parameters. For the full specification, see Optimization Engine.
World Model
Section titled “World Model”A World Model is a Spectral-managed standard that encodes the behavioral expectations of a specific problem domain. It exists independently of any customer’s system.
The EvaluationFramework answers what to measure and what constitutes better. The World Model answers where those criteria come from. Without an external standard, evaluation criteria drift — they evolve to reflect what a customer’s system already does well, not what the domain actually requires. A customer can only construct evaluations for scenarios they have encountered. Synthetic generation without grounding produces volume, not value.
The World Model solves this. Every EvaluationFramework is derived from the World Model, not authored by the customer. The customer’s system operates in a problem space — a subset of the broader domain the World Model represents. The EvaluationFramework generated for that system is scoped to the customer’s problem space but grounded in the domain standard. The customer steers coverage by selecting vectors and focus areas. The criteria themselves originate from the World Model.
This changes what optimization means. Performance gains against a World Model-derived EvaluationFramework reflect genuine domain conformance — improvement measured against an external standard, not convergence toward a self-referential rubric. The loop moves agents toward what the domain actually requires.
For the full specification, see World Model System.