Skip to content
GitHub
Reference

Primitives

This page is the lookup surface for Spectral’s core primitives — the building blocks the rest of the system is built on. Each section is a per-primitive reference: definition, structure, lifecycle, and the relationships that constrain it.

For the conceptual narrative — how the primitives compose into the platform, what each one is for, why this shape and not another — see the System Design overview.


The top-level container. A Workspace represents a customer’s multi-agent system as a single optimizable entity.

Each Workspace is discrete — it has its own members, permissions, change sets, and optimization history. See Access Control for the full role and isolation model.

A Workspace contains one or more Workspace Agents and an Evaluation Framework instance. Together, these define what the customer’s system does and how Spectral measures it.

An individual agent within a workspace — one member of the customer’s agent team. Workspace Agents are identified through one of two paths:

  • Declared: The customer defines the agent team structure during workspace setup, naming each agent and its role
  • Inferred: Spectral maps incoming OTEL traces to agent identities within the workspace based on span attributes (agent.name, agent.type) and behavioral patterns

Workspace Agents have defined dependencies between them. They execute in topological order based on these dependencies, and downstream agents receive upstream agent outputs as context.

Each Workspace Agent has two core components that define its behavior: a Prompt Template and Hyperparameters.

The prompt configuration for a single agent. Not a flat text blob — Spectral understands prompt templates structurally:

  • Base prompt text — the core instruction
  • Context selection strategy — how the agent assembles context (retrieval settings, what gets included, truncation behavior)
  • Chain-of-thought configuration — none, brief, or full reasoning injection
  • Few-shot examples — grounding examples that demonstrate expected behavior
  • Upstream context injection — how outputs from dependency agents are incorporated

This structural understanding is what makes Spectral’s recommendations actionable. “Your context assembly is including full document bodies when only metadata is needed” is useful. “Change your prompt” is not.

Template Inference: The ideal onboarding path — Spectral pulls traces from OTEL spans, identifies conversation starts, and infers the prompt template via segmentation. Low friction for the customer, high value from day one, and a meaningful technical differentiator.

Post-alpha. Trace-based prompt template inference ships post-alpha. Alpha relies on the explicit-template fallback: customers declare their templates explicitly during onboarding. Inference becomes the default when the a future enabler lands.

Per-agent model configuration parameters declared via OTEL span attributes:

  • Model provider (gen_ai.system)
  • Model name (gen_ai.request.model)
  • Model version
  • Temperature
  • Max tokens
  • Confidence thresholds
  • Retrieval settings (top-k, similarity thresholds)
  • Tool invocation thresholds
  • Chain-of-thought mode

Spectral establishes a baseline from observed hyperparameters and recommends changes when optimization identifies improvements. Hyperparameter mutations are informed by failure analysis — specific failure patterns map to specific parameter adjustments (e.g., fabrication patterns suggest lowering temperature and enabling chain-of-thought).


Observation: Traces, Samples, and Retention

Section titled “Observation: Traces, Samples, and Retention”

With the workspace structure established — agents, their prompts, their parameters — the next question is: how does Spectral observe what these agents are actually doing?

A raw observation from OTEL — an immutable record of what happened in the customer’s agent system.

Traces arrive continuously from the customer’s OTEL instrumentation. They are high-volume, append-only, and contain the actual data that flowed through the agent system: inputs, outputs, model parameters, latency, cost, and span relationships.

Traces are the firehose. Spectral needs to extract signal from this volume — which is where Samples come in.

Trace stores follow a <context>_<entity> naming convention, separated by purpose:

StorePurpose
otel_tracesCustomer production traces ingested via OTLP — permanent, never belongs to a scan
scan_traces + scan_observationsScan-phase LLM calls — uniform trace record with observe-phase extensions
platform.agent_tracesSpectral Agent conversational traces (per ADR-043)

Scan cost is derived from scan_traces directly (SUM(cost_usd)), and evaluation scores live in scan_evals as an extension of scan_traces. See Domain Model for entity details and Optimization Engine for the pipeline mechanics.

A curated, retention-aware artifact derived from a Trace. Samples are the evaluation data that the optimization loop operates on.

A Sample contains:

  • Structural metadata — classification of what kind of input this represents, what segment of the input space it covers
  • Stratification tags — platform-owned classifications enabling representative coverage of the production input space (e.g., trace domain, behavioral pattern, severity band). Opaque to spectral.worlds — no reference to rule taxonomy, rule IDs, or any structure from spectral.worlds.
  • Ground truth (optional) — known-correct output, when provided by the customer or inferred
  • Evaluation history — all evaluation results ever produced against this Sample across Change Sets
  • Trace reference — provenance link to the originating trace

A Sample never contains copies of trace data. This is the data retention boundary — the most important invariant in the data model. The Sample references the trace while it exists and functions without it when it doesn’t.

StateTrace statusCapabilities
ActiveWithin retention windowFull participation in optimization — re-execution uses live trace data
HistoricalRemoved per retention policyEvaluation history and structural metadata preserved. Cannot be used for re-execution. Trace reference becomes a provenance tombstone.
ArchivedN/AExplicitly removed from active use. Retained for audit trail.

A versioned collection of Samples with working/holdout splits. This is what the optimization loop actually runs against.

  • Version — the Sample Set is versioned because the composition of what you’re evaluating against matters
  • Working/holdout split — partitioned per optimization run, not permanently. Working Samples drive optimization; holdout Samples validate that improvements generalize to unseen inputs.

The Sample collection is not a static test suite. It continuously refreshes from incoming production traffic:

  • New traces arrive → curation selects representative inputs → new active Samples
  • Active Samples age into historical as their traces expire per retention policy
  • The system maintains minimum active coverage per input space segment
  • If a segment loses all active Samples, it is flagged for refresh from incoming traffic

Retention does not break the system — it is a natural lifecycle stage. Production traffic continuously replenishes the active Sample pool. The retention policy controls how long full re-execution fidelity is available, not whether the system functions.

Customer typeTraffic patternSample behavior
Continuous releaseConstant fresh trafficAlways has active Samples, continuous refresh
Periodic releaseTraffic between releasesSamples refresh naturally during active periods
Point-in-timeTraffic stops after engagementSamples degrade to historical — optimization is complete
Self-improvingEvolving traffic patternsSample collection evolves with the system, capturing distribution shifts

Traces are subject to retention policies with reasonable defaults, controllable by the account admin.

SettingControlled byDescription
Trace retention periodAdminHow long raw trace data is held (default 90 days)
Retention clock startSystemBegins after the trace has been processed and any sample derivation is complete
Sample retentionAdminSamples are long-lived by default; admin can configure removal

The data boundary: Samples never contain copies of trace data. At derivation time, the system extracts Spectral’s own analysis (structural classification, stratification, evaluation results) but not the customer’s data. When the trace is removed through retention, the Sample’s trace reference becomes a provenance tombstone — metadata and evaluation history persist, but no customer data remains beyond the retention window.

Reproducibility after retention: When traces are removed, the specific optimization run cannot be replayed. But in LLM systems, re-running produces different results regardless (non-determinism, model updates). What customers actually need is explainability — and that is fully preserved through the Change Set’s reasoning, evaluation history, framework snapshot, and Sample Set reference.


With workspace structure, observation data, and retention understood, the final primitive ties everything together: the Change Set.

A Change Set is the unit of change management — a package that captures the complete state of a Workspace’s optimization at a point in time.

FieldDescription
VersionThe version of this Change Set
BaselineReference to the predecessor Change Set that was evaluated against — answers “compared to what?”
Evaluation FrameworkSnapshot reference to the framework version under which results were generated
Sample SetReference to the Sample Set version this Change Set was evaluated against
Agents[]Versioned configuration for each agent in the team
ExplainabilityReasoning, decision log, and summary of changes (see Explainability)
StatusLifecycle state: proposedacceptedsuperseded (or validated if no changes warranted, rejected if declined)

A Change Set references three context dimensions: what it changed from (Baseline), what it was measured with (Evaluation Framework), and what it was measured on (Sample Set). Results are only interpretable when all three are known.

Each entry in the Agents array contains:

FieldDescription
VersionIndependent version for this agent’s configuration — Agent A might be v5 while Agent B is v3 if B didn’t change
Prompt TemplateThe agent’s full prompt configuration (structured, not flat text)
HyperparametersThe agent’s model parameters

Version exists at two levels:

  • Change Set Version — the workspace-level snapshot. Every change, even to a single agent, creates a new Change Set Version because system-level interaction effects are what matter.
  • Agent Version — independent per agent within a Change Set. Comparing agent versions across Change Sets immediately shows which agents changed.

A Change Set always references its Baseline — the Change Set it was evaluated against. This creates a chain: each Change Set knows what came before it and why the changes were made. Customers can read their version history as a narrative of how their system evolved.

When an optimization run finds nothing to improve, the system still produces a Change Set with validated status. This captures what was evaluated, what was considered, and why no changes were warranted. The audit trail is maintained — the customer knows the system is working even when it’s quiet.

StageChange Set role
Stage 1Proposed Change Sets are advisory — the customer reads the recommendations and applies changes manually
Stage 2Proposed Change Sets can be applied to managed templates — the customer reviews and accepts
Stage 3Change Sets can be accepted automatically, subject to lifecycle gates and version management

The measurement framework that tells Spectral what to evaluate, what “better” means, and where improvement has business value. Without an Evaluation Framework, Spectral is optimizing blindly.

Evaluation Frameworks exist at three layers:

LayerOwnerPurpose
Global templatePlatform (Spectral)Base frameworks derived from World Models and maintained by Spectral, organized by domain and use case pattern. See World Model System
Workspace instanceWorkspace ownerFork of a global template, customized for this customer’s specific situation. Tracks its global parent version for merge purposes
Change Set snapshotChange SetPoint-in-time reference to the framework version under which optimization results were generated

An Evaluation Framework contains rubrics with weighted dimensions, an objective function, hard constraints, and economic parameters. For the full specification, see Optimization Engine.


A World Model is a Spectral-managed standard that encodes the behavioral expectations of a specific problem domain. It exists independently of any customer’s system.

The EvaluationFramework answers what to measure and what constitutes better. The World Model answers where those criteria come from. Without an external standard, evaluation criteria drift — they evolve to reflect what a customer’s system already does well, not what the domain actually requires. A customer can only construct evaluations for scenarios they have encountered. Synthetic generation without grounding produces volume, not value.

The World Model solves this. Every EvaluationFramework is derived from the World Model, not authored by the customer. The customer’s system operates in a problem space — a subset of the broader domain the World Model represents. The EvaluationFramework generated for that system is scoped to the customer’s problem space but grounded in the domain standard. The customer steers coverage by selecting vectors and focus areas. The criteria themselves originate from the World Model.

This changes what optimization means. Performance gains against a World Model-derived EvaluationFramework reflect genuine domain conformance — improvement measured against an external standard, not convergence toward a self-referential rubric. The loop moves agents toward what the domain actually requires.

For the full specification, see World Model System.