Foundations

Observability Principles

This page establishes the observability principles Spectral upholds regardless of which specific stack we choose. These are load-bearing — they hold the line on data-classification, on the minimum shape of telemetry every LLM call and every scan phase emit, and on traceability across contexts.

The first-class boundary — customer data vs operational data

Spectral handles two categorically different classes of data. Commingling them is an architectural violation, not a policy preference.

Customer data

What it is. Workspace-scoped traces customers ingest, samples derived from those traces, scan artifacts (EvalResults, FailureClusters, VerdictResults, ChangeSets, AgentPerformanceCards), and any conversation content between the customer and the Spectral Agent.

Where it lives. Customer-owned schemas under Supabase RLS. Every row carries workspace_id (and account_id where applicable) from the first migration (per Architecture — Multi-tenancy isolation). A customer querying with their JWT never sees another workspace’s data because Postgres itself enforces the boundary.

Who can see it. The owning workspace’s authenticated users, plus Spectral operators holding the operations account role (Spectral staff — used for customer support, incident response, and the operational surfaces in the Operations app).

Retention. Tied to the customer’s agreement. Deletion propagates on workspace retirement.

Operational data

What it is. Spectral’s own telemetry: structured logs, internal OTEL spans (API request traces, worker-job traces, scan-phase spans), agent reasoning traces (Spectral Agent, Operations Agent, World Agent internal LangGraph state), LLM call metadata.

Where it lives. Platform-owned stores at Grafana Cloud LGTM (logs/metrics/traces) + Pydantic Logfire (LLM-call telemetry) + Sentry (error capture) per ADR-036. What is fixed is that it is not in customer-scoped tables. Operational storage carries no workspace_id as a primary organizing key — it may carry the field as a label for correlation, but the storage does not enforce workspace isolation.

Who can see it. Spectral engineering and operations. Customers do not see operational data.

Retention. Platform-owned policy per ADR-042. Shorter than customer-data retention in most cases — operational telemetry does not need to outlive incident-investigation windows.

Mixing the two is a category violation

The customer ingestion endpoint (/api/v1/traces — API-key authenticated, write:traces scope) receives customer data only. Operational telemetry has its own backend, distinct from customer-data storage — vendor-swappable via OTEL_EXPORTER_OTLP_ENDPOINT per ADR-036 (see Observability stack for the vendor inventory). Specifically:

Application log statements never include customer PII (taxpayer-identifiable content, full trace payloads with user prompts, private conversation text). Workspace IDs and scan IDs are fine; their contents are not.
Operational spans that cross customer-data surfaces carry references (scan IDs, sample IDs, trace IDs) rather than copying the customer data into the operational record.
If debugging requires joining operational telemetry with customer data, the join happens at query time against the customer-data store — never by copying customer data into the operational store.

This boundary is enforced by convention + code review for today. A structured-linter rule (forbidden payload shapes in log.info(...) calls) is a a future hardening item.

Structured logging — canonical fields

All application logs flow through structlog configured to emit JSON. Every log record carries the canonical fields below when the context applies; fields are omitted rather than set to null.

Field	Type	When present	Purpose
`workspace_id`	UUID	Any log in a request / job that has resolved a workspace	Multi-tenant correlation. Indexed in the operational store.
`account_id`	UUID	Same as `workspace_id`	Account-level rollup correlation.
`scan_id`	UUID	Any log emitted inside the scan pipeline	Scan-scoped correlation.
`world_model_version`	string	Any log that cites a specific world-model version (EvalSet request, verdict)	Authority correlation — lets you slice telemetry by world-model version.
`bc`	enum(`worlds`, `platform`, `core`, `app:api`, `app:workers`, `app:operations`)	Every log	Which package emitted the log. Critical for context-aware filtering.
`phase`	enum(`observe`, `calibrate`, `diagnose`, `evaluate`, `optimize`, `safety`, `verdict`, `distill`, `evolve`, `publish`, `agent`, …)	Logs inside a named pipeline phase	Phase-level slice of scan and world-model activity.
`trace_id`	string	Every log emitted within an OTEL span context	Correlates logs to spans (see Trace context propagation).

Free-form message fields are allowed, but they are secondary. Dashboards, alerts, and post-incident queries key on the canonical fields above. A log that only carries a free-form message is debuggable by a human but not by the observability pipeline.

Emitter discipline. Each context constructs a pre-bound logger at composition time (bc=worlds or bc=platform), so downstream call sites inherit the field automatically. Adding a new context-aware field is a composition-root change, not a per-call-site change.

LLM call telemetry

Every LLM call emits a structured telemetry record. This is the minimum shape — the specific destination is Pydantic Logfire per ADR-036, but the shape itself is invariant across substrate choices.

Field	Type	Notes
`model`	string	Provider + model identifier (`anthropic/claude-opus-4-7`, `openai/gpt-5.2`, `google/gemini-2.5-flash`). Exact strings; never aliased.
`account_id`	UUID \| null	Customer-tenancy field per ADR-033. Nullable for OPERATIONS-only calls.
`workspace_id`	UUID \| null	Present when the call is attributable to a specific customer workspace. Null for OPERATIONS-only calls (e.g., Ops Agent / World Agent internal reasoning).
`input_tokens`	int	Prompt-token count.
`output_tokens`	int	Completion-token count.
`latency_ms`	int	Wall-clock latency of the call.
`cost_usd`	decimal	Dollar cost of the call, computed from provider pricing at call time (per ADR-035 D6 via `genai-prices` with fallback registry).
`purpose`	enum	The 8-value `PurposeKey` taxonomy below. Required.
`content_class`	enum(`PLATFORM`, `OPERATIONS`, `SYNTHETIC`)	The content classification driving Stream A redaction. Required.
`bc`	enum(`worlds`, `platform`, `core`, `app:api`, `app:workers`, `app:operations`)	Which context initiated the call.
`scan_id`	UUID \| null	Present when the call is inside a scan run.
`trace_id`	string	OTel trace context for correlation with the enclosing operation.

The `PurposeKey` taxonomy

A closed 8-value enum in spectral.core.llm.purposes.PurposeKey (per ADR-035 D3). If a new purpose is needed, it gets added here, not invented at the call site.

Value	Meaning
`scoring`	Evaluation scoring (high volume, cost-sensitive).
`detection`	Anti-deception, parse validation.
`reasoning`	Diagnosis, optimization, calibration rewrites, rule distillation (quality-critical).
`agent_turn`	Spectral Agent / Ops Agent conversational turn.
`agent_tool`	Agent tool-invocation call (may differ from turn).
`world_agent`	World Agent exploration / hypothesis.
`customer_replay`	Re-executing customer agents during the observe phase.
`embedding`	Embedding generation (full policy in embeddings).

The `ContentClass` taxonomy

spectral.core.llm.content_class.ContentClass is a closed 3-value enum (per ADR-036 D6):

Value	Meaning
`PLATFORM`	Customer content processed or generated in `platform` (conformance-track customer traces, Spectral Agent conversations, customer replay).
`OPERATIONS`	Spectral-operated reasoning (World Agent, Ops Agent, internal distillation).
`SYNTHETIC`	Test-agent-generated synthetic content (no customer PII).

Resolver mapping at the composition root (never per-call developer discretion):

world_agent → always OPERATIONS
scoring / detection / reasoning → SYNTHETIC on the synthetic track; PLATFORM on the conformance track
agent_turn (Spectral Agent) → always PLATFORM
customer_replay → always PLATFORM
agent_tool → inherits from parent span’s content_class
embedding → caller-determined
Ops Agent / platform-internal reasoning → OPERATIONS

For PLATFORM-class calls, prompt / completion / tool-arg fields are stripped before export to third-party observability (Logfire, Sentry). See observability stack for the three-stream architecture and the two-layer enforcement.

Why this matters. Cost attribution, rate-limit investigation, model-choice decisions, and provider-drift detection all key on purpose. A generic “LLM call count” dashboard is not actionable; a dashboard sliced by purpose × model × bc × content_class is.

Trace context propagation across contexts

A scan completes in spectral.platform and emits scan.convergence.delta — a domain event consumed by spectral.worlds per ADR-017. The WorldAgent then reasons over the event and potentially proposes rule candidates. The entire path must be traceable as one logical operation.

The rule

W3C Trace Context (traceparent / tracestate) — or an equivalent propagation envelope — is carried across every event between contexts. The consumer side of an event opens a new span as a child of the producer’s span. One logical operation can be walked start-to-finish even when it spans worlds and platform, multiple workers, and multiple LLM calls.

What carries the context

HTTP requests — standard OTEL HTTP propagation. Incoming traceparent is honoured; if absent, a new root span is started.
Domain events — events typed in spectral.core carry an envelope field that includes the producer’s trace_id and span_id (the propagation shape itself is a spectral.core contract; changes to it follow spectral.core governance ADR-065).
Agent tool calls — tool invocations inherit the agent’s span context; LLM calls made by tools inherit further still.
Worker-job dispatch — the AgentTask row carries the trace context; the worker picks it up and opens child spans.

The guarantee

If a scan produces a verdict that causes a world-model rule-candidate proposal, the operator looking at the proposal can walk back through:

RuleCandidate (worlds, distill phase)
  ← scan.convergence.delta event (carries trace context)
    ← verdict phase span (platform)
      ← scan phase span (platform)
        ← HTTP scan request (API)
          ← customer ingestion (API)

— with one trace_id threading the entire path. No guessing, no manual correlation.

What breaks if we lose propagation

Incident forensics degrades to database archaeology (looking up rows by timestamps and hoping they correlate).
Cost-attribution across worlds and platform becomes impossible (an LLM call in the evaluate phase cannot be attributed to the scan that originated it).
The “why did this candidate appear?” question has no mechanical answer — operators end up guessing from timing.

The rule is non-optional. An event published without a trace context is a bug.

Tooling realization

The principles above are in force from commit one. The concrete tooling that realizes them — vendor inventory, export destinations, per-stream redaction, retention defaults, alert rules — lives at Observability Stack per ADR-036. This page is the doctrine; that page is the inventory. Principles do not change between the two; only the runtime destination of each signal does.

What this page does NOT cover

Tool choices. Specific vendors (Grafana Cloud / Pydantic Logfire / Sentry) and the alpha-posture matrix live in Observability stack per ADR-036.
Retention policies. Per-class retention rules live in Data retention per ADR-042.
Alerting discipline. Who gets paged, for what, and through which channel — captured in the operational runbooks (docs/runbooks/) alongside on-call rotation.
Customer-visible observability. Dashboards the customer sees are part of the product surface, not the operational observability plane. See Optimization Engine for customer-visible verdict and scoring surfaces.

Architecture — the three-context topology these principles apply across
Event System — the event shape that carries trace context between contexts
Access Control — role-and-scope model (who can see what)
Testing — per-layer strategy, including integration tests that cross between contexts
Observability stack — vendors, alpha posture, and content-class taxonomy per ADR-036
Data retention — per-class retention policies per ADR-042

Previous
Data Retention Next
Overview