Skip to content
GitHub
Foundations

Observability Principles

This page establishes the observability principles Spectral upholds regardless of which specific stack we choose. These are load-bearing — they hold the line on data-classification, on the minimum shape of telemetry every LLM call and every scan phase emit, and on traceability across contexts.


The first-class boundary — customer data vs operational data

Section titled “The first-class boundary — customer data vs operational data”

Spectral handles two categorically different classes of data. Commingling them is an architectural violation, not a policy preference.

What it is. Workspace-scoped traces customers ingest, samples derived from those traces, scan artifacts (EvalResults, FailureClusters, VerdictResults, ChangeSets, AgentPerformanceCards), and any conversation content between the customer and the Spectral Agent.

Where it lives. Customer-owned schemas under Supabase RLS. Every row carries workspace_id (and account_id where applicable) from the first migration (per Architecture — Multi-tenancy isolation). A customer querying with their JWT never sees another workspace’s data because Postgres itself enforces the boundary.

Who can see it. The owning workspace’s authenticated users, plus Spectral operators holding the operations account role (Spectral staff — used for customer support, incident response, and the operational surfaces in the Operations app).

Retention. Tied to the customer’s agreement. Deletion propagates on workspace retirement.

What it is. Spectral’s own telemetry: structured logs, internal OTEL spans (API request traces, worker-job traces, scan-phase spans), agent reasoning traces (Spectral Agent, Operations Agent, World Agent internal LangGraph state), LLM call metadata.

Where it lives. Platform-owned stores at Grafana Cloud LGTM (logs/metrics/traces) + Pydantic Logfire (LLM-call telemetry) + Sentry (error capture) per ADR-036. What is fixed is that it is not in customer-scoped tables. Operational storage carries no workspace_id as a primary organizing key — it may carry the field as a label for correlation, but the storage does not enforce workspace isolation.

Who can see it. Spectral engineering and operations. Customers do not see operational data.

Retention. Platform-owned policy per ADR-042. Shorter than customer-data retention in most cases — operational telemetry does not need to outlive incident-investigation windows.

The customer ingestion endpoint (/api/v1/traces — API-key authenticated, write:traces scope) receives customer data only. Operational telemetry has its own backend, distinct from customer-data storage — vendor-swappable via OTEL_EXPORTER_OTLP_ENDPOINT per ADR-036 (see Observability stack for the vendor inventory). Specifically:

  • Application log statements never include customer PII (taxpayer-identifiable content, full trace payloads with user prompts, private conversation text). Workspace IDs and scan IDs are fine; their contents are not.
  • Operational spans that cross customer-data surfaces carry references (scan IDs, sample IDs, trace IDs) rather than copying the customer data into the operational record.
  • If debugging requires joining operational telemetry with customer data, the join happens at query time against the customer-data store — never by copying customer data into the operational store.

This boundary is enforced by convention + code review for today. A structured-linter rule (forbidden payload shapes in log.info(...) calls) is a a future hardening item.


All application logs flow through structlog configured to emit JSON. Every log record carries the canonical fields below when the context applies; fields are omitted rather than set to null.

FieldTypeWhen presentPurpose
workspace_idUUIDAny log in a request / job that has resolved a workspaceMulti-tenant correlation. Indexed in the operational store.
account_idUUIDSame as workspace_idAccount-level rollup correlation.
scan_idUUIDAny log emitted inside the scan pipelineScan-scoped correlation.
world_model_versionstringAny log that cites a specific world-model version (EvalSet request, verdict)Authority correlation — lets you slice telemetry by world-model version.
bcenum(worlds, platform, core, app:api, app:workers, app:operations)Every logWhich package emitted the log. Critical for context-aware filtering.
phaseenum(observe, calibrate, diagnose, evaluate, optimize, safety, verdict, distill, evolve, publish, agent, …)Logs inside a named pipeline phasePhase-level slice of scan and world-model activity.
trace_idstringEvery log emitted within an OTEL span contextCorrelates logs to spans (see Trace context propagation).

Free-form message fields are allowed, but they are secondary. Dashboards, alerts, and post-incident queries key on the canonical fields above. A log that only carries a free-form message is debuggable by a human but not by the observability pipeline.

Emitter discipline. Each context constructs a pre-bound logger at composition time (bc=worlds or bc=platform), so downstream call sites inherit the field automatically. Adding a new context-aware field is a composition-root change, not a per-call-site change.


Every LLM call emits a structured telemetry record. This is the minimum shape — the specific destination is Pydantic Logfire per ADR-036, but the shape itself is invariant across substrate choices.

FieldTypeNotes
modelstringProvider + model identifier (anthropic/claude-opus-4-7, openai/gpt-5.2, google/gemini-2.5-flash). Exact strings; never aliased.
account_idUUID | nullCustomer-tenancy field per ADR-033. Nullable for OPERATIONS-only calls.
workspace_idUUID | nullPresent when the call is attributable to a specific customer workspace. Null for OPERATIONS-only calls (e.g., Ops Agent / World Agent internal reasoning).
input_tokensintPrompt-token count.
output_tokensintCompletion-token count.
latency_msintWall-clock latency of the call.
cost_usddecimalDollar cost of the call, computed from provider pricing at call time (per ADR-035 D6 via genai-prices with fallback registry).
purposeenumThe 8-value PurposeKey taxonomy below. Required.
content_classenum(PLATFORM, OPERATIONS, SYNTHETIC)The content classification driving Stream A redaction. Required.
bcenum(worlds, platform, core, app:api, app:workers, app:operations)Which context initiated the call.
scan_idUUID | nullPresent when the call is inside a scan run.
trace_idstringOTel trace context for correlation with the enclosing operation.

A closed 8-value enum in spectral.core.llm.purposes.PurposeKey (per ADR-035 D3). If a new purpose is needed, it gets added here, not invented at the call site.

ValueMeaning
scoringEvaluation scoring (high volume, cost-sensitive).
detectionAnti-deception, parse validation.
reasoningDiagnosis, optimization, calibration rewrites, rule distillation (quality-critical).
agent_turnSpectral Agent / Ops Agent conversational turn.
agent_toolAgent tool-invocation call (may differ from turn).
world_agentWorld Agent exploration / hypothesis.
customer_replayRe-executing customer agents during the observe phase.
embeddingEmbedding generation (full policy in embeddings).

spectral.core.llm.content_class.ContentClass is a closed 3-value enum (per ADR-036 D6):

ValueMeaning
PLATFORMCustomer content processed or generated in platform (conformance-track customer traces, Spectral Agent conversations, customer replay).
OPERATIONSSpectral-operated reasoning (World Agent, Ops Agent, internal distillation).
SYNTHETICTest-agent-generated synthetic content (no customer PII).

Resolver mapping at the composition root (never per-call developer discretion):

  • world_agent → always OPERATIONS
  • scoring / detection / reasoningSYNTHETIC on the synthetic track; PLATFORM on the conformance track
  • agent_turn (Spectral Agent) → always PLATFORM
  • customer_replay → always PLATFORM
  • agent_tool → inherits from parent span’s content_class
  • embedding → caller-determined
  • Ops Agent / platform-internal reasoning → OPERATIONS

For PLATFORM-class calls, prompt / completion / tool-arg fields are stripped before export to third-party observability (Logfire, Sentry). See observability stack for the three-stream architecture and the two-layer enforcement.

Why this matters. Cost attribution, rate-limit investigation, model-choice decisions, and provider-drift detection all key on purpose. A generic “LLM call count” dashboard is not actionable; a dashboard sliced by purpose × model × bc × content_class is.


A scan completes in spectral.platform and emits scan.convergence.delta — a domain event consumed by spectral.worlds per ADR-017. The WorldAgent then reasons over the event and potentially proposes rule candidates. The entire path must be traceable as one logical operation.

W3C Trace Context (traceparent / tracestate) — or an equivalent propagation envelope — is carried across every event between contexts. The consumer side of an event opens a new span as a child of the producer’s span. One logical operation can be walked start-to-finish even when it spans worlds and platform, multiple workers, and multiple LLM calls.

  • HTTP requests — standard OTEL HTTP propagation. Incoming traceparent is honoured; if absent, a new root span is started.
  • Domain events — events typed in spectral.core carry an envelope field that includes the producer’s trace_id and span_id (the propagation shape itself is a spectral.core contract; changes to it follow spectral.core governance ADR-065).
  • Agent tool calls — tool invocations inherit the agent’s span context; LLM calls made by tools inherit further still.
  • Worker-job dispatch — the AgentTask row carries the trace context; the worker picks it up and opens child spans.

If a scan produces a verdict that causes a world-model rule-candidate proposal, the operator looking at the proposal can walk back through:

RuleCandidate (worlds, distill phase)
← scan.convergence.delta event (carries trace context)
← verdict phase span (platform)
← scan phase span (platform)
← HTTP scan request (API)
← customer ingestion (API)

— with one trace_id threading the entire path. No guessing, no manual correlation.

  • Incident forensics degrades to database archaeology (looking up rows by timestamps and hoping they correlate).
  • Cost-attribution across worlds and platform becomes impossible (an LLM call in the evaluate phase cannot be attributed to the scan that originated it).
  • The “why did this candidate appear?” question has no mechanical answer — operators end up guessing from timing.

The rule is non-optional. An event published without a trace context is a bug.


The principles above are in force from commit one. The concrete tooling that realizes them — vendor inventory, export destinations, per-stream redaction, retention defaults, alert rules — lives at Observability Stack per ADR-036. This page is the doctrine; that page is the inventory. Principles do not change between the two; only the runtime destination of each signal does.


  • Tool choices. Specific vendors (Grafana Cloud / Pydantic Logfire / Sentry) and the alpha-posture matrix live in Observability stack per ADR-036.
  • Retention policies. Per-class retention rules live in Data retention per ADR-042.
  • Alerting discipline. Who gets paged, for what, and through which channel — captured in the operational runbooks (docs/runbooks/) alongside on-call rotation.
  • Customer-visible observability. Dashboards the customer sees are part of the product surface, not the operational observability plane. See Optimization Engine for customer-visible verdict and scoring surfaces.

  • Architecture — the three-context topology these principles apply across
  • Event System — the event shape that carries trace context between contexts
  • Access Control — role-and-scope model (who can see what)
  • Testing — per-layer strategy, including integration tests that cross between contexts
  • Observability stack — vendors, alpha posture, and content-class taxonomy per ADR-036
  • Data retention — per-class retention policies per ADR-042