Observability Principles
This page establishes the observability principles Spectral upholds regardless of which specific stack we choose. These are load-bearing — they hold the line on data-classification, on the minimum shape of telemetry every LLM call and every scan phase emit, and on traceability across contexts.
The first-class boundary — customer data vs operational data
Section titled “The first-class boundary — customer data vs operational data”Spectral handles two categorically different classes of data. Commingling them is an architectural violation, not a policy preference.
Customer data
Section titled “Customer data”What it is. Workspace-scoped traces customers ingest, samples derived from those traces, scan artifacts (EvalResults, FailureClusters, VerdictResults, ChangeSets, AgentPerformanceCards), and any conversation content between the customer and the Spectral Agent.
Where it lives. Customer-owned schemas under Supabase RLS. Every row carries workspace_id
(and account_id where applicable) from the first migration
(per Architecture — Multi-tenancy isolation). A customer
querying with their JWT never sees another workspace’s data because Postgres itself enforces the
boundary.
Who can see it. The owning workspace’s authenticated users, plus Spectral operators holding
the operations account role (Spectral staff — used for customer support, incident response,
and the operational surfaces in the Operations app).
Retention. Tied to the customer’s agreement. Deletion propagates on workspace retirement.
Operational data
Section titled “Operational data”What it is. Spectral’s own telemetry: structured logs, internal OTEL spans (API request traces, worker-job traces, scan-phase spans), agent reasoning traces (Spectral Agent, Operations Agent, World Agent internal LangGraph state), LLM call metadata.
Where it lives. Platform-owned stores at Grafana Cloud LGTM (logs/metrics/traces) + Pydantic
Logfire (LLM-call telemetry) + Sentry (error capture) per
ADR-036. What is fixed is that it is not in
customer-scoped tables. Operational storage carries no workspace_id as a primary organizing
key — it may carry the field as a label for correlation, but the storage does not enforce
workspace isolation.
Who can see it. Spectral engineering and operations. Customers do not see operational data.
Retention. Platform-owned policy per ADR-042. Shorter than customer-data retention in most cases — operational telemetry does not need to outlive incident-investigation windows.
Mixing the two is a category violation
Section titled “Mixing the two is a category violation”The customer ingestion endpoint (/api/v1/traces — API-key authenticated, write:traces scope)
receives customer data only. Operational telemetry has its own backend, distinct from
customer-data storage — vendor-swappable via OTEL_EXPORTER_OTLP_ENDPOINT per
ADR-036 (see Observability stack
for the vendor inventory). Specifically:
- Application log statements never include customer PII (taxpayer-identifiable content, full trace payloads with user prompts, private conversation text). Workspace IDs and scan IDs are fine; their contents are not.
- Operational spans that cross customer-data surfaces carry references (scan IDs, sample IDs, trace IDs) rather than copying the customer data into the operational record.
- If debugging requires joining operational telemetry with customer data, the join happens at query time against the customer-data store — never by copying customer data into the operational store.
This boundary is enforced by convention + code review for today. A structured-linter rule
(forbidden payload shapes in log.info(...) calls) is a a future hardening item.
Structured logging — canonical fields
Section titled “Structured logging — canonical fields”All application logs flow through structlog configured to emit JSON. Every log record carries the canonical fields below when the context applies; fields are omitted rather than set to null.
| Field | Type | When present | Purpose |
|---|---|---|---|
workspace_id | UUID | Any log in a request / job that has resolved a workspace | Multi-tenant correlation. Indexed in the operational store. |
account_id | UUID | Same as workspace_id | Account-level rollup correlation. |
scan_id | UUID | Any log emitted inside the scan pipeline | Scan-scoped correlation. |
world_model_version | string | Any log that cites a specific world-model version (EvalSet request, verdict) | Authority correlation — lets you slice telemetry by world-model version. |
bc | enum(worlds, platform, core, app:api, app:workers, app:operations) | Every log | Which package emitted the log. Critical for context-aware filtering. |
phase | enum(observe, calibrate, diagnose, evaluate, optimize, safety, verdict, distill, evolve, publish, agent, …) | Logs inside a named pipeline phase | Phase-level slice of scan and world-model activity. |
trace_id | string | Every log emitted within an OTEL span context | Correlates logs to spans (see Trace context propagation). |
Free-form message fields are allowed, but they are secondary. Dashboards, alerts, and post-incident queries key on the canonical fields above. A log that only carries a free-form message is debuggable by a human but not by the observability pipeline.
Emitter discipline. Each context constructs a pre-bound logger at composition time
(bc=worlds or bc=platform), so downstream call sites inherit the field automatically.
Adding a new context-aware field is a composition-root change, not a per-call-site change.
LLM call telemetry
Section titled “LLM call telemetry”Every LLM call emits a structured telemetry record. This is the minimum shape — the specific destination is Pydantic Logfire per ADR-036, but the shape itself is invariant across substrate choices.
| Field | Type | Notes |
|---|---|---|
model | string | Provider + model identifier (anthropic/claude-opus-4-7, openai/gpt-5.2, google/gemini-2.5-flash). Exact strings; never aliased. |
account_id | UUID | null | Customer-tenancy field per ADR-033. Nullable for OPERATIONS-only calls. |
workspace_id | UUID | null | Present when the call is attributable to a specific customer workspace. Null for OPERATIONS-only calls (e.g., Ops Agent / World Agent internal reasoning). |
input_tokens | int | Prompt-token count. |
output_tokens | int | Completion-token count. |
latency_ms | int | Wall-clock latency of the call. |
cost_usd | decimal | Dollar cost of the call, computed from provider pricing at call time (per ADR-035 D6 via genai-prices with fallback registry). |
purpose | enum | The 8-value PurposeKey taxonomy below. Required. |
content_class | enum(PLATFORM, OPERATIONS, SYNTHETIC) | The content classification driving Stream A redaction. Required. |
bc | enum(worlds, platform, core, app:api, app:workers, app:operations) | Which context initiated the call. |
scan_id | UUID | null | Present when the call is inside a scan run. |
trace_id | string | OTel trace context for correlation with the enclosing operation. |
The PurposeKey taxonomy
Section titled “The PurposeKey taxonomy”A closed 8-value enum in spectral.core.llm.purposes.PurposeKey (per
ADR-035 D3). If a new purpose is needed, it gets added here, not
invented at the call site.
| Value | Meaning |
|---|---|
scoring | Evaluation scoring (high volume, cost-sensitive). |
detection | Anti-deception, parse validation. |
reasoning | Diagnosis, optimization, calibration rewrites, rule distillation (quality-critical). |
agent_turn | Spectral Agent / Ops Agent conversational turn. |
agent_tool | Agent tool-invocation call (may differ from turn). |
world_agent | World Agent exploration / hypothesis. |
customer_replay | Re-executing customer agents during the observe phase. |
embedding | Embedding generation (full policy in embeddings). |
The ContentClass taxonomy
Section titled “The ContentClass taxonomy”spectral.core.llm.content_class.ContentClass is a closed 3-value enum
(per ADR-036 D6):
| Value | Meaning |
|---|---|
PLATFORM | Customer content processed or generated in platform (conformance-track customer traces, Spectral Agent conversations, customer replay). |
OPERATIONS | Spectral-operated reasoning (World Agent, Ops Agent, internal distillation). |
SYNTHETIC | Test-agent-generated synthetic content (no customer PII). |
Resolver mapping at the composition root (never per-call developer discretion):
world_agent→ alwaysOPERATIONSscoring/detection/reasoning→SYNTHETICon the synthetic track;PLATFORMon the conformance trackagent_turn(Spectral Agent) → alwaysPLATFORMcustomer_replay→ alwaysPLATFORMagent_tool→ inherits from parent span’scontent_classembedding→ caller-determined- Ops Agent / platform-internal reasoning →
OPERATIONS
For PLATFORM-class calls, prompt / completion / tool-arg fields are stripped before export to third-party observability (Logfire, Sentry). See observability stack for the three-stream architecture and the two-layer enforcement.
Why this matters. Cost attribution, rate-limit investigation, model-choice decisions, and
provider-drift detection all key on purpose. A generic “LLM call count” dashboard is not
actionable; a dashboard sliced by purpose × model × bc × content_class is.
Trace context propagation across contexts
Section titled “Trace context propagation across contexts”A scan completes in spectral.platform and emits scan.convergence.delta — a domain event
consumed by spectral.worlds per
ADR-017. The WorldAgent then
reasons over the event and potentially proposes rule candidates. The entire path must be
traceable as one logical operation.
The rule
Section titled “The rule”W3C Trace Context (traceparent / tracestate) — or an equivalent propagation envelope — is
carried across every event between contexts. The consumer side of an event opens a new span as a
child of the producer’s span. One logical operation can be walked start-to-finish even when it
spans worlds and platform, multiple workers, and multiple LLM calls.
What carries the context
Section titled “What carries the context”- HTTP requests — standard OTEL HTTP propagation. Incoming
traceparentis honoured; if absent, a new root span is started. - Domain events — events typed in
spectral.corecarry an envelope field that includes the producer’strace_idandspan_id(the propagation shape itself is aspectral.corecontract; changes to it followspectral.coregovernance ADR-065). - Agent tool calls — tool invocations inherit the agent’s span context; LLM calls made by tools inherit further still.
- Worker-job dispatch — the
AgentTaskrow carries the trace context; the worker picks it up and opens child spans.
The guarantee
Section titled “The guarantee”If a scan produces a verdict that causes a world-model rule-candidate proposal, the operator looking at the proposal can walk back through:
RuleCandidate (worlds, distill phase) ← scan.convergence.delta event (carries trace context) ← verdict phase span (platform) ← scan phase span (platform) ← HTTP scan request (API) ← customer ingestion (API)— with one trace_id threading the entire path. No guessing, no manual correlation.
What breaks if we lose propagation
Section titled “What breaks if we lose propagation”- Incident forensics degrades to database archaeology (looking up rows by timestamps and hoping they correlate).
- Cost-attribution across
worldsandplatformbecomes impossible (an LLM call in the evaluate phase cannot be attributed to the scan that originated it). - The “why did this candidate appear?” question has no mechanical answer — operators end up guessing from timing.
The rule is non-optional. An event published without a trace context is a bug.
Tooling realization
Section titled “Tooling realization”The principles above are in force from commit one. The concrete tooling that realizes them — vendor inventory, export destinations, per-stream redaction, retention defaults, alert rules — lives at Observability Stack per ADR-036. This page is the doctrine; that page is the inventory. Principles do not change between the two; only the runtime destination of each signal does.
What this page does NOT cover
Section titled “What this page does NOT cover”- Tool choices. Specific vendors (Grafana Cloud / Pydantic Logfire / Sentry) and the alpha-posture matrix live in Observability stack per ADR-036.
- Retention policies. Per-class retention rules live in Data retention per ADR-042.
- Alerting discipline. Who gets paged, for what, and through which channel — captured in the
operational runbooks (
docs/runbooks/) alongside on-call rotation. - Customer-visible observability. Dashboards the customer sees are part of the product surface, not the operational observability plane. See Optimization Engine for customer-visible verdict and scoring surfaces.
Related reading
Section titled “Related reading”- Architecture — the three-context topology these principles apply across
- Event System — the event shape that carries trace context between contexts
- Access Control — role-and-scope model (who can see what)
- Testing — per-layer strategy, including integration tests that cross between contexts
- Observability stack — vendors, alpha posture, and content-class taxonomy per ADR-036
- Data retention — per-class retention policies per ADR-042