Skip to content
GitHub
Foundations

Observability Principles

This page establishes the observability principles Spectral upholds regardless of which specific stack we choose. These are load-bearing — they hold the line on data-classification, on the minimum shape of telemetry every LLM call and every decision-execution phase emit, and on traceability across contexts.


The first-class boundary — customer data vs operational data

Section titled “The first-class boundary — customer data vs operational data”

Spectral handles two categorically different classes of data. Commingling them is an architectural violation, not a policy preference.

What it is. Domain-scoped decision records and audit-chain entries (per /decide invocation), override-pattern signals from customer-flagged decisions, System Card snapshots, and conversation content between the customer and the post-release World Agent customer-mode chat affordance per ADR-081 D5.

Where it lives. Customer-owned schemas under Supabase RLS. Every row carries org_id and domain_id per ADR-086 D6 (see Architecture — Multi-tenancy isolation). A customer querying with their JWT never sees another org/domain’s data because Postgres itself enforces the boundary.

Who can see it. The owning domain’s authenticated users, plus Spectral operators holding the operations org role (Spectral staff — used for customer support, incident response, and the operational surfaces in the Operations app).

Retention. Tied to the customer’s agreement. Deletion propagates on domain retirement.

What it is. Spectral’s own telemetry: structured logs, internal OTEL spans (API request traces, worker-job traces, decision-execution spans), World Agent reasoning traces (internal LangGraph state), LLM call metadata.

Where it lives. Platform-owned stores at Grafana Cloud LGTM (logs/metrics/traces) + Pydantic Logfire (LLM-call telemetry) + Sentry (error capture) per ADR-036. What is fixed is that it is not in customer-scoped tables. Operational storage carries no domain_id as a primary organizing key — it may carry the field as a label for correlation, but the storage does not enforce domain isolation.

Who can see it. Spectral engineering and operations. Customers do not see operational data.

Retention. Platform-owned policy per ADR-042. Shorter than customer-data retention in most cases — operational telemetry does not need to outlive incident-investigation windows.

The decision API surface (POST /decide — API-key authenticated, decide:domain scope per ADR-086 D4 + the Customer Dashboard’s read routes) receives and produces customer data only. Operational telemetry has its own backend, distinct from customer-data storage — vendor-swappable via OTEL_EXPORTER_OTLP_ENDPOINT per ADR-036 (see Observability stack for the vendor inventory). Specifically:

  • Application log statements never include customer PII (decision-context content, audit-chain payloads, private conversation text). Org/domain IDs and decision IDs are fine; their contents are not.
  • Operational spans that cross customer-data surfaces carry references (decision IDs, audit-chain entry IDs, override-pattern signal IDs) rather than copying the customer data into the operational record.
  • If debugging requires joining operational telemetry with customer data, the join happens at query time against the customer-data store — never by copying customer data into the operational store.

This boundary is enforced by convention + code review for today. A structured-linter rule (forbidden payload shapes in log.info(...) calls) is a a future hardening item.


All application logs flow through structlog configured to emit JSON. Every log record carries the canonical fields below when the context applies; fields are omitted rather than set to null.

FieldTypeWhen presentPurpose
domain_idUUIDAny log in a request / job that has resolved a domainMulti-tenant correlation. Indexed in the operational store.
org_idUUIDSame as domain_idOrg-level rollup correlation.
decision_idUUIDAny log emitted inside /decide executionDecision-scoped correlation.
world_model_versionstringAny log that cites a specific world-model version (decision response, audit-chain entry)Authority correlation — lets you slice telemetry by world-model version.
bcenum(worlds, platform, core, app:api, app:workers, app:operations)Every logWhich package emitted the log. Critical for context-aware filtering.
phaseenum(auth, module_load, context_establish, predicate_eval, aggregate, distill, evolve, publish, agent, …)Logs inside a named pipeline phasePhase-level slice of decision-execution and world-model activity.
trace_idstringEvery log emitted within an OTEL span contextCorrelates logs to spans (see Trace context propagation).

Free-form message fields are allowed, but they are secondary. Dashboards, alerts, and post-incident queries key on the canonical fields above. A log that only carries a free-form message is debuggable by a human but not by the observability pipeline.

Emitter discipline. Each context constructs a pre-bound logger at composition time (bc=worlds or bc=platform), so downstream call sites inherit the field automatically. Adding a new context-aware field is a composition-root change, not a per-call-site change.


Every LLM call emits a structured telemetry record. This is the minimum shape — the specific destination is Pydantic Logfire per ADR-036, but the shape itself is invariant across substrate choices.

FieldTypeNotes
modelstringProvider + model identifier (anthropic/claude-opus-4-7, openai/gpt-5.2, google/gemini-2.5-flash). Exact strings; never aliased.
org_idUUID | nullCustomer-tenancy field per ADR-033 + ADR-086 D1. Nullable for OPERATIONS-only calls.
domain_idUUID | nullPresent when the call is attributable to a specific customer domain. Null for OPERATIONS-only calls (e.g., World Agent internal reasoning).
input_tokensintPrompt-token count.
output_tokensintCompletion-token count.
latency_msintWall-clock latency of the call.
cost_usddecimalDollar cost of the call, computed from provider pricing at call time (per ADR-035 D6 via genai-prices with fallback registry).
purposeenumThe 8-value PurposeKey taxonomy below. Required.
content_classenum(PLATFORM, OPERATIONS, SYNTHETIC)The content classification driving Stream A redaction. Required.
bcenum(worlds, platform, core, app:api, app:workers, app:operations)Which context initiated the call.
decision_idUUID | nullPresent when the call is inside a /decide execution.
trace_idstringOTel trace context for correlation with the enclosing operation.

A closed enum in spectral.core.llm.purposes.PurposeKey (per ADR-035 D3). If a new purpose is needed, it gets added here, not invented at the call site.

ValueMeaning
code_generationWorld Agent generates predicate code from natural-language rules (highest-capability tier).
applies_when_generationWorld Agent generates the optional context-only filter alongside a predicate.
distillationOperator-driven distillation runs against source materials.
reasoningDiagnosis, coverage reflection, restatement drafting.
agent_turnWorld Agent chat surface conversational turn.
agent_toolAgent tool-invocation call (may differ from turn).
world_agentWorld Agent exploration / hypothesis.
embeddingEmbedding generation (full policy in embeddings).

spectral.core.llm.content_class.ContentClass is a closed 3-value enum (per ADR-036 D6):

ValueMeaning
PLATFORMCustomer content processed or generated in platform (decision-context inputs, audit-chain entries, post-release World Agent customer-mode chat content).
OPERATIONSSpectral-operated reasoning (World Agent, internal distillation).
SYNTHETICTest-agent-generated synthetic content (no customer PII).

Resolver mapping at the composition root (never per-call developer discretion):

  • world_agent (operator mode) → always OPERATIONS
  • world_agent (customer mode, post-release per ADR-081 D5) → always PLATFORM
  • code_generation / applies_when_generation / distillation / reasoning → always OPERATIONS (authoring-time)
  • agent_turn (World Agent operator mode) → always OPERATIONS
  • agent_turn (World Agent customer-mode chat) → always PLATFORM
  • agent_tool → inherits from parent span’s content_class
  • embedding → caller-determined

For PLATFORM-class calls, prompt / completion / tool-arg fields are stripped before export to third-party observability (Logfire, Sentry). See observability stack for the three-stream architecture and the two-layer enforcement.

Why this matters. Cost attribution, rate-limit investigation, model-choice decisions, and provider-drift detection all key on purpose. A generic “LLM call count” dashboard is not actionable; a dashboard sliced by purpose × model × bc × content_class is.


An override-pattern signal aggregation in spectral.platform emits an override_pattern_signal.aggregated event — a domain event consumed by spectral.worlds. The WorldAgent then reasons over the event and potentially proposes rule candidates. The entire path must be traceable as one logical operation.

W3C Trace Context (traceparent / tracestate) — or an equivalent propagation envelope — is carried across every event between contexts. The consumer side of an event opens a new span as a child of the producer’s span. One logical operation can be walked start-to-finish even when it spans worlds and platform, multiple workers, and multiple LLM calls.

  • HTTP requests — standard OTEL HTTP propagation. Incoming traceparent is honoured; if absent, a new root span is started.
  • Domain events — events typed in spectral.core carry an envelope field that includes the producer’s trace_id and span_id (the propagation shape itself is a spectral.core contract; changes to it follow spectral.core governance ADR-065).
  • Agent tool calls — tool invocations inherit the agent’s span context; LLM calls made by tools inherit further still.
  • Worker-job dispatch — the AgentTask row carries the trace context; the worker picks it up and opens child spans.

If an override-pattern signal aggregation causes a world-model rule-candidate proposal, the operator looking at the proposal can walk back through:

RuleCandidate (worlds, distill phase)
← override_pattern_signal.aggregated event (carries trace context)
← override-pattern signal record (platform, decision-flagging path)
← /decide invocation span (platform)
← HTTP /decide request (API)

— with one trace_id threading the entire path. No guessing, no manual correlation.

  • Incident forensics degrades to database archaeology (looking up rows by timestamps and hoping they correlate).
  • Cost-attribution across worlds and platform becomes impossible (an LLM call during authoring-time work cannot be attributed to the decision activity that motivated it).
  • The “why did this candidate appear?” question has no mechanical answer — operators end up guessing from timing.

The rule is non-optional. An event published without a trace context is a bug.


The principles above are in force from commit one. The concrete tooling that realizes them — vendor inventory, export destinations, per-stream redaction, retention defaults, alert rules — lives at Observability Stack per ADR-036. This page is the doctrine; that page is the inventory. Principles do not change between the two; only the runtime destination of each signal does.


  • Tool choices. Specific vendors (Grafana Cloud / Pydantic Logfire / Sentry) and the posture matrix live in Observability stack per ADR-036.
  • Retention policies. Per-class retention rules live in Data retention per ADR-042.
  • Alerting discipline. Who gets paged, for what, and through which channel — captured in the operational runbooks (docs/runbooks/) alongside on-call rotation.
  • Customer-visible observability. Dashboards the customer sees are part of the product surface, not the operational observability plane. See Customer Dashboard and System Card for the customer-facing decision and operational-record surfaces.

  • Architecture — the three-context topology these principles apply across
  • Event System — the event shape that carries trace context between contexts
  • Access Control — role-and-scope model (who can see what)
  • Testing — per-layer strategy, including integration tests that cross between contexts
  • Observability stack — vendors, posture, and content-class taxonomy per ADR-036
  • Data retention — per-class retention policies per ADR-042