Skip to content
GitHub
Agents

Memory System

Spectral’s optimization agents draw on a hierarchical knowledgebase that is autonomously defined and managed. Knowledge flows upward through compounding — each tier boundary filters, generalizes, and optionally sanitizes observations before promotion.

This system is what enables Spectral to get smarter over time: individual optimization attempts produce observations, the best of those observations become workspace memory, and the best workspace memory routes as world signal events to the World Model system, where it compounds across the entire customer base in a given domain.

The universal lifecycle vocabulary across every Spectral agent is interaction / session / persistent per ADR-058 D1 — tiers carry durability, not domain. Each agent parameterizes the universal lifecycle with its own anchor entity and its own session-end rules; the agent-domain event scopes that accumulate observations within a session are not tiers.

The Spectral Agent’s parameterization is:

  • Interaction tier → cycle scope — one optimization attempt within a run.
  • Session tier → run scope — all cycles grouped in a single optimization run.
  • Persistent tier → workspace scope — cross-run observations specific to one workspace.

The Tier 1 / Tier 2 / Tier 3 numbering below is the Spectral Agent’s parameterization of the universal lifecycle, not a separate tier model. Cycle and run are scan-event scopes — agent-domain event scopes within which observations accumulate — not memory tiers in their own right. See agent memory primitives for the cross-agent template; the World Agent and Operations Agent parameterize the same universal lifecycle with different anchors and session-end rules.

Interaction tier (cycle scope) — T1 in Spectral Agent shorthand

Section titled “Interaction tier (cycle scope) — T1 in Spectral Agent shorthand”

Scope: A single optimization attempt within a run. A cycle is one optimization attempt that may be accepted or rejected by the evaluator. The cycle is the scan-event scope; the memory tier is interaction.

Lifecycle: Created during the cycle. At cycle completion, a compounding step evaluates whether any observations warrant promotion to Tier 2. All unpromoted observations are dropped.

Example: “Lowering temperature to 0.2 eliminated fabrication in Agent A’s output for this specific case pattern.”

Session tier (run scope) — T2 in Spectral Agent shorthand

Section titled “Session tier (run scope) — T2 in Spectral Agent shorthand”

Scope: All optimization cycles grouped in a single optimization event, initiated automatically or by the user. The run is the scan-event scope; the memory tier is session.

Lifecycle: Observations promoted from Tier 1 accumulate here for the duration of the run. At run completion — whether a Change Set is proposed or a validated Change Set is returned — compounding evaluates promotion to Tier 3. All unpromoted observations are dropped.

Example: “Across 4 cycles, prompt rewrites targeting context assembly consistently outperformed hyperparameter-only changes for this agent team’s failure patterns.”

Persistent tier (workspace scope) — T3 in Spectral Agent shorthand

Section titled “Persistent tier (workspace scope) — T3 in Spectral Agent shorthand”

Scope: Cross-run observations specific to a single workspace. Persists across optimization runs.

Lifecycle: Observations promoted from Tier 2 persist here. Periodic compounding evaluates observations for promotion. Two classes of observation exist at Tier 3:

  1. Domain observations — behavioral patterns with relevance to the problem domain: agent failure modes in specific domain contexts, behavioral responses to domain-specific inputs, hallucination patterns tied to domain content. These are candidates for promotion as world signal events to the World Model system. Promotion requires passing the conformity gate (see World Model System / Evolution Loop).
  2. Workspace-specific observations — prompt structural patterns, hyperparameter sensitivities, team topology observations specific to how this workspace is configured. These have no domain generalization value. They stay at Tier 3, subject to decay. They never route to the World Model.

The critical distinction: if an observation is about how this workspace’s agents are configured and interact, it stays at Tier 3. If it is about how agents behave in a domain context, it is a domain observation and a candidate for world signal routing.

Sanitization gate: Domain observations routed as world signal events must be expressible without workspace-specific details. The workspace context is stripped; the domain behavioral pattern is what routes. If sanitization would destroy the value of the observation, it stays at Tier 3.

Example staying at T3: “Agent B in this workspace performs better when Agent A’s output is truncated to 200 tokens before injection — Agent B’s context window fills with upstream verbosity and its instruction-following degrades.” This is workspace topology knowledge with no domain generalization value.

Example routing as world signal event: “In healthcare agent teams handling medication reconciliation, hallucination rate increases significantly when upstream context omits the patient’s current medication list.” This is a domain behavioral observation.

The compounding flywheel works through the following sequence:

  1. Optimization runs produce observations that promote through T1 → T2 → T3.
  2. Domain observations at T3 that clear the conformity gate route as world signal events to the World Model system.
  3. World signal events feed the evolution loop — enshrined observations strengthen and evolve world model rules.
  4. Stronger world model rules produce better EvaluationFrameworks for every workspace in that domain.
  5. Better frameworks produce better optimization, which produces better observations.

The flywheel compounds across the entire customer base in a domain, not just within a single account. Every workspace’s domain observations contribute — after sanitization and conformity checking — to a world model that benefits every customer operating in that domain.

The flywheel concentrates signal into the world model and amplifies signal out of it: one enshrined rule benefits every customer in the domain, so operator decisions scale sub-linearly with customer count.

The bottleneck is real nonetheless. The Operations Agent is the operator’s leverage point against scale — cross-customer deduplication of failure patterns, classification by world-model coverage zone, prioritization by impact and confidence — so the operator surface is “decide on the frontier of new patterns” rather than “process every customer’s queue serially.” Mechanical pre-filters (the conformity gate) and batch curation surfaces compound the leverage. The operator team itself scales alongside customer count; alpha is single-operator dogfooding, the trajectory is a small dedicated curation team.

Empirical thresholds sharpen as Spectral sees customer-fan-out beyond design partners.

Each observation uses a structured envelope around a natural language body:

Structured envelope (enables retrieval and deduplication):

  • Agent type tags
  • Problem category
  • Optimization strategy applied
  • Tier metadata (origin tier, promotion history)

Natural language body (carries the actual insight):

  • The observation itself
  • Context that informed it
  • Conditions under which it applies

The structure enables efficient, relevance-based retrieval. The natural language carries insights that are difficult to formalize into rigid schemas.

Compounding is the evaluation process that runs at each tier boundary. It performs two functions:

  1. Filtering — is this observation valuable enough to keep?
  2. Generalizing — can it be stated in a way that’s useful at the broader scope?

At the Tier 3 boundary, compounding performs a two-stream evaluation:

  1. Domain observation? → evaluate for world signal event routing through the conformity gate.
  2. Workspace-specific? → evaluate for decay-exempt status or continued retention at Tier 3.

Unpromoted observations at Tier 1 and Tier 2 are dropped after compounding. This prevents memory accumulation from unbounded growth while ensuring that valuable learnings are preserved at the appropriate scope.

Deduplication runs against existing memories at the target tier to prevent redundant observations from accumulating.

Two compounding evaluations may target the same persistent-tier slot when concurrent flows reach the same tier boundary in overlapping windows. The invariant is that the resulting memory state is deterministic under concurrent races — never a torn write, never a duplicated promotion, never a lost decay-exemption flag. Memory write paths run under serializable (SERIALIZABLE) transaction isolation, backstopped by a partial unique index on the target slot, so the second writer either observes the first writer’s commit (and short-circuits the duplicate) or fails the unique constraint and retries against the post-commit state. Tests reproduce the race directly in tests/platform/integration/memory/ rather than relying on time-of-day flakiness.

Observations at Tier 3 are subject to time decay. Decay is not simply a relevance weighting mechanism — it is a forcing function for disposition. Every observation at Tier 3 is on a clock that demands: confirm this or let it go.

  • Tier 1 and Tier 2 are already ephemeral — observations are dropped at the end of their cycle or run. Decay is not applicable.
  • Tier 3 observations decay over time. The decay creates pressure toward one of three disposition outcomes:
    1. Route as world signal event — the observation is domain-relevant and clears the conformity gate, exiting the memory system for the World Model.
    2. Mark decay-exempt — the observation is workspace-specific and durably valuable at Tier 3, exempt from further decay.
    3. Archive — the observation meets neither bar and is archived at the end of its decay window.

The point of decay is not to gradually fade observations into irrelevance. It is to force continued evaluation toward a disposition decision — either the observation earns a durable outcome, or it is eventually archived.

Decay operates through two separate mechanisms:

  • Continuous evaluation on retrieval: When the retrieval system queries memories, time decay is a factor in relevance scoring. Older, undisposed observations carry progressively less weight in retrieval results. This ensures that the optimization agent naturally favors fresher or confirmed knowledge.
  • Periodic cleanup: A separate periodic process identifies observations that have fully decayed — reached the end of their decay window without disposition. These observations are archived, not deleted — they remain available for auditability but are no longer returned by the retrieval system.

These are different concerns: retrieval weighting is real-time and affects optimization quality; cleanup is a maintenance operation that manages storage and keeps the active memory set clean.

Fully decayed observations are archived rather than deleted. Archived observations:

  • Are no longer returned by the retrieval system
  • Remain queryable for audit, analysis, and historical review
  • May inform future analysis of decay policy effectiveness

Individual observations can be marked as decay-exempt by either the compounding engine (autonomously, when confidence is high) or by a human (workspace owner or admin). Decay-exempt status means the observation has indefinite lifespan at Tier 3 without needing further confirmation.

Exemption should be set when a workspace-specific observation has been validated as durably true — for example, a pattern tied to this workspace’s agent architecture that will remain relevant as long as that architecture exists, but that has no domain generalization value and therefore will never route as a world signal event.

At the start of each optimization cycle, the optimization agent retrieves relevant memories from all applicable tiers:

  • Tier 1: Current cycle (initially empty)
  • Tier 2: Current run’s accumulated observations
  • Tier 3: This workspace’s persistent memory

Retrieval is relevance-based, using the structured envelope to match against the current optimization context (agent type, problem category, optimization strategy being considered). More specific tiers are weighted appropriately — a recent cycle observation about this exact agent type is more relevant than a broad cross-run pattern.

Retrieval uses three parallel signals merged via Reciprocal Rank Fusion (RRF):

  • Structured envelope matching — filter by agent type, problem category, optimization strategy. Fast, precise, catches exact pattern matches.
  • Semantic search via pgvector — embed query against memory embeddings. Catches relevant memories with different envelope tags but similar patterns.
  • Time decay scoring — exponential decay with confirmation/frequency boost.

RRF normalizes all signals to a common scale (k=60) without requiring tuned weights. More specific tiers weighted higher. Supabase pgvector extension enabled natively.

Either signal alone misses relevant results — structured matching misses analogous patterns, semantic search misses exact matches. RRF combines both naturally. At memory system scale (hundreds to low thousands per workspace), brute-force pgvector scan is performant.

The World Agent maintains a separate memory system, architecturally independent of the optimization agent memory above. The full architecture is documented at World Agent — a single anchor (world_id), the universal three-tier lifecycle (interaction / session / persistent), semantic + procedural typology distribution at the persistent tier, contextual world_version provenance rather than version pinning, and the workshop framing (memory holds reasoning + references, never canonical rule content).

Workshop discipline at the tool → memory boundary

Section titled “Workshop discipline at the tool → memory boundary”

A cross-agent invariant captured in agent memory primitives: agent memory is a workshop, not a canonical-content cache. Tool-output paths that contain canonical content (rule body, scan trace, customer PII) do not round-trip into memory rows verbatim. Sanitization happens at the memory-write boundary; the repository gateway enforces typology-driven classification; the trigram trigger backstops doctrine drift. Tool calls that fetch canonical content for reasoning are fresh-read each time, not cached in memory.

Rule interpretation lives with the rule corpus and its tools. The World Agent’s memory has no promotion path to rule storage; rule candidates are proposed through the evolution loop.

Several spectral.platform entities — InterventionLog, RegressionRecord, FailureCluster, FeedbackSignal, RubricDivergenceRecord, and others — capture canonical optimization activity. They are not agent memory: per ADR-058 D14, agent memory carries reasoning + references and explicitly does not mirror canonical scan artifacts. Records are produced by scan operations and other system functions, not by agents. Records hold what the system did; memory holds what the agent reasoned about it.

The bridge between records and memory is provenance, not duplication. When an agent forms value-added conclusions about a record, the resulting observation may be written to memory tiers via the spectral_agent_memory gateway; the gateway stores a cross-reference back to the source record via (source_type, source_ref) in spectral_agent_memory_corroborations. The gateway rejects records-verbatim memory writes at the write boundary: memory rows whose body fields equal the source record’s fields with no value-added observation are not accepted. This is a specific instance of the workshop discipline at the tool → memory boundary (per agent-memory-primitives primitive 13: reference-only invariants).

For the records side:

  • Agent Architecture — the three-agent topology (Spectral Agent, World Agent, Operations Agent) and how each agent uses the memory tiers this page describes.
  • Optimization Engine — how T1 / T2 / T3 observations feed scan analysis and how RegressionRecord flows back from records into tournament avoidance signals.
  • Explainability — how persistent-tier reasoning surfaces to the customer via AgentPerformanceCard rationale.
  • Agent Memory Primitives — the canonical schema and gateway behavior backing the tiered memory described here.