Evolution Loop
The Evolution Loop is the governed workflow by which live agent failures become structural improvements to the World Model. It is not an automated process — human sign-off is required at the enshrinement gate. The loop turns the aggregate signal from millions of agent evaluation cycles into a continuously improving domain standard.
Signal Path
Section titled “Signal Path”World signal events — the platform’s failure clusters and promoted persistent-tier memory
observations routed back to worlds as input to rule evolution; see
Memory System — Compounding Flywheel for
the upstream mechanism — enter the loop from spectral.platform. When a persistent-tier
workspace memory observation crosses the domain observation boundary (a domain-relevant pattern
that clears the sanitization gate), it is published
as a platform.t3_memory.written event (typed payload at
spectral.platform.contracts.events.t3_memory_written) and consumed by spectral.worlds.
Event semantics are at-least-once delivery with idempotent handlers. Ordering is best-effort. The loop is statistically robust — a missed signal is unfortunate, not catastrophic. The conformity gate and human sign-off are the integrity controls, not event ordering. This lets the communication between worlds and platform stay simple while preserving the guarantees that matter.
From Signal to Candidate
Section titled “From Signal to Candidate”Incoming world signal events accumulate as observations within spectral.worlds. When an observation accumulates sufficient supporting evidence across multiple workspaces and evaluation contexts, it is promoted to Rule Candidate status and enters the review queue.
“Sufficient evidence” is not a fixed threshold. It is a governed judgment informed by the observation’s provenance tier, the number of independent supporting signals, and the World Agent’s assessment of domain relevance. A pattern corroborated by signals from many independent workspaces carries more weight than one corroborated by signals from a single customer’s agent runs.
The World Agent’s Role
Section titled “The World Agent’s Role”The World Agent monitors incoming signals, identifies emerging patterns, and proposes rule candidates for human review. It maintains discovery continuity across versions through its version-spanning memory — long-running domain threads that survive world model version boundaries.
The World Agent proposes; it does not decide. Every promotion to Enshrined requires human sign-off. The World Agent’s function is to surface and structure — not to commit changes to the standard.
Conformity Gate
Section titled “Conformity Gate”Before a rule candidate can be promoted to Provisional or Enshrined, it must pass the conformity gate: demonstrate that the proposed rule does not contradict the existing rule set, including other pending candidates. See Eval Generation for the full conformity gate specification.
Human Sign-off
Section titled “Human Sign-off”The enshrinement gate requires explicit human approval. The reviewer is not performing a consistency check — the conformity gate has already done that. The reviewer is exercising domain judgment: does this rule belong in this world model? Does it accurately represent what the domain requires?
This is a deliberate design choice. Automation handles consistency; humans handle meaning. Separating the two gates makes each one legible and auditable — the conformity gate produces a mechanical pass/fail record, and the enshrinement gate produces a documented human decision.
Reviewer surface
Section titled “Reviewer surface”RuleCandidate records that pass the conformity gate enter the pending_approval state. The operations app surface (apps/operations UI; apps/api/operator/* endpoints) surfaces them in the enshrinement review queue with full evidence: rule statement and proposed provenance tier, supporting signals that drove the proposal, cross-workspace aggregation summary, confidence score, conformity gate result, conflict diff against the existing rule set, and the World Agent’s reasoning summary. The reviewer cannot edit rule content from this surface — only one of three actions per candidate:
- Approve — the candidate moves to
enshrined. The World Agent emitsworldmodel.observation.enshrinedafter the call returns. - Reject — required rationale is captured and appended to the WorldModelCard evolution history for the next world-model version.
- Request revision — returned to the World Agent with reviewer notes; the candidate cycles back through proposal.
Transport shape
Section titled “Transport shape”Call flow, not events. The three reviewer actions invoke the World Agent’s enshrinement use
case handlers directly from apps/api/operator/*. The call site is the framework-layer seam,
not a typed Protocol indirection: single-Python-consumer call flow consumed by apps/* per
validator rule 7 — Tier 1 of the inter-context mechanism ladder (per
ADR-070).
Use case handlers. The three actions ship as
worlds.application.enshrinement.{ApproveCandidate, RejectCandidate, RequestRevision}, invoked
from apps/api/operator/* per
ADR-047 D1.
Emit-after-return. The World Agent emits worldmodel.observation.enshrined after the call
returns — call-flow-driven notification, not event-driven promotion. The operator decision and
the consequent state change happen synchronously inside the call; the event is the downstream
broadcast for consumers (cards, audit) to react to. See
Contract Surfaces.
Approval persists an ApprovalDecision row keyed by (candidate_id, operator_id, decided_at) with rationale captured verbatim. This is a record of operator activity, not agent memory — the bridge between operator decisions and any agent memory observation about a decision is provenance via (source_type, source_ref) per memory system.
Accumulate, then publish
Section titled “Accumulate, then publish”Approval enshrines a single rule. It does not trigger a new world-model version. Rules accumulate in the enshrined set until an operator separately greenlights publication — see Versioning and Release.
Rule-Level Health Signal
Section titled “Rule-Level Health Signal”Every rule in a published world model carries a rolling discriminative health signal that addresses: is this rule still producing meaningful evaluation signal? Per ADR-028 (statistical uniqueness per scan), the signal is rule-scoped rather than EvalSet-scoped.
Definition
Section titled “Definition”For each rule r, the health signal is computed over a rolling window of completed scans that
referenced r (via EvalResult.generating_rule_ref attribution):
| Metric | Meaning |
|---|---|
score_variance[r] | Variance of per-sample scores across all samples referencing r in the window |
score_mean[r] | Mean of those scores |
sample_count[r] | Number of samples referencing r in the window |
last_updated[r] | Most recent scan-completed event that contributed |
The rolling window is rule-age-aware: for rules with fewer than N samples in history, the window is “all samples ever”; for established rules, the window is the last K samples (K chosen per world model, managed config).
Interpretation
Section titled “Interpretation”The pair (score_variance[r], score_mean[r]) classifies the rule’s current health:
| Variance | Mean | Classification | Implication |
|---|---|---|---|
| High | Any | Discriminating | Rule is working as intended — it separates good configs from bad. |
| Low | High (≥ 0.9) | Plausibly solved / too easy | Candidate for retirement or for refinement into a harder edge case. |
| Low | Low (≤ 0.3) | Plausibly too hard or underspecified | Candidate for revision — the rule may be failing not because agents are wrong but because the rule is unclear. |
| Low | Middle (0.3–0.9) | Inconclusive | Needs more signal before a decision; flagged for the World Agent to watch. |
Thresholds (0.3 / 0.9 for mean; the High/Low variance cutoff is also configurable per world model) are managed-configuration, not hard-coded.
Who computes it
Section titled “Who computes it”A RuleHealthService in spectral.worlds consumes scan-result attributions through the
notification flow and updates the rolling window per rule referenced by each scan’s EvalResult
attribution. The computation is idempotent per scan (same scan can be reprocessed without drift).
The service exposes the current health vector to the World Agent via an intra-worlds read; the World
Agent does not compute the signal itself. (This service tracks worlds-corpus rule health only —
per-rule confidence / density rolling windows. The platform-side FailureClusterService is a
distinct service that aggregates customer scan-failure observations into operator-triage signals;
see ADR-057 for that signal protocol.)
What triggers a rule-lifecycle action
Section titled “What triggers a rule-lifecycle action”Nothing automatic. The health signal is an input to the World Agent’s evolution loop — it appears in the agent’s coverage reports, in gap analyses, and in operator-facing summaries. The World Agent may propose a rule candidate for revision, retirement, or refinement based on the signal, but every such proposal passes through the same conformity gate + human sign-off path as any other candidate.
Automatic retirement would bypass human authority over the domain standard, which is the whole point of the governed loop.
Where the signal surfaces
Section titled “Where the signal surfaces”- World Agent coverage reports — flags rules in “plausibly solved” or “underspecified” states when the operator asks about coverage.
- Operator dashboards in the Operations app — surfaces the health vector per rule for ad-hoc review.
- WorldModelCard metadata — aggregates the distribution of rule-health states per version (how many are discriminating, how many are flagged, etc.) without exposing per-rule signal externally.
What the signal does NOT do
Section titled “What the signal does NOT do”- Does not flow to customers. The signal is world-model-internal; a customer reading a system card sees aggregate distribution, not per-rule health.
- Does not automatically demote or retire rules. Those remain governed enshrinement-path decisions.
- Does not affect scan outcomes. Rules with “plausibly solved” status still contribute to scoring; the signal is a trigger for evolution review, not a scoring weight.
Versioning and Release
Section titled “Versioning and Release”Publication is operator-triggered, not enshrinement-triggered. Rules enshrine one at a time as reviewers approve them, and the new-rules set accumulates in the world model’s working state. When a meaningful set of changes has accumulated — new enshrinements, retirements, revisions — an operator triggers world-model-version publication from the operational control plane. The publication action mints a new WorldModel version + a new EvaluationAuthorityRef (ADR-030) from the current enshrined rule set; the prior version is retained.
The publication call shares the call-flow shape with enshrinement — the operator invokes the use case handler worlds.application.publication.PublishWorldModelVersion directly from apps/api/operator/* per ADR-070 Tier 1. The publication transaction is atomic across (WorldModel version mint, EvaluationAuthorityRef mint, WorldModelCard mint via mint_world_model_card_at_publication, outbox row for worlds.world_model_card.published); after commit, the World Agent emits worldmodel.version.published. Downstream consumers — including the WorldModelCard auto-generation pipeline — react to the worlds.world_model_card.published projection event per ADR-065 D2.
Version releases follow semver conventions with a breaking-change discipline: if the new version’s rule set would produce materially different evaluation results for the same agent system, it is a breaking change and increments the major version.
Release notes document rule changes, including corrections. A rule found to have been misread from its source is corrected in the release notes of the version that fixes it. Prior system cards generated against the version with the misread rule are not retroactively invalidated — they are correct assessments against the published standard at that time. The published version is the unit of authority, and authority attaches to what was published, not to what the standard later became.
What’s Next
Section titled “What’s Next”- System Card — the external audit artifact published against each enshrined version
- Version Attribution — how
spectral.platformconsumes the version on the EvalSet attribution envelope without becoming version-aware - Operations App — Publication — the operator surface for triggering version publication