Skip to content
GitHub
World Model System

World Model System

The World Model System is Spectral’s mechanism for establishing and evolving the behavioral standard a domain requires. It exists because any single customer’s eval set is bounded by their experience — the failures they have seen, the scenarios they have anticipated, the rules they already know. A domain’s actual behavioral territory is strictly larger than any one customer’s view of it. Evaluating against a customer-authored eval set is evaluating against that customer’s map, not the territory.

The World Model System replaces that map with a governed domain standard. Rules are grounded in authoritative sources, curated through human-gated evolution, and made available to every customer operating in the same domain. Optimization runs against eval sets derived from the world model — not from a customer’s private scenario list.

Spectral’s optimization pipeline, memory system, and tournament all operate against EvaluationFrameworks derived from World Models (the operational artifact generated per scan from an EvaluationFramework is the EvalSet — see Eval Generation for the full distinction). The World Model System supplies the standard; Spectral supplies the optimization. The two are architecturally separate contexts that communicate through world signal events — the platform’s failure clusters and promoted memory observations routed back to worlds as input to rule evolution; see Memory System for the mechanism.

A World Model is a Spectral-managed standard encoding the behavioral expectations of a specific problem domain. It is domain-scoped, exists independently of any customer’s system, and is the source of truth from which EvaluationFrameworks are generated. For the full specification, see World Model.

A Rule is the atomic unit of a world model — a natural language assertion about expected behavior in the domain, with structured metadata covering provenance, authority, and lifecycle status. Rules move through a five-status lifecycle from Candidate to Provisional to Pending Approval to Enshrined to Retired. For the full specification, see World Model.

A Domain is the full behavioral territory a world model governs. It is the complete space of behaviors, scenarios, and edge cases that the domain standard is responsible for characterizing, regardless of whether any specific customer’s system encounters them.

A Problem Space is the subset of a domain that a customer’s system operates in. A customer deploying a healthcare coding agent does not operate across the entire healthcare domain — they operate in a specific slice of it. The problem space is that slice.

A World Agent is the internal resident of a World Model. It explores domain coverage, proposes rule candidates, and maintains discovery continuity across versions. It is not customer-facing. For the full specification, see World Agent.

A System Card is the external artifact Spectral generates to characterize conformance — either the world model’s own composition or an agent system’s performance against it. For the full specification, see System Card.

A World Model’s coverage of its domain is partitioned into three zones.

Known — enshrined rules the system acts on. These are the load-bearing rules that EvaluationFrameworks evaluate against and that system cards report on.

Unknown — discoverable rules that exist in the domain but have not yet been found and enshrined. The system deliberately leaves space to challenge known rules and explore here. The Evolution Loop is the mechanism for moving rules from Unknown to Known.

Unknowable — rules that exist in the domain but cannot yet be perceived with current methods. No current discovery process can surface them. The world model acknowledges this zone rather than pretending completeness.

The agent evaluation surface has a symmetric three-zone structure that partitions the agent’s behaviors rather than the world-model rule set. To avoid label collision with the rule-set zones above, the evaluation zones are named Observed / Unobserved / Unobservable:

Observed — behaviors the agent exhibits that eval execution has already surfaced. These are the behaviors current evaluation runs surface.

Unobserved — behaviors discoverable through eval execution but not yet surfaced. Better coverage, more runs, or new scenario generation could find them.

Unobservable — behaviors that exist in the agent system but cannot be detected through current eval metrics. No current evaluation method can surface them.

The two zone systems map onto each other (rule-set Known ↔ evaluation Observed; rule-set Unknown ↔ evaluation Unobserved; rule-set Unknowable ↔ evaluation Unobservable) without sharing labels. The intersection of rule-set Unknowable and evaluation Unobservable is where the system is most humble. Incompleteness is a first-class design property, not a gap.

The World Model System is implemented as a context separate from the optimization pipeline. Three contexts structure the code: spectral.worlds (World Model context — rule storage, the World Agent, the Evolution Loop, and WorldModelCard generation), spectral.platform (Platform context — optimization pipeline, Memory System, tournament, and AgentPerformanceCard generation), and spectral.core (substrate transport plus cross-cutting plumbing — events, auth, db, retention, llm, embeddings, tools). Both worlds and platform depend on core; core depends on neither.

Worlds publishes typed contract surfaces — events for notification flow, callee-owned protocols for synchronous call flow — that platform consumes without importing worlds internals. The integrity controls for the Evolution Loop itself — the conformity gate and human sign-off — sit on top of these mechanisms, not in lieu of them.

Mechanism deferred to Contract Surfaces. Worlds and platform never import each other; integration happens through three mechanisms (events, OHS Protocols [Open Host Service — callee-owned typed Protocols], framework-layer bridge tools). For this page, the takeaway is that rule data flows out of worlds as events and the World Agent’s synchronous capabilities are exposed through callee-owned protocols — never as imports.

  • World Model — rule architecture, provenance tiers, five-status lifecycle, versioning
  • World Agent — the internal resident, memory architecture, role in evolution
  • Eval Generation — three-source corpus, mutation testing, holdout strategy, conformity gate
  • Evolution Loop — signal path, candidate promotion, human sign-off, release discipline
  • System Card — authority basis, mandatory fields, card types, versioning