Skip to content
GitHub
Decisions

ADR-027: Eval corpus as internal world asset

Status: Accepted (2026-04-20)

Source: migrated from planning/swms-decisions.md ADR-037 as part of SPEC-270.

Context

The eval corpus — the pool of generated instances per rule from which EvalSets are drawn — is a world model asset. It must not leak structural information across the context boundary. Internal identifiers, corpus position signals, instance shape metadata, or any information that would allow spectral.platform to infer holdout boundaries or corpus organization would enable eval corpus distillation: a sophisticated customer accumulating inferred patterns from eval shape, identifiers, and data to reconstruct the underlying corpus structure.

Decision

The eval corpus and holdout registry are strictly internal to packages/worlds. What crosses the context boundary is a fully sanitized EvalSet — generated instances with no internal identifiers, no corpus position signals, and no shape metadata. The attribution envelope carries world model version and rule references only. packages/spectral receives eval instances as opaque inputs, not as slices of a known corpus. packages/spectral has no read or write access to the holdout registry.

Consequences

  • EvalSets delivered to spectral.platform contain no internal corpus identifiers or structural metadata.
  • The attribution envelope is the only reference between contexts — world model version and rule identifiers, never corpus internals.
  • Holdout validation is a spectral.worlds-internal process. spectral.platform does not know which instances are holdouts.
  • The eval corpus is a world model version asset: it persists for the life of the version and is archived (not deleted) on version retirement.
  • Holdout instances persist for the life of the version and are excluded from active generation. They are accessed only during explicit holdout validation runs.