ADR-014: EvaluationFramework as shared contractual type — customer-directed parameterization
Status: Accepted (2026-04-20)
Source: migrated from
planning/swms-decisions.mdADR-023 as part of SPEC-270.
Context
The EvaluationFramework is the primary integration surface between spectral.worlds and spectral.platform. worlds generates evaluation frameworks from world model rules; platform executes optimization scans against them. In the prior Spectral architecture, EvaluationFramework was a platform-owned type with rubrics typed as list[dict[str, Any]] — schemaless, unvalidated, and carrying no provenance metadata. This is structurally incompatible with SWMS-generated frameworks, which carry rule attribution, provenance tier, and world model version.
An earlier iteration of this ADR left room for customer-authored frameworks existing alongside world-model-generated frameworks. The design interview clarified that customer framework authorship is not a supported authoring model: customers parameterize world-model eval generation rather than authoring independent frameworks that would require grounding against the world model as a separate step.
Decision
Placement of
EvaluationFramework/RubricDefinition/EvalSampleinspectral.coreas rich domain types — superseded by ADR-065. Per ADR-065 D1, no domain types live in the kernel; contracts between contexts that carry typed payloads relocate to<producer>.contracts.events.*per D2. The mandatoryauthority/authority_versionfield convention (per ADR-015) remains the pinning mechanism between contexts. The context distribution — worlds generates as output of the eval generation pipeline; platform executes — remains authoritative.
RubricDefinition replaces the former list[dict[str, Any]] with a typed Pydantic model carrying dimensions, weights, scoring guidance, hard constraint thresholds, and an opaque attribution envelope. EvalSample is a first-class primitive — the unit that worlds generates and platform executes.
World model presence in assessment is structural and mandatory; there is no assessment path that bypasses world model grounding. Customers do not author independent evaluation frameworks. Customers parameterize world-model eval generation by selecting metrics, measurement vectors, and coverage areas that direct the generation process. The output is always a world-model-generated eval set with customer-directed parameters.
Unknown-territory behavior is handled at generation time: when customer steering parameters point at territory outside current world model coverage, the system returns a coverage gap notification to the customer and routes the candidate observation internally as a discovery signal. There is no ingestion-time grounding step because there is no independently authored framework to ground.
LLM-assisted rubric generation is removed from packages/spectral; this capability is replaced by worlds eval generation.
Consequences
- The untyped
rubrics: list[dict[str, Any]]field is eliminated. All rubric structures are validated at the type level. - The three-layer instantiation model (global template → workspace instance → changeset snapshot) remains valid; the top layer’s data source changes from LLM generation to world model generation.
rubric_gen.pyLLM-assisted generation is retired fromspectral.platform. There is no customer-authored-framework authoring path in the new design.EvalSampleas a first-class type enables clean holdout split operations, world-model-suggested holdout configuration, and deviation record attribution.spectral’s localEvaluationFrameworkrepresentation references the core type rather than replacing it;evaluation_framework_idonChangeSetcontinues to reference the local representation.- Customer-directed parameterization is a first-class input to the generation pipeline. The World Agent interprets customer steering parameters against world model structure at generation time.
- Goodhart-resistance is reinforced: even when a customer directs eval scenarios through parameterization, the world model remains the adjudicating standard.
- The generation-time coverage gap notification and internal discovery routing are specified in ADR-022 (eval generation architecture) and integrate with the world signal path in ADR-017.