Future Considerations
Capabilities identified during product planning that are not yet scheduled but should inform architectural decisions. These are not backlog items — they are directional signals that prevent us from painting ourselves into a corner.
Configurable Optimization Vectors
Current state: The optimization engine operates along a fixed quality vs. cost Pareto frontier.
Future direction: Extend to customer-defined optimization axes — latency, safety, compliance, or other dimensions specific to the customer’s domain.
Why it matters now: Different stakeholders within a customer team may prioritize different axes (engineers care about latency, product cares about safety, leadership cares about cost). In multi-agent systems, different agents may warrant different optimization priorities. The Configuration and optimization primitives should not hardcode quality/cost as the only dimensions.
Severity Variant Linking
Current state: Diagnosis identifies failure clusters independently.
Future direction: Link severity variants of the same root cause (e.g., “timeout after 30s” and “timeout after 5s”) to reduce noise in diagnosis output and provide cleaner explainability.
Approach options: NLP-based similarity or hierarchical clustering within the diagnosis phase.
Why it matters now: As the diagnosis engine handles more complex multi-agent systems, the number of failure clusters will grow. Without variant linking, customers may see redundant clusters that are really the same underlying issue at different severity levels. This is a quality-of-output concern that should inform how the diagnosis pipeline structures its clustering — not a separate feature bolted on later.
DSPy as an Optimization Strategy
Current state: Spectral’s own internal prompts are hand-crafted with persona-based patterns (“You are an expert…”) that have been shown to be minimally impactful.
Future direction: Two-phase adoption:
- Replace internal prompts with DSPy-compiled optimization (improving Spectral itself)
- Add DSPy as one of the optimization strategies the engine can apply when producing customer Configurations
DSPy is an implementation detail — customers never see or know it’s there. They see the Configuration with its explainability. The engine selects the best optimization strategy for the situation.
Cross-Agent Context Awareness
Current state: Evaluation operates on individual agent outputs with limited awareness of system-level context.
Future direction: Enable the evaluation pipeline to reason about cross-case and cross-agent context — detecting systemic patterns like rate limiting, correlated failures, or emergent behaviors that only appear when analyzing the workspace as a whole.
Why it matters now: This is not a deferral — it is core to Spectral’s multi-agent value proposition. The optimization engine already diagnoses at the team level, but deeper context awareness in the evaluation phase itself would catch patterns earlier in the pipeline. Architectural decisions in the evaluation phase should preserve the ability to pass cross-agent and cross-case context through the evaluation pipeline.
Alpha Deferrals
The sections above capture directional product signals. This section captures a different class of item — work that is intentionally out of scope for alpha but must be reconsidered before (or during) design-partner engagement. Each entry states what is deferred, why it is acceptable to defer for alpha, and the trigger that moves it back into scope.
Execution epics for most of this work live in SPEC-302 — Platform foundations hardening (post-alpha). The Tech Arch Review parent SPEC-257 covers observability and other architectural choices that require research before commitment.
Compliance
SOC2 Type II
Deferred: Full SOC2 Type II audit and attestation.
Why deferred: Alpha runs against internal operators and named design partners under NDA; no regulated-use claims are made (see the alpha-preview footer under System Card Rendering, below). SOC2 evidence-collection windows are 6–12 months long, so starting before the control set stabilizes is wasted effort.
Trigger to reconsider: First design-partner contract that requires a SOC2 report, or any prospect in a regulated vertical (financial services, healthcare) with procurement gating on it.
HIPAA BAA-readiness
Deferred: Signing Business Associate Agreements and the controls required to back them (encryption discipline, breach-notification procedures, workforce training, audit logging tuned to PHI access).
Why deferred: Tax prep is the chosen alpha domain (ADR-030). HIPAA only becomes relevant if healthcare prior-auth or a similar domain becomes a second customer-facing vertical.
Trigger to reconsider: A domain-selection decision to pursue healthcare prior-auth, clinical-coding, or any other HIPAA-scoped vertical.
Data residency
Deferred: Region-pinned deployments, per-workspace residency controls, cross-region replication policy.
Why deferred: Supabase defaults are single-region US-hosted; alpha customers are US-domiciled. No alpha requirement exists for EU / APAC data residency.
Trigger to reconsider: First non-US design partner, any customer with a contractual data-residency clause, or an internal decision to pursue EU GDPR or UK-DPA posture.
Encryption-at-rest
Deferred: Customer-managed keys, envelope encryption for sensitive columns, key rotation policy beyond Supabase defaults.
Why deferred: Supabase provides AES-256 encryption-at-rest by default for all managed Postgres storage — this is sufficient for alpha. The upgrade path (pgsodium column-level encryption or externally-managed KMS) is preserved because it layers on top of the default storage encryption rather than replacing it.
Trigger to reconsider: Customer contractual requirement for customer-managed keys, or procurement gate on column-level encryption for specific data classes (SSNs, tax IDs, financial account numbers).
PII redaction in logs
Deferred: Systematic PII redaction policy for structured logs and trace payloads.
Why deferred: Alpha logging is internal-facing only. No log data is shared with customers; no log data leaves the Spectral-operated platform. Trace payloads captured from customer agents are treated as customer data and are already workspace-scoped under RLS.
Trigger to reconsider: Any plan to share logs with customers (self-serve log search, exported audit bundles), any third-party log-aggregation integration, or the first customer whose contract prohibits PII in logs under any circumstance.
Platform hardening
Multi-tenant RLS adversarial testing
Deferred: Adversarial test suite simulating hostile workspace members, cross-workspace leakage probes across every workspace-scoped table, CI-blocking RLS contract suite beyond smoke-level coverage.
Why deferred: SPEC-247 (alpha platform foundations) delivers RLS-capable schema as a non-negotiable from commit one, with smoke-level coverage proving wrong-workspace queries return empty. That is the alpha bar. Full adversarial testing is a design-partner prerequisite, not an alpha prerequisite.
Trigger to reconsider: First external design partner accessing the platform, or any finding during alpha that suggests the smoke-level coverage missed a class of leakage. Owned by SPEC-302.
Advanced invite flows
Deferred: Self-serve workspace creation, workspace-to-workspace invite transfers, bulk invite, domain-verified auto-accept, full lifecycle UI.
Why deferred: Alpha uses admin-bootstrap only — Spectral operators create workspaces and invite the first members. The invite primitives (workspace_invites table, accept endpoint) are in place; additional UX on top is not load-bearing until customer-led workspace creation is supported.
Trigger to reconsider: First design partner that wants to self-administer workspace membership, or a sales motion that depends on trial-signup flow. Owned by SPEC-302.
Billing + usage metering
Deferred: Entirely — billing model is explicitly undefined at alpha.
Why deferred: Alpha engagements are unpriced / design-partner contracts. Building metering infrastructure before the commercial model is agreed risks baking in the wrong primitives.
Trigger to reconsider: Commercial-model decision from product / leadership. Until then, SPEC-302 holds the placeholder and will not be refined.
System card distribution channels
Deferred: Distribution beyond ad-hoc PDF download — signed expiring links, scheduled email to stakeholder lists, regulatory-filing-friendly bundles, machine-readable feeds for downstream compliance tooling.
Why deferred: PDF download is sufficient for alpha — operators and design partners can share the file manually. The rendering pipeline (SPEC-243) produces the artifact; distribution is a separate orthogonal concern.
Trigger to reconsider: First customer workflow that requires scheduled delivery (e.g., quarterly compliance reports), or any regulatory regime that dictates a machine-readable format.
Observability
Addressed by the Tech Arch Review (SPEC-257).
Operational observability stack — settled
The operational stack (metrics, logs, traces for the platform itself) is now described declaratively in the Codex observability stack page and captured in ADR-036. Alpha uses Grafana Cloud (or platform-native equivalent) plus Pydantic Logfire for LLM observability plus Sentry for error tracking, all OTLP-consuming with credible self-host escape hatches (LGTM, Langfuse, GlitchTip).
LLM-specific observability — settled
The LLM-specific stack lives in the same ADR-036, with the three-stream architecture (payload-free spans → Logfire +
Grafana; payload-bearing records → business-object-contextual tables; cost/usage summary → core.llm_usage).
Content-class-driven routing (PLATFORM / OPERATIONS / SYNTHETIC) handles redaction at the OTel exporter
boundary. See also ADR-061 for the three-tier test posture.
Retention + alerting discipline
Partially addressed by SPEC-307 (TA-4) as of 2026-04-21:
the four-state retention vocabulary (ACTIVE / REFERENCED / ORPHANED / TOMBSTONED) + RetentionPolicy shape +
1-year alpha default landed in spectral.core.retention. Enforcement mechanisms (TTL cron, cascade worker,
integrity job) defer to named D11 triggers with measurable thresholds.
Still deferred:
- Alert routing (who gets paged for what), runbook library for incident response.
- Per-entity-type policy values beyond the 365-day default.
- Concrete TTL enforcement implementation (view-based retention-state computation scaffolds land with each context’s first workspace-scoped migration).
Why still deferred: alpha operations are run during business hours by a small team; informal alerting is acceptable until the SPEC-307 D11 triggers fire. Paging discipline is a post-alpha hardening concern (SPEC-302).
Trigger to reconsider: First off-hours incident that required paging, first customer SLA commitment, any compliance regime with mandated retention windows, or any SPEC-307 D11 trigger firing.
Supply chain
SBOM generation
Deferred: Producing a Software Bill of Materials per release and publishing it alongside the build artifact.
Why deferred: Alpha has no external consumers of SBOMs — no customer procurement gate requires it, and nothing is distributed outside the Spectral-operated platform.
Trigger to reconsider: First customer procurement questionnaire that asks for SBOM, any federal or regulated-vertical prospect (SBOMs are increasingly mandatory under Executive Order 14028 and similar regimes), or upstream compromise incident in a dependency that drives the decision regardless.
Sigstore signing
Deferred: Signing release artifacts with Sigstore / cosign and publishing the attestations.
Why deferred: Alpha builds are deployed to Spectral-operated infrastructure only. There is no distribution surface where signature verification would be enforced by a third party.
Trigger to reconsider: Adding any distribution channel outside Spectral-operated infrastructure (e.g., a customer- hosted component, a CLI, a container image intended for customer pull), or an SBOM trigger firing (SBOM + signing are typically adopted together).
Dependency provenance verification
Deferred: Verifying build-time provenance for third-party dependencies (e.g., npm audit signatures, verified
attestations via slsa-verifier).
Why deferred: Alpha dependency surface is managed via uv.lock and pnpm-lock.yaml pinning; Dependabot surfaces
known-vulnerable versions. Provenance verification is a layer on top that has cost without alpha-scoped benefit.
Trigger to reconsider: Concurrent with SBOM / signing adoption, or a supply-chain incident affecting the direct dependency set.
System card rendering — alpha-preview footer
Deferred: Production-grade system cards suitable for regulated-use attestation.
Why deferred: Alpha system cards are produced by a system that is itself under active construction; the evidence gathering, aggregation, and attribution pipelines are all alpha-quality. Using them for regulated-use decisions would overstate the maturity of the underlying evaluation.
What alpha ships instead: Every rendered WorldModelCard and AgentPerformanceCard carries a visible footer that
reads “Alpha preview — not for regulated use”. This is load-bearing: it is how we preserve the option to ship system
cards during alpha (for internal review, design-partner review, and narrative purposes) without implying a compliance
posture we do not have.
Trigger to reconsider: Footer removal is gated on (a) the evaluation pipeline maturing past alpha, (b) a compliance posture being formally attested, and (c) product / legal sign-off that the card can stand alone.
Execution: SPEC-303 (child of SPEC-243) enforces the footer at every render surface (in-product, PDF, HTML share) with an exact-string test asserting it.
WorldModelCard signing — auditor verifiability
Deferred: Cryptographic signing of the published WorldModelCard and its EvaluationAuthorityRef.
Why deferred: The alpha trust model is operational — auditors trust Spectral’s published URL plus the rule-count- by-tier verification documented on the system-card Codex page. Signing adds cryptographic independence that becomes load-bearing only when third-party auditors need to verify cards without the URL-trust assumption.
What alpha ships instead: The authority_ref is a plain opaque handle (UUID + deterministic hash of the version’s
rule set); verification relies on the auditor trusting Spectral’s published URL.
Post-alpha shape: The WorldModelCard and its authority_ref are cryptographically signed (Sigstore / cosign).
An external auditor can verify the signature independently against the public transparency log, without trusting the
URL.
Trigger to reconsider: External auditors needing cryptographic verification beyond the URL-trust assumption — e.g., regulated-use attestation, third-party audit firms that don’t accept publisher trust as the basis for verification.
Distinct from: Supply chain — Sigstore signing covers signing release artifacts (build authenticity for customers); this entry covers signing published authority documents (document authenticity for external auditors). Same tool, different concerns.
Execution: SPEC-473 tracks the post-alpha workstream and
carries the Codex-update AC for system-design/world-model-system/system-card.mdx.
Customer BYO credentials
Deferred: Customer-supplied credential storage for workspaces (LLM provider keys, third-party API tokens, workspace-scoped secrets the customer brings).
Why deferred: Alpha customers use Spectral-managed credentials. The customer surface during alpha does not require BYO; introducing it would add encryption, key-management, rotation, and onboarding-UX complexity without alpha-scoped benefit.
What alpha ships instead: Spectral-managed credentials only — provisioned via tools/provision/setup.sh and
distributed through Render Environment Groups per the secrets-management page.
Post-alpha shape: Encrypted blobs land in a platform-scoped core.workspace_secrets table; per-workspace access
via SECURITY DEFINER checking auth.uid() against workspace membership; AEAD with workspace_id as AAD. GCP KMS as
the master-key holder when the BYOK feature ships (the dev environment never needs KMS).
Trigger to reconsider: A customer needing to bring their own LLM provider keys, their own third-party API tokens, or workspace-scoped secrets that Spectral cannot or should not hold.
Execution: SPEC-474 tracks the post-alpha workstream and
carries the Codex-update AC for reference/secrets-management.mdx.
Autonomy modes — second alpha wave
Deferred: The recommend and bounded_auto autonomy modes, the four-gate framework for bounded auto-acceptance,
and the per-workspace kill switch.
Why deferred: Alpha first wave ships only observe_only and manual — the two modes that exercise the
AutonomyMode enum without requiring gate evaluation, kill-switch persistence, or auto-acceptance pathways. The
deferred modes plug into the same AutomatedAcceptanceService without changing its shape; deferring them keeps the
alpha autonomy surface narrow.
What alpha ships instead: observe_only (no changeset created) and manual (changeset created, every accept
routes to approval.required). Both modes use the same on_scan_completed handler that the deferred modes plug into
post-alpha.
Post-alpha shape: recommend (semantically identical to manual but distinguished by intent recorded on the
changeset for analytics); bounded_auto (evaluates four gates against composite_score_snapshot —
min_blended_delta, min_world_model_score, max_rules_affected, require_validated_changeset — and
auto-transitions to accepted when all pass); the workspace-level kill switch (forces approval.required on every
subsequent changeset regardless of configured mode).
Trigger to reconsider: SPEC-234 enters the implementation queue.
Distinct from: System card distribution channels and the other deferred items — the autonomy-mode work is post-alpha but the modes themselves all ship; only the gate-evaluation pathways and the kill switch are net-new behavior.
Execution: SPEC-234 tracks the post-alpha workstream and
carries Codex-update ACs for system-design/optimization-engine.mdx and
how-spectral-works.mdx.
Autonomy modes — post-launch ladder
Deferred: The terminal two rungs of the autonomy ladder — auto_test and guarded_auto.
Why deferred: Both modes require operational signals that don’t exist at alpha. auto_test needs a
trust-baseline mechanism that can only be calibrated after the first-wave + second-wave modes have run in production
across multiple workspaces. guarded_auto requires anomaly-driven rollback infrastructure plus the operational
record auto_test produces; it cannot be built in parallel with auto_test.
What alpha + second wave ship instead: The four foundational modes (observe_only, manual, recommend,
bounded_auto) plus the workspace kill switch. Together they cover the full alpha autonomy surface.
Post-launch shape: auto_test auto-accepts changesets classified as non-breaking that pass gate thresholds and
defers anything classified as breaking or failing gates to approval.required. Risk classification (breaking vs
non-breaking) is a hard prerequisite. guarded_auto is the terminal rung — auto-accepts within configured policy
guardrails with anomaly-driven automatic rollback; hard-depends on auto_test having been operationally validated.
Trigger to reconsider: Operational experience with the alpha modes accumulates the trust-baseline data the post-launch modes calibrate against; product roadmap prioritization makes the case to invest in the risk-classification + rollback infrastructure they require.
Execution: SPEC-232 tracks auto_test;
SPEC-233 tracks guarded_auto. Both carry Codex-update ACs (per
SPEC-292’s precedent) for system-design/optimization-engine.mdx.
Trace-based prompt template inference
Deferred: Spectral inferring a customer’s prompt template structurally by pulling traces from OTEL spans, identifying conversation starts, and segmenting the structure (base prompt, context strategy, chain-of-thought, few-shot examples, upstream context injection).
Why deferred: Trace-based inference is the ideal onboarding path — low friction for the customer, high day-one value, and a meaningful technical differentiator. But the segmentation algorithm requires either OtelTrace + sample curation infrastructure that itself is alpha-deferred, or new infrastructure built specifically for this purpose. Alpha relies on the Stage 1 fallback path so onboarding can ship before either prerequisite lands.
What alpha ships instead: The Stage 1 fallback path — customers declare their prompt templates explicitly during onboarding. Workspace Agents are configured per-agent with the same structural metadata (base prompt, context selection, chain-of-thought, few-shot, upstream injection); only the discovery mechanism differs.
Post-alpha shape: Trace-based inference becomes the default onboarding path. The customer instruments their agent system with OTEL, Spectral pulls a window of traces, and the segmentation algorithm produces a draft prompt-template structure that the customer confirms or adjusts. Customer-declared templates remain a fallback for cases where inference fails or the customer prefers explicit declaration.
Trigger to reconsider: Refinement of SPEC-472 begins (after the post-alpha workstream opens or earlier if a distinct customer signal demands inference-based onboarding before the rest of the post-alpha capabilities).
Execution: SPEC-472 tracks the post-alpha workstream and
carries the Codex-update AC for the Prompt Template primitive section in reference/primitives.mdx (relocated from
system-design/index.mdx in the cluster B index split).