Agents

Embeddings

Embeddings feed four non-user-facing retrieval paths: T3 agent memory retrieval, rule-candidate similarity, world-model artifact search, and customer-trace similarity. All four are within-purpose queries; no cross-purpose vector comparison is needed at today. Decision lineage in ADR-038; operational playbook at docs/runbooks/embeddings.md.

Canonical model

BAAI/bge-small-en-v1.5 at 384-dim, loaded in-process in workers (and in apps/api for query-time embedding) via FastEmbed.

33M params, ~120 MB footprint, Apache 2.0 licensed
CPU inference 5–15 ms per embedding (ONNX-backed)
Zero additional infrastructure; zero ongoing cost
MTEB retrieval ~63

Single canonical model across every context and every purpose. The EmbeddingProfileResolver Protocol in spectral.core.embeddings.protocol is the resolution surface every context consumes; concrete implementations live in worlds and platform infrastructure with DB-session access and wire into their repositories + handlers via DI at framework-layer composition. Direct model-ID literals in worlds or platform code are a discouraged pattern.

Profile registry

core.embedding_profile carries one active row per account (partial unique index on account_id WHERE deactivated_at IS NULL):

id, account_id, provider, model, model_version, dimension,
created_at, activated_at, deactivated_at

Append-only. Rotation sets deactivated_at on the previous row and inserts a new active one. The audit trail answers “which vectors belong to which profile.”

EmbeddingProvider protocol

spectral.core.embeddings.protocol.EmbeddingProvider:

@runtime_checkable
class EmbeddingProvider(Protocol):
    async def embed(self, texts: list[str], *, profile: EmbeddingProfile) -> list[Embedding]: ...

Concrete implementations live in worlds and platform infrastructure: InProcessFastEmbedProvider today; TEIProvider, GeminiProvider, OpenAIProvider reserved for upgrade. TenantScopedEmbeddingProvider wrapper applies the LLM platform rate-limit and budget envelope.

Each batch embedding call writes one core.llm_usage row with purpose=EMBEDDING and the caller-determined content_class. In-process embedding means PLATFORM content never leaves the worker.

Hybrid retrieval (RRF)

Every retrievable table carries both a vector(<dim>) column (semantic, HNSW-indexed) and a tsvector column (lexical, GIN-indexed). Retrieval helpers in spectral.core.embeddings.retrieval fuse via Reciprocal Rank Fusion with k=60. Built on vanilla Postgres FTS plus pgvector — zero additional extensions.

Pure vector similarity misses exact matches on domain vocabulary (rule IDs, form codes, error strings); RRF consistently outperforms single-method retrieval.

Retrievable-table convention

Every retrievable table follows the same shape:

embedding <vector|halfvec>(<dim>) + HNSW index on vector_cosine_ops
embedding_model TEXT NOT NULL
embedding_model_version TEXT NOT NULL
embedding_dim INT NOT NULL
source_content_hash TEXT NULL (re-embed skip-if-unchanged)
search_tsv tsvector generated from relevant text columns + GIN index
search_lang TEXT DEFAULT 'english'

HNSW defaults: m=16, ef_construction=64 at build; tune ef_search per query. Alpha (≤ 1M rows): 4–8 GB maintenance_work_mem at build is fine.

Re-embedding lifecycle

Profile rotation is blue-green and event-driven:

Migration adds embedding_v2 vector(<new_dim>) column.
The embedding_profile_rotated domain event (typed payload at spectral.platform.contracts.events.embedding_profile_rotated per ADR-065 D2) triggers a backfill worker.
Worker re-embeds source content in batches into the new column.
Feature-flagged read path flips when backfill hits 100%.
Follow-up migration drops the old column and index.

Upgrade ladder

Each step is a re-embedding job, not a re-architecture:

Quality insufficient → upgrade to BAAI/bge-large-en-v1.5 (335M params, 1024-dim, ~1.3 GB; in-process if worker RAM allows).
Worker RAM pressure OR embedding volume outpaces in-process → deploy huggingface/text-embeddings-inference sidecar on Cloud Run CPU.
Quality demands frontier OR enterprise DPA demands a named provider → swap to a cloud API (Gemini / OpenAI).

All three steps swap provider under the EmbeddingProvider protocol, re-embed via EmbeddingProfileRotated, cut over.