Embeddings
Embeddings feed four non-user-facing retrieval paths: T3 agent memory retrieval, rule-candidate similarity, world-model artifact search, and customer-trace similarity. All four are within-purpose queries; no cross-purpose vector comparison is needed at today. Decision lineage in ADR-038; operational playbook at docs/runbooks/embeddings.md.
Canonical model
Section titled “Canonical model”BAAI/bge-small-en-v1.5 at 384-dim, loaded in-process in workers (and in apps/api for query-time embedding) via FastEmbed.
- 33M params, ~120 MB footprint, Apache 2.0 licensed
- CPU inference 5–15 ms per embedding (ONNX-backed)
- Zero additional infrastructure; zero ongoing cost
- MTEB retrieval ~63
Single canonical model across every context and every purpose. The EmbeddingProfileResolver Protocol in spectral.core.embeddings.protocol is the resolution surface every context consumes; concrete implementations live in worlds and platform infrastructure with DB-session access and wire into their repositories + handlers via DI at framework-layer composition. Direct model-ID literals in worlds or platform code are a discouraged pattern.
Profile registry
Section titled “Profile registry”core.embedding_profile carries one active row per account (partial unique index on account_id WHERE deactivated_at IS NULL):
id, account_id, provider, model, model_version, dimension,created_at, activated_at, deactivated_atAppend-only. Rotation sets deactivated_at on the previous row and inserts a new active one. The audit trail answers “which vectors belong to which profile.”
EmbeddingProvider protocol
Section titled “EmbeddingProvider protocol”spectral.core.embeddings.protocol.EmbeddingProvider:
@runtime_checkableclass EmbeddingProvider(Protocol): async def embed(self, texts: list[str], *, profile: EmbeddingProfile) -> list[Embedding]: ...Concrete implementations live in worlds and platform infrastructure: InProcessFastEmbedProvider today; TEIProvider, GeminiProvider, OpenAIProvider reserved for upgrade. TenantScopedEmbeddingProvider wrapper applies the LLM platform rate-limit and budget envelope.
Each batch embedding call writes one core.llm_usage row with purpose=EMBEDDING and the caller-determined content_class. In-process embedding means PLATFORM content never leaves the worker.
Hybrid retrieval (RRF)
Section titled “Hybrid retrieval (RRF)”Every retrievable table carries both a vector(<dim>) column (semantic, HNSW-indexed) and a tsvector column (lexical, GIN-indexed). Retrieval helpers in spectral.core.embeddings.retrieval fuse via Reciprocal Rank Fusion with k=60. Built on vanilla Postgres FTS plus pgvector — zero additional extensions.
Pure vector similarity misses exact matches on domain vocabulary (rule IDs, form codes, error strings); RRF consistently outperforms single-method retrieval.
Retrievable-table convention
Section titled “Retrievable-table convention”Every retrievable table follows the same shape:
embedding <vector|halfvec>(<dim>)+ HNSW index onvector_cosine_opsembedding_model TEXT NOT NULLembedding_model_version TEXT NOT NULLembedding_dim INT NOT NULLsource_content_hash TEXT NULL(re-embed skip-if-unchanged)search_tsv tsvectorgenerated from relevant text columns + GIN indexsearch_lang TEXT DEFAULT 'english'
HNSW defaults: m=16, ef_construction=64 at build; tune ef_search per query. Alpha (≤ 1M rows): 4–8 GB maintenance_work_mem at build is fine.
Re-embedding lifecycle
Section titled “Re-embedding lifecycle”Profile rotation is blue-green and event-driven:
- Migration adds
embedding_v2 vector(<new_dim>)column. - The
embedding_profile_rotateddomain event (typed payload atspectral.platform.contracts.events.embedding_profile_rotatedper ADR-065 D2) triggers a backfill worker. - Worker re-embeds source content in batches into the new column.
- Feature-flagged read path flips when backfill hits 100%.
- Follow-up migration drops the old column and index.
Upgrade ladder
Section titled “Upgrade ladder”Each step is a re-embedding job, not a re-architecture:
- Quality insufficient → upgrade to
BAAI/bge-large-en-v1.5(335M params, 1024-dim, ~1.3 GB; in-process if worker RAM allows). - Worker RAM pressure OR embedding volume outpaces in-process → deploy
huggingface/text-embeddings-inferencesidecar on Cloud Run CPU. - Quality demands frontier OR enterprise DPA demands a named provider → swap to a cloud API (Gemini / OpenAI).
All three steps swap provider under the EmbeddingProvider protocol, re-embed via EmbeddingProfileRotated, cut over.
See also
Section titled “See also”- ADR-038 — decision lineage
- LLM platform —
embeddingpurpose; rate-limit + budget - World Agent memory — retrievable consumer
- Operations Agent memory — retrievable consumer
docs/runbooks/embeddings.md— rotation + upgrade-ladder playbook