Embeddings
Embeddings feed three non-user-facing retrieval paths: T3 agent memory retrieval, rule-candidate similarity, and world-model artifact search. All three are within-purpose queries; no cross-purpose vector comparison is needed today. Decision lineage in ADR-038; operational playbook at docs/runbooks/embeddings.md.
Canonical model
Section titled “Canonical model”BAAI/bge-small-en-v1.5 at 384-dim, loaded in-process in workers (and in apps/api for query-time embedding) via FastEmbed.
- 33M params, ~120 MB footprint, Apache 2.0 licensed
- CPU inference 5–15 ms per embedding (ONNX-backed)
- Zero additional infrastructure; zero ongoing cost
- MTEB retrieval ~63
Single canonical model across every context and every purpose. The EmbeddingProfileResolver Protocol in spectral.core.embeddings.protocol is the resolution surface every context consumes; its concrete implementations are DB-coupled (they read profile rows) and so live in worlds infrastructure, wiring into repositories + handlers via DI at framework-layer composition. Direct model-ID literals in worlds or platform code are a discouraged pattern.
Profile registry
Section titled “Profile registry”core.embedding_profile carries one active row per org (partial unique index on org_id WHERE deactivated_at IS NULL):
id, org_id, provider, model, model_version, dimension,created_at, activated_at, deactivated_atAppend-only. Rotation sets deactivated_at on the previous row and inserts a new active one. The audit trail answers “which vectors belong to which profile.”
EmbeddingProvider protocol
Section titled “EmbeddingProvider protocol”spectral.core.embeddings.protocol.EmbeddingProvider:
@runtime_checkableclass EmbeddingProvider(Protocol): async def embed(self, texts: list[str], *, profile: EmbeddingProfile) -> list[Embedding]: ...The bare EmbeddingProvider concretion is context-agnostic — it embeds text and knows nothing of any context’s tables — so it lives in the core infra zone (spectral.core.embeddings.infrastructure.fastembed) per ADR-099: InProcessFastEmbedProvider today; TEIProvider, GeminiProvider, OpenAIProvider reserved for upgrade. The DB-coupled machinery that uses it — the profile-resolver and hybrid-retriever — stays in worlds infrastructure. The TenantScopedEmbeddingProvider wrapper applies the LLM platform rate-limit and budget envelope.
Each batch embedding call writes one core.llm_usage row with purpose=EMBEDDING and the caller-determined content_class. In-process embedding means PLATFORM content never leaves the worker.
Hybrid retrieval (RRF)
Section titled “Hybrid retrieval (RRF)”Every retrievable table carries both a vector(<dim>) column (semantic, HNSW-indexed) and a tsvector column (lexical, GIN-indexed). Retrieval helpers in spectral.core.embeddings.retrieval fuse via Reciprocal Rank Fusion with k=60. Built on vanilla Postgres FTS plus pgvector — zero additional extensions.
Pure vector similarity misses exact matches on domain vocabulary (rule IDs, form codes, error strings); RRF consistently outperforms single-method retrieval.
Retrievable-table convention
Section titled “Retrievable-table convention”Every retrievable table follows the same shape:
embedding <vector|halfvec>(<dim>)+ HNSW index onvector_cosine_opsembedding_model TEXT NOT NULLembedding_model_version TEXT NOT NULLembedding_dim INT NOT NULLsource_content_hash TEXT NULL(re-embed skip-if-unchanged)search_tsv tsvectorgenerated from relevant text columns + GIN indexsearch_lang TEXT DEFAULT 'english'
HNSW defaults: m=16, ef_construction=64 at build; tune ef_search per query. At ≤ 1M rows, 4–8 GB maintenance_work_mem at build is fine.
Re-embedding lifecycle
Section titled “Re-embedding lifecycle”Profile rotation is blue-green and event-driven:
- Migration adds
embedding_v2 vector(<new_dim>)column. - The
embedding_profile_rotateddomain event (typed payload atspectral.platform.contracts.events.embedding_profile_rotatedper ADR-065 D2) triggers a backfill worker. - Worker re-embeds source content in batches into the new column.
- Feature-flagged read path flips when backfill hits 100%.
- Follow-up migration drops the old column and index.
Upgrade ladder
Section titled “Upgrade ladder”Each step is a re-embedding job, not a re-architecture:
- Quality insufficient → upgrade to
BAAI/bge-large-en-v1.5(335M params, 1024-dim, ~1.3 GB; in-process if worker RAM allows). - Worker RAM pressure OR embedding volume outpaces in-process → run
huggingface/text-embeddings-inferenceas a dedicated Cloudflare Container, off the in-process worker (one compute vendor per ADR-109). - Quality demands frontier OR enterprise DPA demands a named provider → swap to a cloud API (Gemini / OpenAI).
All three steps swap provider under the EmbeddingProvider protocol, re-embed via EmbeddingProfileRotated, cut over.
See also
Section titled “See also”- ADR-038 — decision lineage
- LLM platform —
embeddingpurpose; rate-limit + budget - World Agent memory — retrievable consumer
docs/runbooks/embeddings.md— rotation + upgrade-ladder playbook