Skip to content
GitHub
Agents

Embeddings

Embeddings feed four non-user-facing retrieval paths: T3 agent memory retrieval, rule-candidate similarity, world-model artifact search, and customer-trace similarity. All four are within-purpose queries; no cross-purpose vector comparison is needed at today. Decision lineage in ADR-038; operational playbook at docs/runbooks/embeddings.md.


BAAI/bge-small-en-v1.5 at 384-dim, loaded in-process in workers (and in apps/api for query-time embedding) via FastEmbed.

  • 33M params, ~120 MB footprint, Apache 2.0 licensed
  • CPU inference 5–15 ms per embedding (ONNX-backed)
  • Zero additional infrastructure; zero ongoing cost
  • MTEB retrieval ~63

Single canonical model across every context and every purpose. The EmbeddingProfileResolver Protocol in spectral.core.embeddings.protocol is the resolution surface every context consumes; concrete implementations live in worlds and platform infrastructure with DB-session access and wire into their repositories + handlers via DI at framework-layer composition. Direct model-ID literals in worlds or platform code are a discouraged pattern.


core.embedding_profile carries one active row per account (partial unique index on account_id WHERE deactivated_at IS NULL):

id, account_id, provider, model, model_version, dimension,
created_at, activated_at, deactivated_at

Append-only. Rotation sets deactivated_at on the previous row and inserts a new active one. The audit trail answers “which vectors belong to which profile.”


spectral.core.embeddings.protocol.EmbeddingProvider:

@runtime_checkable
class EmbeddingProvider(Protocol):
async def embed(self, texts: list[str], *, profile: EmbeddingProfile) -> list[Embedding]: ...

Concrete implementations live in worlds and platform infrastructure: InProcessFastEmbedProvider today; TEIProvider, GeminiProvider, OpenAIProvider reserved for upgrade. TenantScopedEmbeddingProvider wrapper applies the LLM platform rate-limit and budget envelope.

Each batch embedding call writes one core.llm_usage row with purpose=EMBEDDING and the caller-determined content_class. In-process embedding means PLATFORM content never leaves the worker.


Every retrievable table carries both a vector(<dim>) column (semantic, HNSW-indexed) and a tsvector column (lexical, GIN-indexed). Retrieval helpers in spectral.core.embeddings.retrieval fuse via Reciprocal Rank Fusion with k=60. Built on vanilla Postgres FTS plus pgvector — zero additional extensions.

Pure vector similarity misses exact matches on domain vocabulary (rule IDs, form codes, error strings); RRF consistently outperforms single-method retrieval.


Every retrievable table follows the same shape:

  • embedding <vector|halfvec>(<dim>) + HNSW index on vector_cosine_ops
  • embedding_model TEXT NOT NULL
  • embedding_model_version TEXT NOT NULL
  • embedding_dim INT NOT NULL
  • source_content_hash TEXT NULL (re-embed skip-if-unchanged)
  • search_tsv tsvector generated from relevant text columns + GIN index
  • search_lang TEXT DEFAULT 'english'

HNSW defaults: m=16, ef_construction=64 at build; tune ef_search per query. Alpha (≤ 1M rows): 4–8 GB maintenance_work_mem at build is fine.


Profile rotation is blue-green and event-driven:

  1. Migration adds embedding_v2 vector(<new_dim>) column.
  2. The embedding_profile_rotated domain event (typed payload at spectral.platform.contracts.events.embedding_profile_rotated per ADR-065 D2) triggers a backfill worker.
  3. Worker re-embeds source content in batches into the new column.
  4. Feature-flagged read path flips when backfill hits 100%.
  5. Follow-up migration drops the old column and index.

Each step is a re-embedding job, not a re-architecture:

  1. Quality insufficient → upgrade to BAAI/bge-large-en-v1.5 (335M params, 1024-dim, ~1.3 GB; in-process if worker RAM allows).
  2. Worker RAM pressure OR embedding volume outpaces in-process → deploy huggingface/text-embeddings-inference sidecar on Cloud Run CPU.
  3. Quality demands frontier OR enterprise DPA demands a named provider → swap to a cloud API (Gemini / OpenAI).

All three steps swap provider under the EmbeddingProvider protocol, re-embed via EmbeddingProfileRotated, cut over.