Embeddings runbook
Operational procedures for the embedding profile registry, in-process model lifecycle, and the upgrade ladder.
System reference: Codex system-design/embeddings.mdx · ADR-038.
Profile registry
Active profile per account is a single row in core.embedding_profile with deactivated_at IS NULL (partial unique index enforces).
Inspect active profile
SELECT id, account_id, provider, model, model_version, dimension, activated_atFROM core.embedding_profileWHERE account_id = $1 AND deactivated_at IS NULL;Seed the first account
When a new account onboards, insert one active row referencing the canonical profile:
INSERT INTO core.embedding_profile (account_id, provider, model, model_version, dimension, activated_at)VALUES ($1, 'fastembed', 'BAAI/bge-small-en-v1.5', '1', 384, now());The EmbeddingProfileResolver Protocol in spectral.core.embeddings.protocol is the read surface for this table; concrete implementations in each context’s infrastructure layer query the active row keyed on account_id.
Profile rotation (blue-green)
Triggered by an upgrade per the ladder below. Sequence:
- Migration adds
embedding_v2 vector(<new_dim>)column to every retrievable table. - Insert a new active row in
core.embedding_profilefor the target model. Setdeactivated_at = now()on the previous row. - Publish
EmbeddingProfileRotatedevent. Backfill worker re-embeds source content in batches intoembedding_v2. - Watch
core.llm_usage(filtered bypurpose='embedding') for backfill cost. - When backfill hits 100% per retrievable table, flip the feature flag for the read path.
- Follow-up migration drops the old column and HNSW index.
Upgrade ladder
Each step is a re-embedding job, not a re-architecture. Triggered by:
| Trigger | Action |
|---|---|
| Retrieval quality insufficient at alpha dogfooding | Step 1 — upgrade to BAAI/bge-large-en-v1.5 (1024-dim, ~1.3 GB; in-process if worker RAM allows) |
| Worker memory pressure OR embedding volume outpaces in-process | Step 2 — deploy huggingface/text-embeddings-inference sidecar on Cloud Run CPU; swap EmbeddingProvider impl |
| Quality demands frontier OR enterprise DPA names a provider | Step 3 — swap to a cloud API (Gemini gemini-embedding-001 preferred; OpenAI text-embedding-3-large if non-GCP) |
Each step uses the same blue-green rotation flow above.
Worker memory check
In-process BGE-small adds ~120 MB per worker. Track via Render metrics. Worker subdivision trigger fires if a future canonical model upgrade pushes in-process footprint past ~500 MB.
Cost monitoring
Embedding calls write core.llm_usage rows with purpose='embedding'. Per-workspace daily cost comes from the core.llm_usage_daily rollup. Alpha-stage in-process embeddings are zero-cost; cost monitoring matters once Step 2 or Step 3 of the upgrade ladder lands.
Troubleshooting
- Retrieval returns nothing for a known matching document → verify
embedding_dimon the row matches the active profile’sdimension. Mid-rotation rows may have onlyembedding_v2populated; ensure the read path matches the feature flag. - HNSW build OOMs → increase
maintenance_work_memfor the build session (4–8 GB at alpha row counts). - Trigram trigger rejects a write with errcode 23514 → the
body_textsimilarity exceeded the 0.85 threshold againstworlds.rules.body_textscoped byworld_id_context(per ADR-038 D8 — the rule-corpus integrity backstop that prevents the agent’s reasoning from shadowing canonical rule content). Inspect the rejected row and the closest matching rule; if the row is legitimate, the rejection is likely a sign the row belongs in agent memory rather than the rule corpus.
See also
- ADR-038 — Embedding model + consistency + hybrid retrieval
- Codex embeddings
- Codex LLM platform — cost contract
- ADR-038 — embedding model + consistency + hybrid retrieval (embedding profile lifecycle)