Skip to content
GitHub
Developer

Embeddings runbook

Operational procedures for the embedding profile registry, in-process model lifecycle, and the upgrade ladder.

System reference: Codex system-design/embeddings.mdx · ADR-038.


Profile registry

Active profile per account is a single row in core.embedding_profile with deactivated_at IS NULL (partial unique index enforces).

Inspect active profile

SELECT id, account_id, provider, model, model_version, dimension, activated_at
FROM core.embedding_profile
WHERE account_id = $1 AND deactivated_at IS NULL;

Seed the first account

When a new account onboards, insert one active row referencing the canonical profile:

INSERT INTO core.embedding_profile
(account_id, provider, model, model_version, dimension, activated_at)
VALUES
($1, 'fastembed', 'BAAI/bge-small-en-v1.5', '1', 384, now());

The EmbeddingProfileResolver Protocol in spectral.core.embeddings.protocol is the read surface for this table; concrete implementations in each context’s infrastructure layer query the active row keyed on account_id.


Profile rotation (blue-green)

Triggered by an upgrade per the ladder below. Sequence:

  1. Migration adds embedding_v2 vector(<new_dim>) column to every retrievable table.
  2. Insert a new active row in core.embedding_profile for the target model. Set deactivated_at = now() on the previous row.
  3. Publish EmbeddingProfileRotated event. Backfill worker re-embeds source content in batches into embedding_v2.
  4. Watch core.llm_usage (filtered by purpose='embedding') for backfill cost.
  5. When backfill hits 100% per retrievable table, flip the feature flag for the read path.
  6. Follow-up migration drops the old column and HNSW index.

Upgrade ladder

Each step is a re-embedding job, not a re-architecture. Triggered by:

TriggerAction
Retrieval quality insufficient at alpha dogfoodingStep 1 — upgrade to BAAI/bge-large-en-v1.5 (1024-dim, ~1.3 GB; in-process if worker RAM allows)
Worker memory pressure OR embedding volume outpaces in-processStep 2 — deploy huggingface/text-embeddings-inference sidecar on Cloud Run CPU; swap EmbeddingProvider impl
Quality demands frontier OR enterprise DPA names a providerStep 3 — swap to a cloud API (Gemini gemini-embedding-001 preferred; OpenAI text-embedding-3-large if non-GCP)

Each step uses the same blue-green rotation flow above.


Worker memory check

In-process BGE-small adds ~120 MB per worker. Track via Render metrics. Worker subdivision trigger fires if a future canonical model upgrade pushes in-process footprint past ~500 MB.


Cost monitoring

Embedding calls write core.llm_usage rows with purpose='embedding'. Per-workspace daily cost comes from the core.llm_usage_daily rollup. Alpha-stage in-process embeddings are zero-cost; cost monitoring matters once Step 2 or Step 3 of the upgrade ladder lands.


Troubleshooting

  • Retrieval returns nothing for a known matching document → verify embedding_dim on the row matches the active profile’s dimension. Mid-rotation rows may have only embedding_v2 populated; ensure the read path matches the feature flag.
  • HNSW build OOMs → increase maintenance_work_mem for the build session (4–8 GB at alpha row counts).
  • Trigram trigger rejects a write with errcode 23514 → the body_text similarity exceeded the 0.85 threshold against worlds.rules.body_text scoped by world_id_context (per ADR-038 D8 — the rule-corpus integrity backstop that prevents the agent’s reasoning from shadowing canonical rule content). Inspect the rejected row and the closest matching rule; if the row is legitimate, the rejection is likely a sign the row belongs in agent memory rather than the rule corpus.

See also

  • ADR-038 — Embedding model + consistency + hybrid retrieval
  • Codex embeddings
  • Codex LLM platform — cost contract
  • ADR-038 — embedding model + consistency + hybrid retrieval (embedding profile lifecycle)