Developer

Embeddings runbook

Operational procedures for the embedding profile registry, in-process model lifecycle, and the upgrade ladder.

System reference: Codex system-design/embeddings.mdx · ADR-038.

Profile registry

Active profile per account is a single row in core.embedding_profile with deactivated_at IS NULL (partial unique index enforces).

Inspect active profile

SELECT id, account_id, provider, model, model_version, dimension, activated_at
FROM core.embedding_profile
WHERE account_id = $1 AND deactivated_at IS NULL;

Seed the first account

When a new account onboards, insert one active row referencing the canonical profile:

INSERT INTO core.embedding_profile
  (account_id, provider, model, model_version, dimension, activated_at)
VALUES
  ($1, 'fastembed', 'BAAI/bge-small-en-v1.5', '1', 384, now());

The EmbeddingProfileResolver Protocol in spectral.core.embeddings.protocol is the read surface for this table; concrete implementations in each context’s infrastructure layer query the active row keyed on account_id.

Profile rotation (blue-green)

Triggered by an upgrade per the ladder below. Sequence:

Migration adds embedding_v2 vector(<new_dim>) column to every retrievable table.
Insert a new active row in core.embedding_profile for the target model. Set deactivated_at = now() on the previous row.
Publish EmbeddingProfileRotated event. Backfill worker re-embeds source content in batches into embedding_v2.
Watch core.llm_usage (filtered by purpose='embedding') for backfill cost.
When backfill hits 100% per retrievable table, flip the feature flag for the read path.
Follow-up migration drops the old column and HNSW index.

Upgrade ladder

Each step is a re-embedding job, not a re-architecture. Triggered by:

Trigger	Action
Retrieval quality insufficient at alpha dogfooding	Step 1 — upgrade to `BAAI/bge-large-en-v1.5` (1024-dim, ~1.3 GB; in-process if worker RAM allows)
Worker memory pressure OR embedding volume outpaces in-process	Step 2 — deploy `huggingface/text-embeddings-inference` sidecar on Cloud Run CPU; swap `EmbeddingProvider` impl
Quality demands frontier OR enterprise DPA names a provider	Step 3 — swap to a cloud API (Gemini `gemini-embedding-001` preferred; OpenAI `text-embedding-3-large` if non-GCP)

Each step uses the same blue-green rotation flow above.

Worker memory check

In-process BGE-small adds ~120 MB per worker. Track via Render metrics. Worker subdivision trigger fires if a future canonical model upgrade pushes in-process footprint past ~500 MB.

Cost monitoring

Embedding calls write core.llm_usage rows with purpose='embedding'. Per-workspace daily cost comes from the core.llm_usage_daily rollup. Alpha-stage in-process embeddings are zero-cost; cost monitoring matters once Step 2 or Step 3 of the upgrade ladder lands.

Troubleshooting

Retrieval returns nothing for a known matching document → verify embedding_dim on the row matches the active profile’s dimension. Mid-rotation rows may have only embedding_v2 populated; ensure the read path matches the feature flag.
HNSW build OOMs → increase maintenance_work_mem for the build session (4–8 GB at alpha row counts).
Trigram trigger rejects a write with errcode 23514 → the body_text similarity exceeded the 0.85 threshold against worlds.rules.body_text scoped by world_id_context (per ADR-038 D8 — the rule-corpus integrity backstop that prevents the agent’s reasoning from shadowing canonical rule content). Inspect the rejected row and the closest matching rule; if the row is legitimate, the rejection is likely a sign the row belongs in agent memory rather than the rule corpus.