ADR-008: Migrate Scan Pipeline LLM Provider to Pydantic AI
Status: Accepted (2026-04-04) — LiteLLMAdapter removed entirely per ADR-035; the rebuild ships Pydantic-AI-only with no LiteLLM fallback.
Context
The scan pipeline’s LLM calls were routed through LiteLLMAdapter, which wrapped
litellm.completion(). LiteLLM served as a unified interface across Anthropic,
Google, and OpenAI providers. Over time, several issues emerged:
- Global state leakage. LiteLLM uses module-level globals (
litellm.suppress_debug_info,litellm.drop_params) that affect all threads in the process. - Cost calculation brittleness.
litellm.completion_cost()fails silently on unknown or recently-released model IDs, requiring a try/except fallback. - Typing gaps. The litellm response type is
dict | ModelResponseand provides no static guarantees — every caller manually indexes into.choices[0].message.content. - Test friction. Mocking litellm requires patching module-level globals and constructing mock response objects that mirror litellm internals.
- Dependency weight. The full
litellmpackage pulls in 100+ transitive dependencies; thepydantic-ai-slim[anthropic,google,openai]variant installs only what we need.
LangChain/LangGraph and DeepAgents still depend on litellm transitively (agent
orchestration path), so litellm remains in the dependency graph. This ADR covers
removing the direct usage from our scan pipeline only.
Decision
Replace LiteLLMAdapter with PydanticAIAdapter as the concrete implementation of
the LLMProvider protocol for the scan pipeline.
What changed
| Layer | Before | After |
|---|---|---|
| Infrastructure adapter | LiteLLMAdapter (litellm.completion) | PydanticAIAdapter (pydantic-ai model API) |
| Capability detection | litellm.get_model_info() | Config-driven registry + Ollama /api/show |
| Cost calculation | litellm.completion_cost() | genai-prices.calc_price() → MODEL_CONFIGS fallback |
| Rate limiter | Defined in client.py | Moved to rate_limiter.py, re-exported from client.py |
| Composition root | LiteLLMAdapter() | PydanticAIAdapter() |
What did NOT change
- The
LLMProviderprotocol inspectral.application.shared.protocols— unchanged. TracedLLMProvider— operates on the protocol, no changes needed.ModelRouter— injects via protocol, no changes needed (default adapter updated).- LangChain/LangGraph/DeepAgents agent orchestration — completely untouched.
- The
provider/modelstring format used in settings and.envfiles.
Architecture
composition.py └─ PydanticAIAdapter (infrastructure/shared/llm/pydantic_ai_adapter.py) └─ implements LLMProvider (application/shared/protocols.py) └─ wrapped by TracedLLMProvider └─ used by ModelRouterAll pydantic-ai imports are confined to pydantic_ai_adapter.py. No pydantic-ai
imports exist in the domain or application layers. The architecture validator enforces
this boundary.
Protocol boundary approach
PydanticAIAdapter implements LLMProvider with two methods:
call(model, system, user, max_tokens, temperature)→LLMResponsecall_with_tools(model, system, messages, tools, max_tokens)→ToolResponse
The pydantic-ai Model.request() is async. Since the scan engine calls the protocol
synchronously from ThreadPoolExecutor workers, the adapter bridges async→sync via
asyncio.run() in each worker thread. If a running event loop is detected (e.g., in
async tests), it falls back to a ThreadPoolExecutor with a new loop.
Cost calculation
Cost is calculated with a fallback chain:
genai-prices.calc_price()— model-aware, regularly updated pricing data.MODEL_CONFIGSper-tier rates — static rates hardcoded inrouter.py.0.0— safe default if both fail.
This replaces litellm.completion_cost() which failed silently on new or unknown models.
Capability detection
capabilities.py no longer calls litellm.get_model_info(). Instead:
- Cloud models (Anthropic, Google, OpenAI): capabilities are declared in a
config-driven registry (
_CLOUD_CAPABILITIESlist incapabilities.py). This is explicit, versioned in code, and not subject to litellm’s registry lag. - Ollama models: Ollama’s
/api/showHTTP endpoint is queried directly (unchanged).
Model instance caching
PydanticAIAdapter caches model instances per (provider, model_name) tuple.
Model objects are stateless once constructed, so caching is safe and avoids
re-creating HTTP client connections on every scan call.
Extensibility
To add a new provider:
- Add an entry to
_PROVIDER_MAPinpydantic_ai_adapter.py. - Add a branch in
_create_model()to construct the pydantic-ai model. - Add capability entries to
_CLOUD_CAPABILITIESincapabilities.py.
No changes to the LLMProvider protocol or any application-layer code are needed.
Alternatives considered
Keep LiteLLM for everything
Rejected. Global state leakage, cost tracking brittleness, and test friction were all real operational problems. The protocol boundary makes it straightforward to swap the adapter without touching application code.
Use LangChain for scan pipeline LLM calls
Rejected. LangChain is already used for agent orchestration (see ADR-007) and adds significant abstraction overhead for simple completion calls. The scan pipeline needs direct token-counted responses, not LangChain’s chain/runnable model.
Build a bespoke provider adapter per provider (Anthropic SDK, Google SDK, etc.)
Rejected. Pydantic AI already provides well-typed, async-first adapters for all three providers we use. Building our own duplicates that work without adding value.