Skip to content
GitHub
Decisions

ADR-008: Migrate Scan Pipeline LLM Provider to Pydantic AI

Status: Accepted (2026-04-04) — LiteLLMAdapter removed entirely per ADR-035; the rebuild ships Pydantic-AI-only with no LiteLLM fallback.


Context

The scan pipeline’s LLM calls were routed through LiteLLMAdapter, which wrapped litellm.completion(). LiteLLM served as a unified interface across Anthropic, Google, and OpenAI providers. Over time, several issues emerged:

  • Global state leakage. LiteLLM uses module-level globals (litellm.suppress_debug_info, litellm.drop_params) that affect all threads in the process.
  • Cost calculation brittleness. litellm.completion_cost() fails silently on unknown or recently-released model IDs, requiring a try/except fallback.
  • Typing gaps. The litellm response type is dict | ModelResponse and provides no static guarantees — every caller manually indexes into .choices[0].message.content.
  • Test friction. Mocking litellm requires patching module-level globals and constructing mock response objects that mirror litellm internals.
  • Dependency weight. The full litellm package pulls in 100+ transitive dependencies; the pydantic-ai-slim[anthropic,google,openai] variant installs only what we need.

LangChain/LangGraph and DeepAgents still depend on litellm transitively (agent orchestration path), so litellm remains in the dependency graph. This ADR covers removing the direct usage from our scan pipeline only.


Decision

Replace LiteLLMAdapter with PydanticAIAdapter as the concrete implementation of the LLMProvider protocol for the scan pipeline.

What changed

LayerBeforeAfter
Infrastructure adapterLiteLLMAdapter (litellm.completion)PydanticAIAdapter (pydantic-ai model API)
Capability detectionlitellm.get_model_info()Config-driven registry + Ollama /api/show
Cost calculationlitellm.completion_cost()genai-prices.calc_price() → MODEL_CONFIGS fallback
Rate limiterDefined in client.pyMoved to rate_limiter.py, re-exported from client.py
Composition rootLiteLLMAdapter()PydanticAIAdapter()

What did NOT change

  • The LLMProvider protocol in spectral.application.shared.protocols — unchanged.
  • TracedLLMProvider — operates on the protocol, no changes needed.
  • ModelRouter — injects via protocol, no changes needed (default adapter updated).
  • LangChain/LangGraph/DeepAgents agent orchestration — completely untouched.
  • The provider/model string format used in settings and .env files.

Architecture

composition.py
└─ PydanticAIAdapter (infrastructure/shared/llm/pydantic_ai_adapter.py)
└─ implements LLMProvider (application/shared/protocols.py)
└─ wrapped by TracedLLMProvider
└─ used by ModelRouter

All pydantic-ai imports are confined to pydantic_ai_adapter.py. No pydantic-ai imports exist in the domain or application layers. The architecture validator enforces this boundary.


Protocol boundary approach

PydanticAIAdapter implements LLMProvider with two methods:

  • call(model, system, user, max_tokens, temperature)LLMResponse
  • call_with_tools(model, system, messages, tools, max_tokens)ToolResponse

The pydantic-ai Model.request() is async. Since the scan engine calls the protocol synchronously from ThreadPoolExecutor workers, the adapter bridges async→sync via asyncio.run() in each worker thread. If a running event loop is detected (e.g., in async tests), it falls back to a ThreadPoolExecutor with a new loop.


Cost calculation

Cost is calculated with a fallback chain:

  1. genai-prices.calc_price() — model-aware, regularly updated pricing data.
  2. MODEL_CONFIGS per-tier rates — static rates hardcoded in router.py.
  3. 0.0 — safe default if both fail.

This replaces litellm.completion_cost() which failed silently on new or unknown models.


Capability detection

capabilities.py no longer calls litellm.get_model_info(). Instead:

  • Cloud models (Anthropic, Google, OpenAI): capabilities are declared in a config-driven registry (_CLOUD_CAPABILITIES list in capabilities.py). This is explicit, versioned in code, and not subject to litellm’s registry lag.
  • Ollama models: Ollama’s /api/show HTTP endpoint is queried directly (unchanged).

Model instance caching

PydanticAIAdapter caches model instances per (provider, model_name) tuple. Model objects are stateless once constructed, so caching is safe and avoids re-creating HTTP client connections on every scan call.


Extensibility

To add a new provider:

  1. Add an entry to _PROVIDER_MAP in pydantic_ai_adapter.py.
  2. Add a branch in _create_model() to construct the pydantic-ai model.
  3. Add capability entries to _CLOUD_CAPABILITIES in capabilities.py.

No changes to the LLMProvider protocol or any application-layer code are needed.


Alternatives considered

Keep LiteLLM for everything

Rejected. Global state leakage, cost tracking brittleness, and test friction were all real operational problems. The protocol boundary makes it straightforward to swap the adapter without touching application code.

Use LangChain for scan pipeline LLM calls

Rejected. LangChain is already used for agent orchestration (see ADR-007) and adds significant abstraction overhead for simple completion calls. The scan pipeline needs direct token-counted responses, not LangChain’s chain/runnable model.

Build a bespoke provider adapter per provider (Anthropic SDK, Google SDK, etc.)

Rejected. Pydantic AI already provides well-typed, async-first adapters for all three providers we use. Building our own duplicates that work without adding value.