The Problem
Different LLM providers use different APIs:
- Anthropic: Messages API with
contentblocks,tool_use/tool_resulttypes, prompt caching - OpenAI: Chat Completions API with
messages,tool_calls,functionformat
agent-code needs to work identically regardless of which provider is configured.
Architecture
User prompt
│
▼
Query Engine (provider-agnostic)
│
▼
Provider Detection (auto from model name + base URL)
│
├── Anthropic wire format → Anthropic Messages API
└── OpenAI wire format → OpenAI Chat Completions API
│
▼
SSE Stream → Normalize → Unified ContentBlock types
│
▼
Tool execution (same code path regardless of provider)
Provider Detection
detect_provider() in llm/provider.rs determines the provider from:
- Model name:
claude-*→ Anthropic,gpt-*→ OpenAI,grok-*→ xAI, etc. - Base URL:
api.anthropic.com→ Anthropic,api.openai.com→ OpenAI, etc. - Environment:
AGENT_CODE_USE_BEDROCK→ Bedrock,AGENT_CODE_USE_VERTEX→ Vertex
Each provider maps to a WireFormat:
| Wire Format | Providers |
|---|---|
Anthropic | Anthropic, Bedrock, Vertex |
OpenAi | OpenAI, xAI, Google, DeepSeek, Groq, Mistral, Together, Zhipu, Ollama, any compatible |
Wire Formats
Anthropic (llm/anthropic.rs)
- Sends
messageswithcontentas array of typed blocks - Tool calls appear as
tool_usecontent blocks in assistant messages - Tool results are
tool_resultcontent blocks in user messages - Supports
cache_controlbreakpoints for prompt caching - Extended thinking via
thinkingcontent blocks
OpenAI (llm/openai.rs)
- Sends
messageswithcontentas string or array - Tool calls appear in
tool_callsarray on assistant messages - Tool results are separate messages with
role: "tool" - Supports streaming via SSE with
[DONE]sentinel
Message Normalization
llm/normalize.rs ensures messages are valid before sending:
- Tool pairing: every
tool_useblock must have a matchingtool_resultin the next user message - Alternation: user and assistant messages must alternate (APIs reject consecutive same-role messages)
- Empty handling: empty content arrays are removed or filled with placeholder text
This runs after every turn, before the next API call.
Stream Parsing
llm/stream.rs handles SSE (Server-Sent Events) parsing:
- Read
data:lines from the HTTP response stream - Parse JSON deltas (content block starts, text deltas, tool input deltas)
- Accumulate into complete
ContentBlockinstances - Emit blocks to the UI (real-time text display) and executor (tool dispatch)
The stream parser handles both Anthropic's content_block_delta events and OpenAI's choices[0].delta format through the wire format abstraction.
Error Recovery
| Error | Recovery |
|---|---|
| Rate limited (429) | Wait retry_after ms, retry up to 5 times |
| Overloaded (529) | 5s exponential backoff, fall back to smaller model after 3 attempts |
| Prompt too long (413) | Reactive compaction, then retry |
| Max output tokens | Inject continuation message, retry up to 3 times |
| Stream interrupted | Reconnect with exponential backoff |
The retry state machine in llm/retry.rs tracks attempts per error type and supports model fallback (e.g., Opus → Sonnet on overload).
Source: llm/provider.rs (detection), llm/anthropic.rs (Anthropic format), llm/openai.rs (OpenAI format), llm/normalize.rs (validation), llm/stream.rs (SSE parsing), llm/retry.rs (error recovery)