Category Guide
LiteLLM is an open-source Python library that lets you call 100+ LLM APIs using the OpenAI format. One interface, every provider. Here is how it works, what it costs, and where it fits in your stack.
LiteLLM is an open-source Python library created by BerriAI that provides a unified interface for calling large language model APIs. You write one completion() call using the OpenAI format, and LiteLLM translates it into the correct request format for whichever provider you specify — Anthropic, Google Vertex AI, AWS Bedrock, Azure OpenAI, Cohere, Mistral, and dozens of others.
The core problem LiteLLM solves is provider lock-in at the code level. Every LLM provider has its own SDK, its own request format, and its own response schema. If you start with OpenAI and want to try Anthropic, you rewrite your API calls, handle different error formats, and parse a different response structure. LiteLLM eliminates this by standardizing on the OpenAI format for all providers. Switching from model="gpt-4o" to model="claude-sonnet-4" is a one-line change.
LiteLLM comes in two forms. The SDK is a Python package you import directly into your application. The Proxy Server is a standalone service that exposes an OpenAI-compatible HTTP endpoint. The proxy adds operational features — load balancing across API keys, rate limiting per user or team, virtual API key management, and a spend tracking dashboard. Most production deployments use the proxy.
LiteLLM has become one of the most popular open-source LLM gateways, with over 18,000 GitHub stars. It is widely used in AI agent frameworks, internal developer platforms, and multi-model applications where teams need to route different tasks to different providers without maintaining separate integrations for each.
You call litellm.completion(model="provider/model-name", messages=[...]) and LiteLLM maps the request to the correct provider API. It handles authentication, request formatting, and response normalization. The response always comes back in the OpenAI format regardless of which provider served it.
LiteLLM uses a naming convention to identify providers: anthropic/claude-sonnet-4, bedrock/anthropic.claude-3-sonnet, vertex_ai/gemini-pro. OpenAI models do not need a prefix. This convention lets you switch providers by changing a string rather than importing a different SDK.
You can configure a list of models to try in order. If the primary model fails (rate limit, timeout, provider outage), LiteLLM automatically retries with the next model in the chain. This is one of LiteLLM's most used features for production reliability.
LiteLLM supports streaming responses across all providers, normalizing the different streaming formats into a consistent interface. This matters for user-facing applications where time-to-first-token determines perceived responsiveness.
The LiteLLM Proxy runs as a separate service and accepts standard OpenAI API calls over HTTP. Any language or framework that can call the OpenAI API can use LiteLLM Proxy without installing the Python SDK. The proxy adds load balancing, virtual API keys for team management, rate limiting, and a spend tracking UI.
LiteLLM itself is free and open source under the MIT license. There is no cost for the Python SDK or for self-hosting the proxy server. The costs you pay are the underlying LLM provider charges — OpenAI, Anthropic, Google, and others bill you directly for token usage.
| Tier | Price | Includes |
|---|---|---|
| Open Source (SDK) | Free | Unified API, 100+ providers, streaming, fallbacks |
| Open Source (Proxy) | Free (self-hosted) | Load balancing, rate limiting, virtual keys, spend tracking |
| Enterprise | Custom pricing | SSO, audit logs, priority support, SLA |
The hidden cost of LiteLLM is operational. Self-hosting the proxy means running and maintaining another service in your infrastructure — deployment, monitoring, upgrades, and ensuring high availability. If the proxy goes down, every LLM call in your application fails. Teams that do not want to manage this infrastructure can use managed alternatives like Portkey or Helicone, though those come with their own pricing.
| Feature | LiteLLM | Portkey | Helicone | Langfuse | MarginDash |
|---|---|---|---|---|---|
| Primary purpose | Gateway / routing | Gateway / reliability | Observability | Tracing / evals | Cost / margin |
| Multi-provider routing | No | No | |||
| Cost tracking | |||||
| Per-customer cost | No | No | No | No | |
| Revenue / margin | No | No | No | No | |
| Prompt tracing | Basic logging | No | |||
| Cost simulator | No | No | No | No | |
| Open source | No | No | No | ||
| Self-hostable | Cloud only | Cloud only | Cloud only |
LiteLLM focuses on routing and provider abstraction — it gives you a single API across all LLM providers. Portkey offers similar routing with a managed service model. Helicone and Langfuse focus on observability and debugging. MarginDash focuses on per-customer cost attribution and margin analysis — connecting AI costs to revenue to show which customers are profitable.
If your application calls OpenAI, Anthropic, and Google, LiteLLM saves you from maintaining three separate SDKs and handling three different response formats. The unified interface means your application code does not need to know which provider is serving a given request.
Provider outages happen. If your application depends on a single provider with no fallback, an outage means downtime. LiteLLM's fallback chains let you specify backup providers that kick in automatically when the primary is down.
Testing whether Claude performs better than GPT-4o for a specific task should not require rewriting your API integration. With LiteLLM, you change a model string and compare results. This is especially useful during development and evaluation phases.
Organizations that want to give developers access to LLM APIs through a central gateway use the LiteLLM Proxy to manage API keys, enforce rate limits, and track spend per team. This is a common pattern for platform engineering teams supporting multiple product teams.
You only use one provider and plan to stay with them, your application is simple enough that direct API calls are manageable, or you want a managed service and do not want to host infrastructure. In those cases, direct SDK usage or a managed gateway like Portkey may be simpler.
The LiteLLM library is Python-only. If your application is in Node.js, Go, or another language, you need to use the proxy server as an HTTP endpoint rather than importing the SDK directly. The proxy solves this but adds infrastructure complexity.
Running the LiteLLM Proxy in production means maintaining a highly available service. If the proxy goes down, every LLM call in your stack fails. You need monitoring, health checks, and likely a redundant deployment. This operational overhead is significant for small teams.
LiteLLM tracks spend per API key, per team, and per model. It does not track spend per individual customer in your application. If you are reselling AI features and need to know which end-customers are profitable, you need a separate cost attribution layer.
By standardizing on the OpenAI format, LiteLLM necessarily abstracts away some provider-specific features. If you need to use a capability unique to one provider — like Anthropic's extended thinking or Google's grounding — you may need to drop down to the native SDK for those calls.
See per-customer P&L across 100+ models from OpenAI, Anthropic, Google, and more. Connect to Stripe for revenue. Simulate savings from model swaps. Get budget alerts before spend goes sideways. Set up in 5 minutes.
Start Tracking Costs →No credit card required
Create an account, install the SDK, and see your first margin data in minutes.
See My Margin DataNo credit card required