Category Guide

What Is LiteLLM?

LiteLLM is an open-source Python library that lets you call 100+ LLM APIs using the OpenAI format. One interface, every provider. Here is how it works, what it costs, and where it fits in your stack.

LiteLLM Explained

LiteLLM is an open-source Python library created by BerriAI that provides a unified interface for calling large language model APIs. You write one completion() call using the OpenAI format, and LiteLLM translates it into the correct request format for whichever provider you specify — Anthropic, Google Vertex AI, AWS Bedrock, Azure OpenAI, Cohere, Mistral, and dozens of others.

The core problem LiteLLM solves is provider lock-in at the code level. Every LLM provider has its own SDK, its own request format, and its own response schema. If you start with OpenAI and want to try Anthropic, you rewrite your API calls, handle different error formats, and parse a different response structure. LiteLLM eliminates this by standardizing on the OpenAI format for all providers. Switching from model="gpt-4o" to model="claude-sonnet-4" is a one-line change.

LiteLLM comes in two forms. The SDK is a Python package you import directly into your application. The Proxy Server is a standalone service that exposes an OpenAI-compatible HTTP endpoint. The proxy adds operational features — load balancing across API keys, rate limiting per user or team, virtual API key management, and a spend tracking dashboard. Most production deployments use the proxy.

LiteLLM has become one of the most popular open-source LLM gateways, with over 18,000 GitHub stars. It is widely used in AI agent frameworks, internal developer platforms, and multi-model applications where teams need to route different tasks to different providers without maintaining separate integrations for each.

How LiteLLM Works

Unified Completion Interface

You call litellm.completion(model="provider/model-name", messages=[...]) and LiteLLM maps the request to the correct provider API. It handles authentication, request formatting, and response normalization. The response always comes back in the OpenAI format regardless of which provider served it.

Provider Routing via Model Prefix

LiteLLM uses a naming convention to identify providers: anthropic/claude-sonnet-4, bedrock/anthropic.claude-3-sonnet, vertex_ai/gemini-pro. OpenAI models do not need a prefix. This convention lets you switch providers by changing a string rather than importing a different SDK.

Fallback Chains

You can configure a list of models to try in order. If the primary model fails (rate limit, timeout, provider outage), LiteLLM automatically retries with the next model in the chain. This is one of LiteLLM's most used features for production reliability.

Streaming Support

LiteLLM supports streaming responses across all providers, normalizing the different streaming formats into a consistent interface. This matters for user-facing applications where time-to-first-token determines perceived responsiveness.

LiteLLM Proxy Server

The LiteLLM Proxy runs as a separate service and accepts standard OpenAI API calls over HTTP. Any language or framework that can call the OpenAI API can use LiteLLM Proxy without installing the Python SDK. The proxy adds load balancing, virtual API keys for team management, rate limiting, and a spend tracking UI.

LiteLLM Pricing

LiteLLM itself is free and open source under the MIT license. There is no cost for the Python SDK or for self-hosting the proxy server. The costs you pay are the underlying LLM provider charges — OpenAI, Anthropic, Google, and others bill you directly for token usage.

Tier Price Includes
Open Source (SDK) Free Unified API, 100+ providers, streaming, fallbacks
Open Source (Proxy) Free (self-hosted) Load balancing, rate limiting, virtual keys, spend tracking
Enterprise Custom pricing SSO, audit logs, priority support, SLA

The hidden cost of LiteLLM is operational. Self-hosting the proxy means running and maintaining another service in your infrastructure — deployment, monitoring, upgrades, and ensuring high availability. If the proxy goes down, every LLM call in your application fails. Teams that do not want to manage this infrastructure can use managed alternatives like Portkey or Helicone, though those come with their own pricing.

LiteLLM Alternatives Compared

Feature LiteLLM Portkey Helicone Langfuse MarginDash
Primary purpose Gateway / routing Gateway / reliability Observability Tracing / evals Cost / margin
Multi-provider routing No No
Cost tracking
Per-customer cost No No No No
Revenue / margin No No No No
Prompt tracing Basic logging No
Cost simulator No No No No
Open source No No No
Self-hostable Cloud only Cloud only Cloud only

LiteLLM focuses on routing and provider abstraction — it gives you a single API across all LLM providers. Portkey offers similar routing with a managed service model. Helicone and Langfuse focus on observability and debugging. MarginDash focuses on per-customer cost attribution and margin analysis — connecting AI costs to revenue to show which customers are profitable.

When to Use LiteLLM

You Use Multiple LLM Providers

If your application calls OpenAI, Anthropic, and Google, LiteLLM saves you from maintaining three separate SDKs and handling three different response formats. The unified interface means your application code does not need to know which provider is serving a given request.

You Need Automatic Failover

Provider outages happen. If your application depends on a single provider with no fallback, an outage means downtime. LiteLLM's fallback chains let you specify backup providers that kick in automatically when the primary is down.

You Want to Experiment with Models Quickly

Testing whether Claude performs better than GPT-4o for a specific task should not require rewriting your API integration. With LiteLLM, you change a model string and compare results. This is especially useful during development and evaluation phases.

You Are Building an Internal AI Platform

Organizations that want to give developers access to LLM APIs through a central gateway use the LiteLLM Proxy to manage API keys, enforce rate limits, and track spend per team. This is a common pattern for platform engineering teams supporting multiple product teams.

When You May Not Need LiteLLM

You only use one provider and plan to stay with them, your application is simple enough that direct API calls are manageable, or you want a managed service and do not want to host infrastructure. In those cases, direct SDK usage or a managed gateway like Portkey may be simpler.

LiteLLM Limitations

Python-Only SDK

The LiteLLM library is Python-only. If your application is in Node.js, Go, or another language, you need to use the proxy server as an HTTP endpoint rather than importing the SDK directly. The proxy solves this but adds infrastructure complexity.

Self-Hosting Burden

Running the LiteLLM Proxy in production means maintaining a highly available service. If the proxy goes down, every LLM call in your stack fails. You need monitoring, health checks, and likely a redundant deployment. This operational overhead is significant for small teams.

Aggregate Spend Tracking Only

LiteLLM tracks spend per API key, per team, and per model. It does not track spend per individual customer in your application. If you are reselling AI features and need to know which end-customers are profitable, you need a separate cost attribution layer.

Provider-Specific Features May Be Unavailable

By standardizing on the OpenAI format, LiteLLM necessarily abstracts away some provider-specific features. If you need to use a capability unique to one provider — like Anthropic's extended thinking or Google's grounding — you may need to drop down to the native SDK for those calls.

Frequently Asked Questions

What is LiteLLM?
LiteLLM is an open-source Python library that provides a unified interface for calling 100+ LLM APIs using the OpenAI format. It translates your OpenAI-style API calls into the correct format for each provider — Anthropic, Google, AWS Bedrock, Azure, Cohere, and others — so you can switch models without rewriting code. It also includes a proxy server that acts as an LLM gateway with load balancing, rate limiting, and spend tracking.
Is LiteLLM free?
LiteLLM's Python SDK is free and open source (MIT license). The LiteLLM Proxy Server is also open source and free to self-host. BerriAI, the company behind LiteLLM, offers a managed Enterprise tier with SSO, audit logs, and premium support at custom pricing. You still pay the underlying LLM provider costs (OpenAI, Anthropic, etc.) regardless of which tier you use.
What is the difference between LiteLLM SDK and LiteLLM Proxy?
The LiteLLM SDK is a Python library you import and use directly in your code — it translates completion() calls into the correct format for each provider. The LiteLLM Proxy is a standalone server that exposes an OpenAI-compatible API endpoint. Your application makes standard OpenAI API calls to the proxy, and the proxy routes them to the configured provider. The proxy adds features like load balancing, rate limiting, virtual API keys, and a spend tracking dashboard.
How does LiteLLM handle cost tracking?
LiteLLM tracks spend at the proxy level — it logs token usage per request and calculates costs using its internal pricing data. You can view spend per API key, per team, and per model through the proxy dashboard. This works well for aggregate spend visibility, but LiteLLM does not connect costs to individual customers in your application or calculate margin against revenue. If you need per-customer profitability tracking, you need a dedicated cost attribution tool.
Does LiteLLM track costs per customer?
No. LiteLLM tracks spend per API key, per team, and per model — but not per customer in your application. If you charge customers for AI features and need to know which ones are profitable after API costs, you need a tool that ties token usage to individual customer identifiers and revenue. LiteLLM's spend tracking is designed for internal team budgeting, not per-customer unit economics.
What are the best LiteLLM alternatives?
For multi-provider routing: Portkey (managed gateway with reliability features), Helicone (observability-first with gateway capabilities), and Kong AI Gateway (enterprise API gateway with AI plugins). For cost tracking specifically: MarginDash (per-customer cost and margin analysis with Stripe integration), Helicone (request-level cost logging), and Langfuse (open-source tracing with basic cost tracking). The right choice depends on whether you need routing, observability, cost attribution, or a combination.

Track AI Costs Per Customer

See per-customer P&L across 100+ models from OpenAI, Anthropic, Google, and more. Connect to Stripe for revenue. Simulate savings from model swaps. Get budget alerts before spend goes sideways. Set up in 5 minutes.

Start Tracking Costs →

No credit card required

Stop guessing. Start measuring.

Create an account, install the SDK, and see your first margin data in minutes.

See My Margin Data

No credit card required