What Is LiteLLM?

LiteLLM Explained

LiteLLM is an open-source Python library created by BerriAI that provides a unified interface for calling large language model APIs. You write one completion() call using the OpenAI format, and LiteLLM translates it into the correct request format for whichever provider you specify — Anthropic, Google Vertex AI, AWS Bedrock, Azure OpenAI, Cohere, Mistral, and dozens of others.

The core problem LiteLLM solves is provider lock-in at the code level. Every LLM provider has its own SDK, its own request format, and its own response schema. If you start with OpenAI and want to try Anthropic, you rewrite your API calls, handle different error formats, and parse a different response structure. LiteLLM eliminates this by standardizing on the OpenAI format for all providers. Switching from model="openai/gpt-4o" to model="anthropic/claude-sonnet-4" is a one-line change.

LiteLLM comes in two forms. The SDK is a Python package you import directly into your application. The Proxy Server is a standalone service that exposes an OpenAI-compatible HTTP endpoint. The proxy adds operational features — load balancing across API keys, rate limiting per user or team, virtual API key management, and a spend tracking dashboard. Most production deployments use the proxy.

LiteLLM has become one of the most popular open-source LLM gateways, with over 18,000 GitHub stars. It is widely used in AI agent frameworks, internal developer platforms, and multi-model applications where teams need to route different tasks to different providers without maintaining separate integrations for each.

How LiteLLM Works

Unified Completion Interface

You call litellm.completion(model="provider/model-name", messages=[...]) and LiteLLM maps the request to the correct provider API. It handles authentication, request formatting, and response normalization. The response always comes back in the OpenAI format regardless of which provider served it.

Provider Routing via Model Prefix

LiteLLM uses a naming convention to identify providers: anthropic/claude-sonnet-4, bedrock/anthropic.claude-3-sonnet, vertex_ai/gemini-pro. In MarginDash, always send full provider/model slugs (including OpenAI) so pricing lookup is unambiguous.

Fallback Chains

You can configure a list of models to try in order. If the primary model fails (rate limit, timeout, provider outage), LiteLLM automatically retries with the next model in the chain. This is one of LiteLLM's most used features for production reliability.

Streaming Support

LiteLLM supports streaming responses across all providers, normalizing the different streaming formats into a consistent interface. This matters for user-facing applications where time-to-first-token determines perceived responsiveness.

LiteLLM Proxy Server

The LiteLLM Proxy runs as a separate service and accepts standard OpenAI API calls over HTTP. Any language or framework that can call the OpenAI API can use LiteLLM Proxy without installing the Python SDK. The proxy adds load balancing, virtual API keys for team management, rate limiting, and a spend tracking UI.

LiteLLM Pricing

LiteLLM itself is free and open source under the MIT license. There is no cost for the Python SDK or for self-hosting the proxy server. The costs you pay are the underlying LLM provider charges — OpenAI, Anthropic, Google, and others bill you directly for token usage.

Tier	Price	Includes
Open Source (SDK)	Free	Unified API, 100+ providers, streaming, fallbacks
Open Source (Proxy)	Free (self-hosted)	Load balancing, rate limiting, virtual keys, spend tracking
Enterprise	Custom pricing	SSO, audit logs, priority support, SLA

The hidden cost of LiteLLM is operational. Self-hosting the proxy means running and maintaining another service in your infrastructure — deployment, monitoring, upgrades, and ensuring high availability. If the proxy goes down, every LLM call in your application fails. Teams that do not want to manage this infrastructure can use managed alternatives like Portkey or Helicone, though those come with their own pricing.

LiteLLM Alternatives Compared

Feature	LiteLLM	Portkey	Helicone	Langfuse	MarginDash
Primary purpose	Gateway / routing	Gateway / reliability	Observability	Tracing / evals	Cost / margin
Multi-provider routing				No	No
Cost tracking
Per-customer cost	No	No	No	No
Revenue / margin	No	No	No	No
Prompt tracing	Basic logging				No
Cost simulator	No	No	No	No
Open source		No	No		No
Self-hostable		Cloud only	Cloud only		Cloud only

LiteLLM is strongest at routing and provider abstraction. MarginDash is strongest at per-customer cost attribution and margin analysis. Teams often use both: LiteLLM as the gateway, MarginDash as the unit-economics layer.

Already running LiteLLM Proxy? Add MarginDash to see cost per customer, connect revenue from Stripe, and track which accounts are profitable.

LiteLLM vs MarginDash: Which One Do You Need?

Use LiteLLM if...

You need multi-provider routing, failover, and an OpenAI-compatible gateway for your app or internal platform.

Use MarginDash if...

You need cost per customer, feature-level attribution, revenue sync, and margin visibility for AI features in production.

Use both if...

You want gateway reliability plus business visibility: route traffic with LiteLLM and measure profitability with MarginDash.

Skip both if...

You are early-stage, single-model, and only need simple direct SDK calls with no routing or per-customer cost reporting yet.

When to Use LiteLLM

You Use Multiple LLM Providers

If your application calls OpenAI, Anthropic, and Google, LiteLLM saves you from maintaining three separate SDKs and handling three different response formats. The unified interface means your application code does not need to know which provider is serving a given request.

You Need Automatic Failover

Provider outages happen. If your application depends on a single provider with no fallback, an outage means downtime. LiteLLM's fallback chains let you specify backup providers that kick in automatically when the primary is down.

You Want to Experiment with Models Quickly

Testing whether Claude performs better than GPT-4o for a specific task should not require rewriting your API integration. With LiteLLM, you change a model string and compare results. This is especially useful during development and evaluation phases.

You Are Building an Internal AI Platform

Organizations that want to give developers access to LLM APIs through a central gateway use the LiteLLM Proxy to manage API keys, enforce rate limits, and track spend per team. This is a common pattern for platform engineering teams supporting multiple product teams.

When to Add MarginDash on Top

As soon as AI costs need to be tied to real customers, features, or revenue, add a dedicated cost management layer. LiteLLM can tell you aggregate gateway spend. MarginDash tells you who is profitable.

LiteLLM Limitations

Python-Only SDK

The LiteLLM library is Python-only. If your application is in Node.js, Go, or another language, you need to use the proxy server as an HTTP endpoint rather than importing the SDK directly. The proxy solves this but adds infrastructure complexity.

Self-Hosting Burden

Running the LiteLLM Proxy in production means maintaining a highly available service. If the proxy goes down, every LLM call in your stack fails. You need monitoring, health checks, and likely a redundant deployment. This operational overhead is significant for small teams.

Aggregate Spend Tracking Only

LiteLLM tracks spend per API key, per team, and per model. It does not track spend per individual customer in your application. If you are reselling AI features and need to know which end-customers are profitable, you need a separate cost attribution layer.

Provider-Specific Features May Be Unavailable

By standardizing on the OpenAI format, LiteLLM necessarily abstracts away some provider-specific features. If you need to use a capability unique to one provider — like Anthropic's extended thinking or Google's grounding — you may need to drop down to the native SDK for those calls.

Frequently Asked Questions

What is LiteLLM?

LiteLLM is an open-source Python library that provides a unified interface for calling 100+ LLM APIs using the OpenAI format. It translates your OpenAI-style API calls into the correct format for each provider — Anthropic, Google, AWS Bedrock, Azure, Cohere, and others — so you can switch models without rewriting code. It also includes a proxy server that acts as an LLM gateway with load balancing, rate limiting, and spend tracking.

Is LiteLLM free?

LiteLLM's Python SDK is free and open source (MIT license). The LiteLLM Proxy Server is also open source and free to self-host. BerriAI, the company behind LiteLLM, offers a managed Enterprise tier with SSO, audit logs, and premium support at custom pricing. You still pay the underlying LLM provider costs (OpenAI, Anthropic, etc.) regardless of which tier you use.

What is the difference between LiteLLM SDK and LiteLLM Proxy?

The LiteLLM SDK is a Python library you import and use directly in your code — it translates completion() calls into the correct format for each provider. The LiteLLM Proxy is a standalone server that exposes an OpenAI-compatible API endpoint. Your application makes standard OpenAI API calls to the proxy, and the proxy routes them to the configured provider. The proxy adds features like load balancing, rate limiting, virtual API keys, and a spend tracking dashboard.

How does LiteLLM handle cost tracking?

LiteLLM tracks spend at the proxy level — it logs token usage per request and calculates costs using its internal pricing data. You can view spend per API key, per team, and per model through the proxy dashboard. This works well for aggregate spend visibility, but LiteLLM does not connect costs to individual customers in your application or calculate margin against revenue. If you need per-customer profitability tracking, you need a dedicated cost attribution tool.

Does LiteLLM track costs per customer?

No. LiteLLM tracks spend per API key, per team, and per model — but not per customer in your application. If you charge customers for AI features and need to know which ones are profitable after API costs, you need a tool that ties token usage to individual customer identifiers and revenue. LiteLLM's spend tracking is designed for internal team budgeting, not per-customer unit economics.

What are the best LiteLLM alternatives?

For multi-provider routing: Portkey (managed gateway with reliability features), Helicone (observability-first with gateway capabilities), and Kong AI Gateway (enterprise API gateway with AI plugins). For cost tracking specifically: MarginDash (per-customer cost and margin analysis with Stripe integration), Helicone (request-level cost logging), and Langfuse (open-source tracing with basic cost tracking). The right choice depends on whether you need routing, observability, cost attribution, or a combination.

Track AI Costs Per Customer

See per-customer P&L across 400+ models from OpenAI, Anthropic, Google, and more. Connect to Stripe for revenue. Simulate savings from model swaps. Get budget alerts before spend goes sideways. Set up in 5 minutes.

Start Tracking Costs →

No credit card required