Category Guide
Understand what an LLM gateway does, compare the leading options, and learn how to track costs across every provider your gateway routes to.
An LLM gateway is a unified API layer that sits between your application and one or more LLM providers — OpenAI, Anthropic, Google, AWS Bedrock, Azure, Groq, and others. Instead of your application code calling each provider's API directly, every request goes through the gateway. The gateway translates the request into the provider-specific format, routes it to the appropriate model, and returns a standardized response.
The core value of an LLM gateway is abstraction. Your application talks to a single API regardless of which provider serves the request. Switching from OpenAI to Anthropic — or routing different features to different providers — becomes a configuration change rather than a code change. This decoupling matters more as the number of models and providers you use grows.
Beyond routing, most LLM gateways handle operational concerns that every production AI application needs: automatic failover when a provider is down, rate limiting to stay within provider quotas, request caching to avoid paying for identical calls, load balancing across multiple API keys or accounts, and centralized logging of every request and response.
The term "LLM gateway" is sometimes used interchangeably with "AI gateway" or "LLM proxy." In practice, a proxy is typically a simple pass-through that forwards requests to a single provider. A gateway adds routing logic, multi-provider support, and operational features on top. If you are using more than one LLM provider — or plan to — you are looking for a gateway, not a proxy.
| Feature | LiteLLM | Portkey | Helicone Gateway | Kong AI Gateway | Custom Build |
|---|---|---|---|---|---|
| Multi-provider routing | You build it | ||||
| Automatic failover | Limited | You build it | |||
| Load balancing | No | You build it | |||
| Response caching | You build it | ||||
| Rate limiting | You build it | ||||
| Usage logging | You build it | ||||
| Open source | No | No | N/A | ||
| Self-hostable | Cloud only | Cloud only | By definition |
LiteLLM is an open-source Python library and proxy server that provides an OpenAI-compatible API across 100+ models. It is the most popular open-source option for teams that want to self-host. Portkey is a managed gateway with a focus on reliability features — automatic retries, fallbacks, and load balancing. Helicone Gateway is primarily an observability platform that also offers gateway functionality for routing and caching. Kong AI Gateway extends the Kong API gateway with AI-specific plugins for multi-provider routing and rate limiting.
Unified API and routing. The gateway exposes a single endpoint that accepts requests in a common format. Routing rules determine which provider and model handle each request. Routes can be static (feature X always uses Anthropic) or dynamic (route based on input length, customer tier, or task complexity).
Failover and fallback. When a provider returns errors or becomes unresponsive, the gateway automatically retries the request with a fallback provider. A well-configured failover chain might route from OpenAI to Anthropic to Google, so a single provider outage does not become an application outage.
Load balancing. If you have multiple API keys for the same provider, the gateway distributes requests across them. Some gateways also balance across providers based on latency or cost.
Response caching. Identical requests return cached responses instead of making another API call. Especially effective for classification, extraction, and deterministic workloads.
Logging and observability. Every request passes through the gateway, making it the natural place to log usage data. This is the foundation for LLM monitoring and LLM observability.
The biggest cost lever an LLM gateway provides is intelligent model routing — sending different tasks to different models based on complexity. Not every request needs a frontier model. Classification, extraction, and summarization can often be handled by fast-tier models that cost a fraction of the price while scoring within a few percentage points on benchmarks like MMLU-Pro and GPQA.
Caching compounds the savings. If 15% of your requests are cache-eligible, that is 15% of your API spend eliminated entirely. The gateway handles this transparently.
The cost dimension that gateways do not solve is cost attribution. A gateway knows the total cost of all requests it routed, but it does not connect those costs to individual customers or revenue. That requires a separate layer — tools like MarginDash and the broader discipline of AI cost management.
Calling LLM provider APIs directly is the simplest approach and works well in the early stages. You import the OpenAI SDK, make a call, get a response. No additional infrastructure required. The tradeoffs emerge as your usage grows — more providers, more models, more customers, more things that can go wrong.
When direct API calls are enough: You use a single provider. Your request volume is low enough that rate limits are not a concern. You do not need automatic failover. Provider outages are tolerable. You have a small number of models and switching between them is infrequent. In this scenario, a gateway adds complexity without proportional value.
When you need a gateway: You use multiple providers and want a unified API. You need automatic failover when a provider goes down. You want to route different tasks to different models without changing application code. You need rate limiting across API keys. You want response caching to reduce redundant calls. You have multiple services making LLM calls and want centralized logging.
The latency tradeoff is real but usually small. A gateway adds a network hop and processing time for each request. For self-hosted gateways running in the same infrastructure, the added latency is typically single-digit milliseconds — negligible compared to LLM response times. For managed cloud gateways, the added latency depends on where the gateway is hosted relative to your application and the provider.
The operational tradeoff is more significant. A self-hosted gateway is another piece of infrastructure to deploy, monitor, and maintain. It needs to be highly available — if your gateway goes down, every LLM call in your application fails. Managed gateways offload this operational burden but introduce a dependency on a third-party service. Choose based on your team's infrastructure maturity and tolerance for external dependencies.
Your LLM gateway routes the traffic. MarginDash tracks the cost. See per-customer P&L across 100+ models, simulate savings from model swaps, and get budget alerts before spend goes sideways. Set up in 5 minutes.
Start Tracking Gateway Costs →No credit card required
Create an account, install the SDK, and see your first margin data in minutes.
See My Margin DataNo credit card required