Category Guide

LLM Gateway: Route, Monitor, and Control AI API Traffic

Understand what an LLM gateway does, compare the leading options, and learn how to track costs across every provider your gateway routes to.

What is an LLM Gateway?

An LLM gateway is a unified API layer that sits between your application and one or more LLM providers — OpenAI, Anthropic, Google, AWS Bedrock, Azure, Groq, and others. Instead of your application code calling each provider's API directly, every request goes through the gateway. The gateway translates the request into the provider-specific format, routes it to the appropriate model, and returns a standardized response.

The core value of an LLM gateway is abstraction. Your application talks to a single API regardless of which provider serves the request. Switching from OpenAI to Anthropic — or routing different features to different providers — becomes a configuration change rather than a code change. This decoupling matters more as the number of models and providers you use grows.

Beyond routing, most LLM gateways handle operational concerns that every production AI application needs: automatic failover when a provider is down, rate limiting to stay within provider quotas, request caching to avoid paying for identical calls, load balancing across multiple API keys or accounts, and centralized logging of every request and response.

The term "LLM gateway" is sometimes used interchangeably with "AI gateway" or "LLM proxy." In practice, a proxy is typically a simple pass-through that forwards requests to a single provider. A gateway adds routing logic, multi-provider support, and operational features on top. If you are using more than one LLM provider — or plan to — you are looking for a gateway, not a proxy.

Top LLM Gateways Compared

Feature LiteLLM Portkey Helicone Gateway Kong AI Gateway Custom Build
Multi-provider routing You build it
Automatic failover Limited You build it
Load balancing No You build it
Response caching You build it
Rate limiting You build it
Usage logging You build it
Open source No No N/A
Self-hostable Cloud only Cloud only By definition

LiteLLM is an open-source Python library and proxy server that provides an OpenAI-compatible API across 100+ models. It is the most popular open-source option for teams that want to self-host. Portkey is a managed gateway with a focus on reliability features — automatic retries, fallbacks, and load balancing. Helicone Gateway is primarily an observability platform that also offers gateway functionality for routing and caching. Kong AI Gateway extends the Kong API gateway with AI-specific plugins for multi-provider routing and rate limiting.

Key Features of an LLM Gateway

Unified API and routing. The gateway exposes a single endpoint that accepts requests in a common format. Routing rules determine which provider and model handle each request. Routes can be static (feature X always uses Anthropic) or dynamic (route based on input length, customer tier, or task complexity).

Failover and fallback. When a provider returns errors or becomes unresponsive, the gateway automatically retries the request with a fallback provider. A well-configured failover chain might route from OpenAI to Anthropic to Google, so a single provider outage does not become an application outage.

Load balancing. If you have multiple API keys for the same provider, the gateway distributes requests across them. Some gateways also balance across providers based on latency or cost.

Response caching. Identical requests return cached responses instead of making another API call. Especially effective for classification, extraction, and deterministic workloads.

Logging and observability. Every request passes through the gateway, making it the natural place to log usage data. This is the foundation for LLM monitoring and LLM observability.

Cost Implications of Gateway Routing

The biggest cost lever an LLM gateway provides is intelligent model routing — sending different tasks to different models based on complexity. Not every request needs a frontier model. Classification, extraction, and summarization can often be handled by fast-tier models that cost a fraction of the price while scoring within a few percentage points on benchmarks like MMLU-Pro and GPQA.

Caching compounds the savings. If 15% of your requests are cache-eligible, that is 15% of your API spend eliminated entirely. The gateway handles this transparently.

The cost dimension that gateways do not solve is cost attribution. A gateway knows the total cost of all requests it routed, but it does not connect those costs to individual customers or revenue. That requires a separate layer — tools like MarginDash and the broader discipline of AI cost management.

LLM Gateway vs. Direct API Calls

Calling LLM provider APIs directly is the simplest approach and works well in the early stages. You import the OpenAI SDK, make a call, get a response. No additional infrastructure required. The tradeoffs emerge as your usage grows — more providers, more models, more customers, more things that can go wrong.

When direct API calls are enough: You use a single provider. Your request volume is low enough that rate limits are not a concern. You do not need automatic failover. Provider outages are tolerable. You have a small number of models and switching between them is infrequent. In this scenario, a gateway adds complexity without proportional value.

When you need a gateway: You use multiple providers and want a unified API. You need automatic failover when a provider goes down. You want to route different tasks to different models without changing application code. You need rate limiting across API keys. You want response caching to reduce redundant calls. You have multiple services making LLM calls and want centralized logging.

The latency tradeoff is real but usually small. A gateway adds a network hop and processing time for each request. For self-hosted gateways running in the same infrastructure, the added latency is typically single-digit milliseconds — negligible compared to LLM response times. For managed cloud gateways, the added latency depends on where the gateway is hosted relative to your application and the provider.

The operational tradeoff is more significant. A self-hosted gateway is another piece of infrastructure to deploy, monitor, and maintain. It needs to be highly available — if your gateway goes down, every LLM call in your application fails. Managed gateways offload this operational burden but introduce a dependency on a third-party service. Choose based on your team's infrastructure maturity and tolerance for external dependencies.

Frequently Asked Questions

What is an LLM gateway?
An LLM gateway is a unified API layer that sits between your application and multiple LLM providers like OpenAI, Anthropic, and Google. It provides a single endpoint for routing requests, handling failovers, caching responses, enforcing rate limits, and logging usage across all providers. Instead of integrating with each provider's API directly, your application talks to the gateway, which handles provider-specific details.
How does an LLM gateway reduce costs?
An LLM gateway reduces costs through intelligent routing — sending simpler tasks to cheaper models and reserving expensive frontier models for complex work. It can also cache identical requests to avoid redundant API calls, batch requests where possible, and enforce rate limits that prevent runaway usage. The cost savings depend on your workload, but teams with mixed-complexity tasks often see significant reductions by routing appropriately.
What is the difference between an LLM gateway and an LLM proxy?
The terms are often used interchangeably, but an LLM proxy typically just forwards requests to a single provider with minimal transformation, while an LLM gateway adds routing logic, failover, load balancing, caching, and observability across multiple providers. A proxy is a pass-through. A gateway is a control plane.
Can I use an LLM gateway with MarginDash?
Yes. MarginDash and LLM gateways solve different problems and work well together. The gateway routes your requests, handles failovers, and manages provider connections. MarginDash tracks the cost of those requests per customer, connects costs to revenue via Stripe, and shows you margin per customer. You add the MarginDash SDK alongside your gateway integration — it logs model name, token counts, and customer ID after each call regardless of which provider the gateway routed to.
Should I build my own LLM gateway or use an existing one?
For most teams, an existing gateway is the better starting point. Building a custom gateway means maintaining provider API compatibility, failover logic, retry handling, and caching infrastructure yourself. Open-source options like LiteLLM give you a working gateway quickly. Custom builds make sense when you have unique routing requirements, strict compliance needs, or the engineering capacity to maintain the infrastructure long-term.
What happens when an LLM provider goes down?
A well-configured LLM gateway detects provider failures and automatically routes requests to a fallback provider. For example, if OpenAI returns errors, the gateway can reroute to Anthropic or Google without your application code changing. This requires mapping equivalent models across providers and accepting that responses may differ slightly. Without a gateway, provider outages require manual intervention or application-level failover code.

Track Gateway Costs Per Customer

Your LLM gateway routes the traffic. MarginDash tracks the cost. See per-customer P&L across 100+ models, simulate savings from model swaps, and get budget alerts before spend goes sideways. Set up in 5 minutes.

Start Tracking Gateway Costs →

No credit card required

Stop guessing. Start measuring.

Create an account, install the SDK, and see your first margin data in minutes.

See My Margin Data

No credit card required