Tool Review
Helicone is an open-source LLM observability platform. Here is what it does, what it costs, and where it fits alongside cost and margin tracking.
Helicone is an LLM observability platform that sits between your application and AI providers like OpenAI, Anthropic, and Google. It works as a proxy — you change the base URL in your API client, and Helicone logs every request and response as it passes through. No SDK required for basic integration.
Once requests flow through the proxy, Helicone provides a dashboard with cost tracking, latency metrics, error rates, and usage analytics. It also offers features like response caching, rate limiting, prompt management, and request retries — all handled at the proxy layer without changes to your application code.
Helicone is open source and can be self-hosted. They also run a managed cloud service with free and paid tiers. The proxy-based architecture means your AI traffic routes through their infrastructure (or yours, if self-hosted), which gives Helicone full visibility into request and response payloads.
Request logging. Every API call is logged with the full request, response, token counts, latency, and status code. You can search, filter, and drill into individual requests to debug issues. This is the core value — a complete audit trail of every LLM interaction.
Cost tracking. Helicone calculates cost per request from token counts and model pricing. You can see aggregate cost over time, cost per model, and cost per custom property (like user ID or feature name). This gives you a clear picture of what your AI calls cost in total.
Caching. Helicone can cache responses at the proxy layer. If the same prompt comes in again, it returns the cached response without hitting the provider. This saves money on repeated calls and reduces latency. You control cache policies via headers.
Rate limiting. You can set rate limits per user, per API key, or globally. This prevents runaway usage from blowing up your AI bill. Rate limits are enforced at the proxy before requests reach the provider.
Prompt management. Helicone provides prompt versioning and templating. You can manage prompt templates in their dashboard, track which version was used for each request, and compare performance across versions.
Custom properties. You can tag requests with arbitrary key-value pairs — customer ID, feature name, environment, experiment ID. These properties become filterable dimensions in the dashboard, letting you slice cost and usage data by any dimension you define.
Helicone offers a free tier that covers a generous number of requests per month, making it accessible for small projects and prototyping. The free tier includes core features like request logging, cost tracking, and basic analytics.
Paid tiers add higher request volumes, longer data retention, advanced features like caching and rate limiting, and priority support. Enterprise plans include custom configurations, dedicated infrastructure, and SLAs.
Because Helicone is open source, self-hosting is always an option. Self-hosting eliminates per-request fees entirely, but you take on the operational burden of running a proxy that all your AI traffic routes through. Downtime or latency in a self-hosted Helicone instance directly impacts your application's AI features.
Helicone's pricing model is based on request volume rather than per-seat, which keeps costs predictable as your team grows. The trade-off is that high-volume applications can see costs scale quickly with request count.
| Feature | Helicone | MarginDash |
|---|---|---|
| Integration method | Proxy (URL change) | SDK (few lines of code) |
| Request logging | No | |
| Prompt tracing | No | |
| Response caching | No | |
| Rate limiting | No | |
| Cost tracking | ||
| Per-customer cost breakdown | Via custom properties | |
| Revenue/margin per customer | No | |
| Stripe integration | No | |
| Cost simulator | No | |
| Budget alerts | ||
| Open source | No | |
| Data collected | Full request/response | Model + tokens + customer ID |
Helicone and MarginDash take fundamentally different approaches to integration. Helicone uses a proxy — you change the base URL in your API client (e.g., from api.openai.com to oai.helicone.ai) and all traffic routes through Helicone's servers. This gives Helicone full visibility into prompts and responses, which enables features like caching and prompt tracing.
MarginDash uses an SDK — you add a few lines of code after each API call to log the model name, token counts, and customer ID. Your API calls go directly to the provider. The SDK never sees your prompts or responses, only usage metadata.
The proxy approach is simpler to set up — one URL change and you're done. But it means all your AI traffic flows through a third party. If the proxy goes down, your AI features go down. It also means the proxy provider has access to every prompt and response, which may require additional security review for sensitive applications.
The SDK approach requires a few more lines of code but keeps your AI traffic direct. There is no single point of failure between your app and the provider. The trade-off is that you don't get features that require seeing the full request, like response caching or prompt tracing.
Helicone is the right choice when your primary need is request-level observability. If you need to see every prompt and response, trace multi-step chains, debug specific failures, or cache repeated calls, Helicone does all of that through a single URL change.
It is especially useful when you need caching at the proxy layer. If your application makes repeated calls with identical prompts — common in retrieval-augmented generation (RAG) pipelines or template-based workflows — Helicone's caching can save significant cost and latency without any application-level changes.
Rate limiting is another strong use case. If you want to prevent individual users or API keys from sending too many requests, Helicone handles this at the proxy level. This is simpler than building rate limiting into your application.
Teams that want to self-host their observability stack should also consider Helicone. It is open source and can run on your own infrastructure, which eliminates per-request fees and keeps all data in your environment.
MarginDash is the right choice when your primary question is "which customers are profitable after AI costs?" If you charge customers for AI-powered features and need to connect API costs to revenue, MarginDash is purpose-built for that.
Per-customer P&L is the core difference. MarginDash connects to Stripe (or accepts revenue data via the API) and shows you revenue, cost, and margin for every customer. You can immediately see which customers are underwater and by how much.
The cost simulator is useful when you want to reduce costs without guessing. Pick a feature, swap the underlying model, and see projected savings. Models are ranked by intelligence-per-dollar using public benchmarks (MMLU-Pro, GPQA, AIME), so you're not just picking the cheapest option — you're finding alternatives that maintain quality.
Budget alerts let you set spending thresholds per customer, per feature, or across your entire organization. MarginDash emails you before a threshold is exceeded, so you can act before costs become a problem.
If you do not need to see prompts or responses — and many teams prefer not to send that data to third parties — MarginDash's SDK-only approach is a privacy advantage. It collects model name, token counts, and a customer identifier. Nothing else.
Connect AI costs to revenue and see which customers are profitable. Set up in 5 minutes with a few lines of SDK code.
Start Tracking CostsNo credit card required
Yes, and it is a common pattern for teams that care about both debugging and unit economics. The two tools solve different problems and do not conflict.
Use Helicone for the engineering side — log every request, trace prompt chains, cache repeated calls, debug failures, and set rate limits. Helicone gives your engineering team the observability they need to build and maintain reliable AI features.
Use MarginDash for the business side — track cost per customer, connect to Stripe for revenue, calculate margins, simulate model swaps, and set budget alerts. MarginDash gives your team the financial visibility to price correctly and stay profitable.
In practice, this means your API calls route through Helicone's proxy (for logging and caching), and you add a few lines of MarginDash SDK code after each call (for cost and revenue tracking). The two integrations are independent — Helicone handles the proxy layer, MarginDash handles the business metrics layer.
LLM observability and unit economics tracking get grouped together because they both involve monitoring API calls. But they answer fundamentally different questions. Observability answers: What did my AI calls do? Where did the chain fail? What was the latency? Unit economics answers: Is this customer profitable? What is my margin per feature? What happens if I swap models?
The confusion happens because observability tools show cost data alongside traces. Helicone shows you that a request cost $0.04. But knowing a single request cost $0.04 is very different from knowing that Customer X generated $49 in monthly revenue and consumed $31 in AI costs, leaving an 18% margin that is dangerously close to unprofitable.
The first is a data point. The second is actionable — you can adjust pricing, set usage limits, or use the cost simulator to find a cheaper model that maintains quality. For teams running AI in production at scale, both types of visibility matter. The observability tool keeps your AI features working. The unit economics tool keeps your business working.
The proxy vs SDK distinction has direct implications for data privacy. Helicone sees everything — every prompt, every response, every parameter. This is by design and is what enables features like caching and prompt tracing. For some applications, sending all prompts and responses through a third-party proxy requires additional security review, data processing agreements, or compliance approvals.
MarginDash sees only metadata — model name, input token count, output token count, customer ID, and optionally a feature label and revenue amount. No prompts. No responses. No user content of any kind. This is a deliberate architectural choice that simplifies compliance for teams handling sensitive data.
If your application processes healthcare data, financial records, or other regulated content, the difference between sending full request payloads and sending only token counts is significant from a compliance perspective. Self-hosting Helicone eliminates the third-party data concern but adds infrastructure overhead.
Related guides
Create an account, install the SDK, and see your first margin data in minutes.
See My Margin DataNo credit card required