Blog · February 15, 2026

The Cheapest AI Models That Are Actually Good

Most teams pick from a shortlist of three or four well-known models. GPT-4.1, Claude 4.5 Sonnet, maybe Gemini. These are good models. They are not good value.

We ranked 272 models from our pricing database (397 models across 41 vendors) by intelligence per dollar. The results surprised us: the best value models deliver flagship-level benchmark scores at a fraction of the cost.

How we ranked them

For each model, we calculated:

  • Cost per 1,000 requests based on each vendor's list price
  • Intelligence Index — a composite benchmark score from MMLU-Pro, GPQA, and AIME
  • Value score — Intelligence Index divided by cost per 1,000 requests. Higher means more intelligence per dollar.

We filtered to 272 models that have both published pricing and benchmark scores. Models with zero or missing pricing were excluded.

One caveat: list price doesn't tell the full story for model swaps. Different models burn different token counts for the same task, so a model with a low sticker price can cost more per task. When we compare swap savings below, we normalize for token efficiency using data from the Artificial Analysis Intelligence Index (AAII).

The top 10 value picks

These models score at least 25 on the Intelligence Index (solid production quality) and cost under $2.50 per 1,000 requests. Sorted by value score — the most intelligence per dollar first.

# Model Vendor Intelligence Index $/1K requests Value score
1 MiMo-V2-Flash Xiaomi 41.4 $0.25 165.6
2 GLM-4.7-Flash (Reasoning) Z AI 30.1 $0.27 111.5
3 GPT-5 nano (high) OpenAI 26.7 $0.25 106.8
4 Grok 4.1 Fast (Reasoning) xAI 38.5 $0.45 85.6
5 DeepSeek V3.2 (Reasoning) DeepSeek 41.6 $0.49 84.9
6 Grok 4 Fast (Reasoning) xAI 34.9 $0.45 77.6
7 gpt-oss-120B (high) OpenAI 33.3 $0.45 74.0
8 MiniMax-M2.1 MiniMax 39.5 $0.90 43.9
9 GPT-5 mini (high) OpenAI 41.0 $1.25 32.8
10 Gemini 3 Flash Preview (Reasoning) Google 46.4 $2.00 23.2

The #1 spot goes to Xiaomi's MiMo-V2-Flash: an Intelligence Index of 41.4 — comparable to Claude 4.5 Sonnet (42.9) — at $0.25 per 1,000 requests.

Seven of the ten models in this list aren't from OpenAI or Anthropic. The best value in AI right now is coming from DeepSeek, Xiaomi, xAI, Z AI, MiniMax, and Google.

Flagship intelligence doesn't have to cost flagship prices

Here are models in our database scoring 40+ on the Intelligence Index, sorted by cost. The cheapest costs $0.25. The most expensive costs $60.00. Same benchmark tier.

Model Vendor Intelligence Index $/1K requests
MiMo-V2-Flash Xiaomi 41.4 $0.25
DeepSeek V3.2 (Reasoning) DeepSeek 41.6 $0.49
GPT-5 mini (high) OpenAI 41.0 $1.25
GLM-4.7 (Reasoning) Z AI 42.0 $1.55
Kimi K2 Thinking Kimi 40.7 $1.85
Gemini 3 Flash Preview (Reasoning) Google 46.4 $2.00
Kimi K2.5 (Reasoning) Kimi 46.7 $2.10
GLM-5 (Reasoning) Z AI 49.6 $2.60
GPT-5 (medium) OpenAI 41.8 $6.25
GPT-5.1 (high) OpenAI 47.6 $6.25
Gemini 3 Pro Preview (high) Google 48.4 $8.00
Grok 4 xAI 41.4 $10.50
Claude 4.5 Sonnet (Reasoning) Anthropic 42.9 $10.50
Claude Opus 4.5 (Reasoning) Anthropic 49.7 $17.50
Claude Opus 4.6 (Adaptive Reasoning) Anthropic 53.0 $17.50
o3-pro OpenAI 40.7 $60.00

Green rows: value picks (under $3 per 1,000 requests). Gray rows: common defaults ($6+).

Eight models clear the 40+ Intelligence Index bar for under $3 per 1,000 requests. Many more — from OpenAI, Google, xAI, and Anthropic — deliver similar scores between $6 and $60.

The price difference between the cheapest (MiMo at $0.25) and most expensive (o3-pro at $60.00) flagship model is 240x. The Intelligence Index difference is 0.7 points — o3-pro actually scores lower than MiMo.

25 30 35 40 45 50 Intelligence Index $0.25 $1 $5 $25 Cost per 1,000 requests (log scale) MiMo-V2-Flash: II 41.4, $0.25/1K req DeepSeek V3.2: II 41.6, $0.49/1K req Grok 4.1 Fast: II 38.5, $0.45/1K req Grok 4 Fast: II 34.9, $0.45/1K req GLM-4.7-Flash: II 30.1, $0.27/1K req GPT-5 nano: II 26.7, $0.25/1K req gpt-oss-120B: II 33.3, $0.45/1K req MiniMax-M2.1: II 39.5, $0.90/1K req GPT-5 mini: II 41.0, $1.25/1K req Gemini 3 Flash: II 46.4, $2.00/1K req Kimi K2.5: II 46.7, $2.10/1K req GLM-5: II 49.6, $2.60/1K req GPT-4.1 mini: II 22.4, $1.20/1K req Claude 4.5 Haiku: II 31.0, $3.50/1K req GPT-4.1: II 25.6, $6.00/1K req GPT-5 medium: II 41.8, $6.25/1K req Claude 4.5 Sonnet (Non-reasoning): II 37.1, $10.50/1K req Claude 4.5 Sonnet (Reasoning): II 42.9, $10.50/1K req Claude Opus 4.6: II 53.0, $17.50/1K req o3-pro: II 40.7, $60.00/1K req MiMo $0.25 GLM-5 Opus 4.6 o3-pro $60.00 GPT-4.1 Sonnet 4.5 Value picks Defaults

Where the defaults land

These are the models most teams use without thinking twice. Here's how they compare to the value leaders — adjusted for token efficiency using AAII normalization.

Default model II List $/1K* Better value alternative II List $/1K* Adj. savings
o3-pro 40.7 $60.00 DeepSeek V3.2 (Reasoning) 41.6 $0.49 50x
Claude 4.5 Haiku (Non-reasoning) 31.0 $3.50 Grok 4 Fast (Reasoning) 34.9 $0.45 21x
GPT-4.1 25.6 $6.00 gpt-oss-120B (high) 33.3 $0.45 16x
Claude 4.5 Sonnet (Non-reasoning) 37.1 $10.50 MiMo-V2-Flash 41.4 $0.25 13x

*List price per 1,000 requests. Adjusted savings account for token efficiency using AAII normalization — see methodology below.

In every case, the alternative scores higher on benchmarks and costs less — even after adjusting for token efficiency. The o3-pro to DeepSeek V3.2 swap is the most dramatic: higher intelligence score, 50x cheaper in normalized cost.

Many enterprise teams stay on these models for compliance, security, or API stability reasons — not because they've compared the alternatives. That's the legacy tax.

Before you swap everything

Benchmarks aren't the full picture. There are real reasons teams choose higher-priced models:

  • API reliability and uptime. OpenAI and Anthropic have years of production API infrastructure. Newer providers may have less mature SLAs.
  • Latency and reasoning overhead. Many of the value leaders in this list are reasoning models. They may have fast time-to-first-token, but total response time can be significantly longer because the model "thinks" before answering. For latency-sensitive applications like real-time chat, test actual end-to-end response times — not just benchmarks.
  • Task-specific performance. Intelligence Index measures general reasoning. Your customer support chatbot might perform differently than the benchmarks predict. Always test on your own data.
  • Ecosystem and tooling. SDK support, function calling, structured outputs, and documentation vary by vendor.
  • Data residency and compliance. Some vendors may not meet your regulatory requirements.

The point isn't that you should switch to MiMo tomorrow. It's that you should know what you're paying for — and whether the premium is justified by your actual requirements.

How we got these numbers

All pricing and Intelligence Index scores come from the MarginDash model database: 397 models across 41 vendors, synced daily from vendor pricing pages.

The “Where the defaults land” comparison table uses AAII-normalized costs. Different models consume different numbers of tokens for the same task — a model with a low list price can cost more per task if it burns significantly more tokens. We normalize using token consumption data from the Artificial Analysis Intelligence Index (AAII) benchmark to estimate what a model swap would actually cost in production. This is the same methodology our cost simulator uses.

All prices reflect standard real-time inference. Batch pricing, cached-input discounts, and volume agreements will shift the numbers — in some cases significantly.

You can explore all 397 models, filter by vendor, and run your own cost comparisons inside MarginDash — sign up free to access the model database and cost simulator.

Stop guessing which model is cheapest

MarginDash tracks your actual AI cost, revenue, and margin per customer — so you can see exactly what each model costs you in production.

See My Margin Data

No credit card required