Pricing Guide

OpenAI Codex Pricing: API Costs for Code Generation Models

Codex is OpenAI's code-specialized model line — optimized for writing, editing, and reviewing code. Below is every Codex model with current per-token pricing, an interactive cost calculator, and guidance on picking the right tier for your workload and budget.

Pricing updated daily.

Codex Model Pricing

Current per-token pricing. Sorted by intelligence index (highest first).

Model	Input / 1M	Output / 1M	Intelligence Index	Tier
OpenAI: GPT-5.3-Codex	$1.75	$14.00	54.0	Frontier
OpenAI: GPT-5 Codex	$1.25	$10.00	44.6	Frontier
OpenAI: GPT-5.1-Codex	$1.25	$10.00	43.1	Frontier
OpenAI: GPT-5.1-Codex-Mini	$0.25	$2.00	38.6	Mid-tier
OpenAI: GPT-5.2-Codex	$1.75	$14.00	N/A	Budget
OpenAI: GPT-5.1-Codex-Max	$1.25	$10.00	N/A	Budget

Codex Cost Calculator

Estimate your API costs based on token usage. Select a model and enter your expected input and output tokens.

Model

Input tokens

Output tokens

Estimated cost

$0.00

Based on list pricing. Actual costs may vary with cached tokens or batch discounts.

Codex Pricing Tiers Explained: mini vs standard vs xhigh

mini (low cost, high throughput)

The most cost-effective option for high-volume code tasks. Handles routine code completions, simple refactoring, and boilerplate generation well. Best for CI/CD pipelines, automated code formatting, and bulk operations where speed and cost matter more than handling complex logic. Mini is particularly well-suited for predictable tasks — generating CRUD endpoints, writing unit tests for simple functions, converting data between formats, or adding type annotations. Output quality is nearly identical to higher tiers when the task itself doesn’t require deep reasoning.

standard (balanced)

Balances quality and cost for most development workflows. Handles multi-file edits, code review with contextual suggestions, and medium-complexity code generation. This is the safest default for teams unsure which tier to start with — it covers the vast majority of real-world tasks without the cost premium of xhigh. Most teams only upgrade specific features to xhigh after measuring where standard falls short.

xhigh (maximum quality)

Delivers the highest quality code output at a premium price. Use it for architecture-level code generation, complex refactoring across large codebases, and security-sensitive code review. The xhigh tier is most justified when errors are expensive to fix downstream — security audits, database migration scripts, and core business logic are cases where a wrong answer from a cheaper model costs more in engineering time than the difference in API pricing.

How OpenAI Codex Pricing Works

OpenAI Codex uses a per-token billing model. Every API request has two billable components: input tokens (the prompt you send, including system prompt, code context, and instructions) and output tokens (the generated code, explanations, or review comments). You pay for both, and output tokens are substantially more expensive because they require more compute to produce.

The price depends on which compute tier you select. Mini is cheapest and fastest, standard handles most production workloads, and xhigh delivers the highest quality for complex reasoning tasks. The tier you choose directly controls both cost and output quality, so the decision should be made per feature, not globally.

One detail that catches teams off guard is the pricing asymmetry: output tokens cost 5–8x more than input tokens. A code review that reads 5,000 tokens of diff and produces a 300-token comment is far cheaper than a scaffolding task that reads a 500-token spec and generates 5,000 tokens of code — even though both process a similar total token count. Understanding this asymmetry is key to estimating costs accurately.

Codex in CI/CD Pipelines: Cost Implications

Codex in CI/CD — automated code review on pull requests, PR summaries, test generation, security scanning — is compelling, but the cost dynamics differ from interactive usage. A developer might make 20–50 requests per day. A CI pipeline running on every push across a 10-person team can easily generate hundreds of API calls daily, each processing thousands of tokens of diff context.

The risk is that costs scale with engineering activity rather than deliberate decisions. Every commit triggers API calls, and during active sprints, feature branches multiply. Teams that integrate Codex into CI without monitoring often discover their bill has doubled during a particularly active month.

Several strategies help control CI/CD costs. Use the mini tier for straightforward tasks like PR summaries, linting suggestions, and boilerplate test generation — these don’t require the reasoning power of standard or xhigh. Cache review output for files that haven’t changed between commits. Send only the diff with minimal surrounding context instead of entire files. Run expensive checks (security audits with xhigh) only on pull requests targeting the main branch, not on every push to a feature branch.

Estimating Codex Costs for Real Development Workflows

Token counts vary significantly by task. A simple code completion might use 500 input tokens and generate 200 output tokens. A multi-file refactoring task could consume 10,000+ input tokens and produce 5,000 output tokens. The ratio matters because output tokens cost 5–8x more than input tokens — tasks that produce long code outputs cost disproportionately more than tasks that consume long inputs but produce short outputs.

Below are rough estimates for common workflows. Actual usage depends on how much context you send with each request. In practice, many workflows involve multiple round-trips (generation followed by correction), which multiplies token usage. Improving prompt quality and providing better context reduces round-trips — often a bigger cost lever than switching tiers.

Workflow	Input tokens (approx)	Output tokens (approx)	Notes
Code completion	300 – 1,000	100 – 500	Single function or block
Code review	2,000 – 8,000	500 – 2,000	Diff + surrounding context
File generation	1,000 – 3,000	1,000 – 5,000	New file from spec or description
Multi-file refactoring	5,000 – 20,000	3,000 – 10,000	Multiple files read for context
Architecture planning	10,000 – 50,000	5,000 – 20,000	Large codebase context required

Use the calculator above to estimate costs for your specific workflow. For production tracking, MarginDash logs actual token usage per request and calculates realized costs automatically.

Codex vs GitHub Copilot vs Cursor: Pricing Models Compared

OpenAI Codex, GitHub Copilot, and Cursor all help developers write code faster, but they use fundamentally different pricing models. Codex is a pay-per-token API — you pay exactly for what you use, with costs scaling linearly with token volume. GitHub Copilot is a flat-rate subscription starting at $10 per month (Pro plan) or $19 per month per developer (Business plan). Cursor charges $20 per month per developer for its Pro plan. For individual developer productivity, subscriptions almost always win on cost.

Where the Codex API makes more sense is when you are building a product that uses code generation as a feature, or embedding it into CI/CD pipelines. Copilot and Cursor are end-user developer tools — they don’t offer an API you can embed in your own product. The Codex API gives you programmatic access, control over model selection, and the ability to scale across thousands of end-user requests. Many teams use both: subscriptions for day-to-day developer productivity, API for automation and product features.

Tool	Pricing Model	Starting Price	API Access	Best For
OpenAI Codex API	Pay-per-token	Usage-based	Yes	Building products, CI/CD, automation
GitHub Copilot	Subscription	$19/mo per user	No	Individual developer productivity
Cursor	Subscription	$20/mo per user	No	Individual developer productivity

Reducing Codex API Costs

The single most effective lever is to send less context. Many integrations default to sending entire files when only a few relevant functions are needed. Trimming input to the function being edited, its direct dependencies, and relevant type definitions can cut input token costs by 50–80% without meaningfully affecting output quality.

The second lever is tiering by task complexity. Use mini for boilerplate generation, test scaffolding, and simple completions. Use standard for code review and multi-file edits. Reserve xhigh for architecture-level decisions and security-sensitive code. Most teams find 60–70% of their Codex calls can run on mini without noticeable quality loss.

Third, cache repeated patterns and monitor usage. If your application generates similar code patterns repeatedly (CRUD operations, API endpoint boilerplate), cache the output keyed by the structural pattern. And track token usage per feature and per pipeline — a small number of workflows almost always account for a disproportionate share of spend.

Building Products on Codex: Unit Economics

If you are building a product that uses the Codex API under the hood — a code review tool, a documentation generator, a test generation platform — then Codex pricing is not just an engineering cost. It is your cost of goods sold. Every API call you make on behalf of a customer comes directly out of your margin, making per-customer cost tracking essential from day one.

Codex usage varies dramatically by customer. One might review small pull requests with 500 tokens of context; another might refactor entire codebases with 50,000-token inputs. If both pay the same flat fee, one is highly profitable and the other may be underwater. The aggregate monthly bill tells you total cost but nothing about how it distributes across customers. Visibility into per-customer spend is the foundation for setting appropriate pricing tiers and maintaining healthy margins as your customer base and usage patterns evolve.

Frequently Asked Questions

How much does OpenAI Codex cost?

OpenAI Codex pricing varies by model and compute tier. GPT-5 Codex (high) costs $1.25 per million input tokens and $10.00 per million output tokens. GPT-5.1 Codex mini (high) is more affordable at $0.25/$2.00. The latest GPT-5.3 Codex (xhigh) costs $1.75/$14.00.

Which Codex model should I use?

For most code generation tasks, GPT-5.1 Codex mini offers the best value — strong coding performance at $0.25/$2.00 per million tokens. For complex multi-file refactoring or architecture-level code generation, GPT-5.3 Codex (xhigh) provides the highest quality at $1.75/$14.00.

How do Codex compute tiers (mini, standard, xhigh) affect pricing?

Each compute tier has different per-token rates reflecting the amount of compute used per token. Mini is the cheapest and fastest, best for routine tasks like boilerplate generation and simple completions. Standard balances cost and quality for most production workloads. Xhigh uses the most compute and delivers the highest output quality, recommended for complex refactoring and security-critical code review.

Why are output tokens more expensive than input tokens?

Output tokens require more compute to generate — the model must run its full inference process for each output token, while input tokens are processed in parallel during a single forward pass. Across the Codex lineup, output tokens cost 5-8x more than input tokens. This means tasks that generate a lot of code (scaffolding, file generation) are proportionally more expensive than tasks that read a lot of code and produce short outputs (code review, bug detection).

How does Codex pricing compare to GitHub Copilot?

They use different pricing models. Codex is pay-per-token through the API — you pay for exactly what you use. GitHub Copilot is a flat-rate subscription starting at $10/month (Pro) or $19/month per developer (Business). For individual developer productivity, the subscription is usually cheaper. For building products that use code generation as a feature, or for CI/CD automation, only the Codex API provides programmatic access.