Code Velocity
AI Models

GPT-5.2-Codex: OpenAI's Agentic Coding Model

·6 min read·OpenAI·Original source
Share
GPT-5.2-Codex benchmark chart showing SWE-Bench Pro and Terminal-Bench 2.0 scores compared to GPT-5.2 base model

GPT-5.2-Codex Benchmark Results

OpenAI released GPT-5.2-Codex on January 14, 2026, five weeks after the base GPT-5.2 model. It targets agentic coding: multi-step sessions where the model plans, writes code, runs tests, and iterates on failures.

The model scores 56.4% on SWE-Bench Pro (up from 55.6% on base GPT-5.2) and 64.0% on Terminal-Bench 2.0 (up from 62.2%). Both benchmarks test real-world coding tasks, not isolated code generation.

GPT-5.2-Codex vs GPT-5.2 vs Claude Opus 4.6

BenchmarkGPT-5.2-CodexGPT-5.2Claude Opus 4.6
SWE-Bench Pro56.4%55.6%
Terminal-Bench 2.064.0%62.2%#1
Context Window (input)400K128K200K (1M beta)
Output Tokens128K128K128K

GPT-5.2-Codex balances cost and performance. Claude Opus 4.6 leads Terminal-Bench 2.0 and Humanity's Last Exam, while GPT-5.2-Codex competes on price and context window size.

Key Features for Developers

Context Compaction

Like Claude Opus 4.6's compaction feature, GPT-5.2-Codex compresses earlier context while preserving task state. This enables multi-hour coding sessions where the model tracks the full project even as the conversation exceeds the context window.

Long-Horizon Task Completion

The model is optimized for tasks spanning many steps: large refactors, codebase migrations, and multi-file feature implementations. When an approach fails, GPT-5.2-Codex adjusts and retries rather than restarting the task.

Built-In Vulnerability Detection

GPT-5.2-Codex includes vulnerability detection during code generation. Teams needing deeper scanning can use dedicated tools like Claude Code Security, which offers multi-stage verification with false positive filtering.

Windows Environment Support

OpenAI improved GPT-5.2-Codex's Windows development performance, addressing the Unix-centric optimization of earlier models.

GPT-5.2-Codex Pricing

TierCost per Million Tokens
Input$1.75
Output$14.00
Cached Input$0.175 (90% discount)

GPT-5.2-Codex is available across all Codex surfaces for paid ChatGPT users and as a standalone API model.

What GPT-5.2-Codex Means for Agentic Coding

The release reflects an industry-wide shift from code completion to sustained coding agents. OpenAI's Codex, Anthropic's Claude Code, and GitHub Agentic Workflows all target multi-step engineering tasks with minimal human intervention.

Frequently Asked Questions

What is GPT-5.2-Codex?
GPT-5.2-Codex is OpenAI's coding-optimized variant of the GPT-5.2 model, released on January 14, 2026. It is built specifically for agentic coding workflows where the model runs sustained, multi-step software engineering sessions. It scores 56.4% on SWE-Bench Pro and 64.0% on Terminal-Bench 2.0, improving on the base GPT-5.2 model's 55.6% and 62.2% respectively. The model supports a 400K input and 128K output context window.
How much does GPT-5.2-Codex cost?
GPT-5.2-Codex costs $1.75 per million input tokens and $14 per million output tokens. Cached inputs receive a 90% discount, bringing the effective cached rate to $0.175 per million tokens. This makes it significantly cheaper than Claude Opus 4.6 at $5/$25 per million tokens, though the two models differ in benchmark performance and feature sets.
What is context compaction in GPT-5.2-Codex?
Context compaction is a feature that compresses earlier conversation context while preserving critical task state. This allows GPT-5.2-Codex to sustain multi-hour coding sessions without losing track of project scope. When a session approaches the context window limit, the model summarizes older context rather than dropping it, enabling longer and more complex coding tasks without restarting.
How does GPT-5.2-Codex compare to Claude Opus 4.6?
On Terminal-Bench 2.0, Claude Opus 4.6 holds the top score, ahead of GPT-5.2-Codex's 64.0%. On SWE-Bench Pro, GPT-5.2-Codex scores 56.4%. The two models take different approaches: GPT-5.2-Codex offers a larger input context (400K tokens vs. Claude's 200K standard) and lower pricing, while Claude Opus 4.6 offers agent teams and higher benchmark scores on reasoning tasks like Humanity's Last Exam.

Stay Updated

Get the latest AI news delivered to your inbox.

Share