GPT-5.2-Codex: OpenAI's Agentic Coding Model

GPT-5.2-Codex Benchmark Results

OpenAI released GPT-5.2-Codex on January 14, 2026, five weeks after the base GPT-5.2 model. It targets agentic coding: multi-step sessions where the model plans, writes code, runs tests, and iterates on failures.

The model scores 56.4% on SWE-Bench Pro (up from 55.6% on base GPT-5.2) and 64.0% on Terminal-Bench 2.0 (up from 62.2%). Both benchmarks test real-world coding tasks, not isolated code generation.

GPT-5.2-Codex vs GPT-5.2 vs Claude Opus 4.6

Benchmark	GPT-5.2-Codex	GPT-5.2	Claude Opus 4.6
SWE-Bench Pro	56.4%	55.6%	—
Terminal-Bench 2.0	64.0%	62.2%	#1
Context Window (input)	400K	128K	200K (1M beta)
Output Tokens	128K	128K	128K

GPT-5.2-Codex balances cost and performance. Claude Opus 4.6 leads Terminal-Bench 2.0 and Humanity's Last Exam, while GPT-5.2-Codex competes on price and context window size.

Key Features for Developers

Context Compaction

Like Claude Opus 4.6's compaction feature, GPT-5.2-Codex compresses earlier context while preserving task state. This enables multi-hour coding sessions where the model tracks the full project even as the conversation exceeds the context window.

Long-Horizon Task Completion

The model is optimized for tasks spanning many steps: large refactors, codebase migrations, and multi-file feature implementations. When an approach fails, GPT-5.2-Codex adjusts and retries rather than restarting the task.

Built-In Vulnerability Detection

GPT-5.2-Codex includes vulnerability detection during code generation. Teams needing deeper scanning can use dedicated tools like Claude Code Security, which offers multi-stage verification with false positive filtering.

Windows Environment Support

OpenAI improved GPT-5.2-Codex's Windows development performance, addressing the Unix-centric optimization of earlier models.

GPT-5.2-Codex Pricing

Tier	Cost per Million Tokens
Input	$1.75
Output	$14.00
Cached Input	$0.175 (90% discount)

GPT-5.2-Codex is available across all Codex surfaces for paid ChatGPT users and as a standalone API model.

What GPT-5.2-Codex Means for Agentic Coding

The release reflects an industry-wide shift from code completion to sustained coding agents. OpenAI's Codex, Anthropic's Claude Code, and GitHub Agentic Workflows all target multi-step engineering tasks with minimal human intervention.

Frequently Asked Questions

What is GPT-5.2-Codex?

GPT-5.2-Codex is OpenAI's coding-optimized variant of the GPT-5.2 model, released on January 14, 2026. It is built specifically for agentic coding workflows where the model runs sustained, multi-step software engineering sessions. It scores 56.4% on SWE-Bench Pro and 64.0% on Terminal-Bench 2.0, improving on the base GPT-5.2 model's 55.6% and 62.2% respectively. The model supports a 400K input and 128K output context window.

How much does GPT-5.2-Codex cost?

GPT-5.2-Codex costs $1.75 per million input tokens and $14 per million output tokens. Cached inputs receive a 90% discount, bringing the effective cached rate to $0.175 per million tokens. This makes it significantly cheaper than Claude Opus 4.6 at $5/$25 per million tokens, though the two models differ in benchmark performance and feature sets.

What is context compaction in GPT-5.2-Codex?

Context compaction is a feature that compresses earlier conversation context while preserving critical task state. This allows GPT-5.2-Codex to sustain multi-hour coding sessions without losing track of project scope. When a session approaches the context window limit, the model summarizes older context rather than dropping it, enabling longer and more complex coding tasks without restarting.

How does GPT-5.2-Codex compare to Claude Opus 4.6?

On Terminal-Bench 2.0, Claude Opus 4.6 holds the top score, ahead of GPT-5.2-Codex's 64.0%. On SWE-Bench Pro, GPT-5.2-Codex scores 56.4%. The two models take different approaches: GPT-5.2-Codex offers a larger input context (400K tokens vs. Claude's 200K standard) and lower pricing, while Claude Opus 4.6 offers agent teams and higher benchmark scores on reasoning tasks like Humanity's Last Exam.