Gemini 3.1 Pro Benchmark Results
Google DeepMind released Gemini 3.1 Pro on February 19, 2026. The model more than doubles its predecessor's reasoning performance, scoring 77.1% on ARC-AGI-2 versus Gemini 3 Pro.
Gemini 3.1 Pro targets tasks requiring multi-step reasoning: algorithm design, large-scale data synthesis, agentic workflows, and complex coding.
Gemini 3.1 Pro vs Claude Opus 4.6 vs GPT-5.2
| Benchmark | Gemini 3.1 Pro | Claude Opus 4.6 | GPT-5.2-Codex |
|---|---|---|---|
| ARC-AGI-2 | 77.1% | — | — |
| RE-Bench (ML R&D) | 1.27 | — | — |
| Terminal-Bench 2.0 | — | #1 | 64.0% |
| Humanity's Last Exam | — | #1 | — |
| Context (input) | 1M | 200K (1M beta) | 400K |
| Context (output) | 64K | 128K | 128K |
Each model leads in different areas. Gemini 3.1 Pro tops novel reasoning benchmarks. Claude Opus 4.6 leads agentic coding and multidisciplinary reasoning. GPT-5.2-Codex offers competitive coding performance at lower pricing.
Key Features for Developers
Configurable Thinking Depth
Gemini 3.1 Pro introduces a thinking_level parameter controlling reasoning depth. Low thinking is fast and cheap for routine tasks. High thinking applies more computation to complex problems.
This is similar to Claude Opus 4.6's effort controls, though Gemini exposes the setting as an explicit API parameter rather than adaptive model behavior.
Custom Tools Endpoint
A separate endpoint, gemini-3.1-pro-preview-customtools, is optimized for agentic applications combining shell commands with custom tools. It prioritizes correct tool selection and invocation, reducing errors when agents interact with external systems. This is relevant for developers building agents similar to GitHub Agentic Workflows, where tool selection accuracy directly affects automation reliability.
YouTube URL Input
Developers can pass YouTube URLs directly into prompts. The model analyzes video content, enabling workflows that combine video understanding with code generation or documentation.
Multimodal Processing
Gemini 3.1 Pro handles text, images, audio, video, and code in a single context. With a 1M token input window, it can process entire codebases or long research documents in one pass.
RE-Bench: ML Research Performance
On RE-Bench, which evaluates ML research and development capabilities, Gemini 3.1 Pro scores 1.27 (human-normalized), up from Gemini 3 Pro's 1.04. The model completed optimization tasks in 47 seconds versus the 94-second human reference.
Gemini 3.1 Pro Availability
Gemini 3.1 Pro is available in the Gemini app, Google Cloud Vertex AI, Google AI Studio, and the Gemini API. Pricing varies by platform. The model is in preview; general availability is expected to follow.
Frequently Asked Questions
What is Gemini 3.1 Pro?
How does Gemini 3.1 Pro compare to Claude Opus 4.6?
What is the thinking_level parameter in Gemini 3.1 Pro?
What is the custom tools endpoint in Gemini 3.1 Pro?
Stay Updated
Get the latest AI news delivered to your inbox.
