Code Velocity
AI Models

Gemini 3.1 Pro: Google's Reasoning-First Model

·6 min read·Google, Google DeepMind·Original source
Share
Gemini 3.1 Pro benchmark comparison showing ARC-AGI-2 and RE-Bench scores versus Gemini 3 Pro and other frontier models

Gemini 3.1 Pro Benchmark Results

Google DeepMind released Gemini 3.1 Pro on February 19, 2026. The model more than doubles its predecessor's reasoning performance, scoring 77.1% on ARC-AGI-2 versus Gemini 3 Pro.

Gemini 3.1 Pro targets tasks requiring multi-step reasoning: algorithm design, large-scale data synthesis, agentic workflows, and complex coding.

Gemini 3.1 Pro vs Claude Opus 4.6 vs GPT-5.2

BenchmarkGemini 3.1 ProClaude Opus 4.6GPT-5.2-Codex
ARC-AGI-277.1%
RE-Bench (ML R&D)1.27
Terminal-Bench 2.0#164.0%
Humanity's Last Exam#1
Context (input)1M200K (1M beta)400K
Context (output)64K128K128K

Each model leads in different areas. Gemini 3.1 Pro tops novel reasoning benchmarks. Claude Opus 4.6 leads agentic coding and multidisciplinary reasoning. GPT-5.2-Codex offers competitive coding performance at lower pricing.

Key Features for Developers

Configurable Thinking Depth

Gemini 3.1 Pro introduces a thinking_level parameter controlling reasoning depth. Low thinking is fast and cheap for routine tasks. High thinking applies more computation to complex problems.

This is similar to Claude Opus 4.6's effort controls, though Gemini exposes the setting as an explicit API parameter rather than adaptive model behavior.

Custom Tools Endpoint

A separate endpoint, gemini-3.1-pro-preview-customtools, is optimized for agentic applications combining shell commands with custom tools. It prioritizes correct tool selection and invocation, reducing errors when agents interact with external systems. This is relevant for developers building agents similar to GitHub Agentic Workflows, where tool selection accuracy directly affects automation reliability.

YouTube URL Input

Developers can pass YouTube URLs directly into prompts. The model analyzes video content, enabling workflows that combine video understanding with code generation or documentation.

Multimodal Processing

Gemini 3.1 Pro handles text, images, audio, video, and code in a single context. With a 1M token input window, it can process entire codebases or long research documents in one pass.

RE-Bench: ML Research Performance

On RE-Bench, which evaluates ML research and development capabilities, Gemini 3.1 Pro scores 1.27 (human-normalized), up from Gemini 3 Pro's 1.04. The model completed optimization tasks in 47 seconds versus the 94-second human reference.

Gemini 3.1 Pro Availability

Gemini 3.1 Pro is available in the Gemini app, Google Cloud Vertex AI, Google AI Studio, and the Gemini API. Pricing varies by platform. The model is in preview; general availability is expected to follow.

Frequently Asked Questions

What is Gemini 3.1 Pro?
Gemini 3.1 Pro is Google DeepMind's reasoning-optimized upgrade to the Gemini 3 series, released on February 19, 2026. It scores 77.1% on ARC-AGI-2, more than doubling the reasoning performance of Gemini 3 Pro. The model supports a 1M token input context and 64K output tokens, and introduces a thinking_level parameter that lets developers control how deeply the model reasons before responding.
How does Gemini 3.1 Pro compare to Claude Opus 4.6?
Gemini 3.1 Pro and Claude Opus 4.6 target different strengths. Gemini 3.1 Pro leads on ARC-AGI-2 (77.1%) and RE-Bench for ML R&D, while Claude Opus 4.6 holds the top position on Terminal-Bench 2.0 for agentic coding and Humanity's Last Exam for multidisciplinary reasoning. Both offer 1M token context windows. The choice depends on the workload: Gemini excels at novel reasoning tasks, Claude at sustained coding work.
What is the thinking_level parameter in Gemini 3.1 Pro?
The thinking_level parameter lets developers control the maximum depth of reasoning the model applies before producing a response. Low thinking is faster and cheaper for straightforward tasks. High thinking allocates more computation time for complex reasoning problems. This gives developers explicit control over the cost-speed-quality tradeoff, similar to the effort controls in Claude Opus 4.6.
What is the custom tools endpoint in Gemini 3.1 Pro?
Gemini 3.1 Pro includes a separate API endpoint called gemini-3.1-pro-preview-customtools, optimized for prioritizing custom developer tools. When building agentic applications with a mix of bash commands and custom tools, this endpoint ensures the model correctly selects and invokes the right tool. This is especially useful for developers building AI agents that need to interact with external systems and APIs.

Stay Updated

Get the latest AI news delivered to your inbox.

Share