Gemini 3.1 Pro Benchmarks, API Specs & Developer Guide in 2026

فبراير 20, 2026
11:27 م

Google released Gemini 3.1 Pro on February 19, 2026 — a reasoning-optimized upgrade to the Gemini 3 Pro series that tops 13 of 16 industry benchmarks and introduces a dedicated API endpoint for agentic, custom-tool workflows. This guide covers official benchmark results, full API specifications, pricing, capability comparisons, and access instructions for developers and enterprises.

What Is Gemini 3.1 Pro? (Direct Answer)

Gemini 3.1 Pro is Google's most advanced general-purpose AI model as of February 2026, built on the Gemini 3 Pro architecture and optimized for three core improvements:

Better thinking: Upgraded chain-of-thought reasoning that scores 77.1% on ARC-AGI-2 — more than double Gemini 3 Pro's prior score
Improved token efficiency: Reduced verbosity with higher-quality outputs per token
More grounded, factually consistent responses: Specifically tuned for software engineering behavior, agentic multi-step task execution, and precise custom tool use

According to Google DeepMind's official model card, Gemini 3.1 Pro is “Google's most advanced model for complex tasks” as of its publication date and is natively multimodal, processing text, images, audio, video, and entire code repositories.

Official API Specifications (Google AI for Developers)

The following technical specifications are sourced directly from the official Gemini API documentation at ai.google.dev:

Property	Details
Model code	`gemini-3.1-pro-preview`
Custom tools endpoint	`gemini-3.1-pro-preview-customtools`
Input modalities	Text, Image, Video, Audio, PDF
Output modality	Text
Input token limit	1,048,576 (~1,500 A4 pages)
Output token limit	65,536 tokens
Knowledge cutoff	January 2025
Release date	February 2026 (Preview)
Batch API	Supported
Context caching	Supported
Code execution	Supported
Function calling	Supported
Search grounding	Supported
Structured outputs	Supported
Thinking / reasoning	Supported
URL context	Supported
Live API	Not supported
Image generation	Not supported
Audio generation	Not supported

إن `gemini-3.1-pro-preview-customtools` Endpoint

For developers building agentic workflows with custom tools (such as view_file, search_code, or bash), Google provides a dedicated endpoint that prioritizes custom tool calls over general-purpose behavior. This endpoint is distinct from the standard gemini-3.1-pro-preview and is specifically optimized for:

Multi-step agentic pipelines
Bash and terminal tool use
Code-centric reasoning workflows

Note: إن customtools endpoint may show quality fluctuations in use cases that do not involve custom tool calling.

Gemini 3.1 Pro Benchmark Results

Gemini 3.1 Pro was evaluated on 16 industry-standard benchmarks. It achieved first place on 13 of 16, with results verified as of February 2026.

Head-to-Head Benchmark Comparison

Benchmark	Gemini 3.1 Pro	Gemini 3 Pro	Claude Opus 4.6	GPT-5.2
ARC-AGI-2 (abstract reasoning)	77.1%	~31.1%	68.8%	—
GPQA Diamond (expert science)	94.3%	—	91.3%	92.4%
Humanity's Last Exam (no tools)	44.4%	37.5%	—	34.5%
Humanity's Last Exam (with tools)	51.4%	—	53.1%	—
SWE-Bench Verified (agentic coding)	80.6%	—	—	—
Terminal-Bench 2.0	68.5%	—	—	77.3%*
APEX-Agents (long-horizon tasks)	33.5%	18.4%	29.8%	23.0%
SWE-Bench Pro (Public)	54.2%	—	—	56.8%*
GDPval-AA Elo (expert tasks)	1317	—	—	—

*GPT-5.3-Codex scores reported on limited benchmarks only. Claude Sonnet 4.6 leads GDPval-AA Elo with 1633.

Key takeaways from benchmarks:

Gemini 3.1 Pro's biggest leap is on ARC-AGI-2, where it more than doubled its predecessor's score and leads all known competitors.
It leads on APEX-Agents, nearly doubling Gemini 3 Pro's score — the clearest signal of improved agentic workflow performance.
Claude Opus 4.6 leads on Humanity's Last Exam (with tools) and GDPval-AA expert tasks. GPT-5.3-Codex leads on terminal and SWE-Bench Pro coding tasks.
The competitive AI landscape remains multi-winner: no single model dominates every benchmark.

Gemini 3.1 Pro API Pricing

Google maintained the same pricing as Gemini 3 Pro, making Gemini 3.1 Pro a significant value upgrade:

Token Range	Input Price	Output Price
Up to 200,000 tokens	$2.00 / 1M tokens	$12.00 / 1M tokens
200,000 – 1,000,000 tokens	$4.00 / 1M tokens	$18.00 / 1M tokens

Context: At $2/1M input tokens, Gemini 3.1 Pro is priced at less than half the cost of Claude Opus 4.6, while achieving broadly comparable benchmark scores across most evaluations. This price-performance ratio makes it particularly compelling for enterprise teams with high-volume API usage.

Gemini 3.1 Pro vs. Competitors: Decision Table

Criteria	Gemini 3.1 Pro	Claude Opus 4.6	GPT-5.2	DeepSeek V3.2
Best abstract reasoning	✅ #1 ARC-AGI-2 (77.1%)	❌	❌	❌
Best agentic workflows	✅ #1 APEX-Agents	❌	❌	❌
Best expert task ELO	❌	❌	❌	❌
Best science knowledge	✅ #1 GPQA Diamond	❌	❌	❌
Best long-context tool use	❌ (Claude wins HLE+tools)	✅	❌	❌
Lowest price (comparable tier)	✅ ~$2/1M input	❌ ($5+)	❌	✅ (cheaper)
1M token context window	✅	✅	✅	❌
Custom tools API endpoint	✅	❌	❌	❌
Multimodal input	✅ Text, Image, Audio, Video, PDF	✅	✅	❌

Choose Gemini 3.1 Pro if: You need top-tier reasoning, agentic coding, Google Cloud integration, or the best price-performance ratio among frontier reasoning models.

Consider alternatives if: You need the absolute best performance on long-horizon expert tasks with tool use (Claude Opus 4.6) or terminal-specific coding benchmarks (GPT-5.3-Codex).

Where to Access Gemini 3.1 Pro

For Developers

Google AI Studio — Free to try at aistudio.google.com; use model string gemini-3.1-pro-preview
Gemini API — Full programmatic access; supports Batch API, caching, function calling, structured outputs
Gemini CLI — Command-line access at geminicli.com
Google Antigravity — Google's agentic development platform
Android Studio — Native integration for Android app development

For Enterprise

Vertex AI — Full Google Cloud integration with enterprise SLAs
Gemini Enterprise — Workspace-integrated access for business teams

For Consumers

Gemini App — Available now with higher limits for Google AI Pro and Ultra subscribers
NotebookLM — Exclusively available to Pro and Ultra plan users

Key Capabilities: What Gemini 3.1 Pro Can Do

Gemini 3.1 Pro is engineered for tasks where a simple answer is not enough. Documented real-world applications include:

Animated SVG generation from text prompts — website-ready, code-based, infinitely scalable, small file size
Live data dashboard creation — building a real-time ISS orbital telemetry dashboard from a public API
3D interactive design — generating a starling murmuration simulation with hand-tracking and generative audio
Literary-to-UI translation — converting the themes of Wuthering Heights into a fully functional web portfolio
Long-document synthesis — processing contracts, research papers, and codebases up to 1M tokens in a single request
Multi-step agentic tasks — reliable execution of complex, real-world professional workflows via the APEX-Agents benchmark

Safety and Trust

According to Google DeepMind's model card, Gemini 3.1 Pro was evaluated under the Frontier Safety Framework across five risk areas:

CBRN (chemical, biological, radiological, and nuclear) information risks
Cybersecurity
Harmful manipulation
Machine learning research risks
Misalignment

The model remained below critical thresholds in all five categories. Automated content safety evaluations also showed marginal improvements over Gemini 3 Pro: +0.10% in text-to-text safety and +0.11% in multilingual safety.

FAQ: Gemini 3.1 Pro Common Questions

Q: What is the model code for Gemini 3.1 Pro in the API? The standard model code is gemini-3.1-pro-preview. For agentic workflows with custom tools, use gemini-3.1-pro-preview-customtools.

Q: What is Gemini 3.1 Pro's context window? The input token limit is 1,048,576 tokens (approximately 1,500 A4 pages). The output token limit is 65,536 tokens.

Q: What benchmarks does Gemini 3.1 Pro lead? It leads 13 of 16 evaluated benchmarks, including ARC-AGI-2 (77.1%), GPQA Diamond (94.3%), Humanity's Last Exam without tools (44.4%), SWE-Bench Verified (80.6%), and APEX-Agents (33.5%).

Q: Does Gemini 3.1 Pro beat Claude Opus 4.6 and GPT-5.2 on all benchmarks? No. While Gemini 3.1 Pro leads most evaluations, Claude Opus 4.6 outperforms it on Humanity's Last Exam with tools (53.1% vs 51.4%) and GDPval-AA expert tasks. GPT-5.3-Codex leads on terminal-specific coding benchmarks.

Q: What does Gemini 3.1 Pro cost? $2.00 per 1M input tokens and $12.00 per 1M output tokens for prompts under 200,000 tokens. Long-context pricing (200K–1M tokens) is $4.00 input / $18.00 output per 1M tokens.

Q: Does Gemini 3.1 Pro support image and video input? Yes. The model is natively multimodal and accepts text, image, video, audio, and PDF inputs, outputting text.

Q: Is Gemini 3.1 Pro available for free? Developers can try it for free in Google AI Studio. Consumer access via the Gemini app and NotebookLM requires a Google AI Pro or Ultra subscription for higher usage limits.

Q: What is the knowledge cutoff for Gemini 3.1 Pro? The knowledge cutoff is January 2025, per the official Google AI developer documentation.

Q: Is Gemini 3.1 Pro generally available? Not yet. As of February 2026, it is in preview. Google has stated general availability is coming soon, pending validation of agentic workflow improvements.

Q: What file types can Gemini 3.1 Pro process via the API? Through the Gemini API, it supports text, images, video, audio files, and PDFs as inputs, all within the 1M token context window limit.

TOP-Rated Vertu Products

The New Agent Q

Quantum Flip

Metavertu Curve