Google AI releases Gemini Pro 3.1 with 1,000,000 tokens and 77.1 per cent ARC AGI-2 reasoning.

Google’s Gemini has been re-launched in high gear. Gemini 3.1 ProThe first update to the Gemini 3 version. This release is not just a minor patch; it is a targeted strike at the ‘agentic’ AI market, focusing on reasoning stability, software engineering, and tool-use reliability.

This update marks a major transition for developers. We are moving from models that simply ‘chat’ to models that ‘work.’ Gemini 3.1 Pro is designed to be the core engine for autonomous agents that can navigate file systems, execute code, and reason through scientific problems with a success rate that now rivals—and in some cases exceeds—the industry’s most elite frontier models.

Massive Context, Precise Output

The handling of scale is one of the immediate improvements. Gemini 3.1 Professional Preview is a huge upgrade. 1M token Input context window. To put this in perspective for software engineers: you can now feed the model an entire medium-sized code repository, and it will have enough ‘memory’ to understand the cross-file dependencies without losing the plot.

But the truth is, it’s not the news that matters. 65k token Output limit. The 65k limit is an important step for long-form generator developers. Whether you are generating a 100-page technical manual or a complex, multi-module Python application, the model can now finish the job in a single turn without hitting an abrupt ‘max token’ wall.

Double Down on the Reasoning

If Gemini 3.0 was about introducing ‘Deep Thinking,’ Gemini 3.1 is about making that thinking efficient. Performance jumps against rigorous benchmarks can be notable.

Benchmark	Score	What It Measures
ARC-AGI-2	77.1%	Solving entirely new logic patterns
GPQA Diamond	94.1%	Science reasoning at the graduate level
SciCode	58.9%	Python Programming for Scientific Computing
Terminal-Bench Hard	53.8%	Use of terminals and agentic code
Humanity’s Last Examination (HLE),	44.7%	Justification of near-human limitations

It is important to note that the word “you” means a person. 77.1% The headline number is ARC-AGI-2. Google says this performance is twice as good as the Gemini 3 Pro. This means the model is much less likely to rely on pattern matching from its training data and is more capable of ‘figuring it out’ when faced with a novel edge case in a dataset.

https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/

The Agentic Toolkit: Custom Tools and ‘Antigravity‘

Google is clearly aiming for the terminal of developers. They launched an endpoint that is specialized along with their main model: gemini-3.1-pro-preview-customtools.

This endpoint was designed for developers that mix bash functions with their own custom commands. Previous versions of models struggled with prioritizing which tool to choose, and sometimes would use a false search, when simply reading a local file could have been sufficient. The Customtools This version is tuned specifically to give priority to tools such view_file You can also find out more about search_codeIt is a reliable foundation for agents that code autonomously.

The new release integrates seamlessly with Google AntigravityThe company has launched a new platform for agentic software development. The new can be used by developers. ‘medium’ thinking level. This allows you to toggle the ‘reasoning budget’—using high-depth thinking for complex debugging while dropping to medium or low for standard API calls to save on latency and cost.

API Changes and File Methods

There is an important but small breaking change for those who are already using the Gemini API. There are two ways to do this. Interactions API version 1 betaThe field total_reasoning_tokens Has been renamed total_thought_tokens. This change aligns with the ‘thought signatures’ introduced in the Gemini 3 family—encrypted representations of the model’s internal reasoning that must be passed back to the model to maintain context in multi-turn agentic workflows.

Data consumption has increased as well. File handling has been updated in several ways.

Downloads are limited to 100MB. Previous API upload limits of 20MB have been quadrupled. 100MB.
Direct YouTube Support The new abacus is now available. You Tube URL As a direct media source. The model ‘watches’ the video via the URL rather than requiring a manual upload.
Cloud Integration: Support for Cloud Storage buckets Pre-signed URLs from private databases can be used as data sources.

Intelligence and the Economics of Intelligence

Gemini 3.1 pro preview is priced aggressively. If you have fewer than 200k tokens to prompt, the input costs will be lower. $1 per one million tokensOutput is 12 dollars per million. In contexts above 200k the scales are $4 input to $18 output.

When compared to competitors like Claude Opus 4.6 or GPT-5.2, Google team is positioning Gemini 3.1 Pro as the ‘efficiency leader.’ According to data collected by Artificial AnalysisGemini 3.1 Pro is now the leader in their Intelligence Index, and costs about half as much as other frontier competitors.

What you need to know

The Context Window is Massive: 1 M/65 K This model is a 1M token Input window for data repositories and large scale datasets. Output limit is significantly increased to 65k tokens For long-form codes and document generation.
The leap in logic and reasoning: The Performance of the ARC-AGI-2 benchmark reached 77.1%The reasoning power of the new version is more than twice that of its predecessors. The software also reached a 94.1% On GPQA Diamond, for Graduate-level Science Tasks.
The Dedicated Agentic Endpoints Google has introduced a new specialized team gemini-3.1-pro-preview-customtools endpoint. Prioritization is specifically optimised for this endpoint. Take command of the bash The system tool (like view_file The following are some examples of how to get started: search_code() to make autonomous agents more reliable.
API Breaking Change As the field changes, developers must keep their codebases updated. total_reasoning_tokens Has been renamed total_thought_tokens In order to improve alignment with internal model interactions, the Interactions of v1beta API has been updated. “thought” processing.
Improved File and Media Management: File sizes can now be up to 20MB. 100MB. Developers can also now be passed. You Tube URLs The model can analyze the video without having to download files or upload them again.

Click here to find out more Technical details The following are some examples of how to get started: Try it here. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

Google AI releases Gemini Pro 3.1 with 1,000,000 tokens and 77.1 per cent ARC AGI-2 reasoning.

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

OpenAI designed GPT-5 so that it is safer. The software still produces gay slurs

The AI wearable from ex-Apple engineers looks like an iPod Shuffle

What are the Career Options for Tech-Skilled Kids?

I Converted My Photos Into Short Videos With AI on Honor’s Latest Phones. It’s Weird

Elon Musk’s Grokipedia pushes right-wing talking points

Top Insights

NVIDIA AI Releases Canary Qwen-2.5B : A State-of the-Art ASR/LLM Hybrid model with SoTA Performance in OpenASR Leaderboard

OpenClaw users are allegedly bypassing anti-bot system

Latest News

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI releases Gemini Pro 3.1 with 1,000,000 tokens and 77.1 per cent ARC AGI-2 reasoning.

Massive Context, Precise Output

Double Down on the Reasoning

The Agentic Toolkit: Custom Tools and ‘Antigravity‘

API Changes and File Methods

Intelligence and the Economics of Intelligence

What you need to know

Related Posts