Google Gemini 3 Pro transforms sparse MoE tokens and contexts into an engine that can handle multimodal workloads.

How can we go from language models which only respond to prompts and move on to systems capable of reasoning over millions token contexts? Google has just launched the Gemini 3 family Gemini 3 Pro, as its centerpiece, is a step in the right direction towards more general AI. Gemini 3 has been described as the most advanced model by researchers, boasting state of art reasoning capabilities, multimodal comprehension, improved agentic coding, and enhanced vibe coding. Gemini 3 Pro is now available in preview. It’s already integrated into Google AI Studio and Vertex AI as well as the Gemini App, AI Mode for Search, Gemini API and Google Antigravity.

Transformer MoE spare with token 1M

Gemini 3 Pro has a mixture of sparse experts that can be used to transform text, images and audio into video. The sparse MoE layer routes each token to only a subset of experts. This allows the model to scale its parameter count while not incurring disproportionate compute costs per token. The model is able to generate 64k tokens from inputs that can range up to 1M. This can be useful for long transcripts or code bases. This model was trained completely from scratch, not as a Gemini 2.5 fine-tune.

Data for training includes public text on the web, code, images, audios, videos, licensed data (such as user interactions), and synthetic data. The post-training uses reinforcement learning and multimodal instruction tuning from critic and human feedback in order to improve behaviours like multi-step reasoning, problem solving, and theorem proving. The system is based on Google Tensor Processing Units (TPUs), with the training being implemented using JAX, ML Pathways and multimodal instruction tuning.

Academic style and benchmarks for reasoning

Gemini 3 Pro is clearly superior to Gemini 2.5 Pro on public benchmarks and competitive with frontier models like GPT 5.1 or Claude Sonnet 4.5. Humanity’s last exam, which aggregates PhD-level questions from many different scientific and humanities fields, gives Gemini 3 Pro a score of 37.5 percent, without using any tools. This compares to Gemini 2.5 Pro at 21.6 percent, GPT 5.1 at 26.5 percent, and Claude Sonnet 4.5 with 13.7 percent. Gemini 3 Pro achieves 45.8 per cent with code execution and search enabled.

Gemini 3 Pro scored 31.1 percent on ARC AGI 2, compared to 4.9 percent in Gemini 2.5 Pro. It was also ahead of GPT 5.1, which had a score of 17.6 percent, and Claude Sonnet 4.5, with a score 13.6 percent. Gemini 3 Pro reaches a score of 91.9 percent for answering scientific questions on GPQA diamond, just ahead GPT 5.1 with 88.1 percent. Claude Sonnet 4.5 is at 83.4. The math model scores 95.0 on AIME2025 with no tools and 1000.0 on code execution.

https://blog.google/products/gemini/gemini-3/#learn-anything

Long context and multimodal comprehension

Gemini 3 Pro was designed to be a multimodal native model, not a text-based model. Gemini 3 Pro scored 81.0 percent on MMMU Pro which is a multimodal reasoning test across a wide range of university-level subjects. This compares to 68.0 percent in Gemini 2.5 Pro, Claude Sonnet 4.5 and 76.0 for GPT 5.1. Gemini 3 Pro reaches 87.6 % in Video MMMU (which evaluates the knowledge acquired from videos), ahead of Gemini 2 Pro (83.6 %) and other frontier models.

Also, the user interface and document comprehension are stronger. ScreenSpot Pro is a benchmark that measures the accuracy of locating screen elements. Gemini 3 Pro scored 72.7 percent compared with 11.4 percent in Gemini 2.5 Pro and 36.2 percent in Claude Sonnet 4.5. GPT 5.1 was rated at 3.5 percent. Gemini 3 Pro achieved a score of 0.115 on OmniDocBench 1,5, which measures the overall edit distance in OCR and structured documents understanding. This is lower than any baseline included in this comparison.

Gemini 3 Pro with eight needle retrieval is tested on MRCR v2 for context lengths greater than 128k. It scores 77.0 percent at 128k context average, but at 1M tokens pointwise, the score is 26.3 percent. This puts it ahead of Gemini 2.5 Pro, which scored 16.4 percent.

Google Antigravity, Agents and Coding

The main focus for software developers is agentic and coding behaviour. Gemini 3 Pro ranks first on the LMArena leaderboard, with an Elo of 1501, and has achieved 1487 in WebDev Arena (which evaluates tasks related to web development). Terminal Bench 2.0 tests the user’s ability to control a computer via a terminal using an agent. It scores 54.2 percent. This is higher than GPT 5.1, at 47.6 per cent, Claude Sonnet 4.5, at 42.8%, and Gemini 2. Pro, at 32.6%. Gemini 3 Pro scored 76.2 percent on SWE Bench Verified. This measure single-attempt code changes in GitHub issues.

Gemini 3 Pro also performs well on τ² Bench 2 evaluates long-term planning in a business simulation, producing a net worth mean of 5478.16 dollar compared to 573.64 dollars with Gemini 2.5 Pro or 1473.43 dollars with GPT 5.1.

Google Antigravity, a development environment based on agents first, exposes all of these features. Antigravity is a combination of Gemini 3 Pro and Gemini 2.5 Computer Use for browser control as well as the Nano Banana Image Model. This allows agents to plan, create code, test it on the terminal and browser and then verify the results in a single workflow.

What you need to know

Gemini 3 Pro has a mixture of an experts transformator with multimodal native support, and a context window that can hold up to 1M tokens. It is designed for reasoning on a large scale over long inputs.
This model is comparable to GPT 5.1, Claude Sonnet 4.5, ARC AGI 2 and MathArena Apex. It also shows significant gains over Gemini 2.5 Pro in difficult reasoning benchmarks, such as Humanity’s Last Exam.
Gemini 3 Pro is a multimodal tool that delivers impressive results on benchmarks such as MMMU Pro (multimodal multimedia understanding), Video MMMU (multimodal video comprehension), ScreenSpot Pro, and OmniDocBench. These are designed to test university-level questions, complex documents or UI comprehension, and videos.
Coding and agentic use cases are a primary focus, with high scores on SWE Bench Verified, WebDev Arena, Terminal Bench and tool use and planning benchmarks such as τ2 bench and Vending Bench 2.

Gemini 3 Pro is a clear escalation of Google’s AGI Strategy, combining sparse combination of experts’ architectures, 1M token context and strong performances on ARC AGI 2, GPQA Diamond, Humanity’s last Exam, MathArena Apex, MMMU Pro, and WebDev Arena. Gemini 3 Pro’s emphasis on the use of tools, browser and terminal control, and evaluation within Frontier Safety Framework makes it an API-ready workhorse that can be used for producing agentic systems. Gemini 3 Pro, as a whole, is an agent-focused, benchmark driven response to large scale multimodal AI.

Take a look at the Technical details The following are some examples of how to get started: Docs. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.

Max is a Silicon Valley-based AI analyst who shapes the future technology. Max teaches robots at Brainvyne and combats spam through ComplyEmail. He uses AI to transform complex technology advancements every day into understandable, clear insights.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

Google Gemini 3 Pro transforms sparse MoE tokens and contexts into an engine that can handle multimodal workloads.

GitNexus, an Open-Source Knowledge Graph Engine that is MCP Native and Gives Claude Coding and Cursor Complete Codebase Structure Awareness

Deepgram Python SDK Implementation for Transcription and Async Processing of Audio, Async Text Intelligence, and Async Text Intelligence.

DeepSeek AI releases DeepSeek V4: Sparse attention and heavily compressed attention enable one-million-token contexts.

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

I Let AI Agents Plan My Vacation—and It Wasn’t Terrible

OpenAI’s president gave millions to Trump. OpenAI’s President Gave Millions to Trump.

Some of them Were Scary Good. They were all pretty scary.

Elon Musk Is Rolling xAI Into SpaceX—Creating the World’s Most Valuable Private Company

Apple Intelligence is a Gambler on Privacy As A Killer Feature

Top Insights

Meta has finally been held responsible for the harm Meta caused to teens. The next step?

WIRED| WIRED

Latest News

Anthropic Mythos is Unauthorized by Discord Sleuths

Ace the Ping Pong Robot can Whup your Ass

Google Gemini 3 Pro transforms sparse MoE tokens and contexts into an engine that can handle multimodal workloads.

Transformer MoE spare with token 1M

Academic style and benchmarks for reasoning

Long context and multimodal comprehension

Google Antigravity, Agents and Coding

What you need to know

Related Posts