Google DeepMind releases Gemini Robotics -ER 1.6, bringing enhanced embodied reasoning and instrument reading to physical AI

Google DeepMind research team introduced Gemini Robotics-ER 1.6, a significant upgrade to its embodied reasoning model designed to serve as the ‘cognitive brain’ of robots operating in real-world environments. The model specializes in reasoning capabilities critical for robotics, including visual and spatial understanding, task planning, and success detection — acting as the high-level reasoning model for a robot, capable of executing tasks by natively calling tools like Google Search, vision-language-action models (VLAs), or any other third-party user-defined functions.

Google DeepMind uses a dual model approach for robotic AI. Gemini Robotics 1.5 is the vision-language-action (VLA) model — it processes visual inputs and user prompts and directly translates them into physical motor commands. Gemini Robotics ER is an embodied reason model. It specializes in planning and understanding spatial environments, as well as making logical decision. However, it does not control the robotic limbs. It provides higher-level information to the VLA to assist in deciding what action to take next. Think of it as the difference between a strategist and an executor — Gemini Robotics-ER 1.6 is the strategist.

https://deepmind.google/blog/gemini-robotics-er-1-6/?

Gemini Robotics ER 1.6 – What’s New?

Gemini Robotics -ER 1.6 is a significant upgrade over Gemini Robotics -ER 1.5 as well as Gemini 3.0 Flash. It enhances spatial and physical reasoning abilities such pointing, counting and success detection. It is important to note that the new version has a key feature: it can read instruments. This capability was absent in earlier versions.

The Foundation of Spatial Reasoning is Pointing

Pointing — the model’s ability to identify precise pixel-level locations in an image — is far more powerful than it sounds. Points can be used to express spatial reasoning (precision object detection and counting), relational logic (making comparisons such as identifying the smallest item in a set, or defining from-to relationships like ‘move X to location Y’), motion reasoning (mapping trajectories and identifying optimal grasp points), and constraint compliance (reasoning through complex prompts like “point to every object small enough to fit inside the blue cup”).

https://deepmind.google/blog/gemini-robotics-er-1-6/?

Gemini Robotics 1.6 shows a significant advantage when compared to its predecessor in benchmarks conducted internally. Gemini Robotics-ER 1.6 correctly identifies the number of hammers, scissors, paintbrushes, pliers, and garden tools in a scene, and does not point to requested items that are not present in the image — such as a wheelbarrow and Ryobi drill. Gemini Robotics ER 1.6, in contrast, does not correctly count hammers and paintbrushes. It also fails to recognize scissors completely, while hallucinating a ‘wheelbarrow. For AI Robotics professionals this matters because hallucinated object detections in robotic pipelines can cause cascading downstream failures — a robot that ‘sees’ an object that isn’t there will attempt to interact with empty space.

Success Detection Multi-View Reasoning

It is important to know when an action is completed in robotics. A critical part of the decision-making process is success detection. It allows an agent intelligently to choose whether or not to repeat a failed effort.

It’s harder than it appears. The majority of modern robotics sets include multiple cameras, including overhead feeds and wrist mounted feeds. A system must be able to comprehend how the different views combine in order to create a cohesive picture, both at any given moment and over time. Gemini Robotics -ER 1.6 is a multi-view reasoning system that can better combine information from different camera streams even when the environment changes or becomes occluded.

Real World Breakthrough in Instrument Reading

The genuinely new capability in Gemini Robotics-ER 1.6 is instrument reading — the ability to interpret analog gauges, pressure meters, sight glasses, and digital readouts in industrial settings. Boston Dynamics is focusing on facility inspections, which are a key part of the task. Spot, Boston Dynamics’ robot, can inspect instruments at a facility. It will then take pictures of the devices and send them to Gemini Robotics.

Instrument reading requires complex visual reasoning: one must precisely perceive a variety of inputs — including the needles, liquid level, container boundaries, tick marks, and more — and understand how they all relate to each other. This is especially important when estimating the amount of liquid in a sightglass, while taking into account distortions from the camera’s perspective. The gauges usually have text that describes the unit and must be interpreted. Some have needles with different decimal places, but all need to be added together.

Gemini Robotics -ER 1.6 reads instruments using agentic visualisation (a feature that combines code execution with visual reasoning, first introduced in Gemini 3.0 Flash but now extended by Gemini Robotics -ER 1.6). Models take intermediate steps. First, they zoom into an image in order to better understand the small details of a gauge. Then, using code execution and pointing to calculate proportions and intervals. Finally, world knowledge is used to interpret meaning.

Gemini Robotics ER 1.5 achieved a 23 % success rate in instrument reading. Gemini Robotics ER 1.6 reached 86%. Gemini Robotics ER 1.6 with Agentic Vision hit 93%. Gemini Robotics ER 1.5 has not been evaluated with agentic vision, as it is unable to support this capability. Other three models had agentic vision for the reading of instruments enabled, which made the 23 percent baseline more than a performance gap. For AI developers evaluating model generations, this distinction matters — you are not comparing apples to apples across the full benchmark column.

The Key Takeaways

Gemini Robotics ER 1.6 has a reasoning, but not an actions model: It acts as the high-level ‘brain’ of a robot — handling spatial understanding, task planning, and success detection — while the separate VLA model (Gemini Robotics 1.5) handles the actual physical motor commands.
The power of pointing is greater than you think: Gemini Robotics-ER 1.6’s pointing capability goes far beyond simple object detection — it enables relational logic, motion trajectory mapping, grasp point identification, and constraint-based reasoning, all of which are foundational to reliable robotic manipulation.
This is the most important new feature.: Built in collaboration with Boston Dynamics’ Spot robot for industrial facility inspection, Gemini Robotics-ER 1.6 can now read analog gauges, pressure meters, and sight glasses with 93% accuracy using agentic vision — up from just 23% in Gemini Robotics-ER 1.5, which lacked the capability entirely.
Autonomy is possible only when there’s a successful detection.: Knowing when a task is actually complete — across multiple camera views, in occluded or dynamic environments — is what allows a robot to decide whether to retry or move to the next step without human intervention.

Check out the Technical details You can also find out more about the following: Model Information. Also, feel free to follow us on Twitter Don’t forget about our 130k+ ML SubReddit Subscribe now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

You can partner with us to promote your GitHub Repository OR Hugging Page OR New Product Launch OR Webinar, etc.? Connect with us

Google DeepMind releases Gemini Robotics -ER 1.6, bringing enhanced embodied reasoning and instrument reading to physical AI

OpenAI’s GPT-5.4 Cyber: A Finely Tuned Model for Verified Security Defenders

Code Implementation for an AI-Powered Pipeline to Detect File Types and Perform Security Analysis with OpenAI and Magika

TabPFN’s superior accuracy on tabular data sets is achieved by leveraging in-context learning compared to Random Forest or CatBoost

Moonshot AI Researchers and Tsinghua Researchers propose PrfaaS, a cross-datacenter KVCache architecture that rethinks how LLMs can be served at scale.

Gemini 3 pro: I’m the Next leap in Intelligence

Why are Therapists so Expensive? ChatGPT: Why are thousands of women spilling their deepest secrets?

Meta and Mercor Pause Work After Breach of Data Puts AI Industry Secrets in Danger

A toy AI exposed 50,000 logs of its chats with kids for anyone who has a Gmail account

Google Gemini is taking control of humanoid robots in auto factory floors

Top Insights

Amazon Has New Frontier AI Models—and a Way for Customers to Build Their Own

Montezuma’s Revenge: PoE-World+ Planner outperforms the Reinforcement learning RL baselines with minimal demonstration data

Latest News

OpenAI’s GPT-5.4 Cyber: A Finely Tuned Model for Verified Security Defenders

Code Implementation for an AI-Powered Pipeline to Detect File Types and Perform Security Analysis with OpenAI and Magika

Google DeepMind releases Gemini Robotics -ER 1.6, bringing enhanced embodied reasoning and instrument reading to physical AI

Gemini Robotics ER 1.6 – What’s New?

The Foundation of Spatial Reasoning is Pointing

Success Detection Multi-View Reasoning

Real World Breakthrough in Instrument Reading

The Key Takeaways

Related Posts