What browser tasks would you be willing to delegate if an agent was able to plan and execute predefined UI action? Google AI introduces Gemini 2.5 Computer UseGemini 2.5 is an advanced version that can plan and execute. Real UI action It’s possible to do this in real time via an action API. It’s The public can now preview the new version of this app Through Google AI Studio You can also find out more about the following: Vertex AI. Model targets automated web testing and UI, with gains documented by humans on web/mobile benchmarks, as well as a safety feature that requires human verification for any risky step.
The model that ships?
The new term for developers is “new”. computer_use Tool that Returns function calls The following are some examples of how to use click_at, type_text_atOr drag_and_drop. Client code executes the action (e.g., Playwright/Browserbase), captures a fresh screenshot/URL, and loops until the task ends or a safety rule blocks it. The The supported action area is 13 predefined actions—open_web_browser, wait_5_seconds, go_back, go_forward, You can search for the best deal by clicking here., You can navigate to this page by clicking here., click_at, hover_at, type_text_at, key_combination, scroll_document, scroll_at, drag_and_drop—and can be Custom functions to extend the functionality (e.g., open_app, long_press_at, go_homeFor non-browser surface.
How does the law affect you??
This model has a slant. Optimized for Web Browsers. Google claims it to be Not yet optimized for desktop OS level controlUsing the same loop, mobile scenarios can be created using custom actions. Built-in safety monitoring can stop prohibited actions and require confirmation by the user. “high-stakes” Operation (payments and sending of messages or access to sensitive information).
Measuring performance
- Online-Mind2Web (official): 69.0% pass@1 This is a human-majority voting judgment, validated by the benchmark organizers.
- Browserbase matched harness: Leads Both APIs are competing for computer use. Accuracy and Latency You can find out more about this by clicking here. Online-Mind2Web You can also find out more about the following: WebVoyager under identical time/step/environment constraints. Google’s model card lists 65.7% (OM2W) You can also find out more about the following: 79.9% (WebVoyager) Runs Browserbase
- Latency/quality trade-off (Google figure): Accuracy of 70%+ at 225 s Browserbase OM2W harness median latency. Consider as Google-reported with human evaluation.
- AndroidWorld (mobile) 69.7% Google measures; this is done via the same API as with Custom mobile Actions You can exclude browser actions.

Early Production Signals
- Automatic UI Test Repair: Google Payment Platform team reports model rehabilitates >60% executing automated UI testing that had previously failed. The public report should be given credit for this rather than the main blog.
- Operational Speed: Poke.com Workflows of (early-external tester) reports often ~50% faster The next best alternative.
Gemini 2.5 Computer Use, a restricted API that exposes 13 documented UI commands and requires a server-side executor is available in preview through Google AI Studio. Google’s materials and model cards report the latest results in web/mobile control, while Browserbase’s harness shows that Online-Mind2Web is passed at 65.7% with the lowest latency. This scope is browser centric with step safety/confirmation. These data points are evidence of the need for a measured evaluation when it comes to UI Testing and Web Ops.
Click here to find out more GitHub Page You can also find out more about the following: Technical details. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe Now our Newsletter.


