Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks
  • The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs
  • Schematik Is ‘Cursor for Hardware.’ The Anthropics Want In
  • Hacking the EU’s new age-verification app takes only 2 minutes
  • Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale
  • This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.
  • The Huey Code Guide: Build a High-Performance Background Task Processor Using Scheduling with Retries and Pipelines.
  • Top 19 AI Red Teaming Tools (2026): Secure Your ML Models
AI-trends.todayAI-trends.today
Home»Tech»Google AI Gemini 2.5 Computer Use (Preview), A browser-control Model that will allow AI agents to interact with interfaces.

Google AI Gemini 2.5 Computer Use (Preview), A browser-control Model that will allow AI agents to interact with interfaces.

Tech By Gavin Wallace08/10/20254 Mins Read
Facebook Twitter LinkedIn Email
This AI Paper Introduces MMaDA: A Unified Multimodal Diffusion Model
This AI Paper Introduces MMaDA: A Unified Multimodal Diffusion Model
Share
Facebook Twitter LinkedIn Email

What browser tasks would you be willing to delegate if an agent was able to plan and execute predefined UI action? Google AI introduces Gemini 2.5 Computer UseGemini 2.5 is an advanced version that can plan and execute. Real UI action It’s possible to do this in real time via an action API. It’s The public can now preview the new version of this app Through Google AI Studio You can also find out more about the following: Vertex AI. Model targets automated web testing and UI, with gains documented by humans on web/mobile benchmarks, as well as a safety feature that requires human verification for any risky step.

The model that ships?

The new term for developers is “new”. computer_use Tool that Returns function calls The following are some examples of how to use click_at, type_text_atOr drag_and_drop. Client code executes the action (e.g., Playwright/Browserbase), captures a fresh screenshot/URL, and loops until the task ends or a safety rule blocks it. The The supported action area is 13 predefined actions—open_web_browser, wait_5_seconds, go_back, go_forward, You can search for the best deal by clicking here., You can navigate to this page by clicking here., click_at, hover_at, type_text_at, key_combination, scroll_document, scroll_at, drag_and_drop—and can be Custom functions to extend the functionality (e.g., open_app, long_press_at, go_homeFor non-browser surface.

https://blog.google/technology/google-deepmind/gemini-computer-use-model/

How does the law affect you??

This model has a slant. Optimized for Web Browsers. Google claims it to be Not yet optimized for desktop OS level controlUsing the same loop, mobile scenarios can be created using custom actions. Built-in safety monitoring can stop prohibited actions and require confirmation by the user. “high-stakes” Operation (payments and sending of messages or access to sensitive information).

Measuring performance

  • Online-Mind2Web (official): 69.0% pass@1 This is a human-majority voting judgment, validated by the benchmark organizers.
  • Browserbase matched harness: Leads Both APIs are competing for computer use. Accuracy and Latency You can find out more about this by clicking here. Online-Mind2Web You can also find out more about the following: WebVoyager under identical time/step/environment constraints. Google’s model card lists 65.7% (OM2W) You can also find out more about the following: 79.9% (WebVoyager) Runs Browserbase
  • Latency/quality trade-off (Google figure): Accuracy of 70%+ at 225 s Browserbase OM2W harness median latency. Consider as Google-reported with human evaluation.
  • AndroidWorld (mobile) 69.7% Google measures; this is done via the same API as with Custom mobile Actions You can exclude browser actions.
https://blog.google/technology/google-deepmind/gemini-computer-use-model/

Early Production Signals

  • Automatic UI Test Repair: Google Payment Platform team reports model rehabilitates >60% executing automated UI testing that had previously failed. The public report should be given credit for this rather than the main blog.
  • Operational Speed: Poke.com Workflows of (early-external tester) reports often ~50% faster The next best alternative.

Gemini 2.5 Computer Use, a restricted API that exposes 13 documented UI commands and requires a server-side executor is available in preview through Google AI Studio. Google’s materials and model cards report the latest results in web/mobile control, while Browserbase’s harness shows that Online-Mind2Web is passed at 65.7% with the lowest latency. This scope is browser centric with step safety/confirmation. These data points are evidence of the need for a measured evaluation when it comes to UI Testing and Web Ops.


Click here to find out more GitHub Page You can also find out more about the following: Technical details. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe Now our Newsletter.


Michal is a professional in data science with a Masters of Science degree from the University of Padova. Michal Sutter excels in transforming large datasets to actionable insight. He has a strong foundation in statistics, machine learning and data engineering.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

AI Google
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

19/04/2026

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

18/04/2026

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

18/04/2026
Top News

Anthropic Claims Pentagon Feud Cost It Billions

Real Estate Is Entering Its AI Slop Era

Anthropic Uses Claude Chats as Training Data. You can opt out.

What is Adobe Firefly? Learn How To Use This Generative AI Tool

Digital Marketing Courses to Sell Digital Marketing Courses • AI Blog

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

NVIDIA Released Audio Flamingo 3 : A Model Open Source for Advancing General Audio Intelligence

16/07/2025

‘She’s Never Going to Age’: Porn Stars Are Embracing AI Clones to Stay Forever Young

26/03/2026
Latest News

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

19/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.