Google Unveils Gemini 2.5 AI Model for Enhanced Browser Interaction
Google has introduced a new AI model, known as Gemini 2.5 Computer Use, which is designed to navigate and operate within web browsers, enabling AI agents to perform tasks typically reserved for human users. This advancement allows for clicking, scrolling, and typing within browser environments to access data otherwise unavailable through an API, reports 24brussels.
The Gemini 2.5 model utilizes “visual understanding and reasoning capabilities” to interpret user requests and execute actions, such as filling out and submitting forms. This functionality can be applied in various scenarios, including user interface testing, where direct API access may not be possible.
Google’s latest announcement comes shortly after OpenAI unveiled new applications for its ChatGPT during its annual Dev Day. This push emphasizes the ongoing competition in the AI space, particularly with features that automate complex tasks for users. Meanwhile, Anthropic has already released an AI model with similar capabilities under its Claude brand.
The tech giant has shared demo videos demonstrating the capabilities of its new tool, stating that these demonstrations are sped up by three times for viewing convenience. Google asserts that the Gemini 2.5 model outperforms leading alternatives on several web and mobile benchmarks. Notably, it functions exclusively within a browser, as opposed to other tools that integrate with a broader computer environment.
Currently, Gemini 2.5 Computer Use can complete 13 specific actions, including opening web browsers and performing drag-and-drop tasks. Developers can access this model through Google AI Studio and Vertex AI, with a demo available on Browserbase showcasing its functionality, including activities like playing a game of 2048 and browsing trending topics on Hacker News.