Google releases an update updated version of Gemini 2.5 Flash and Gemini 2.5 Flash-Lite preview models across AI Studio and Vertex AI, plus rolling aliases—gemini-flash-latest The following are some examples of how to get started: gemini-flash-lite-latest—that always point to the newest preview in each family. For production stability, Google advises pinning fixed strings (gemini-2.5-flash, gemini-2.5-flash-lite). Google will send a notice via email two weeks before it retargets a webpage. -latest The alias is a note that Rate limits, features and costs may differ between alias upgrades.
What really changed?
- FlashThe newest version of the bestselling ‘Improved Use of agentic tools More efficient “thinking” (multi-pass reasoning). Google reports. +5 Point “Lift on” SWE-Bench Verified Compared to the May preview48.9% → 54.0%The code navigation/long-horizon planning is improved with this.
- Flash-LiteTuned to stricter Please follow the instructions below, Reduced verbosityStronger and more. multimodal/translation. Google’s internal graph shows Output tokens are reduced by 50% Flash-Lite There are 24% less people in the world. Flash is a service that reduces the amount of output tokens and time spent on wall clocks in services requiring high throughput.

Artificial Analysis, the account that runs the AI benchmarking website was received Get pre-release access The thread and companion pages: Highlights from the thread and companion pages Highlights and links from the thread
- It is a measure of the amount that can be produced.: In endpoint tests, Gemini Flash-Lite 2.5 (Preview: 09-2025, Reasoning) It is reported that the Fastest proprietary model They track around The output tokens is 887 AI Studio can be used in any setup.
- Intelligence index deltasPreviews for September Flash The following are some examples of how to get started: Flash-Lite Improve on the Artificial Analysis aggregate “intelligence” Scores compared to previous stable releases.
- Token efficiency: The thread reiterates Google’s own reduction claims (−24% Flash, −50% Flash-Lite) and frames the win as cost-per-success improvements for tight latency budgets.
Budgets for cost surface and context (for choosing deployment options)
- Price list for the Flash-Lite GA You can learn more about it here. $10.00 / 1000 input tokens The following are some examples of how to get started: Output tokens: $0.40 per 1000 Google’s GA posting from July and DeepMind’s model page The baseline for verbosity is the level where immediate savings can be realized by reducing it.
- ContextFlash-Lite Support ~1M-token Context with Configurable “thinking budgets” and tool connectivity (Search grounding, code execution)—useful for agent stacks that interleave reading, planning, and multi-tool calls.
“Browser-agent” angle and o3 claim
The claim is that the “new Gemini Flash has o3-level accuracy, but is 2× faster and 4× cheaper on browser-agent tasks.” The following is a list of all the languages that are spoken in this country. community-reportedNot in Google’s post. The likely cause is private/limited tasks suites (DOM Navigation, Action Planning) that have specific budgets and times. Do not treat this as the truth. Use it to make your own assessments.
Guideline for Teams
- The pin versus the chase
-latestYou should not rely on SLAs that are strict or limits fixed. The pin is a great way to get a grip on the situation. The stable strings. If you continuously canary for cost/latency/quality, the-latestGoogle provides two-week notice for the switchover of pointers. - Endpoints with high-quality service or those that use token metersStart by Flash-Lite preview; the verbosity and instruction-following upgrades shrink egress tokens. Validate long context and multimodal traces when under heavy production loads.
- Agent/tool pipelines: A/B Flash preview Google’s SWE Bench Verified Lift and Community Tokens/S figures indicate better planning when thinking within constrained budgets.
Model strings (current)
- Views:
gemini-2.5-flash-preview-09-2025,gemini-2.5-flash-lite-preview-09-2025 - Stable:
gemini-2.5-flash,gemini-2.5-flash-lite - The Rolling Aliases:
gemini-flash-latest,gemini-flash-lite-latest(pointer semantics; may change features/limits/pricing).
The following is a summary of the information that you will find on this page.
Google’s latest release tightens up security tool-use competence (Flash) and token/latency efficiency (Flash-Lite) and introduces -latest Use aliases to speed up iteration. External benchmarks in Artificial Analysis The word “indicate” is used to mean: Throughput The following are some examples of how to get started: intelligence-index Flash-Lite is now being tested as part of the previews for Sept. 2025. Fastest proprietary model In their harness. Validate on your workload—especially browser-agent stacks—before committing to the aliases in production.


