Gemini Flash-Lite is now the fastest proprietary model (external tests) with 50% fewer output tokens.

Google releases an update updated version of Gemini 2.5 Flash and Gemini 2.5 Flash-Lite preview models across AI Studio and Vertex AI, plus rolling aliases—gemini-flash-latest The following are some examples of how to get started: gemini-flash-lite-latest—that always point to the newest preview in each family. For production stability, Google advises pinning fixed strings (gemini-2.5-flash, gemini-2.5-flash-lite). Google will send a notice via email two weeks before it retargets a webpage. -latest The alias is a note that Rate limits, features and costs may differ between alias upgrades.

https://developers.googleblog.com/en/continuing-to-bring-you-our-latest-models-with-an-improved-gemini-2-5-flash-and-flash-lite-release/

What really changed?

FlashThe newest version of the bestselling ‘Improved Use of agentic tools More efficient “thinking” (multi-pass reasoning). Google reports. +5 Point “Lift on” SWE-Bench Verified Compared to the May preview48.9% → 54.0%The code navigation/long-horizon planning is improved with this.
Flash-LiteTuned to stricter Please follow the instructions below, Reduced verbosityStronger and more. multimodal/translation. Google’s internal graph shows Output tokens are reduced by 50% Flash-Lite There are 24% less people in the world. Flash is a service that reduces the amount of output tokens and time spent on wall clocks in services requiring high throughput.

https://developers.googleblog.com/en/continuing-to-bring-you-our-latest-models-with-an-improved-gemini-2-5-flash-and-flash-lite-release/

Artificial Analysis, the account that runs the AI benchmarking website was received Get pre-release access The thread and companion pages: Highlights from the thread and companion pages Highlights and links from the thread

It is a measure of the amount that can be produced.: In endpoint tests, Gemini Flash-Lite 2.5 (Preview: 09-2025, Reasoning) It is reported that the Fastest proprietary model They track around The output tokens is 887 AI Studio can be used in any setup.
Intelligence index deltasPreviews for September Flash The following are some examples of how to get started: Flash-Lite Improve on the Artificial Analysis aggregate “intelligence” Scores compared to previous stable releases.
Token efficiency: The thread reiterates Google’s own reduction claims (−24% Flash, −50% Flash-Lite) and frames the win as cost-per-success improvements for tight latency budgets.

Google shared pre-release access for the new Gemini 2.5 Flash & Flash-Lite Preview 09-2025 models. Independent benchmarking has shown that Flash-Lite, output speed, and token efficiency have improved over predecessor models.

Key takeaways from our intelligence… pic.twitter.com/ybzKvZBH5A

— Artificial Analysis (@ArtificialAnlys) September 25, 2025

Budgets for cost surface and context (for choosing deployment options)

Price list for the Flash-Lite GA You can learn more about it here. $10.00 / 1000 input tokens The following are some examples of how to get started: Output tokens: $0.40 per 1000 Google’s GA posting from July and DeepMind’s model page The baseline for verbosity is the level where immediate savings can be realized by reducing it.
ContextFlash-Lite Support ~1M-token Context with Configurable “thinking budgets” and tool connectivity (Search grounding, code execution)—useful for agent stacks that interleave reading, planning, and multi-tool calls.

“Browser-agent” angle and o3 claim

The claim is that the “new Gemini Flash has o3-level accuracy, but is 2× faster and 4× cheaper on browser-agent tasks.” The following is a list of all the languages that are spoken in this country. community-reportedNot in Google’s post. The likely cause is private/limited tasks suites (DOM Navigation, Action Planning) that have specific budgets and times. Do not treat this as the truth. Use it to make your own assessments.

What a crazy idea! Gemini Flash, the new model that was released yesterday is as accurate as O3, but 2x faster for tasks involving browser agents and 4x less expensive.

This is something I could not believe. Gemini-2.5 flash had a 71% score on this benchmark. https://t.co/KdgkuAK30W pic.twitter.com/F69BiZHiwD

— Magnus Müller (@mamagnus00) September 26, 2025

Guideline for Teams

The pin versus the chase -latestYou should not rely on SLAs that are strict or limits fixed. The pin is a great way to get a grip on the situation. The stable strings. If you continuously canary for cost/latency/quality, the -latest Google provides two-week notice for the switchover of pointers.
Endpoints with high-quality service or those that use token metersStart by Flash-Lite preview; the verbosity and instruction-following upgrades shrink egress tokens. Validate long context and multimodal traces when under heavy production loads.
Agent/tool pipelines: A/B Flash preview Google’s SWE Bench Verified Lift and Community Tokens/S figures indicate better planning when thinking within constrained budgets.

Model strings (current)

Views: gemini-2.5-flash-preview-09-2025, gemini-2.5-flash-lite-preview-09-2025
Stable: gemini-2.5-flash, gemini-2.5-flash-lite
The Rolling Aliases: gemini-flash-latest, gemini-flash-lite-latest (pointer semantics; may change features/limits/pricing).

The following is a summary of the information that you will find on this page.

Google’s latest release tightens up security tool-use competence (Flash) and token/latency efficiency (Flash-Lite) and introduces -latest Use aliases to speed up iteration. External benchmarks in Artificial Analysis The word “indicate” is used to mean: Throughput The following are some examples of how to get started: intelligence-index Flash-Lite is now being tested as part of the previews for Sept. 2025. Fastest proprietary model In their harness. Validate on your workload—especially browser-agent stacks—before committing to the aliases in production.

Michal is a professional in the field of data science with a Masters of Science degree from University of Padova. Michal Sutter excels in transforming large datasets to actionable insight. He has a strong foundation in statistics, machine learning and data engineering.

Gemini Flash-Lite is now the fastest proprietary model (external tests) with 50% fewer output tokens.

Google Cloud AI Research introduces ReasoningBank: a memory framework that distills reasoning strategies from agent successes and failures.

Equinox Detailed implementation with JAX Native Moduls, Filtered Transformations, Stateful Ladders and Workflows from End to end.

Xiaomi MiMo V2.5 Pro and MiMo V2.5 Released: Frontier Model Benchmarks with Significantly Lower Token Cost

How to Create a Multi-Agent System of Production Grade CAMEL with Tool Usage, Consistency, and Criticism-Driven Improvement

Ai Agent is Ready for Service, While on the Phone

Livestream Replay – Beginner’s Advice for Claude – a ChatGPT alternative

OpenAI’s Chief Communication Officer is Leaving the Company

It’s not for plumbers or electricians that the real AI talent war is.

The Sex I Had With AI Clive Owen

Top Insights

Anthropic’s Developer Day: AI Agents were the Stars of Anthropic Day 1

The Inside Story of the AI Summit where China presented its AI agenda to the world

Latest News

Google Cloud AI Research introduces ReasoningBank: a memory framework that distills reasoning strategies from agent successes and failures.

Equinox Detailed implementation with JAX Native Moduls, Filtered Transformations, Stateful Ladders and Workflows from End to end.

Gemini Flash-Lite is now the fastest proprietary model (external tests) with 50% fewer output tokens.

What really changed?

Budgets for cost surface and context (for choosing deployment options)

“Browser-agent” angle and o3 claim

Guideline for Teams

Model strings (current)

The following is a summary of the information that you will find on this page.

Related Posts