Gemini Flash-Lite is now the fastest proprietary model (external tests) with 50% fewer output tokens.

Google releases an update updated version of Gemini 2.5 Flash and Gemini 2.5 Flash-Lite preview models across AI Studio and Vertex AI, plus rolling aliases—gemini-flash-latest The following are some examples of how to get started: gemini-flash-lite-latest—that always point to the newest preview in each family. For production stability, Google advises pinning fixed strings (gemini-2.5-flash, gemini-2.5-flash-lite). Google will send a notice via email two weeks before it retargets a webpage. -latest The alias is a note that Rate limits, features and costs may differ between alias upgrades.

https://developers.googleblog.com/en/continuing-to-bring-you-our-latest-models-with-an-improved-gemini-2-5-flash-and-flash-lite-release/

What really changed?

FlashThe newest version of the bestselling ‘Improved Use of agentic tools More efficient “thinking” (multi-pass reasoning). Google reports. +5 Point “Lift on” SWE-Bench Verified Compared to the May preview48.9% → 54.0%The code navigation/long-horizon planning is improved with this.
Flash-LiteTuned to stricter Please follow the instructions below, Reduced verbosityStronger and more. multimodal/translation. Google’s internal graph shows Output tokens are reduced by 50% Flash-Lite There are 24% less people in the world. Flash is a service that reduces the amount of output tokens and time spent on wall clocks in services requiring high throughput.

https://developers.googleblog.com/en/continuing-to-bring-you-our-latest-models-with-an-improved-gemini-2-5-flash-and-flash-lite-release/

Artificial Analysis, the account that runs the AI benchmarking website was received Get pre-release access The thread and companion pages: Highlights from the thread and companion pages Highlights and links from the thread

It is a measure of the amount that can be produced.: In endpoint tests, Gemini Flash-Lite 2.5 (Preview: 09-2025, Reasoning) It is reported that the Fastest proprietary model They track around The output tokens is 887 AI Studio can be used in any setup.
Intelligence index deltasPreviews for September Flash The following are some examples of how to get started: Flash-Lite Improve on the Artificial Analysis aggregate “intelligence” Scores compared to previous stable releases.
Token efficiency: The thread reiterates Google’s own reduction claims (−24% Flash, −50% Flash-Lite) and frames the win as cost-per-success improvements for tight latency budgets.

Google shared pre-release access for the new Gemini 2.5 Flash & Flash-Lite Preview 09-2025 models. Independent benchmarking has shown that Flash-Lite, output speed, and token efficiency have improved over predecessor models.

Key takeaways from our intelligence… pic.twitter.com/ybzKvZBH5A

— Artificial Analysis (@ArtificialAnlys) September 25, 2025

Budgets for cost surface and context (for choosing deployment options)

Price list for the Flash-Lite GA You can learn more about it here. $10.00 / 1000 input tokens The following are some examples of how to get started: Output tokens: $0.40 per 1000 Google’s GA posting from July and DeepMind’s model page The baseline for verbosity is the level where immediate savings can be realized by reducing it.
ContextFlash-Lite Support ~1M-token Context with Configurable “thinking budgets” and tool connectivity (Search grounding, code execution)—useful for agent stacks that interleave reading, planning, and multi-tool calls.

“Browser-agent” angle and o3 claim

The claim is that the “new Gemini Flash has o3-level accuracy, but is 2× faster and 4× cheaper on browser-agent tasks.” The following is a list of all the languages that are spoken in this country. community-reportedNot in Google’s post. The likely cause is private/limited tasks suites (DOM Navigation, Action Planning) that have specific budgets and times. Do not treat this as the truth. Use it to make your own assessments.

What a crazy idea! Gemini Flash, the new model that was released yesterday is as accurate as O3, but 2x faster for tasks involving browser agents and 4x less expensive.

This is something I could not believe. Gemini-2.5 flash had a 71% score on this benchmark. https://t.co/KdgkuAK30W pic.twitter.com/F69BiZHiwD

— Magnus Müller (@mamagnus00) September 26, 2025

Guideline for Teams

The pin versus the chase -latestYou should not rely on SLAs that are strict or limits fixed. The pin is a great way to get a grip on the situation. The stable strings. If you continuously canary for cost/latency/quality, the -latest Google provides two-week notice for the switchover of pointers.
Endpoints with high-quality service or those that use token metersStart by Flash-Lite preview; the verbosity and instruction-following upgrades shrink egress tokens. Validate long context and multimodal traces when under heavy production loads.
Agent/tool pipelines: A/B Flash preview Google’s SWE Bench Verified Lift and Community Tokens/S figures indicate better planning when thinking within constrained budgets.

Model strings (current)

Views: gemini-2.5-flash-preview-09-2025, gemini-2.5-flash-lite-preview-09-2025
Stable: gemini-2.5-flash, gemini-2.5-flash-lite
The Rolling Aliases: gemini-flash-latest, gemini-flash-lite-latest (pointer semantics; may change features/limits/pricing).

The following is a summary of the information that you will find on this page.

Google’s latest release tightens up security tool-use competence (Flash) and token/latency efficiency (Flash-Lite) and introduces -latest Use aliases to speed up iteration. External benchmarks in Artificial Analysis The word “indicate” is used to mean: Throughput The following are some examples of how to get started: intelligence-index Flash-Lite is now being tested as part of the previews for Sept. 2025. Fastest proprietary model In their harness. Validate on your workload—especially browser-agent stacks—before committing to the aliases in production.

Michal is a professional in the field of data science with a Masters of Science degree from University of Padova. Michal Sutter excels in transforming large datasets to actionable insight. He has a strong foundation in statistics, machine learning and data engineering.

Gemini Flash-Lite is now the fastest proprietary model (external tests) with 50% fewer output tokens.

GitNexus, an Open-Source Knowledge Graph Engine that is MCP Native and Gives Claude Coding and Cursor Complete Codebase Structure Awareness

Deepgram Python SDK Implementation for Transcription and Async Processing of Audio, Async Text Intelligence, and Async Text Intelligence.

DeepSeek AI releases DeepSeek V4: Sparse attention and heavily compressed attention enable one-million-token contexts.

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

CBP Signs Clearview AI Deal to Use Face Recognition for ‘Tactical Targeting’

Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

Sam Altman’s Orb Company promoted a Bruno Mars partnership that didn’t exist

Vibe Coding Is the New Open Source—in the Worst Way Possible

iFixit Added a Chatbot Expert Repairer to an App

Top Insights

Can LLMs Judge With Reasoning? Microsoft researchers and Tsinghua researchers introduce Reward Reasoning models to dynamically scale test-time computation for better alignment

Meet the Gods in AI Warfare

Latest News

Anthropic Mythos is Unauthorized by Discord Sleuths

Ace the Ping Pong Robot can Whup your Ass

Gemini Flash-Lite is now the fastest proprietary model (external tests) with 50% fewer output tokens.

What really changed?

Budgets for cost surface and context (for choosing deployment options)

“Browser-agent” angle and o3 claim

Guideline for Teams

Model strings (current)

The following is a summary of the information that you will find on this page.

Related Posts