Google has launched the latest preview versions of Gemini 2.5 Flash and Flash-Lite, which not only significantly improve instruction following and multimodal capabilities but also markedly reduce cost and latency. This update aims to help developers do more with less, and the new
-latestalias simplifies the development workflow.
In the pursuit of more powerful and efficient AI models, Google continues to make strides. The company has announced the latest updated versions of Gemini 2.5 Flash and 2.5 Flash-Lite, which are now available for experimentation in Google AI Studio and Vertex AI. The core goal of this update is clear: to significantly improve the model’s operational efficiency while continuously enhancing output quality.
Simply put, the move aims to make AI not only smarter but also faster and cheaper.
The chart data below clearly shows that the new preview models have achieved an excellent balance between intelligence (Artificial Analysis Intelligence Index) and end-to-end response time, with a significant performance improvement compared to the current stable version.
Table 1: Intelligence vs. End-to-End Response Time
| Model Version | Intelligence Index (higher is better) | End-to-End Response Time (seconds, lower is better) |
|---|---|---|
| Gemini 2.5 Flash-Lite STABLE (No Thinking) | ~30 | ~2.5 |
| Gemini 2.5 Flash STABLE (No Thinking) | ~40 | ~3.5 |
| Gemini 2.5 Flash-Lite 09-2025 (No Thinking) | ~47 | ~5.0 |
| Gemini 2.5 Flash 09-2025 (No Thinking) | ~42.5 | ~2.0 |
| Gemini 2.5 Flash-Lite STABLE | ~40 | ~7.5 |
| Gemini 2.5 Flash STABLE | ~50 | ~15.5 |
| Gemini 2.5 Flash 09-2025 | ~53 | ~10.0 |
Table 2: Output Token Efficiency
| Model Version | Output Tokens |
|---|---|
| Gemini 2.5 Flash (09-2025) | 71M |
| Gemini 2.5 Flash STABLE | 93M |
| Gemini 2.5 Flash-Lite (09-2025) | 70M |
| Gemini 2.5 Flash-Lite STABLE | 140M |
Flash-Lite: More Precise, More Concise, and with Stronger Multimedia Capabilities
The latest version of Gemini 2.5 Flash-Lite has undergone a comprehensive upgrade, mainly optimized around the three topics that developers care about most:
- More Accurate Instruction Following: To address the problem of AI sometimes only partially understanding complex instructions, the new version of Flash-Lite has made great strides in understanding complex instructions and system prompts, and can execute requests more accurately.
- No Longer Verbose, but Concise: The old model sometimes provided overly lengthy answers, increasing latency and token costs. This update significantly reduces the model’s verbosity, enabling it to give more concise and precise answers, which is a great benefit for application scenarios that require high throughput. In fact, the number of output tokens (i.e., cost) has been reduced by 50%.
- Stronger Multimodal and Translation Capabilities: Flash-Lite has been enhanced in its multimodal capabilities, including more accurate speech-to-text transcription, deeper image understanding, and smoother translation quality.
Developers can start testing with the following model string:
gemini-2.5-flash-lite-preview-09-2025
Flash: The Comprehensive Evolution of AI Assistants
This update to the 2.5 Flash model directly responds to the two core pieces of feedback Google has received from the developer community:
- Smarter Tool Use: Google has improved the way the model uses tools, making it perform better when handling complex, multi-step agentic applications. This allows the AI to complete more complex tasks on its own. In the key SWE-Bench Verified benchmark test, the new model’s performance improved by 5%, from 48.9% to 54%, a significant improvement.
- Higher Efficiency, Lower Cost: The new model offers extremely high cost-effectiveness, producing higher quality results with fewer tokens and shorter latency. The chart shows that the number of output tokens for Gemini 2.5 Flash has been reduced by 24%, which means a direct cost reduction.
Early testers have already given a lot of positive feedback. For example, Yichao ‘Peak’ Ji, co-founder and chief scientist of the automated AI agent company Manus, mentioned: “The new Gemini 2.5 Flash model perfectly combines speed and intelligence. Our internal benchmark tests show a 15% performance improvement when handling long-term planning agent tasks. Its excellent cost-effectiveness enables Manus to scale to an unprecedented level.”
To experience this version, you can use the following model string:
gemini-2.5-flash-preview-09-2025
Designed for Developers: Welcome to the -latest Alias Era
Google has stated that the experience of the past year has shown that releasing preview versions of models allows developers to test the latest features and innovations at the first opportunity and provide valuable feedback, which helps to create more stable and outstanding official versions of Gemini.
To make it easier for developers to access the latest models and reduce the trouble of tracking lengthy model strings, Google has introduced a -latest alias for each model family. This alias will always point to the latest model version in that family, allowing developers to easily experiment with new features without having to modify their code for each update.
Developers can use the new preview version in the following ways:
gemini-flash-latestgemini-flash-lite-latest
To ensure that developers can test with peace of mind, Google will notify them by email two weeks in advance before updating or deprecating the specific version behind -latest. However, it should be noted that these are just model aliases, and rate limits, costs, and available features may change with version releases.
If an application requires higher stability, Google recommends that developers continue to use models with explicitly specified versions, such as gemini-2.5-flash and gemini-2.5-flash-lite.
Google will continue to explore the infinite possibilities of AI. This release is just one step on its forward path, and more news will be released in the future.


