Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

Google released Gemini 3.5 Flash with a significant price hike, yet simultaneously deployed it across core products like Search and the Gemini app, revealing a shift from pure cost-effectiveness to paying for comprehensive model capabilities.

Large Language Models AI模型定价谷歌 Developer Tools 行业趋势

KEY POINTS

Gemini 3.5 Flash's price has increased substantially, approaching Pro model levels
Google simultaneously deployed it across core free products like Search and Gemini app
The new model supports million-level context but removed 'computer use' functionality
Industry trend: Major AI labs are probing API customers' price tolerance

ANALYSIS

The Context: An Unusual Price Hike Amidst Mass Deployment

At the recently concluded Google I/O conference, Google launched the Gemini 3.5 Flash model. This release is notable for two unusual reasons: First, it skipped the preview phase and went straight to general availability (GA). Second, and most strikingly, its price has surged dramatically compared to previous Flash models—it's 3 times the price of Gemini 3 Flash Preview and 6 times that of the more lightweight 3.1 Flash-Lite. This price point is now very close to Google's own Gemini 3.1 Pro model.

Yet, in stark contrast to the price increase, Google announced it would roll out this more expensive model across almost all its key products: to all users via the Gemini app and Google Search's AI mode, to developers via the Antigravity development platform and AI Studio, and to businesses via enterprise platforms. This seems contradictory—why deploy a more expensive model so broadly in free products?

Unpacking the Shift: Repositioning Capability Behind the Price

To understand this, we need to look at two key changes. First, technically, Gemini 3.5 Flash supports up to 1 million input tokens and 65,000 output tokens, offering a massive context window. While it removed the "computer use" functionality that might have been in previous versions, its core language understanding and generation capabilities are clearly positioned as a more powerful "workhorse" model, not just a lightweight, low-cost option.

Second, Google simultaneously launched the new Interactions API (in beta), seen as a counterpart to OpenAI's Responses API, aiming to provide more convenient server-side history management. This indicates that Google is repositioning the Flash series from an "economy choice" to an "all-rounder," attempting to cover a wide range of scenarios from simple tasks to complex interactions with a single model.

Trend Insight: The Shifting Logic of AI Model Pricing

This event reveals a deeper industry trend: the pricing logic for AI models is shifting from "cost-based pricing" to "value-based pricing." In the past, suffixes like Flash/Lite implied "cheap and good enough," suitable for cost-sensitive scenarios. But now, Google, OpenAI (GPT-5.5 is twice the price of 5.4), and Anthropic (Claude Opus 4.7 is pricier than 4.6) are all raising prices for their flagship models.

The signal behind this is that capability gaps between models are widening, and users (including internal product teams at enterprises) are willing to pay a premium for significantly improved abilities. When Google believes the capabilities of 3.5 Flash are sufficient to power the core AI experiences of Search and the Gemini app, the higher cost becomes justifiable in terms of overall product value and user experience. This marks a new phase in AI model competition—not just racing on price, but on the numerator of "cost-effectiveness," i.e., absolute capability.

Practical Value and Counter-Intuitive Insights

For developers and businesses, this means reassessing model selection strategies. The rough division of "use Flash for simple tasks, Pro for complex ones" may no longer hold. You need to test more meticulously: does the capability boost of the new Flash model justify its 3-6x cost increase for your specific tasks? Can it truly replace some Pro model workloads, thereby optimizing overall costs?

A counter-intuitive point here is that free users might be the first to access more powerful (and more expensive) models. By deploying 3.5 Flash directly into Search and the Gemini app, Google means billions of users will seamlessly benefit from improved experiences driven by enhanced model capabilities. This reflects the business logic of tech giants: using high API revenue (from developers) to subsidize free services for massive consumer bases, building product moats and user habits. For developers, this implies API price sensitivity may become increasingly critical, and the "capability-cost" ratio will be the core consideration in technical selection.

In summary, the launch of Gemini 3.5 Flash is more than just a new model; it's a bellwether signaling that the AI industry is moving from an "arms race in model capabilities" to a new phase of "monetizing model value."

Analysis by BitByAI · Read original

Originally from Simon Willison · Analyzed by BitByAI