What's new in Claude Sonnet 5

Claude Sonnet 5 brings Opus-level performance at Sonnet prices, but a tokenizer change effectively raises costs by 30% for English users; removed sampling params and default thinking mode add more hidden costs.

Claude API 变化成本优化 Large Language Models Developer Tools 分词器

KEY POINTS

Sampling parameters like temperature, top_p are no longer supported; model controls randomness internally, removing fine-grained developer control.
A new tokenizer increases input token count by ~30% for English text, effectively raising costs despite unchanged list prices; Chinese text sees almost no increase.
Adaptive thinking is on by default, consuming expensive output tokens — disable it for simple tasks to avoid unnecessary charges.
Performance leap is real: near-Opus 4.8 quality at lower cost, but evaluate your workload to see if hidden fees offset the gains.

ANALYSIS

Whenever a new model drops, Simon Willison heads straight for the developer docs, not the press release. With Claude Sonnet 5, he found a few subtle but wallet-pinching details buried in the ‘What’s new’ notes. On the surface, Anthropic delivered: near-Opus 4.8 performance at Sonnet-tier prices, plus an introductory discount. But the devil is in the details.

The Vanishing Sampling Parameters Sonnet 5 no longer supports temperature, top_p, or top_k. Before, you could dial up randomness for creative writing or turn it down for deterministic code generation. Now those knobs are gone; the model decides itself when to be strict and when to riff. Anthropic probably believes its internal mechanisms already optimize this better, but for developers, it means losing a set of fine-tuning levers. If your application relied on specific temperature settings, you’ll need serious regression testing before migrating to see if the auto-pilot meets your needs.

Same Sticker Price, Bigger Bill—the Tokenizer Surprise The docs quietly note that “the same input text produces approximately 30% more tokens than on Claude Sonnet 4.6.” Simon ran the numbers: the English Universal Declaration of Human Rights jumped from 2,356 to 3,341 tokens (1.42x). Spanish went from 3,572 to 4,747 (1.33x). But Chinese barely moved: 3,334 to 3,360 (1.01x). The harsh truth: if your workload is mostly English, you’re effectively paying 30% more for the same amount of text, because the token count—not the per-token price—is what drives your bill. Chinese users, however, caught a lucky break: the new tokenizer is nearly as efficient as the old one for Chinese, so they get near-Opus intelligence at no extra token overhead. This language-dependent cost shift is a powerful reminder to never trust list prices alone.

Default Thinking: Paying for Output You Don’t Need Sonnet 5 has adaptive thinking turned on by default. The model performs internal reasoning before answering, which boosts quality on complex tasks. But that reasoning also consumes output tokens—priced at $15/million vs $3/million for input. If your use case involves simple Q&A, summarization, or translation, that default thinking is just burning money. You can disable it with "thinking": {type: "disabled"}, but many developers will overlook this until the monthly bill arrives.

The Bigger Trend: Hidden Pricing in the LLM Era This episode exposes a new phase in AI competition. Vendors are no longer just chasing benchmark scores; they’re optimizing inference costs through engineering tricks, but those savings aren’t always passed to users. Tokenizer changes often improve model efficiency (faster generation, lower GPU usage), yet the extra tokens land on your invoice. Default-on premium features like thinking are a classic upsell strategy: give users a taste of better quality, then charge for the privilege. It’s not dishonest, but it demands that developers treat every model upgrade as a mini procurement project—you must measure with your own data.

What You Should Do Now If you’re planning to adopt Sonnet 5, immediately do three things: first, run your typical inputs through a token counter to benchmark real-world cost changes. Second, disable adaptive thinking if your tasks don’t need deep reasoning. Third, verify that the removal of sampling parameters hasn’t degraded your output quality—you may need to adjust prompts to compensate. For Chinese-language applications, this update is actually a steal: near-Opus capability at almost no token overhead. Just keep an eye on output-side extras.

Analysis by BitByAI · Read original

Originally from Simon Willison · Analyzed by BitByAI