ChatGPT voice mode is a weaker model

Simon Willison points out that ChatGPT's voice mode actually runs on an older GPT-4o model, revealing AI companies' business strategy of deploying different capability models across product lines.

Large Language Models AI Product Strategy Developer Tools AI Applications

KEY POINTS

ChatGPT's voice mode uses an older April 2024 model, not the latest version
AI companies commonly deploy different capability models across product lines
Verifiable domains like code advance faster than subjective domains like writing
Users' perception of AI capabilities varies dramatically based on usage scenarios

ANALYSIS

The 'Smartest AI' You Think You're Talking to Might Just Be the Entry-Level Version

Simon Willison's blog post highlights a fact many people overlook: when you're having a voice conversation with ChatGPT, the model running behind the scenes might be an "older version." According to his testing, the voice mode's knowledge cutoff date is April 2024, meaning it's still using a GPT-4o-era model, while the latest text interactions might already be powered by more advanced versions.

This is actually a shrewd business strategy by AI companies. Just like smartphone manufacturers differentiate between entry-level and flagship models, OpenAI deploys models of varying capabilities across different product lines. Voice mode, as a "lightweight" interaction interface for the general public, has relatively lower demands for model reasoning capabilities but requires stable real-time responses. Meanwhile, professional scenarios like code generation and data analysis need the most powerful models to handle complex tasks.

Why Is Code Advancing Faster Than Writing?

Andrej Karpathy's observation hits the nail on the head: AI's progress in coding far outpaces creative fields like writing. The reason is quite practical—code has clear "right or wrong" standards. Whether a function works or not can be immediately verified through unit tests, making it perfect for reinforcement learning training. In contrast, judging the quality of writing is much more subjective, making it difficult to establish clear reward mechanisms.

Even more crucial is the commercial value. Enterprises are willing to pay premium prices for AI tools that directly improve development efficiency, driving teams to focus their main efforts on optimizing these "high-value" domains. While making AI write more elegant prose also has value, the path to commercial monetization isn't as straightforward.

What Does This Mean for Average Users?

First, don't judge the overall capability of AI based on voice mode's performance. Just as you wouldn't define a brand's entire实力 based on its entry-level phone's performance, you should switch to text mode or use specialized developer tools when dealing with complex tasks.

Second, this phenomenon reveals a deeper trend in AI capability development: domains that are verifiable and have high commercial value will receive priority resource allocation. This means that AI's progress in "hard skills" like programming, mathematics, and data analysis may continue to outpace "soft skills" like creative writing and emotional companionship.

For developers, this is an important product design insight: when building AI applications, consider how different interaction methods limit model capabilities. Voice interaction might be better suited for simple queries and daily conversations, while complex reasoning tasks should be left to text interfaces or specialized workflows.

Finally, this reminds us to maintain rational expectations. AI is not an omniscient "singular intelligence" but a toolbox composed of different capability modules. Wisely choosing the right tool for the right task is the most pragmatic approach to using AI today.

Analysis by BitByAI · Read original

Originally from Simon Willison · Analyzed by BitByAI