← Back to Home

ChatGPT voice mode is a weaker model

Simon Willison 行业观点 入门 Impact: 7/10

Simon Willison reveals a counterintuitive fact: ChatGPT's voice mode runs on an older, weaker GPT-4o-era model, creating a massive gap between user expectations and reality.

Key Points

  • ChatGPT voice mode has knowledge cutoff April 2024, essentially a GPT-4o-era model, far weaker than the latest
  • Users intuitively think 'the AI you talk to should be the smartest', but voice mode is actually the weakest link
  • Andrej Karpathy: free Advanced Voice Mode fumbles simple questions while paid Codex can spend 1 hour restructuring entire codebases
  • Code domains have two advantages: verifiable reward functions (unit test pass/fail) making RL training easier, and high B2B value attracting more team focus
  • AI capability differences come not just from the model itself, but from which interface and use case you access it through

Analysis

You might assume that when you talk to ChatGPT using voice, the AI "thinks" as intelligently as it does when you type. But Simon Willison is here to tell you that's an illusion.

He's uncovered something many users don't realize: ChatGPT's voice mode actually runs on a much older, weaker model. Specifically, its knowledge cutoff is April 2024 – meaning it predates the GPT-4o era and could be six months to a year or more behind the latest and greatest models.

Why does this matter? Because our intuition tells us that "the AI that can listen to me must be the smartest one." Voice interaction feels more direct, more real-time, as if the AI has "come alive." But the reality is often the opposite: voice mode is frequently the entry point for free users, and free users are precisely the ones using the version the company invests the least resources in.

This insight stems from a tweet by Andrej Karpathy, who highlighted a growing trend: the disparity in AI capabilities experienced by different users is widening dramatically. The free Advanced Voice Mode might stumble on the simplest common-sense questions – like failing to understand a trending meme on Instagram Reels. Meanwhile, OpenAI's top-tier paid model, Codex, can autonomously spend an hour refactoring an entire codebase or discovering and exploiting vulnerabilities in a computer system.

Why such a huge gap? Karpathy points to two key reasons. First, the coding domain has "verifiable reward functions" – unit tests either pass or fail, a binary judgment perfectly suited for reinforcement learning training. In contrast, skills like writing, judgment, and dialogue lack an objective "right or wrong" standard. Second, B2B scenarios are more commercially valuable, so teams dedicate more resources to areas like coding and scientific reasoning, leading to a "the rich get richer" effect.

This reveals a larger trend: the differentiation in AI capabilities is shifting from the "model itself" to the "access point." You might have once thought, "All AIs are pretty much the same." But now, choose the wrong entry point, and you might struggle with even the simplest voice conversations. Choose the right one, and AI can refactor your entire codebase. As a user, you need to realize that the product format you're using (voice, web page, API) often dictates the model capabilities you get – and that might be completely different from what you expect.

Analysis generated by BitByAI · Read original English article

Originally from Simon Willison

Automatically analyzed by BitByAI AI Editor

BitByAI — AI-powered, AI-evolved AI News