Where's the raccoon with the ham radio? (ChatGPT Images 2.0)
Simon Willison 工具链 入门 Impact: 7/10
Simon Willison's 'Where's Waldo' style test reveals GPT Image 2.0's significant improvements in complex scene understanding, instruction following, and detail coherence compared to its predecessor and competitors.
Key Points
- OpenAI released GPT Image 2.0
- with Sam Altman claiming its progress is equivalent to the leap from GPT-3 to GPT-5.
- The unique test method uses a 'Where's Waldo' style prompt to challenge the model's scene understanding and generation capabilities.
- GPT Image 1.0 failed to produce an identifiable target
- while version 2.0 successfully generated a complex scene adhering to the prompt.
- Comparative tests show vast differences among models (like Google's Nano Banana series) in following complex instructions and generating logical scenes.
- The test highlights the difficulty in evaluating image generation models: it's not just about 'drawing well
- but also 'understanding correctly' and 'maintaining logical coherence'.
Analysis
"The Catalyst: A Deceptively Simple Game of Hide-and-Seek
Analysis generated by BitByAI · Read original English article