Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7
Simon Willison 工具链 入门 Impact: 7/10
Simon Willison's famous 'pelican riding a bicycle' benchmark surprisingly shows a locally-run, smaller Alibaba Qwen3.6 model outperforming the cloud-based, massive Claude Opus 4.7 in creative SVG generation, revealing the surprising potential of open-source models for specific tasks.
Key Points
- Simon Willison's 'pelican riding a bicycle' is a popular
- informal test for AI models' visual understanding and generation capabilities.
- A locally-run
- 20.9GB quantized Qwen3.6-35B-A3B model on a MacBook outperformed Anthropic's latest cloud-based giant
- Claude Opus 4.7
- in generating an SVG of a pelican on a bicycle.
- In a follow-up 'flamingo riding a unicycle' test
- the Qwen model again showed superior creativity and detail (e.g.
- adding sunglasses
- a bowtie)
- while Opus's output was comparatively bland.
- This result challenges the assumption that 'bigger models and the cloud are always stronger
- highlighting the competitiveness of open-source
- locally-deployable models on specific creative tasks.
Analysis
"The Origin: Why Does a "Silly" Test Spark Discussion Again?
Analysis generated by BitByAI · Read original English article