← Back to Home

Tag: 多模态AI (2 articles)

microsoft/VibeVoice

Microsoft releases VibeVoice, an MIT-licensed Whisper-style speech model with built-in speaker diarization, capable of locally transcribing up to one hour of audio on a Mac.

Simon Willison · Apr 28, 2026

Where's the raccoon with the ham radio? (ChatGPT Images 2.0)

Simon Willison's 'Where's Waldo' style test reveals GPT Image 2.0's significant improvements in complex scene understanding, instruction following, and detail coherence compared to its predecessor and competitors.

Simon Willison · Apr 22, 2026