microsoft/VibeVoice
Microsoft releases VibeVoice, an MIT-licensed Whisper-style speech model with built-in speaker diarization, capable of locally transcribing up to one hour of audio on a Mac.
Simon Willison · Apr 28, 2026
Microsoft releases VibeVoice, an MIT-licensed Whisper-style speech model with built-in speaker diarization, capable of locally transcribing up to one hour of audio on a Mac.
Simon Willison's 'Where's Waldo' style test reveals GPT Image 2.0's significant improvements in complex scene understanding, instruction following, and detail coherence compared to its predecessor and competitors.