VibeVoice: Microsoft’s 300ms TTS That Speaks Before It Finishes Thinking
Microsoft’s new open-source VibeVoice-Realtime-0.5B delivers sub-second speech generation by streaming tokens as they arrive, but a buried language limitation has developers asking why the fine print matters more than the benchmarks.