VibeVoice: A Frontier Open-Source Text-to-Speech Model

Posted by lastdong 9/3/2025

VibeVoice: A Frontier Open-Source Text-to-Speech Model(microsoft.github.io)

448 points | 170 commentspage 5

agos 9/3/2025|

seemingly supports only English, Indian and Chinese

plingamp 9/3/2025|

Indian and Chinese are not languages

agos 9/4/2025|||

I'm very aware of this. The project does not specify more than an in- and zh- prefix.

ascorbic 9/3/2025|||

Voices, not languages. The "English" one is American though.

cush 9/3/2025||

I tried using the demo but it just errors out

amelius 9/3/2025||

I tried some TTS models a while ago, but I noticed that none of them allowed to put markup statements in the text. For example, it would be nice to do something like:

     Hey look! [enthusiastic] Should we tell the others? Maybe not ... [giggles]

etc.

In fact, I think this kind of thing is absolutely necessary if you want to use this to replace a voice actor.

data-ottawa 9/3/2025||

Eleven labs has some models with support for that.

https://elevenlabs.io/blog/v3-audiotags

sciencesama 9/3/2025||

Need this for mac

double_one 9/3/2025|

I tried it on my MacBook Pro — works great!

watsonmusic 9/3/2025||

one of the best models built by Microsoft

enigma101 9/4/2025|

only microsoft could come up with such a name rofl

defrost 9/4/2025|

Lippy got vetoed.