Posted by lastdong 9/3/2025
Most that claim to do a British accent end up sounding like Kelsey Grammer - sort of an American accent pretending to be British.
> 2025-09-05: VibeVoice is an open-source research framework intended to advance collaboration in the speech synthesis community. After release, we discovered instances where the tool was used in ways inconsistent with the stated intent. Since responsible use of AI is one of Microsoft’s guiding principles, we have disabled the repo until we are confident that out-of-scope use is no longer possible.
What was that about?
A 100M podcast model
Some of them have tone wobbles which iirc was more common in early TTS models. Looks like the huge context window is really helping out here.
Would probably want to do similar to balance crossfade anyway... having each speaker's input offset from center instead of straight mono.
I generally don't like a lot of the AI generated slop that's starting to pop up on YouTube these days... I do enjoy some of the reddit story channels, but have completely stopped with it all now. With the AI stuff, it really becomes apparent with dates/ages and when numbers are spoken. Dates/ages/timelines are just off as far as story generation, and really should be human tweaked. As to the voice gen, saying a year or measurement is just not how English speakers (US or otherwise) speak.