Top
Best
New

Posted by pain_perdu 1 day ago

Pocket TTS: A high quality TTS that gives your CPU a voice(kyutai.org)
576 points | 137 commentspage 4
indigodaddy 18 hours ago|
Perfect timing that is exactly what I am looking for for a fun little thing I'm working on. The voices sound good!
maxglute 12 hours ago||
Would be nice if preview supports variable speed.
grahamrr 16 hours ago||
voices sound great! i see sample rate can be adjusted, is there any way to adjust the actual speed of the voice?
gabrieldemarm 5 hours ago|
[dead]
Zardoz84 10 hours ago||
I'm missing the old days that connecting a SPOKE256 to the Spectrum and making it speak, looked like magic.
oybng 19 hours ago||
>If you want access to the model with voice cloning, go to https://huggingface.co/kyutai/pocket-tts and accept the terms, then make sure you're logged in locally with `uvx hf auth login` lol
andhuman 12 hours ago|
I’ve tried the voice clinking and it works great. I added a 9s clip and it captured the speaker pretty well.

But don’t do the fake mistake I did and use a hf token that doesn’t have access to read from repos! The error message said that I had to request access to the repo, but I’ve had already done that, so I couldn’t figure out what was wrong. Turns out my HF token only had access to inference.

snvzz 20 hours ago||
Relative to AmigaOS translator.device + narrator.device, this sure seems bloated.
fuzzer371 12 hours ago|
Haven't we had TTS for like 20+ years? Why does AI need to be shoved into it all of a sudden. Total waste of electricity.
rhdunn 9 hours ago|
Using neural nets (machine learning) to train TTS voices has been around a long time.

[1] (2016 https://arxiv.org/abs/1609.03499) WaveNet: A Generative Model for Raw Audio

[2] (2017 https://arxiv.org/abs/1711.10433) Parallel WaveNet: Fast High-Fidelity Speech Synthesis

[3] (2021 https://arxiv.org/abs/2106.07889) UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

[4] (2022 https://arxiv.org/abs/2203.14941) Neural Vocoder is All You Need for Speech Super-resolution