Show HN: Three new Kitten TTS models – smallest less than 25MB

Posted by rohan_joshi 6 hours ago

Show HN: Three new Kitten TTS models – smallest less than 25MB(github.com)

Kitten TTS (https://github.com/KittenML/KittenTTS) is an open-source series of tiny and expressive text-to-speech models for on-device applications. We had a thread last year here: https://news.ycombinator.com/item?id=44807868.

Today we're releasing three new models with 80M, 40M and 14M parameters.

The largest model (80M) has the highest quality. The 14M variant reaches new SOTA in expressivity among similar sized models, despite being <25MB in size. This release is a major upgrade from the previous one and supports English text-to-speech applications in eight voices: four male and four female.

Here's a short demo: https://www.youtube.com/watch?v=ge3u5qblqZA.

Most models are quantized to int8 + fp16, and they use ONNX for runtime. Our models are designed to run anywhere eg. raspberry pi, low-end smartphones, wearables, browsers etc. No GPU required! This release aims to bridge the gap between on-device and cloud models for tts applications. Multi-lingual model release is coming soon.

On-device AI is bottlenecked by one thing: a lack of tiny models that actually perform. Our goal is to open-source more models to run production-ready voice agents and apps entirely on-device.

We would love your feedback!

222 points | 69 commentspage 3

great_psy 5 hours ago|

Thanks for working on this!

Is there any way to get those running on iPhone ? I would love to have the ability for it to read articles to me like a podcast.

rohan_joshi 5 hours ago|

yes, we're releasing an official mobile sdk and inference engine very soon. if you want to use something until then, some folks from the oss community have built ways to run kitten on ios. if you search kittentts ios on github you should find a few. if you cant find it, feel free to ping me and i can help you set it up. thanks a lot for your support and feedback!

whitepaper27 3 hours ago||

This is great. Demo looks awesome.

rohan_joshi 3 hours ago|

thanks, glad you liked it

ilaksh 5 hours ago||

Thanks for open sourcing this.

Is there any way to do a custom voice as a DIY? Or we need to go through you? If so, would you consider making a pricing page for purchasing a license/alternative voice? All but one of the voices are unusable in a business context.

rohan_joshi 5 hours ago|

thanks a lot for the feedback. yes, we're working on a diy way to add custom voices and will also be releasing a model with more professional voices in the next 2-3 weeks. as of now, we're providing commercial support for custom voices, languages and deployment through the support form on our github. can you share more about your business use-case? if possible, i'd like to ensure the next release can serve that.

ilaksh 3 hours ago||

Right now it's outgoing calls for a small business client that checks information. Although if they call back they don't mind an automated system, on outgoing calls the person answering will often hang up if they detect AI right away, so we use a realistic custom voice with an accent.

This is a mind numbing task that requires workers to make hundreds of calls each day with only minor variations, sometimes navigating phone trees, half the time leaving almost the exact same message.

Anyway, I believe almost all such businesses will be automated within months. Human labour just cannot compete on cost.

exe34 1 hour ago||

sounds amazing! does it stream? or is it so fast you don't need to?

Tacite 5 hours ago||

Is it English only?

rohan_joshi 5 hours ago|

as of now its english only. the training for multilingual model is underway and should be out in April! what languages are you most interested in? Right now, we are providing deployments for custom languages + voices through support form on the github.

ivm 2 hours ago|||

Spanish would be great, there's a serious lack of Spanish TTS on Android compared to iOS and the quality is not the best.

Zopieux 3 hours ago|||

French, Spanish, German would go a long way.

wiradikusuma 4 hours ago||

I'm thinking of giving "voice" to my virtual pets (think Pokemon but less than a dozen). The pets are made up animals but based on real animal, like Mouseier from Mouse (something like that). Is this possible?

Tldr: generate human-like voice based on animal sound. Anyway maybe it doesn't make sense.

rohan_joshi 3 hours ago|

it'd be an interesting experiment to try what kind of information is extracted from the samples of the pet sounds. it'd be so cool if it can just get the features of the audio and then still be able to reproduce the audio in english lol. we would need a really good "speaker" encoder i think.

ryguz 2 hours ago||

[dead]

adriencr81 1 hour ago||

[dead]

devnotes77 3 hours ago||

[dead]

Iamkkdasari74 4 hours ago|

[dead]