Kitten TTS V0.8 is out: New SOTA Super-tiny TTS Model (Less than 25 MB)

Tools 1.2K points 197 comments 1 month ago

**Model introduction:** New Kitten models are out. Kitten ML has released open source code and weights for three new tiny expressive TTS models - 80M, 40M, 14M (all Apache 2.0) Discord: https://discord.com/invite/VJ86W4SURW GitHub: https://github.com/KittenML/KittenTTS Hugging Face - Kitten TTS V0.8: * Mini 80M: https://huggingface.co/KittenML/kitten-tts-mini-0.8 * Micro 40M: https://huggingface.co/KittenML/kitten-tts-micro-0.8 * Nano 14M: https://huggingface.co/KittenML/kitten-tts-nano-0.8 The smallest model is less than 25 MB, and around 14M parameters. All models have a major quality upgrade from previous versions, and can run on just CPU. **Key Features and Advantages** 1. **Eight expressive voices:** 4 female and 4 male voices across all three models. They all have very high expressivity, with 80M being the best in quality. English support in this release, multilingual coming in future releases. 2. **Super-small in size:** The 14M model is just 25 megabytes. 40M and 80M are slightly bigger, with high quality and expressivity even for longer chunks. 3. **Runs literally anywhere lol:** Forget "no GPU required." This is designed for resource-constrained edge devices. Great news for GPU-poor folks like us. 4. **Open source (hell yeah!):** The models can be used for free under Apache 2.0. 5. **Unlocking on-device voice agents and applications:** Matches cloud TTS quality for most use cases, but runs entirely on-device (can also be hosted on a cheap GPU). If you're building voice agents, assistants, or any local speech application, no API calls needed. Free local inference. Just ship it. 6. **What changed from V0.1 to V0.8:** Higher quality, expressivity, and realism. Better training pipelines and 10x larger datasets.

More from r/LocalLLaMA