Soprano 1.1-80M released: 95% fewer hallucinations and 63% preference rate over Soprano-80M

Hello everyone! Today, I am announcing Soprano 1.1! I’ve designed it for massively improved stability and audio quality over the original model. While many of you were happy with the quality of Soprano, it had a tendency to start, well, *Mongolian throat singing*. Contrary to its name, Soprano is **NOT** supposed to be for singing, so I have reduced the frequency of these hallucinations by **95%**. Soprano 1.1-80M also has a **50%** lower WER than Soprano-80M, with comparable clarity to much larger models like Chatterbox-Turbo and VibeVoice. In addition, it now supports sentences up to **30 seconds** long, up from 15. The outputs of Soprano could sometimes have a lot of artifacting and high-frequency noise. This was because the model was severely undertrained. I have trained Soprano further to reduce these audio artifacts. According to a blind study I conducted on my family (against their will), they preferred Soprano 1.1's outputs **63%** of the time, so these changes have produced a noticeably improved model. You can check out the new Soprano here: Model: https://huggingface.co/ekwek/Soprano-1.1-80M Try Soprano 1.1 Now: https://huggingface.co/spaces/ekwek/Soprano-TTS Github: https://github.com/ekwek1/soprano \- Eugene

Soprano 1.1-80M released: 95% fewer hallucinations and 63% preference rate over Soprano-80M

More from r/LocalLLaMA

My story of underestimating /r/LocalLLaMA's thirst for VRAM

zai-org/GLM-4.7-Flash · Hugging Face

NVIDIA's new 8B model is Orchestrator-8B, a specialized 8-billion-parameter AI designed not to answer everything itself, but to intelligently manage and route complex tasks to different tools (like web search, code execution, other LLMs) for greater efficiency