Soprano 1.1-80M released: 95% fewer hallucinations and 63% preference rate over Soprano-80M

Tools 318 points 54 comments 6 days ago

Hello everyone! Today, I am announcing Soprano 1.1! I’ve designed it for massively improved stability and audio quality over the original model.  While many of you were happy with the quality of Soprano, it had a tendency to start, well, *Mongolian throat singing*. Contrary to its name, Soprano is **NOT** supposed to be for singing, so I have reduced the frequency of these hallucinations by **95%**. Soprano 1.1-80M also has a **50%** lower WER than Soprano-80M, with comparable clarity to much larger models like Chatterbox-Turbo and VibeVoice. In addition, it now supports sentences up to **30 seconds** long, up from 15. The outputs of Soprano could sometimes have a lot of artifacting and high-frequency noise. This was because the model was severely undertrained. I have trained Soprano further to reduce these audio artifacts. According to a blind study I conducted on my family (against their will), they preferred Soprano 1.1's outputs **63%** of the time, so these changes have produced a noticeably improved model. You can check out the new Soprano here: Model: https://huggingface.co/ekwek/Soprano-1.1-80M  Try Soprano 1.1 Now: https://huggingface.co/spaces/ekwek/Soprano-TTS  Github: https://github.com/ekwek1/soprano  \- Eugene

More from r/LocalLLaMA