Posted by u/eugenekwek
Soprano 1.1-80M released: 95% fewer hallucinations and 63% preference rate over Soprano-80M
Hello everyone! Today, I am announcing Soprano 1.1! I’ve designed it for massively improved stability and audio quality over the original model. While many of you were happy with the quality of Soprano, it had a tendency to start, well, *Mongolian throat singing*. Contrary to its name, Soprano is **NOT** supposed to be for singing, so I have reduced the frequency of these hallucinations by **95%**. Soprano 1.1-80M also has a **50%** lower WER than Soprano-80M, with comparable clarity to much larger models like Chatterbox-Turbo and VibeVoice. In addition, it now supports sentences up to **30 seconds** long, up from 15. The outputs of Soprano could sometimes have a lot of artifacting and high-frequency noise. This was because the model was severely undertrained. I have trained Soprano further to reduce these audio artifacts. According to a blind study I conducted on my family (against their will), they preferred Soprano 1.1's outputs **63%** of the time, so these changes have produced a noticeably improved model. You can check out the new Soprano here: Model: https://huggingface.co/ekwek/Soprano-1.1-80M Try Soprano 1.1 Now: https://huggingface.co/spaces/ekwek/Soprano-TTS Github: https://github.com/ekwek1/soprano \- Eugene
External link:
https://v.redd.it/v0c2rda9scdg1More from r/LocalLLaMA
Anthropic: "We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax." 🚨
I feel personally attacked
Distillation when you do it. Training when we do it.