Posted by u/eugenekwek
Soprano 1.1-80M released: 95% fewer hallucinations and 63% preference rate over Soprano-80M
Hello everyone! Today, I am announcing Soprano 1.1! I’ve designed it for massively improved stability and audio quality over the original model. While many of you were happy with the quality of Soprano, it had a tendency to start, well, *Mongolian throat singing*. Contrary to its name, Soprano is **NOT** supposed to be for singing, so I have reduced the frequency of these hallucinations by **95%**. Soprano 1.1-80M also has a **50%** lower WER than Soprano-80M, with comparable clarity to much larger models like Chatterbox-Turbo and VibeVoice. In addition, it now supports sentences up to **30 seconds** long, up from 15. The outputs of Soprano could sometimes have a lot of artifacting and high-frequency noise. This was because the model was severely undertrained. I have trained Soprano further to reduce these audio artifacts. According to a blind study I conducted on my family (against their will), they preferred Soprano 1.1's outputs **63%** of the time, so these changes have produced a noticeably improved model. You can check out the new Soprano here: Model: https://huggingface.co/ekwek/Soprano-1.1-80M Try Soprano 1.1 Now: https://huggingface.co/spaces/ekwek/Soprano-TTS Github: https://github.com/ekwek1/soprano \- Eugene
External link:
https://v.redd.it/v0c2rda9scdg1More from r/LocalLLaMA
My story of underestimating /r/LocalLLaMA's thirst for VRAM
zai-org/GLM-4.7-Flash · Hugging Face
NVIDIA's new 8B model is Orchestrator-8B, a specialized 8-billion-parameter AI designed not to answer everything itself, but to intelligently manage and route complex tasks to different tools (like web search, code execution, other LLMs) for greater efficiency
I’ve seen some arguments we’ve reached AGI, it’s just about putting the separate pieces together in the right context....