r/LocalLLaMA
Posted by u/No_Conversation9561
Forgive my ignorance but how is a 27B model better than 397B?
Tools 1.1K points
278 comments
1 month ago
Is Qwen just incredibly good at doing dense and not so good at doing MoE? I get that dense is generally better than MoE but 27B being better than 397B just doesn’t sit right with me. What are those additional experts even doing then?
External link:
https://i.redd.it/3ady5t95ntwg1.jpegMore from r/LocalLLaMA
r/LocalLLaMA · u/jacek2023
Recent
Hot
This is where we are right now, LocalLLaMA
the future is now
Tools
3.2K 439 0 months ago
r/LocalLLaMA · u/KvAk_AKPlaysYT
Hot
Anthropic: "We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax." 🚨
Tools
3.1K 674 2 months ago
r/LocalLLaMA · u/HeadAcanthisitt...
Hot
I feel personally attacked
Tools
3.0K 151 2 months ago