All LLMs Research Tools Industry Tutorials

Qwen3.5 family comparison on shared benchmarks

Tools 1.1K points 263 comments 2 months ago

Main takeaway: 122B, 35B, and especially 27B retain a lot of the flagship’s performance, while 2B/0.8B fall off much harder on long-context and agent categories.

External link:

https://i.redd.it/krs0xrebcung1.png

View Discussion on Reddit

Qwen3.5 family comparison on shared benchmarks

More from r/LocalLLaMA

This is where we are right now, LocalLLaMA

Anthropic: "We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax." 🚨

I feel personally attacked