Qwen3.5 family comparison on shared benchmarks

Tools 1.1K points 263 comments 1 week ago

Main takeaway: 122B, 35B, and especially 27B retain a lot of the flagship’s performance, while 2B/0.8B fall off much harder on long-context and agent categories.

More from r/LocalLLaMA