Posted by u/danielhanchen
Final Qwen3.5 Unsloth GGUF Update!
Hey r/LocalLLaMA this week we worked on **further improving** the best size/KLD tradeoff for Qwen3.5, and we’re excited to share new GGUF benchmarks for Qwen3.5-122B-A10B and Qwen3.5-35B-A3B (99.9% KL divergence). This will likely be our final GGUF update. We’re also deeply saddened by the news around the Qwen team, and incredibly grateful for everything they’ve done for the open source community! For a lot of model releases, they had to stay up all night and not sleep. * All GGUFs now use our new imatrix **calibration dataset** so you might see small improvements in chat, coding, long context, and tool-calling use-cases. We are always manually improving this dataset and it will change often. * This is a follow up to https://www.reddit.com/r/LocalLLaMA/comments/1rgel19/new\_qwen3535ba3b\_unsloth\_dynamic\_ggufs\_benchmarks/ * We further enhanced our quantization method for Qwen3.5 MoEs to **reduce Maximum KLD** directly. 99.9% is what is generally used, but for massive outliers, Maximum KLD can be useful. Our New method generally pushes the Maximum KLD quite a much down vs the pre March 5th update. **UD-Q4\_K\_XL is 8% bigger, but reduces maximum KLD by 51%!** |Quant|Old GB|New GB|Max KLD Old|Max KLD New| |:-|:-|:-|:-|:-| |UD-Q2\_K\_XL|12.0|11.3 (-6%)|8.237|8.155 (-1%)| |UD-Q3\_K\_XL|16.1|15.5 (-4%)|5.505|5.146 (-6.5%)| |UD-Q4\_K\_XL|19.2|20.7 (+8%)|5.894|2.877 (-51%)| |UD-Q5\_K\_XL|23.2|24.6 (+6%)|5.536|3.210 (-42%)| * Re-download **Qwen3.5-35B-A3B**, **27B**, and **122B-A10B** as they're now all updated. Re-download **397B-A17B** after today’s update (still uploading!) * **Qwen3.5-27B** and **122B-A10B** include the earlier chat template fixes for better tool-calling/coding output. **397B-A17B** will also be updated today to include this. * **LM Studio** now supports toggling “thinking” for our GGUFs. Read our guide or run `lms get unsloth/qwen3.5-4b`. This process will be easier very soon. * Benchmarks were conducted using the latest versions for every GGUF provider. * Replaced **BF16 layers** with **F16** for faster inference on unsupported devices. * **Qwen3.5-35B-A3B** now has all variants (Q4\_K\_M, Q8\_0, BF16, etc.) uploaded. * A reminder KLD and perplexity benchmarks does not exactly reflect real-world use-cases. * Links to new GGUFs: Qwen3.5-35B-A3B-GGUF, Qwen3.5-122B-A10B-GGUF, Qwen3.5-397B-A17B-GGUF (397B still uploading!) You can also now Fine-tune Qwen3.5 in Unsloth via our free notebooks! Thanks a lot everyone!
External link:
https://i.redd.it/9vw1iichx8ng1.pngMore from r/LocalLLaMA
Anthropic: "We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax." 🚨
I feel personally attacked
Distillation when you do it. Training when we do it.