Posted by u/gladkos
Google TurboQuant running Qwen Locally on MacAir
Hi everyone, we just ran an experiment. We patched llama.cpp with Google’s new TurboQuant compression method and then ran Qwen 3.5–9B on a regular MacBook Air (M4, 16 GB) with 20000 tokens context. Previously, it was basically impossible to handle large context prompts on this device. But with the new algorithm, it now seems feasible. Imagine running OpenClaw on a regular device for free! Just a MacBook Air or Mac Mini, not even a Pro model the cheapest ones. It’s still a bit slow, but the newer chips are making it faster. link for MacOs app: atomic.chat \- open source and free. Curious if anyone else has tried something similar? [](https://www.reddit.com/submit/?source_id=t3_1s5k9n7&composer_entry=crosspost_prompt)
External link:
https://v.redd.it/cwg1s2nmaorg1More from r/LocalLLaMA
This is where we are right now, LocalLLaMA
the future is now
Anthropic: "We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax." 🚨
I feel personally attacked