Qwen3.5-35B-A3B is a gamechanger for agentic coding.

Tools 1.1K points 359 comments 3 weeks ago

Qwen3.5-35B-A3B with Opencode Just tested this badboy with Opencode **cause frankly I couldn't believe those benchmarks.** Running it on a single RTX 3090 on a headless Linux box. Freshly compiled Llama.cpp and those are my settings after some tweaking, still not fully tuned: ./llama.cpp/llama-server \\ \-m /models/**Qwen3.5-35B-A3B-MXFP4\_MOE.gguf** \\ \-a "DrQwen" \\ \-c 131072 \\ \-ngl all \\ \-ctk q8\_0 \\ \-ctv q8\_0 \\ \-sm none \\ \-mg 0 \\ \-np 1 \\ \-fa on Around 22 gigs of vram used. Now the fun part: 1. I'm getting over 100t/s on it 2. This is the first open weights model I was able to utilise on my home hardware to successfully complete my own "coding test" I used for years for recruitment (mid lvl mobile dev, around 5h to complete "pre AI" ;)). It did it in around 10 minutes, strong pass. First agentic tool that I was able to "crack" it with was Kodu.AI with some early sonnet roughly 14 months ago. 3. For fun I wanted to recreate this dashboard OpenAI used during Cursor demo last summer, I did a recreation of it with Claude Code back then and posted it on Reddit: https://www.reddit.com/r/ClaudeAI/comments/1mk7plb/just\_recreated\_that\_gpt5\_cursor\_demo\_in\_claude/ So... Qwen3.5 was able to do it in around 5 minutes. **I think we got something special here...**

More from r/LocalLLaMA