r/LocalLLaMA
Posted by u/alhinai_03
I fucking love this community
Tools 498 points
54 comments
2 months ago
Thank you guys, thanks to everyone who took the time to write a comment or a post explaining, teaching people how things work, the people behind llama.cpp, vllm, and all the contributors who keep the open-source community thriving. I'm able to run huge models on my weak ass pc from 10 years ago relatively fast, my fastest one being nemotron-3-nano-30B-a3b-iq4_nl running @14-13.5 t/s with 65k context. While my actual GPU having only 4GB of vram, that's fucking ridiculous and it blows my mind everytime that I'm able to run these models. What's been key for me is having a good amount of system memory, and as long as the model is a MoE architecture they run pretty decently.
More from r/LocalLLaMA
r/LocalLLaMA · u/KvAk_AKPlaysYT
Recent
Hot
Anthropic: "We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax." 🚨
Tools
3.1K 674 3 weeks ago
r/LocalLLaMA · u/HeadAcanthisitt...
Recent
Hot
I feel personally attacked
Tools
3.0K 151 1 week ago
r/LocalLLaMA · u/Xhehab_
Recent
Hot
Distillation when you do it. Training when we do it.
Tools
2.6K 156 3 weeks ago