Home | AI Reddit Digest

I think gave it a fair shot over the past few weeks, forcing myself to use local models for non-work tech asks. I use...

Tools

977 771 3 weeks ago

r/LocalLLaMA · u/MichaelXie4645

Recent

Deepseek V4 Flash and Non-Flash Out on HuggingFace

Tools

693 287 0 months ago

r/LocalLLaMA · u/dionysio211

Recent

Qwen 3.6 27B Makes Huge Gains in Agency on Artificial Analysis - Ties with Sonnet 4.6

It is crazy that Qwen3.6 27B now matches Sonnet 4.6 on AA's Agentic Index, overtaking Gemini 3.1 Pro Preview, GPT 5.2...

Tools

623 148 0 months ago

r/LocalLLaMA · u/AverageFormal90...

Qwen 3.6 27B is a BEAST

I have a 5090 Laptop from work, 24GB VRAM. I have been testing every model that comes out, and I can confidently say...

Tools

590 309 1 month ago

r/LocalLLaMA · u/fagenorn

Qwen3 TTS is seriously underrated - I got it running locally in real-time and it's one of the most expressive open TTS models I've tried

Heya guys and gals, Around a year ago I released and posted about Persona Engine as a fun side project, trying to get...

Tools

537 95 1 month ago

r/LocalLLaMA · u/jacek2023

unsloth Qwen3.6-27B-GGUF

finally with files inside :)

Tools

494 103 1 month ago

r/LocalLLaMA · u/ResearchCrafty1...

Qwen3.6-27B released!

Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B...

Tools

689 140 1 month ago

r/LocalLLaMA · u/Creative-Regula...

Qwen3.6-35B becomes competitive with cloud models when paired with the right agent

A short follow-up to my previous post, where I showed that changing the scaffold around the same 9B Qwen model moved...

Tools

688 169 1 month ago

r/LocalLLaMA · u/pacmanpill

Unpopular opinion: OpenClaw and all its clones are almost useless tools for those who know what they're doing. It's kind of impressive for someone who has never used a CLI, Claude Code, Codex, etc. Nor used any workflow tool like 8n8 or make.

It seems to me that OpenClaw and all its clones are almost useless tools for those who know what they're doing. It's...

Tools

617 251 1 month ago

r/LocalLLaMA · u/Unfounded_898

Gemma-4-E2B's safety filters make it unusable for emergencies

I’ve been testing Google’s Gemma-4-E2B-it as a local, offline resource for emergency preparedness. The idea was to have...

Tools

435 294 1 month ago

r/LocalLLaMA · u/BiggestBau5

Kimi K2.6 Released (huggingface)

Tools

875 265 1 month ago

r/LocalLLaMA · u/technaturalism

When you dial in your bot’s personality

sycophancy: deleted efficiency per token:+1000% friendship: just beginning edit: “sup” got cut off at top

Tools

685 67 1 month ago

r/LocalLLaMA · u/KillerMiller13

Why isn't ebay doing anything to stop those scams?

There's no way this is real and ebay is doing nothing to stop those scams. Why, people are actually bidding and buying...

Tools

449 134 1 month ago

r/LocalLLaMA · u/Medical_Lengthi...

I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude

of course this is just a trust me bro post but I've been testing various local models (a couple gemma4s, qwen3 coder...

Tools

644 315 1 month ago

r/LocalLLaMA · u/Namra_7

KIMI K2.6 SOON !!

Tools

482 89 1 month ago

r/LocalLLaMA · u/marlang

RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part.

Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun side note: I had Claude Opus 4.7 (just the $20...

Tools

569 143 1 month ago

r/LocalLLaMA · u/onil_gova

qwen3.6 performance jump is real, just make sure you have it properly configured

I've been running workloads that I typically only trust Opus and Codex with, and I can confirm 3.6 is really capable....

Tools

754 308 1 month ago

r/LocalLLaMA · u/danielhanchen

Qwen3.6 GGUF Benchmarks

Hey guys, we ran Qwen3.6-35B-A3B GGUF KLD performance benchmarks to help you choose the best quant. Unsloth quants have...

Tools

564 120 1 month ago

r/LocalLLaMA · u/Epicguru

Qwen 3.6 is the first local model that actually feels worth the effort for me

I spent some time yesterday after work trying out the new qwen3.6-35b-a3b model, and at least for me it's the first...

Tools

434 165 1 month ago

r/LocalLLaMA · u/jacek2023

llama.cpp at 100k stars

Tools

983 47 1 month ago

r/LocalLLaMA · u/jacek2023

LocalLLaMA 2026

we are doomed

Tools

973 140 1 month ago

r/LocalLLaMA · u/gigaflops_

Throwback to my proudest impulse buy ever, which has let me enjoy this hobby 10x more

Can you beleive I almost bought two of them?? (oh, and they gave me 10% cashback for Prime Day)

Tools

947 100 1 month ago

r/LocalLLaMA · u/PsychologicalSo...

Prices finally coming down? 🥺🙏

Tools

930 182 1 month ago

r/LocalLLaMA · u/PrestigiousEmu4...

Best model that can beat Claude opus that runs on 32MB of vram?

Hi everyone! I want to get in to vibe coding to make my very own ai wrapper, what are the best models that can run on...

Tools

644 183 1 month ago

r/LocalLLaMA · u/Few_Painter_558...

MiniMax M2.7 Will Be Open Weights

Composer 2-Flash has been saved! (For legal reasons that's a joke)

Tools

676 98 2 months ago

r/LocalLLaMA · u/davernow

Moonshot says Cursor Composer was authorized

Sounds like Fireworks had a partnership with Moonshot, and Cursor went through them. Kinda makes sense that Moonshot...

Tools

367 31 2 months ago

r/LocalLLaMA · u/TumbleweedNew65...

Feedback on my 256gb VRAM local setup and cluster plans. Lawyer keeping it local.

I’m a lawyer who got Claude code pilled about 90 days ago, then thought about what I wanted to do with AI tools, and...

Tools

310 186 2 months ago

r/LocalLLaMA · u/erazortt

Qwen 3.5 397B is the best local coder I have used until now

Omg, this thing is amazing. I have tried all its smaller silbings 122b/35b/27b, gpt-oss 120b, StepFun 3.5, MiniMax...

Tools

271 150 2 months ago

r/LocalLLaMA · u/Namra_7

Glm 5.1 👀

Tools

850 80 2 months ago

r/LocalLLaMA · u/SDogAlex

Running TinyLlama 1.1B locally on a PowerBook G4 from 2002. Mac OS 9, no internet, installed from a CD.

Hey everyone! I've been working on this for months and today's the day. MacinAI Local is a complete local AI inference...

Tools

294 30 2 months ago

r/LocalLLaMA · u/dinerburgeryum

Qwen3.5 is a working dog.

I saw someone say recently something to the effect of: “that man is a working dog. if you don’t give him a job, he’ll...

Tools

453 116 2 months ago

r/LocalLLaMA · u/KvAk_AKPlaysYT

So nobody's downloading this model huh?

Disappointed in the performance myself too :/ The last good Mistral model I can remember was Nemo, which led to a lot...

Tools

646 248 2 months ago

r/LocalLLaMA · u/Lightnig125

Two weeks ago, I posted here to see if people would be interested in an open-source local AI 3D model generator

I posted a question about this idea here two weeks ago, kept working on it, and now I finally have a beta to show. It’s...

Tools

248 62 2 months ago

r/LocalLLaMA · u/_camera_up

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling.

My workplace just got a server equipped with 2x Nvidia H200 GPUs (141GB HBM3e each). I've been asked to test LLMs on it...

Tools

510 180 2 months ago

r/LocalLLaMA · u/Mysterious_Fini...

MiniMax-M2.7 Announced!

Tools

731 178 2 months ago

r/LocalLLaMA · u/CrimsonShikaban...

I just realised how good GLM 5 is

This is crazy. As a heavy Claude code user, who has used over 12 billion tokens in the last few months, and never tried...

Tools

258 136 2 months ago

r/LocalLLaMA · u/clem59480

Hugging Face just released a one-liner that uses 𝚕𝚕𝚖𝚏𝚒𝚝 to detect your hardware and pick the best model and quant, spins up a 𝚕𝚕a𝚖𝚊.𝚌𝚙𝚙 server, and launches Pi (the agent behind OpenClaw 🦞)

Tools

646 78 2 months ago

r/LocalLLaMA · u/ilintar

Unsloth announces Unsloth Studio - a competitor to LMStudio?

Until now, LMStudio has basically been the "go-to" solution for more advanced LLM users in the GGUF ecosystem, but...

Tools

937 261 2 months ago

r/LocalLLaMA · u/danielhanchen

Introducing Unsloth Studio: A new open-source web UI to train and run LLMs

Hey r/LocalLlama, we're super excited to launch Unsloth Studio (Beta), a new open-source web UI to train and run LLMs...

Tools

905 145 2 months ago

r/LocalLLaMA · u/seamonn

Mistral Small 4:119B-2603

Tools

619 237 2 months ago

r/LocalLLaMA · u/TKGaming_11

Mistral 4 Family Spotted

Tools

397 147 2 months ago

r/LocalLLaMA · u/Ueberlord

OpenCode concerns (not truely local)

I know we all love using opencode, I just recently found out about it and my experience is generally positive so far....

Tools

416 172 2 months ago

r/LocalLLaMA · u/gamblingapocaly...

Qwen 3.5 122b - a10b is kind of shocking

I’m building an app with this model locally, and I’ve been genuinely surprised by how naturally it reasons through...

Tools

405 168 2 months ago

r/LocalLLaMA · u/Reddactor

Homelab has paid for itself! (at least this is how I justify it...)

Hey, I thought I'd do an update on my Homelab I posted a while back. I have it running on LLM experiments, which I...

Tools

793 115 2 months ago

r/LocalLLaMA · u/__JockY__

Nvidia updated the Nemotron Super 3 122B A12B license to remove the rug-pull clauses

tl;dr the new license doesn't include the rug pull clauses and removes restrictions on modifications, guardrails,...

Tools

298 79 2 months ago

r/LocalLLaMA · u/No-Compote-6794

You guys gotta try OpenCode + OSS LLM

as a heavy user of CC / Codex, i honestly find this interface to be better than both of them. and since it's open...

Tools

433 185 2 months ago

r/LocalLLaMA · u/xandep

I regret ever finding LocalLLaMA

It all started with using "the AI" to help me study for a big exam. Can it make some flashcards or questions? Then...

Tools

763 138 2 months ago

r/LocalLLaMA · u/pigeon57434

Heretic has FINALLY defeated GPT-OSS with a new experimental decensoring method called ARA

The creator of heretic p-e-w opened a pull request #211 with a new method called Arbitrary-Rank Ablation (ARA) the...

Tools

691 141 2 months ago

r/LocalLLaMA · u/Porespellar

Open WebUI’s New Open Terminal + “Native” Tool Calling + Qwen3.5 35b = Holy Sh!t!!!

Let me pre-apologize for this long and rambling post but I get excited by stuff like this. I think a lot of folks here...

Tools

890 202 2 months ago

r/LocalLLaMA · u/Joozio

Ran Qwen 3.5 9B on M1 Pro (16GB) as an actual agent, not just a chat demo. Honest results.

Quick context: I run a personal automation system built on Claude Code. It's model-agnostic, so switching to Ollama was...

Tools

951 249 2 months ago

r/LocalLLaMA · u/theeler222

Qwen3.5-0.8B - Who needs GPUs?

I am genuinely surprised at how good the model is and that it can run on 14 years old device: 2nd gen i5 + 4GB DDR3 RAM.

Tools

683 127 2 months ago

r/LocalLLaMA · u/ForsookComparis...

Back in my day, LocalLLaMa were the pioneers!

Tools

812 157 2 months ago

r/LocalLLaMA · u/hedgehog0

PewDiePie fine-tuned Qwen2.5-Coder-32B to beat ChatGPT 4o on coding benchmarks.

Tools

697 122 2 months ago

r/LocalLLaMA · u/__JockY__

American closed models vs Chinese open models is becoming a problem.

The work I do involves customers that are sensitive to nation state politics. We cannot and do not use cloud API...

Tools

665 582 2 months ago

r/LocalLLaMA · u/DealingWithIt20...

Anthropic is the leading contributor to open weight models

It just happens to be entirely against their will and TOS. I say: Distill Baby Distill!

Tools

695 80 2 months ago

r/LocalLLaMA · u/AaronFeng47

New Qwen3.5 models spotted on qwen chat

Tools

510 166 2 months ago

r/LocalLLaMA · u/obvithrowaway34...

Anthropic's recent distillation blog should make anyone only ever want to use local open-weight models; it's scary and dystopian

It's quite ironic that they went for the censorship and authoritarian angles here. Full blog:

Tools

557 105 2 months ago

r/LocalLLaMA · u/obvithrowaway34...

People are getting it wrong; Anthropic doesn't care about the distillation, they just want to counter the narrative about Chinese open-source models catching up with closed-source frontier models

Why would they care about distillation when they probably have done the same with OpenAI models and the Chinese labs...

Tools

648 115 2 months ago

r/LocalLLaMA · u/InternationalAs...

Fun fact: Anthropic has never open-sourced any LLMs

I’ve been working on a little side project comparing tokenizer efficiency across different companies’ models for...

Tools

702 94 2 months ago

r/LocalLLaMA · u/pmv143

Hypocrisy?

Tools

425 139 2 months ago

r/LocalLLaMA · u/jacek2023

so is OpenClaw local or not

Reading the comments, I’m guessing you didn’t bother to read this: "Safety and alignment at Meta Superintelligence."

Tools

914 275 2 months ago

r/LocalLLaMA · u/Vaddieg

Feels like magic. A local gpt-oss 20B is capable of agentic work

I gave a try to zeroclaw agent (intstead of the bloated and overhyped one). After few hours of fuckery with configs...

Tools

447 127 2 months ago

r/LocalLLaMA · u/k_means_cluster...

Qwen3's most underrated feature: Voice embeddings

Did you know that Qwen3 TTS utilizes voice embedding for voice cloning? Your voice is turned into a vector of 1024...

Tools

629 66 2 months ago

r/LocalLLaMA · u/jacek2023

Which one are you waiting for more: 9B or 35B?

Tools

930 208 3 months ago

r/LocalLLaMA · u/Figai

Favourite niche usecases?

Tools

620 299 3 months ago

r/LocalLLaMA · u/-p-e-w-

PSA: The software “Shade” is a fraudulent, plagiarized copy of Heretic

Three days ago, the following repository was published, which its “creator” has been aggressively promoting on various...

Tools

376 74 3 months ago

r/LocalLLaMA · u/keb_37

The top 3 models on openrouter this week ( Chinese models are dominating!)

the first time i see a model exceed 3 trillion tokens per week on openrouter! the first time i see more than one model...

Tools

379 93 3 months ago

r/LocalLLaMA · u/Time_Reaper

Free ASIC Llama 3.1 8B inference at 16,000 tok/s - no, not a joke

Hello everyone, A fast inference hardware startup, Taalas, has released a free chatbot interface and API endpoint...

Tools

458 251 3 months ago

r/LocalLLaMA · u/CesarOverlorde

Pack it up guys, open weight AI models running offline locally on PCs aren't real. 😞

Tools

888 273 3 months ago

r/LocalLLaMA · u/FPham

I'm 100% convinced that it's the NFT-bros pushing all the openclawd engagement on X

I'm absolutely sure of it. The same usual suspects, the same language, the same who stole from whom the next million...

Tools

485 172 3 months ago

r/LocalLLaMA · u/copingmechanism

More quantization visualization types (repost)

Inspired by this post from u/VoidAlchemy a few months back: Intrusive thoughts had me try to reproduce and extend the...

Tools

456 47 3 months ago

r/LocalLLaMA · u/anvarazizov

I plugged a $30 radio into my Mac mini and told my AI "connect to this" — now I control my smart home and send voice messages over radio with zero internet

Hey r/LocalLLaMA, So I live in Ukraine during the war. Power goes out a lot here – russia regularly attacks our power...

Tools

452 93 3 months ago

r/LocalLLaMA · u/No_Afternoon_42...

PSA: DDR5 RDIMM price passed the point were 3090 are less expensive per gb..

Hello all, Just wanted to note that RDIMM prices are so wild.. Stacking rdimms starts to be as expensive as stacking...

Tools

470 217 3 months ago

r/LocalLLaMA · u/ylankgz

KaniTTS2 — open-source 400M TTS model with voice cloning, runs in 3GB VRAM. Pretrain code included.

Hey everyone, we just open-sourced KaniTTS2 - a text-to-speech model designed for real-time conversational use cases....

Tools

426 81 3 months ago

r/LocalLLaMA · u/-p-e-w-

Heretic 1.2 released: 70% lower VRAM usage with quantization, Magnitude-Preserving Orthogonal Ablation ("derestriction"), broad VL model support, session resumption, and more

Llamas and Gentlemen, Heretic ( is the leading software for removing censorship from language models. In the three...

Tools

350 46 3 months ago

r/LocalLLaMA · u/abdouhlili

The gap between open-weight and proprietary model intelligence is as small as it has ever been, with Claude Opus 4.6 and GLM-5'

Tools

698 165 3 months ago

r/LocalLLaMA · u/hauhau901

GPT-OSS 120b Uncensored Aggressive Release (MXFP4 GGUF)

Hey everyone, made an uncensored version of GPT-OSS 120B. Quick specs: 117B total params, \~5.1B active (MoE with 128...

Tools

345 27 3 months ago

r/LocalLLaMA · u/CuriousPlatypus...

SWE-rebench Jan 2026: GLM-5, MiniMax M2.5, Qwen3-Coder-Next, Opus 4.6, Codex Performance

Hi all, I’m Anton from Nebius. We’ve updated the SWE-rebench leaderboard with our January runs on 48 fresh GitHub PR...

Tools

278 80 3 months ago

r/LocalLLaMA · u/rerri

MiniMaxAI/MiniMax-M2.5 · Hugging Face

You can monitor quants begin to appear with this search:

Tools

388 108 3 months ago

r/LocalLLaMA · u/gradNorm

UG student launches Dhi-5B (Trained from Scratch)

Hii everyone, I present Dhi-5B: A 5 billion parameter Multimodal Language Model trained compute optimally with just...

Tools

269 52 3 months ago

r/LocalLLaMA · u/Zyj

MiniMaxAI MiniMax-M2.5 has 230b parameters and 10b active parameters

OpenHands reveals the model size in their announcement. Still waiting for the model to appear on HF.

Tools

350 88 3 months ago

r/LocalLLaMA · u/JacketHistorica...

Why do we allow "un-local" content

Title somewhat says it all. I get that it's related but if links to new models are being discussed shouldn't it be a...

Tools

324 110 3 months ago

r/LocalLLaMA · u/Which_Slice1600

Minimax M2.5 Officially Out

Only official webpages released now. But the bench looks very promising: SWE-Bench Verified 80.2% Multi-SWE-Bench 51.3%...

Tools

507 130 3 months ago

r/LocalLLaMA · u/RickyRickC137

Unsloth just unleashed Glm 5! GGUF NOW!

Tools

297 80 3 months ago

r/LocalLLaMA · u/ForsookComparis...

#SaveLocalLLaMA

Tools

852 129 3 months ago

r/LocalLLaMA · u/abdouhlili

GLM-5 scores 50 on the Intelligence Index and is the new open weights leader!

Tools

640 145 3 months ago

r/LocalLLaMA · u/ResearchCrafty1...

GLM-5 Officially Released

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of...

Tools

767 157 3 months ago

r/LocalLLaMA · u/External_Mood47...

MiniMax M2.5 Released

Tools

265 79 3 months ago

r/LocalLLaMA · u/External_Mood47...

GLM 5 Released

Tools

612 175 3 months ago

r/LocalLLaMA · u/danielhanchen

Train MoE models 12x faster with 30% less memory! (<15GB VRAM)

Hey r/LocalLlama! We’re excited to introduce \~12x faster Mixture of Experts (MoE) training with >35% less VRAM and...

Tools

421 56 3 months ago

r/LocalLLaMA · u/Bernice_working...

Kimi is so smart

Kimi > ChatGPT = Claude

Tools

309 158 3 months ago

r/LocalLLaMA · u/RIPT1D3_Z

Qwen-Image-2.0 is out - 7B unified gen+edit model with native 2K and actual text rendering

Qwen team just released Qwen-Image-2.0. Before anyone asks - no open weights yet, it's API-only on Alibaba Cloud...

Tools

503 105 3 months ago

r/LocalLLaMA · u/ortegaalfredo

MechaEpstein-8000

I know it has already been done but this is my AI trained on Epstein Emails. Surprisingly hard to do, as most LLMs will...

Tools

592 132 3 months ago

r/LocalLLaMA · u/Iory1998

Do not Let the "Coder" in Qwen3-Coder-Next Fool You! It's the Smartest, General Purpose Model of its Size

Like many of you, I like to use LLM as tools to help improve my daily life, from editing my emails, to online search....

Tools

430 145 3 months ago

r/LocalLLaMA · u/FireGuy324

Bad news for local bros

Tools

471 225 3 months ago

r/LocalLLaMA · u/TKGaming_11

Qwen3.5 Support Merged in llama.cpp

Tools

236 14 3 months ago

r/LocalLLaMA · u/sultan_papagani

I built a rough .gguf LLM visualizer

I hacked together a small tool that lets you upload a .gguf file and visualize its internals in a 3D-ish way (layers /...

Tools

668 40 3 months ago

r/LocalLLaMA · u/Chromix_

Qwen3 Coder Next as first "usable" coding model < 60 GB for me

I've tried lots of "small" models < 60 GB in the past. GLM 4.5 Air, GLM 4.7 Flash, GPT OSS 20B and 120B, Magistral,...

Tools

351 177 3 months ago

r/LocalLLaMA · u/Mysterious_Fini...

PR opened for Qwen3.5!!

Looking at the code at src/transformers/models/qwen35/modelingqwen3_5.py, it looks like Qwen3.5 series will have VLMs...

Tools

613 73 3 months ago

r/LocalLLaMA · u/SrijSriv211

I trained a 1.8M params model from scratch on a total of ~40M tokens.

Ok so I've been working & experimenting with my own simple architecture. I call it Strawberry Here's the repo for...

Tools

519 103 3 months ago

r/LocalLLaMA · u/mike34113

Prompt injection is killing our self-hosted LLM deployment

We moved to self-hosted models specifically to avoid sending customer data to external APIs. Everything was working...

Tools

312 231 3 months ago

r/LocalLLaMA · u/Dismal-Effect-1...

Nemo 30B is insane. 1M+ token CTX on one 3090

Been playing around with llama.cpp and some 30-80B parameter models with CPU offloading. Currently have one 3090 and 32...

Tools

386 105 3 months ago

r/LocalLLaMA · u/FPham

A top-downloaded OpenClaw skill is actually a staged malware delivery chain

Here we go! As expected by most of us here. Jason Meller from 1password argues that OpenClaw’s agent “skills” ecosystem...

Tools

229 54 3 months ago

r/LocalLLaMA · u/Few_Painter_558...

GLM 5 Is Being Tested On OpenRouter

Tools

286 84 3 months ago

r/LocalLLaMA · u/Sad-Size2723

[Release] Experimental Model with Subquadratic Attention: 100 tok/s @ 1M context, 76 tok/s @ 10M context (30B model, single GPU)

Hey everyone, Last week I shared preliminary results on a new subquadratic attention mechanism ( Following up with the...

Tools

345 46 3 months ago

r/LocalLLaMA · u/JackStrawWitchi...

CPU-only, no GPU computers can run all kinds of AI tools locally

While it’s great that so many people on LocalLLaMA are pushing the envelope with what can be done locally with...

Tools

546 132 3 months ago

r/LocalLLaMA · u/TwistedDiesel53

I am absolutely loving qwen3-235b

I installed qwen3-235b on my desktop system, and I had to join here to brag about it. It's such a careful model, the...

Tools

236 146 3 months ago

r/LocalLLaMA · u/S1M0N38

BalatroBench - Benchmark LLMs' strategic performance in Balatro

If you own a copy of Balatro, you can make your local LLM play it. I built tools to let LLMs play Balatro autonomously....

Tools

522 57 3 months ago

r/LocalLLaMA · u/Fear_ltself

Google Research announces Sequential Attention: Making AI models leaner and faster without sacrificing accuracy

Tools

600 45 3 months ago

r/LocalLLaMA · u/bobaburger

Qwen3-Coder-Next on RTX 5060 Ti 16 GB - Some numbers

About 2 weeks ago, I posted about running GLM-4.7-Flash on 16 GB of VRAM here...

Tools

250 121 3 months ago

r/LocalLLaMA · u/jacek2023

mistralai/Voxtral-Mini-4B-Realtime-2602 · Hugging Face

Voxtral Mini 4B Realtime 2602 is a multilingual, realtime speech-transcription model and among the first open-source...

Tools

250 30 3 months ago

r/LocalLLaMA · u/jacek2023

Bashing Ollama isn’t just a pleasure, it’s a duty

Tools

715 144 3 months ago

r/LocalLLaMA · u/iGermanProd

ACE-Step-1.5 has just been released. It’s an MIT-licensed open source audio generative model with performance close to commercial platforms like Suno

It’s already supported in Comfy. MIT license. HuggingFace Demo is also available! Pretty much the whole package - LoRAs...

Tools

508 114 3 months ago

r/LocalLLaMA · u/AppropriateGuav...

The open-source version of Suno is finally here: ACE-Step 1.5

ACE-Step 1.5 is an open-source music model that can generate a full song in about 2 seconds on an A100, runs locally on...

Tools

337 71 3 months ago

r/LocalLLaMA · u/danielhanchen

Qwen3-Coder-Next

Qwen3-Coder-Next is out!

Tools

319 98 3 months ago

r/LocalLLaMA · u/coder543

Qwen/Qwen3-Coder-Next · Hugging Face

Tools

678 234 3 months ago

r/LocalLLaMA · u/Difficult-Cap-7...

GLM-5 Coming in February! It's confirmed.

Twitter Link:

Tools

831 144 3 months ago

r/LocalLLaMA · u/ForsookComparis...

How close are open-weight models to "SOTA"? My honest take as of today, benchmarks be damned.

Tools

629 216 3 months ago

r/LocalLLaMA · u/demon_bhaiya

Cline team got absorbed by OpenAI. Kilo is going full source available in response.

For those who used Cline with local models, heads up that the core team appears to have joined OpenAI's Codex group...

Tools

417 54 3 months ago

r/LocalLLaMA · u/Electrical-Shap...

LingBot-World outperforms Genie 3 in dynamic simulation and is fully Open Source

The newly released LingBot-World framework offers the first high capability world model that is fully open source,...

Tools

599 74 3 months ago

r/LocalLLaMA · u/Wonderful-Excus...

Mistral CEO Arthur Mensch: “If you treat intelligence as electricity, then you just want to make sure that your access to intelligence cannot be throttled.”

Tools

586 68 3 months ago

r/LocalLLaMA · u/Distinct-Expres...

GitHub trending this week: half the repos are agent frameworks. 90% will be dead in 1 week.

It this the js framework hell moment of ai?

Tools

471 101 3 months ago

r/LocalLLaMA · u/SweetHomeAbalam...

768Gb Fully Enclosed 10x GPU Mobile AI Build

I haven't seen a system with this format before but with how successful the result was I figured I might as well share...

Tools

575 169 4 months ago

r/LocalLLaMA · u/Recoil42

It's been one year since the release of Deepseek-R1

Tools

266 48 4 months ago

r/LocalLLaMA · u/Wooden-Deer-127...

Unsloth GLM 4.7-Flash GGUF

Tools

226 44 4 months ago

r/LocalLLaMA · u/ayylmaonade

GLM 4.7 Flash official support merged in llama.cpp

Tools

352 61 4 months ago

r/LocalLLaMA · u/__Maximum__

My gpu poor comrades, GLM 4.7 Flash is your local agent

I tried many MoE models at 30B or under and all of them failed sooner or later in an agentic framework. If z.ai is not...

Tools

442 150 4 months ago

r/LocalLLaMA · u/Dark_Fire_12

zai-org/GLM-4.7-Flash · Hugging Face

Tools

719 225 4 months ago

r/LocalLLaMA · u/NunzeCs

4x AMD R9700 (128GB VRAM) + Threadripper 9955WX Build

Disclaimer: I am from Germany and my English is not perfect, so I used an LLM to help me structure and write this post....

Tools

340 91 4 months ago

r/LocalLLaMA · u/Difficult-Cap-7...

Qwen 4 might be a long way off !? Lead Dev says they are "slowing down" to focus on quality.

Tools

446 71 4 months ago

r/LocalLLaMA · u/Ulterior-Motive...

128GB VRAM quad R9700 server

This is a sequel to my previous thread from 2024. I originally planned to pick up another pair of MI100s and an...

Tools

529 111 4 months ago

r/LocalLLaMA · u/Fun-Situation-4...

The Search for Uncensored AI (That Isn’t Adult-Oriented)

I’ve been trying to find an AI that’s genuinely unfiltered and technically advanced, uncensored something that can...

Tools

274 214 4 months ago

r/LocalLLaMA · u/gggghhhhiiiijkl...

Best "End of world" model that will run on 24gb VRAM

Hey peeps, I'm feeling in a bit of a omg the world is ending mood and have been amusing myself by downloading and...

Tools

331 176 4 months ago

r/LocalLLaMA · u/Technical-Love-...

DeepSeek Engram : A static memory unit for LLMs

DeeepSeek AI released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large...

Tools

320 47 4 months ago

r/LocalLLaMA · u/CuriousPlatypus...

GPT-5.2 xhigh, GLM-4.7, Kimi K2 Thinking, DeepSeek v3.2 on Fresh SWE-rebench (December 2025)

Hi all, I’m Anton from Nebius. We’ve updated the SWE-bench leaderboard with our December runs on 48 fresh GitHub PR...

Tools

376 89 4 months ago

r/LocalLLaMA · u/alhinai_03

I fucking love this community

Thank you guys, thanks to everyone who took the time to write a comment or a post explaining, teaching people how...

Tools

498 54 4 months ago

r/LocalLLaMA · u/Porespellar

Dang, M2 drives are the new DDR5 apparently.

Tools

211 97 4 months ago

r/LocalLLaMA · u/inserterikhere

Latest upgrade…A100 40 GB

Originally this was my gaming rig but I went ITX and basically bought a new computer. So I had the case, fans, AIO, 64...

Tools

405 54 4 months ago

r/LocalLLaMA · u/DrewGrgich

Nemotron-3-nano:30b is a spectacular general purpose local LLM

Just want to sing the praises of this model. I am stunned at how intelligent it is for a 30b model. Comparing it to...

Tools

213 125 4 months ago

r/LocalLLaMA · u/danielhanchen

7x Longer Context Reinforcement Learning in Unsloth

Hey r/LocalLlama! We're excited to show how Unsloth now enables 7x longer context lengths (up to 12x) for Reinforcement...

Tools

251 28 4 months ago

r/LocalLLaMA · u/Paramecium_caud...

RTX 5070 Ti and RTX 5060 Ti 16 GB no longer manufactured

Nvidia has essentially killed off supply for the RTX 5070 Ti. Also supply of RTX 5060 Ti 16 GB has been significantly...

Tools

232 95 4 months ago

r/LocalLLaMA · u/N8Karma

I trained a model to 'unslop' AI prose

I ran passages from Project Gutenberg through GPT-4o-mini 10 times over, each time telling it to "make it read far...

Tools

206 71 4 months ago

r/LocalLLaMA · u/fallingdowndizz...

Zhipu AI breaks US chip reliance with first major model trained on Huawei stack (GLM-Image)

Tools

421 45 4 months ago

r/LocalLLaMA · u/TeamNeuphonic

NeuTTS Nano: 120M Parameter On-Device TTS based on Llama3

Hey everyone, The team at Neuphonic is back with a new open-source release: NeuTTS Nano. After NeuTTS Air trended #1 on...

Tools

213 44 4 months ago

r/LocalLLaMA · u/eugenekwek

Soprano 1.1-80M released: 95% fewer hallucinations and 63% preference rate over Soprano-80M

Hello everyone! Today, I am announcing Soprano 1.1! I’ve designed it for massively improved stability and audio quality...

Tools

318 54 4 months ago

r/LocalLLaMA · u/Fear_ltself

NVIDIA's new 8B model is Orchestrator-8B, a specialized 8-billion-parameter AI designed not to answer everything itself, but to intelligently manage and route complex tasks to different tools (like web search, code execution, other LLMs) for greater efficiency

I’ve seen some arguments we’ve reached AGI, it’s just about putting the separate pieces together in the right context....

Tools

705 129 4 months ago

AI Reddit Digest

I'm done with using local LLMs for coding

Deepseek V4 Flash and Non-Flash Out on HuggingFace

Qwen 3.6 27B Makes Huge Gains in Agency on Artificial Analysis - Ties with Sonnet 4.6

Qwen 3.6 27B is a BEAST

Qwen3 TTS is seriously underrated - I got it running locally in real-time and it's one of the most expressive open TTS models I've tried

unsloth Qwen3.6-27B-GGUF

Qwen3.6-27B released!

Qwen3.6-35B becomes competitive with cloud models when paired with the right agent

Unpopular opinion: OpenClaw and all its clones are almost useless tools for those who know what they're doing. It's kind of impressive for someone who has never used a CLI, Claude Code, Codex, etc. Nor used any workflow tool like 8n8 or make.

Gemma-4-E2B's safety filters make it unusable for emergencies

Kimi K2.6 Released (huggingface)

When you dial in your bot’s personality

Why isn't ebay doing anything to stop those scams?

I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude

KIMI K2.6 SOON !!

RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part.

qwen3.6 performance jump is real, just make sure you have it properly configured

Qwen3.6 GGUF Benchmarks

Qwen 3.6 is the first local model that actually feels worth the effort for me

llama.cpp at 100k stars

LocalLLaMA 2026

Throwback to my proudest impulse buy ever, which has let me enjoy this hobby 10x more

Prices finally coming down? 🥺🙏

Best model that can beat Claude opus that runs on 32MB of vram?

MiniMax M2.7 Will Be Open Weights

Moonshot says Cursor Composer was authorized

Feedback on my 256gb VRAM local setup and cluster plans. Lawyer keeping it local.

Qwen 3.5 397B is the best local coder I have used until now

Glm 5.1 👀

Running TinyLlama 1.1B locally on a PowerBook G4 from 2002. Mac OS 9, no internet, installed from a CD.

Qwen3.5 is a working dog.

So nobody's downloading this model huh?

Two weeks ago, I posted here to see if people would be interested in an open-source local AI 3D model generator

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling.

MiniMax-M2.7 Announced!

I just realised how good GLM 5 is

Hugging Face just released a one-liner that uses 𝚕𝚕𝚖𝚏𝚒𝚝 to detect your hardware and pick the best model and quant, spins up a 𝚕𝚕a𝚖𝚊.𝚌𝚙𝚙 server, and launches Pi (the agent behind OpenClaw 🦞)

Unsloth announces Unsloth Studio - a competitor to LMStudio?

Introducing Unsloth Studio: A new open-source web UI to train and run LLMs

Mistral Small 4:119B-2603

Mistral 4 Family Spotted

OpenCode concerns (not truely local)

Qwen 3.5 122b - a10b is kind of shocking

Homelab has paid for itself! (at least this is how I justify it...)

Nvidia updated the Nemotron Super 3 122B A12B license to remove the rug-pull clauses

You guys gotta try OpenCode + OSS LLM

I regret ever finding LocalLLaMA

Heretic has FINALLY defeated GPT-OSS with a new experimental decensoring method called ARA

Open WebUI’s New Open Terminal + “Native” Tool Calling + Qwen3.5 35b = Holy Sh!t!!!

Ran Qwen 3.5 9B on M1 Pro (16GB) as an actual agent, not just a chat demo. Honest results.

Qwen3.5-0.8B - Who needs GPUs?

Back in my day, LocalLLaMa were the pioneers!

PewDiePie fine-tuned Qwen2.5-Coder-32B to beat ChatGPT 4o on coding benchmarks.

American closed models vs Chinese open models is becoming a problem.

Anthropic is the leading contributor to open weight models

New Qwen3.5 models spotted on qwen chat

Anthropic's recent distillation blog should make anyone only ever want to use local open-weight models; it's scary and dystopian

People are getting it wrong; Anthropic doesn't care about the distillation, they just want to counter the narrative about Chinese open-source models catching up with closed-source frontier models

Fun fact: Anthropic has never open-sourced any LLMs

Hypocrisy?

so is OpenClaw local or not

Feels like magic. A local gpt-oss 20B is capable of agentic work

Qwen3's most underrated feature: Voice embeddings

Which one are you waiting for more: 9B or 35B?

Favourite niche usecases?

PSA: The software “Shade” is a fraudulent, plagiarized copy of Heretic

The top 3 models on openrouter this week ( Chinese models are dominating!)

GGML.AI has got acquired by Huggingface

Deepseek and Gemma ??

Kimi has context window expansion ambitions

Free ASIC Llama 3.1 8B inference at 16,000 tok/s - no, not a joke

Pack it up guys, open weight AI models running offline locally on PCs aren't real. 😞

I'm 100% convinced that it's the NFT-bros pushing all the openclawd engagement on X

More quantization visualization types (repost)

I plugged a $30 radio into my Mac mini and told my AI "connect to this" — now I control my smart home and send voice messages over radio with zero internet

PSA: DDR5 RDIMM price passed the point were 3090 are less expensive per gb..

KaniTTS2 — open-source 400M TTS model with voice cloning, runs in 3GB VRAM. Pretrain code included.

Heretic 1.2 released: 70% lower VRAM usage with quantization, Magnitude-Preserving Orthogonal Ablation ("derestriction"), broad VL model support, session resumption, and more

The gap between open-weight and proprietary model intelligence is as small as it has ever been, with Claude Opus 4.6 and GLM-5'

Train MoE models 12x faster with 30% less memory! (<15GB VRAM)

Qwen3 Coder Next as first "usable" coding model < 60 GB for me