LTX 2 is amazing : LTX-2 in ComfyUI on RTX 3060 12GB

Tools 966 points 153 comments 3 days ago

My setup: RTX 3060 12GB VRAM + 48GB system RAM. I spent the last couple of days messing around with **LTX-2** inside ComfyUI and had an absolute blast. I created short sample scenes for a loose **spy story set in a neon-soaked, rainy Dhaka** (cyberpunk/Bangla vibes with rainy streets, umbrellas, dramatic reflections, and a mysterious female lead). Workflow : https://drive.google.com/file/d/1VYrKf7jq52BIi43mZpsP8QCypr9oHtCO/view i forgot the username who shared it under a post. This workflow worked really well! Each 8-second scene took about **12 minutes** to generate (with synced audio). I queued up **70+ scenes** total, often trying 3-4 prompt variations per scene to get the mood right. Some scenes were pure text-to-video, others image-to-video starting from Midjourney stills I generated for consistency. Here's a compilation of some of my favorite clips (rainy window reflections, coffee steam morphing into faces, walking through crowded neon markets, intense close-ups in the downpour): i cleaned up the audio. it had some squeaky sounds. **Strengths that blew me away:** 1. **Speed** – Seriously fast for what it delivers, especially compared to other local video models. 2. **Audio sync** is legitimately impressive. I tested illustration styles, anime-ish looks, realistic characters, and even puppet/weird abstract shapes – lip sync, ambient rain, subtle SFX/music all line up way better than I expected. Achieving this level of quality on just **12GB VRAM** is wild. 3. **Handles non-realistic/abstract content extremely well** – illustrations, stylized/puppet-like figures, surreal elements (like steam forming faces or exaggerated rain effects) come out coherent and beautiful. **Weaknesses / Things to avoid:** 1. Weird random zoom-in effects pop up sometimes – not sure if prompt-related or model quirk. 2. **Actions/motion-heavy scenes** just don't work reliably yet. Keep it to subtle movements, expressions, atmosphere, rain, steam, walking slowly, etc. – anything dynamic tends to break coherence. Overall verdict: I literally couldn't believe how two full days disappeared – I was having way too much fun iterating prompts and watching the queue. LTX-2 feels like a huge step forward for local audio-video gen, especially if you lean into atmospheric/illustrative styles rather than high-action.

More from r/StableDiffusion