Optimised LTX 2.3 for my RTX 3070 8GB - 900x1600 20 sec Video in 21 min (T2V)

Tools 352 points 65 comments 2 months ago

Workflow: https://civitai.com/models/2477099?modelVersionId=2785007 Video with Full Resolution: https://files.catbox.moe/00xlcm.mp4 Four days of intensive optimization, I finally got LTX 2.3 running efficiently on my RTX 3070 8GB - 32G laptop ). I’m now able to generate a 20-second video at 900×1600 in just 21 minutes, which is a huge breakthrough considering the limitations. What’s even more impressive is that the video and audio quality remain exceptionally high, despite using the distilled version of LTX 2.3 (Q4\_K\_M GGUF) from Unsloth. The WF is built around Gemma 12B (IT FB4 mix) for text, paired with the dev versions video and audio VAEs. Key optimizations included using Sage Attention (fp16\_Triton), and applying Torch patching to reduce memory overhead and improve throughput. Interestingly. I found that the standard VAE decode node actually outperformed tiled decoding—tiled VAE introduced significant slowdowns. On top of that, last 2 days KJ improved VAE handling made a noticeable difference in VRAM efficiency, allowing the system to stay within the 8GB. For WF used it is same as Comfy official one but with modifications I mentioned above (use Euler\_a and Euler with GGUF, don't use CFG\_PP samplers. Keep in mind 900x1600 20 sec took 98%-98% of VRAM, so this is the limit for 8GB card, if you have more go ahead and increase it. if I have time I will clean my WF and upload it.

More from r/StableDiffusion