Posted by u/Turbulent_Corner9895
ID-LoRA with LTX-2.3 and ComfyUI custom node🎉
**ID-LoRA** (Identity-Driven In-Context LoRA) jointly generates a subject's appearance and voice in a single model, letting a text prompt, a reference image, and a short audio clip govern both modalities together. Built on top of LTX-2, it is the first method to personalize visual appearance and voice within a single generative pass. Unlike cascaded pipelines that treat audio and video separately, ID-LoRA operates in a unified latent space where a single text prompt can simultaneously dictate the scene's visual content, environmental acoustics, and speaking style -- while preserving the subject's vocal identity and visual likeness. Key features: * 🎵 **Unified audio-video generation** \-- voice and appearance synthesized jointly, not cascaded * 🗣️ **Audio identity transfer** \-- the generated speaker sounds like the reference * 🌍 **Prompt-driven environment control** \-- text prompts govern speaking style, environment sounds, and scene content * 🖼️ **First-frame conditioning** \-- provide an image to control the face and scene * ⚡ **Zero-shot at inference** \-- just load the LoRA weights, no per-speaker fine-tuning needed * 🔬 **Two-stage pipeline** \-- high-quality output with 2x spatial upsampling * LORA LINK- ID-LoRA
External link:
https://i.redd.it/74qvsr4u9jqg1.pngMore from r/StableDiffusion
Surgical masking with Wan 2.2 Animate in ComfyUI
Surgical masking lets you preserve the original scene’s performance and image quality, keeping everything intact while...
How was this done? I've experimented a lot and nothing comes close to this guys work
Stickyspoodge admits to using ai in his work, and the hands and other tells in the full video show that it's clearly ai...
Google's new AI algorithm reduces memory 6x and increases speed 8x