Welcome to my ๐ซ๐ฆ Friendly LTX-2 T2V+I2V+Lipsync
LTX-2.3 better in everything! Coming soon...
โจ Less mess, more magic
UniVibe - Lipsync all-in one version with HQ TTS VibeVoice model is released.
New v1.2 with simplified model loading, with quality and perfomance improvements.
LTX-2 is a new video generation model with 19b parameters under the hood. This is the first DiT-based (Diffusion Transformer) foundation model that generates synchronized audio and video simultaneously in a single pass! It supports native 4K resolution at up to 50 FPS, providing cinematic-grade fidelity suitable for professional VFX and film production and it is capable of generating clips up to 10โ20 seconds with consistent style and motion.
๐ป System requirements:
Minimum system requirements for 540p i2v and 720p t2v:
RTX 3000-s, 8GB+ VRAM, 45GB+ RAM, 8-core processor, SSD, latest ComfyUI
๐ Low VRAM optional optimization:
For systems with low VRAM use --reserve-vram ComfyUI parameter in run_nvidia_gpu.bat:ย
--reserve-vram 4ย (or other number in GB).
๐ Detailed tips and links to models in the workflow
โจ Workflow features:
Extremely user-friendly interface
Maximum performance and optimization from 8GB of VRAM: GGUF or 8-step distilled model with fp4 or fp8 text encoder + MultiGPU memory optimization
All-in-one: i2v, t2v, and interpolation
Convenient one-click mode switching
Generation time setting in seconds
Lora support (up to 3)
Detailed tips and links to all necessary models
Manual random seed for complete control over generations
๐ค๐๐ผ Thanks to Lightricks Team
Original repo โ GitHub
Description
Even greater VRAM & RAM optimization
Links to new LTX-2 fp4 - the best model for balancing quality and performance
VAE fixed and optimized
Bugs fixed
Tips have been updated
FAQ
Comments (11)
It doesn't seem to lipsync properly. The audio and video are generated, but they don't line up (no mouth movement at all). I prompted things like "she is saying:" with the text used in the audio section. Perhaps I missed something?
@vokar28 This is a known issue especially with vertical video on LTX-2. It is less common on horizontal videos. But you can use LTX-2-Image2Vid-Adapter lora, it helps often. There is a link in the workflow. Lightricks Team team promise to fix this in future model updates.
@RusselXย Cool, I'll give that a shot. Thanks!
Does this work for video 2 video?
@hamajor not for now, only image 2 video and text 2 video
thanks for your work but why you didn't include the model links and file locations in the description?
@waltuh_07 Hi! You can find links in the worflow in links section
Where does one find ltx-av-step-1751000_vocoder_24K.safetensors I tried google-fu and Claude but they couldn't locate it on the open web. Said it was a part of LTX-2's offical release, but no trace of it found, looks like it's a VAE but can't tell if it's audio or video?
@33251215a613 You don't need this file separately, you need to load audio vae - LTX2_audio_vae_bf16.safetensors and video vae - LTX2_video_vae_bf16.safetensors and put them in the appropriate checkboxes in LTX-2 Module
hi
When I tested the structure in the prompt, node speaker main (sampler female) and node sampler2 (sampler male)
[1]: female text....
[2]: male text....
[1]: female text....
The problem is that the last one [1] makes the male speak this text with a female voice. If I delete this last one [1], then there is no problem.
Shouldn't [1] represent the main speech node and [2] represent the speaker2 node?
@jv12802224 Hello! Yes you are right. [1] is for main and [2] for second speaker.
You should write without ":" and strictly the next line like this:
[1] female
[2] male
[1] female
or you can also use this format:
Speaker 1: female
Speaker 2: male
Speaker 1: female