This workflow takes an Image and an audio track as input to generate a video.
Important Notice
Update ComfyUI and KJ Nodes. A lot of the code has been updated in the last few days.
Include --reserve-vram 1 in your launch option to avoid OOM.
If you have no lipsync, try ensuring that your audio track is in stereo format. fix suggested by @thomasdimitri563
Models to download (LTX2.3)
Place in models/diffusion_models
Place in models/loras
https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-22b-distilled-lora-384.safetensors
Place in models/text_encoders
Place in models/vae
https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_audio_vae_bf16.safetensors
https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_video_vae_bf16.safetensors
Models to download (V3)
Place in models/diffusion_models
https://huggingface.co/Lightricks/LTX-2/resolve/main/ltx-2-19b-distilled-fp8.safetensors
Place in models/text_encoders
Place in models/loras
Description
Updated to LTX2.3
FAQ
Comments (15)
I'm was getting this error at the SamplerCustomAdvanced node.
The size of tensor a (63240) must match the size of tensor b (8126848) at non-singleton dimension 2
Remove "comfyui_smznodes" from custom nodes to fix it.
will this run with 4090 , 32 ram?
it should run. you can try with a lower frame count, like a 1 sec audio sample. then watch your ram usage on task manager.
for reference, i am on a 4060Ti with 64GB DDR4 and it took 12mins to generate a 7sec video @ 24fps, 1920 x 1088 resolution.
Works great thank you! Averaging 560 seconds per 20 second vid on 3090/64.
"I add the image and the audio, but when I generate the video there is no lip sync. The video is generated and the voice plays in the background, but the character is not speaking."
try changing the audio file to stereo, as suggested by thomasdimitri563
this happened for me when the voice starts right at the start of the clip. try to give about 0.2s of silence before the speech.
does your audio file have a lot of background noise? if yes, you can try to isolate the voice by using https://github.com/kijai/ComfyUI-MelBandRoFormer
There is nothing in the output video. It's just black. I can however hear the audio. I have all the correct files downloaded and I'm running this on a 4090.
please ensure that comfyui and your custom nodes are updated.
I also had no lip sync and I fixed it by changing my audio file from mono to stereo (2 channel). I used ffmpeg to make change my mono audio into stereo audio.
I can't seem to make the lipsync work. I even changed the audio to stereo, prompted the exact words the character should say, changed the video length to match the audio length, etc. Any help, please?
Managed to solve it. Elevenlabs audio comes too clean, for whatever reason, adding some background noise makes lip sync work.
@zexeor thanks for your suggestions. I know people are having trouble with getting the lip sync to work. But no one is telling me what their source sound files are. I've been using audio from videos so I haven't experienced the issues users are experiencing. Let me test with some local TTS.
Beautiful, right out of the box. Hardly had to change a thing. Well done.
This is blazing fast. And it made me realize I don't need an upscale stage with LTX. 10 seconds of 24fps 720 is nothing, absolutely nothing...done in 1:30 and with resources to spare. Paired with TTS suite and/or Ace-Step, possibilities are endless. I really need to finish a comp to post. I keep getting distracted discovering everything this model can do.
yes lip sync is working fine. Just don't try to upload mp3, wav is good for example 40k Hz. Yes mono not working properly, try stereo. Vertical video seems fine. Using on rental 5090, my 5070ti would die. But thanks for sharing it, amazing job