CivArchive
    LTX-2 Image Audio to Video - LTX2.3

    This workflow takes an Image and an audio track as input to generate a video.
    Important Notice

    Update ComfyUI and KJ Nodes. A lot of the code has been updated in the last few days.

    Include --reserve-vram 1 in your launch option to avoid OOM.

    If you have no lipsync, try ensuring that your audio track is in stereo format. fix suggested by @thomasdimitri563

    Models to download (LTX2.3)

    Place in models/diffusion_models

    https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/diffusion_models/ltx-2.3-22b-dev_transformer_only_fp8_scaled.safetensors

    Place in models/loras

    https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-22b-distilled-lora-384.safetensors

    Place in models/text_encoders

    https://huggingface.co/Comfy-Org/ltx-2/resolve/main/split_files/text_encoders/gemma_3_12B_it_fp4_mixed.safetensors

    https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/text_encoders/ltx-2.3_text_projection_bf16.safetensors

    Place in models/vae

    https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_audio_vae_bf16.safetensors

    https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_video_vae_bf16.safetensors

    Models to download (V3)

    Place in models/diffusion_models

    https://huggingface.co/Lightricks/LTX-2/resolve/main/ltx-2-19b-distilled-fp8.safetensors

    Place in models/text_encoders

    https://huggingface.co/Comfy-Org/ltx-2/resolve/main/split_files/text_encoders/gemma_3_12B_it_fp4_mixed.safetensors

    Place in models/loras

    https://huggingface.co/Lightricks/LTX-2-19b-IC-LoRA-Detailer/resolve/main/ltx-2-19b-ic-lora-detailer.safetensors

    https://huggingface.co/Lightricks/LTX-2-19b-LoRA-Camera-Control-Static/resolve/main/ltx-2-19b-lora-camera-control-static.safetensors

    Description

    Updated to LTX2.3

    FAQ

    Comments (15)

    eilers333Mar 6, 2026· 2 reactions
    CivitAI

    I'm was getting this error at the SamplerCustomAdvanced node.

    The size of tensor a (63240) must match the size of tensor b (8126848) at non-singleton dimension 2

    Remove "comfyui_smznodes" from custom nodes to fix it.

    HorridWitchMar 6, 2026
    CivitAI

    will this run with 4090 , 32 ram?

    PixelMuseAI
    Author
    Mar 6, 2026· 2 reactions

    it should run. you can try with a lower frame count, like a 1 sec audio sample. then watch your ram usage on task manager.

    for reference, i am on a 4060Ti with 64GB DDR4 and it took 12mins to generate a 7sec video @ 24fps, 1920 x 1088 resolution.

    Rokit8Mar 6, 2026· 1 reaction
    CivitAI

    Works great thank you! Averaging 560 seconds per 20 second vid on 3090/64.

    luigibarb173Mar 10, 2026
    CivitAI

    "I add the image and the audio, but when I generate the video there is no lip sync. The video is generated and the voice plays in the background, but the character is not speaking."

    PixelMuseAI
    Author
    Mar 13, 2026

    try changing the audio file to stereo, as suggested by thomasdimitri563

    this happened for me when the voice starts right at the start of the clip. try to give about 0.2s of silence before the speech.

    does your audio file have a lot of background noise? if yes, you can try to isolate the voice by using https://github.com/kijai/ComfyUI-MelBandRoFormer

    NovellusMar 11, 2026
    CivitAI

    There is nothing in the output video. It's just black. I can however hear the audio. I have all the correct files downloaded and I'm running this on a 4090.

    PixelMuseAI
    Author
    Mar 13, 2026· 1 reaction

    please ensure that comfyui and your custom nodes are updated.

    thomasdimitri563Mar 12, 2026· 2 reactions
    CivitAI

    I also had no lip sync and I fixed it by changing my audio file from mono to stereo (2 channel). I used ffmpeg to make change my mono audio into stereo audio.

    zexeorMar 20, 2026
    CivitAI

    I can't seem to make the lipsync work. I even changed the audio to stereo, prompted the exact words the character should say, changed the video length to match the audio length, etc. Any help, please?

    zexeorMar 20, 2026

    Managed to solve it. Elevenlabs audio comes too clean, for whatever reason, adding some background noise makes lip sync work.

    PixelMuseAI
    Author
    Mar 21, 2026

    @zexeor thanks for your suggestions. I know people are having trouble with getting the lip sync to work. But no one is telling me what their source sound files are. I've been using audio from videos so I haven't experienced the issues users are experiencing. Let me test with some local TTS.

    Ponder_StibbonsMar 28, 2026
    CivitAI

    Beautiful, right out of the box. Hardly had to change a thing. Well done.

    Ponder_StibbonsMar 29, 2026
    CivitAI

    This is blazing fast. And it made me realize I don't need an upscale stage with LTX. 10 seconds of 24fps 720 is nothing, absolutely nothing...done in 1:30 and with resources to spare. Paired with TTS suite and/or Ace-Step, possibilities are endless. I really need to finish a comp to post. I keep getting distracted discovering everything this model can do.

    scotttybreadApr 12, 2026· 1 reaction
    CivitAI

    yes lip sync is working fine. Just don't try to upload mp3, wav is good for example 40k Hz. Yes mono not working properly, try stereo. Vertical video seems fine. Using on rental 5090, my 5070ti would die. But thanks for sharing it, amazing job

    Workflows
    LTXV2

    Details

    Downloads
    2,808
    Platform
    CivitAI
    Platform Status
    Available
    Created
    3/6/2026
    Updated
    6/24/2026
    Deleted
    -

    Files

    ltx2ImageAudioTo_ltx23.zip

    Mirrors

    HuggingFace (1 mirrors)
    CivitAI (1 mirrors)