CivArchive
    LTX-2.3 Dev Audio+Image To Video (GGUF) - Audio+Image to video

    base workflow for Audio+Image to video for Dev model. LOW VRAM as possible.

    can also generate text to video with audio reference. (switch red boolean node to TRUE)

    i suggest leaving the prompt alone unless you want to prompt for a specific motion or action to occur.

    prompt:

    " Transform this static image into a high-quality video with with realistic facial expressions and realistic motion.

    Perfect lip-sync to the attached audio. "

    FILES:

    OPTIONAL Kijais fp8 Scaled (requires load diffusion model node instead of unet loader node and replaces the gguf entirely. )

    https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/diffusion_models

    DEV gguf (distilled ggufs are in the repo as well)

    https://huggingface.co/unsloth/LTX-2.3-GGUF/tree/main

    Gemma 3_12B FP4 text encoder

    https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it_fp4_mixed.safetensors

    Audio VAE

    https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_audio_vae_bf16.safetensors

    Video VAE

    https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_video_vae_bf16.safetensors

    Text Projection text encoder

    https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/text_encoders

    Distill Lora

    https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-22b-distilled-lora-384.safetensors

    Upscaler

    https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-spatial-upscaler-x2-1.1.safetensors

    Description

    A+I2V

    FAQ

    Comments (6)

    ArtificialOtakuMar 22, 2026
    CivitAI

    Very cool wf, had to modify it so it would take my tensor file and added more lora nodes, but other than that, quite simple and clear to work with, thanks!

    gambikules858Mar 22, 2026
    CivitAI

    and if first + last frame ?

    creatorjulie743Mar 25, 2026
    CivitAI

    I don't know. It's not working right. See the posted video. It should have the workflow in it. The only difference is that I used Q8_0 gguf and gemma_3_12b_it text encoder. Oh, and I used resolution 720x1024. Everything else is the same as in the sample workflow.
    Funny part is that I tried the sample image (the guy in a baseball hat) and sound clip and it worked. Was using the same Q8_0 gguf and gemma_3_12b_it text encoder and changed the resolution to 768x768. But my own audio and images do not work even when using the same lowered resolution. What gives?

    IDK i am also having huge problems. even with the official workflow. it eiter throws errors, or it speaks alien language, or everythin looks bloomed, or subtitles everywhere, or general bad movements, and i2v is a joke.

    creatorjulie743Mar 26, 2026

    @chrisbraeuer41172035 Well, with straight i2v, I managed to get some decent clips with various workflows, including the default ComfyUI one. Lots of duds, but some clips are pretty decent. But, I've tried several ai2v workflows and none of them works halfway decent.

    @creatorjulie743 I am really not sure. I also got some decent clips. Woman in protrait mode speaking works great. speaking portraits in general. But as sonn as i try to do something different if falls off a cliff. Its drinving me nuts. Just trying to let someone go up some stairs. Not possible at all.

    Workflows
    LTXV 2.3

    Details

    Downloads
    599
    Platform
    CivitAI
    Platform Status
    Available
    Created
    3/22/2026
    Updated
    5/13/2026
    Deleted
    -

    Files

    ltx23DevAudioImageTo_audioImageToVideo.zip