CivArchive
    LTX-2.3 Dev Audio+Image To Video (GGUF) - Audio+Image to video

    base workflow for Audio+Image to video for Dev model. LOW VRAM as possible.

    can also generate text to video with audio reference. (switch red boolean node to TRUE)

    i suggest leaving the prompt alone unless you want to prompt for a specific motion or action to occur.

    prompt:

    " Transform this static image into a high-quality video with with realistic facial expressions and realistic motion.

    Perfect lip-sync to the attached audio. "

    FILES:

    OPTIONAL Kijais fp8 Scaled (requires load diffusion model node instead of unet loader node and replaces the gguf entirely. )

    https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/diffusion_models

    DEV gguf (distilled ggufs are in the repo as well)

    https://huggingface.co/unsloth/LTX-2.3-GGUF/tree/main

    Gemma 3_12B FP4 text encoder

    https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it_fp4_mixed.safetensors

    Audio VAE

    https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_audio_vae_bf16.safetensors

    Video VAE

    https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_video_vae_bf16.safetensors

    Text Projection text encoder

    https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/text_encoders

    Distill Lora

    https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-22b-distilled-lora-384.safetensors

    Upscaler

    https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-spatial-upscaler-x2-1.1.safetensors

    Description

    A+I2V

    Workflows
    LTXV 2.3

    Details

    Downloads
    177
    Platform
    CivitAI
    Platform Status
    Available
    Created
    3/22/2026
    Updated
    3/25/2026
    Deleted
    -

    Files

    ltx23DevAudioImageTo_audioImageToVideo.zip