CivArchive
    LTX2 Img and Audio to Video - v1.0
    Preview 122915024

    A comprehensive production-grade pipeline designed for the LTX-2 model. It specializes in generating high-fidelity video by combining a source image and an audio track to create synchronized content, such as music videos with lip-syncing and dance.

    Key Features & Architecture

    The workflow is organized into distinct logical stages using subgraphs to manage complexity and optimize hardware resources:

    • Multimodal Input Processing:

      • Image Handling: Uses ImageResizeKJv2 to prepare a source image, which acts as the visual foundation for the video.

      • Audio Integration: Employs a VHS_LoadAudioUpload node to bring in external audio files, which guide the timing and motion of the generation.

    • Dual-Stage Sampling Pipeline:

      • Stage 1 (Initial Generation): Focuses on establishing the core motion and structure.

      • Stage 2 (Refinement): A secondary pass that refines the video and audio latents for higher quality.

    • VRAM Optimization:

      • Gemma API Text Encode: Instead of loading the massive Gemma-3 12B model locally, this workflow uses an API-based text encoder. This significantly reduces local VRAM requirements, allowing the workflow to run on GPUs with as little as 12GB to 16GB.

    • Creative Controls:

      • Camera LoRAs: Includes dedicated slots for LTX-2 Camera Control LoRAs (e.g., Dolly Left), allowing for precise cinematic movement.

      • Latent Upscaling: Incorporates a spatial upscaler to enhance the resolution of the final output.

    Description

    Workflows
    LTXV2

    Details

    Downloads
    135
    Platform
    CivitAI
    Platform Status
    Available
    Created
    3/4/2026
    Updated
    3/9/2026
    Deleted
    -

    Files

    ltx2ImgAndAudioTo_v10.zip

    Mirrors

    CivitAI (1 mirrors)