LTX2 Img and Audio to Video - CivArchive (CivitAI Archive)

A comprehensive production-grade pipeline designed for the LTX-2 model. It specializes in generating high-fidelity video by combining a source image and an audio track to create synchronized content, such as music videos with lip-syncing and dance.

Key Features & Architecture

The workflow is organized into distinct logical stages using subgraphs to manage complexity and optimize hardware resources:

Multimodal Input Processing:
- Image Handling: Uses ImageResizeKJv2 to prepare a source image, which acts as the visual foundation for the video.
- Audio Integration: Employs a VHS_LoadAudioUpload node to bring in external audio files, which guide the timing and motion of the generation.
Dual-Stage Sampling Pipeline:
- Stage 1 (Initial Generation): Focuses on establishing the core motion and structure.
- Stage 2 (Refinement): A secondary pass that refines the video and audio latents for higher quality.
VRAM Optimization:
- Gemma API Text Encode: Instead of loading the massive Gemma-3 12B model locally, this workflow uses an API-based text encoder. This significantly reduces local VRAM requirements, allowing the workflow to run on GPUs with as little as 12GB to 16GB.
Creative Controls:
- Camera LoRAs: Includes dedicated slots for LTX-2 Camera Control LoRAs (e.g., Dolly Left), allowing for precise cinematic movement.
- Latent Upscaling: Incorporates a spatial upscaler to enhance the resolution of the final output.

Key Features & Architecture

Description

FAQ

Details

Files

ltx2ImgAndAudioTo_v10.zip

Mirrors

Key Features & Architecture

Description

FAQ

What is LTX2 Img and Audio to Video?

What files are available and where can I download them?

Details

Files

ltx2ImgAndAudioTo_v10.zip

Mirrors