Welcome to my 💫🎦 Friendly LTX-2 T2V+I2V+Lipsync
✨ Less mess, more magic
UniVibe - Lipsync all-in one version with HQ TTS VibeVoice model is released.
New v1.2 with simplified model loading, with quality and perfomance improvements.
LTX-2 is a new video generation model with 19b parameters under the hood. This is the first DiT-based (Diffusion Transformer) foundation model that generates synchronized audio and video simultaneously in a single pass! It supports native 4K resolution at up to 50 FPS, providing cinematic-grade fidelity suitable for professional VFX and film production and it is capable of generating clips up to 10–20 seconds with consistent style and motion.
💻 System requirements:
Minimum system requirements for 540p i2v and 720p t2v:
RTX 3000-s, 8GB+ VRAM, 45GB+ RAM, 8-core processor, SSD, latest ComfyUI
🚀 Low VRAM optional optimization:
For systems with low VRAM use --reserve-vram ComfyUI parameter in run_nvidia_gpu.bat:
--reserve-vram 4(or other number in GB).
📌 Detailed tips and links to models in the workflow
✨ Workflow features:
Extremely user-friendly interface
Maximum performance and optimization from 8GB of VRAM: GGUF or 8-step distilled model with fp4 or fp8 text encoder + MultiGPU memory optimization
All-in-one: i2v, t2v, and interpolation
Convenient one-click mode switching
Generation time setting in seconds
Lora support (up to 3)
Detailed tips and links to all necessary models
Manual random seed for complete control over generations
🤗🙏🏼 Thanks to Lightricks Team
Original repo — GitHub
Description
· Ultimate LTX-2 version: t2v, i2v, lipsync, speech generation, interpolation
Some recent fixes:
- Fixed a bug and adjusted the logic for setting the generation duration
- Fixed a vae bug that caused artifacts in the final generation
- Added new Kijai nodes for VRAM unloading for extra perfomance
- Added an image strength parameter setting, which controls the accuracy of the original image
· Choose 1 of 3 models in one click: Dev, Distilled, GGUF
· Powered by VibeVoice HQ TTS model for voice generation with up to 2 speakers
· Step-by-Step Generation Control: set audio first then go to video generation
· Modular workflow: generate lipsync with audio sample or lipsink with voice TTS model or just regular LTX-2 generations without audio samples
· Perfomance optimization
· Updated detailed tips in the workflow