This workflow uses the LTX IC-LoRA, a ControlNet for LTX 2.3.
Load an image and an audio file (either your own or the original audio from the source video), or alternatively use LTX Audio—the audio is used for lip synchronization. Then load the target video to track and transfer its movements.
Info:
The length of the output video is determined by the number of frames in the input video, not by the duration of the audio file.
For upscaling, I use RTX Video Super Resolution.
Tips:
If you experience issues with lip sync, try lowering the IC-LoRA Strength and IC-LoRA Guidance Strength values. A value of around 0.7 is a good starting point.
If you notice issues with output quality, try lowering the IC-LoRA Strength as well.