This workflow is a modular and flexible text/image/audio-to-video generation system built in ComfyUI, designed to give full control over video creation using LTX-based models. It allows you to easily switch between multiple generation modes—such as text-to-video, image-to-video, lipsync, and fully guided animation—by enabling or disabling grouped nodes.
The pipeline supports advanced features including LoRA-based character and style conditioning, voice identity transfer (ID LoRA), custom or generated audio, and ControlNet-guided animation using reference videos. Users can also incorporate keyframe images for structured motion control or rely on a single reference image for consistent character appearance.
Performance and quality can be balanced through options like half-resolution sampling with 2× upscaling, as well as post-processing tools like the LTX detailer.
Main features
GGUF support
Prompt relay for segmented prompts
Modular, toggle-based workflow (quickly switch modes)
Text, image, audio, and ControlNet-driven video generation
LoRA support (character, style, and voice via ID LoRA)
Custom or AI-generated audio with automatic syncing
Reference image + up to 7 keyframes (FFLF animation control)
ControlNet video guidance with hybrid reference support
Half-res sampling + 2× upscaling for faster high-quality results
LTX detailer for enhanced final output
Common Setups
Text to video:
All bypassers disabled + Prompt + Default audioImage to video:
Prompt + Reference image + Default audioLipsync:
Prompt + Reference image + Custom audioAudio to video:
Prompt + Custom audio onlyCharacter LoRA + voice cloning:
Prompt + Character LoRA + ID LoRA + Default audioVoice reference to video:
Prompt + ID LoRA + Default audio
OR
Prompt + ID LoRA + Reference image + Default audioCharacter animation:
Prompt + ControlNet + Reference image + (Custom or Default audio)First frame → last frame:
Prompt + Keyframe 1 + Keyframe 2 + (Custom or Default audio)First → middle → last frame:
Prompt + Keyframe 1 + Keyframe 2 + Keyframe 3 + (Custom or Default audio)Character animation with custom voice:
Prompt + Reference image + ID LoRA + ControlNet + Default audio
Detailed instructions are contained in the workflow itself:
Red nodes are instructions and useful notes.
Yellow nodes are configurable elements you can adjust to your needs.

Description
- Added LTX2.3 1.1 support.
- Added Prompt relay support.
- Added extra keyframes (now 8 in total).
- Enhanced the upscaling process; now all the keyframes are taken as reference for upscaling, not just the first one.
FAQ
Comments (18)
V1 is my fav Ltx workflow, by far. now that V2 is out, I'm excited to try it! TY!
Thanks, I don't remember if it was you whom suggested using the keyframes to upscale and don't loose quality, but if it was you, thanks again hehe
@LatentHeart Yes but I didn't want to trouble you so I deleted the comment, lol. Thanks a ton :)
is it possilbe to save controlnet "movements" and load them after they got generated? or do I need to everytime regenerate even if it is the same video input?
Yes you can, you will need to modify this workflow to achieve that but any workflow that takes a preprocessed controlnet input can work like you describe. In this workflow for example, in the controlnet group, you can see the preproccesed input is coneected to a "Resize Image/Mask" node, right after the control net type selector switch; well, you can bypass all the nodes between the "Load video" node and that node if you are directly loading a preprocessed controlnet video.
Hello! Maybe my comment isn't really related to this WF, but I need someone to help me find a WF for Video to Video, please! I'd really appreciate it if someone could help me! 😊✨
@FlowSpecial Wow! I'll give it a try as soon as I can. Thank you so much! 😍✨
Kind of a Noob question, but for the life of me, I can't find the config file mentioned
The workflow is a JSON file, CivitAI auto detects that type of file as a "Configuration file"; but that doesn't matter, you download the json file and drag and drop it into ComfyUI. Now, not trying to be mean or anything here ok? but if you are starting using ComfyUI, perhaps this workflow could be too advanced for you, for starters, you will need to download the model files, and that will require you know what's best for your speficic setup. You will also need to clone the prompt relay repository from GitHub, and possibly troubleshoot things here and there if you install some custom nodes.
@LatentHeart I totally understand now. Thank you for the well thought out explanation. I appreciate it!
please help, frame relay alone does't work
You mean prompt relay? Did you clone the Github repo?
I just started using this workflow instead of my incredibly jank hodge podge of a ltx 2.3 workflow.
(mind you it works fine just uh....spaghetti lmao)
Man I never thought about using Mel-Band Roformer to split the audio from music and then just simply using the original audio to combine back into the video....... i was manually adding the audio back to the already completed video via a dedicated workflow afterwards XD
ive had melband for quite awhile but never used it much aside from sunoai
hehe You can also combine the split voice audio with sony whoosh, for higher fx audio quality ;)
What is supposed to go in the REFERENCE IMAGE SIZE node? It shows 1920 by default.
Thank you for the WF!
You can leave it as is, it is the resolution of all the keyframes, including the reference image, it serves as a "safe limit" in case you are loading huge images, they get automatically resized (by the longest edge); you can lower it to save some VRAM if you want, 1280 (720p) should yield good results too. The lower you go, the lower details the model has to work with.
@LatentHeart Awesome. Thank you for taking the time to explain!