LTX2.3 All in one - Prompt Relay + ID LoRA + ControlNet + Detailer + Upscaler + Custom Audio + Keyframes

LTX2.3 All in one - Prompt Relay + ID LoRA + ControlNet + Detailer + Upscaler + Custom Audio + Keyframes - v2.0

NSFW

This workflow is a modular and flexible text/image/audio-to-video generation system built in ComfyUI, designed to give full control over video creation using LTX-based models. It allows you to easily switch between multiple generation modes—such as text-to-video, image-to-video, lipsync, and fully guided animation—by enabling or disabling grouped nodes.

The pipeline supports advanced features including LoRA-based character and style conditioning, voice identity transfer (ID LoRA), custom or generated audio, and ControlNet-guided animation using reference videos. Users can also incorporate keyframe images for structured motion control or rely on a single reference image for consistent character appearance.

Performance and quality can be balanced through options like half-resolution sampling with 2× upscaling, as well as post-processing tools like the LTX detailer.

Main features

GGUF support
Prompt relay for segmented prompts
Modular, toggle-based workflow (quickly switch modes)
Text, image, audio, and ControlNet-driven video generation
LoRA support (character, style, and voice via ID LoRA)
Custom or AI-generated audio with automatic syncing
Reference image + up to 7 keyframes (FFLF animation control)
ControlNet video guidance with hybrid reference support
Half-res sampling + 2× upscaling for faster high-quality results
LTX detailer for enhanced final output

Common Setups

Text to video:
All bypassers disabled + Prompt + Default audio
Image to video:
Prompt + Reference image + Default audio
Lipsync:
Prompt + Reference image + Custom audio
Audio to video:
Prompt + Custom audio only
Character LoRA + voice cloning:
Prompt + Character LoRA + ID LoRA + Default audio
Voice reference to video:
Prompt + ID LoRA + Default audio
OR
Prompt + ID LoRA + Reference image + Default audio
Character animation:
Prompt + ControlNet + Reference image + (Custom or Default audio)
First frame → last frame:
Prompt + Keyframe 1 + Keyframe 2 + (Custom or Default audio)
First → middle → last frame:
Prompt + Keyframe 1 + Keyframe 2 + Keyframe 3 + (Custom or Default audio)
Character animation with custom voice:
Prompt + Reference image + ID LoRA + ControlNet + Default audio

Detailed instructions are contained in the workflow itself:

Red nodes are instructions and useful notes.
Yellow nodes are configurable elements you can adjust to your needs.

Description

- Added LTX2.3 1.1 support.
- Added Prompt relay support.
- Added extra keyframes (now 8 in total).
- Enhanced the upscaling process; now all the keyframes are taken as reference for upscaling, not just the first one.

FAQ

Comments (18)

2be1b1d316455May 6, 2026· 1 reaction

CivitAI

V1 is my fav Ltx workflow, by far. now that V2 is out, I'm excited to try it! TY!

LatentHeart

Author

May 6, 2026· 1 reaction

Thanks, I don't remember if it was you whom suggested using the keyframes to upscale and don't loose quality, but if it was you, thanks again hehe

2be1b1d316455May 7, 2026· 1 reaction

@LatentHeart Yes but I didn't want to trouble you so I deleted the comment, lol. Thanks a ton :)

theinternetspeaks671May 7, 2026· 1 reaction

CivitAI

is it possilbe to save controlnet "movements" and load them after they got generated? or do I need to everytime regenerate even if it is the same video input?

LatentHeart

Author

May 7, 2026

Yes you can, you will need to modify this workflow to achieve that but any workflow that takes a preprocessed controlnet input can work like you describe. In this workflow for example, in the controlnet group, you can see the preproccesed input is coneected to a "Resize Image/Mask" node, right after the control net type selector switch; well, you can bypass all the nodes between the "Load video" node and that node if you are directly loading a preprocessed controlnet video.

Eliz99May 8, 2026

CivitAI

Hello! Maybe my comment isn't really related to this WF, but I need someone to help me find a WF for Video to Video, please! I'd really appreciate it if someone could help me! 😊✨

FlowSpecialMay 8, 2026· 1 reaction

Here you go: https://civitai.com/models/2498991/dasiwa-ltx23-workflows-or-i2v-or-flf2v-or-t2v-or-v2v-or-audio?modelVersionId=2809128

Eliz99May 8, 2026· 1 reaction

@FlowSpecial Wow! I'll give it a try as soon as I can. Thank you so much! 😍✨

franklynsotelo72838May 8, 2026

CivitAI

Kind of a Noob question, but for the life of me, I can't find the config file mentioned

LatentHeart

Author

May 8, 2026· 1 reaction

The workflow is a JSON file, CivitAI auto detects that type of file as a "Configuration file"; but that doesn't matter, you download the json file and drag and drop it into ComfyUI. Now, not trying to be mean or anything here ok? but if you are starting using ComfyUI, perhaps this workflow could be too advanced for you, for starters, you will need to download the model files, and that will require you know what's best for your speficic setup. You will also need to clone the prompt relay repository from GitHub, and possibly troubleshoot things here and there if you install some custom nodes.

franklynsotelo72838May 8, 2026· 1 reaction

@LatentHeart I totally understand now. Thank you for the well thought out explanation. I appreciate it!

manusgamo2012943May 8, 2026

CivitAI

please help, frame relay alone does't work

LatentHeart

Author

May 8, 2026

You mean prompt relay? Did you clone the Github repo?

rabbitythingMay 8, 2026· 1 reaction

CivitAI

I just started using this workflow instead of my incredibly jank hodge podge of a ltx 2.3 workflow.
(mind you it works fine just uh....spaghetti lmao)

Man I never thought about using Mel-Band Roformer to split the audio from music and then just simply using the original audio to combine back into the video....... i was manually adding the audio back to the already completed video via a dedicated workflow afterwards XD
ive had melband for quite awhile but never used it much aside from sunoai

LatentHeart

Author

May 8, 2026

hehe You can also combine the split voice audio with sony whoosh, for higher fx audio quality ;)

FluxNoobMay 8, 2026· 1 reaction

CivitAI

What is supposed to go in the REFERENCE IMAGE SIZE node? It shows 1920 by default.

Thank you for the WF!

LatentHeart

Author

May 8, 2026· 2 reactions

You can leave it as is, it is the resolution of all the keyframes, including the reference image, it serves as a "safe limit" in case you are loading huge images, they get automatically resized (by the longest edge); you can lower it to save some VRAM if you want, 1280 (720p) should yield good results too. The lower you go, the lower details the model has to work with.

FluxNoobMay 8, 2026· 1 reaction

@LatentHeart Awesome. Thank you for taking the time to explain!

Workflows

LTXV 2.3

by LatentHeart

Download (Beta) View on CivitAI