LTX-2.3 MSR Multi-Image Reference Three-Stage Enhanced Video Workflow

Watch the full video first if you want to understand how this LTX-2.3 MSR multi-image reference video generation workflow works in practice. The video shows how multiple reference images can be used to guide a video generation process, while the workflow improves identity consistency, visual stability, and final rendering quality through a three-stage enhancement pipeline.

This ComfyUI workflow is designed for LTX-2.3 MSR multi-image reference video generation with three-stage rendering enhancement. Its main purpose is to create a longer and more stable reference-guided video by combining image identity guidance, IC LoRA consistency control, staged sampling, latent upscaling, tiled VAE decoding, and final video/audio reconstruction.

The workflow is built around the LTX-2.3 video generation route and the MSR identity consistency module. The key identity module is loaded through LTXICLoRALoaderModelOnly using LTX-2.3-Licon-MSR-V1.safetensors. This module helps the model maintain the visual identity of the reference subject across the generated sequence. The workflow also uses LTXAddVideoICLoRAGuide in multiple stages, which injects the reference image guidance into the video latent process.

The strongest part of this workflow is its three-stage rendering structure. Stage 1 creates the base video latent. It uses an EmptyLTXVLatentVideo canvas, CFGGuider, RandomNoise, KSamplerSelect, ManualSigmas, and SamplerCustomAdvanced. This first stage establishes the main motion, composition, reference relationship, and temporal direction of the video.

After Stage 1, the workflow separates the generated audio-video latent through LTXVSeparateAVLatent. The video latent and audio latent are handled separately, allowing the video side to be refined while the audio latent is preserved for later recombination. LTXVCropGuides is used to crop or normalize the guide area before the next refinement step.

Stage 2 uses the LTX 2.3 spatial upscaler model, ltx-2.3-spatial-upscaler-x2-1.1.safetensors, through LatentUpscaleModelLoader and LTXVLatentUpsampler. This increases the latent resolution and gives the second sampling stage more room to rebuild detail, sharpen structure, and improve the generated frame quality. The Stage 2 sampler uses a different sigma schedule and euler_cfg_pp sampling to refine the upscaled latent.

Stage 3 repeats the enhancement concept again. The workflow separates and recombines audio and video latents, applies another latent upsample route, then runs a final refinement sampler with its own sigma schedule. This third stage is designed to polish the final output, improve high-resolution detail, and reduce the roughness that often appears in single-pass video generation.

The final section uses VAEDecodeTiled for memory-friendly video decoding. This is useful because the final latent is larger after multiple enhancement passes. LTXVAudioVAEDecode restores the audio side from the audio latent. CreateVideo then combines the decoded frames and audio into a final 24fps video.

Compared with a simple LTX image-to-video workflow, this version is more suitable for reference-heavy production. It is useful when you want stronger identity consistency, better detail, cleaner final rendering, and a more controlled multi-image reference video generation process.

Main features:

LTX-2.3 MSR multi-image reference workflow
Three-stage rendering enhancement pipeline
MSR IC LoRA identity consistency control
LTX-2.3-Licon-MSR-V1.safetensors support
Multi-stage LTXAddVideoICLoRAGuide injection
Stage 1 base video latent generation
Stage 2 latent x2 upsample refinement
Stage 3 second latent x2 enhancement
LTX spatial upscaler x2 1.1 support
ManualSigmas control for each stage
euler and euler_cfg_pp sampler routes
Audio-video latent separation and recombination
LTXVSeparateAVLatent processing
LTXVConcatAVLatent reconstruction
LTXVCropGuides guide-area handling
VAEDecodeTiled final decoding
LTXVAudioVAEDecode audio restoration
CreateVideo 24fps final output

Suggested workflow:

Prepare several clear reference images first. The best references should show the target subject, identity, outfit, style, and visual direction clearly. Avoid using references with conflicting identities or dramatically different costumes unless you intentionally want a mixed result. Start with the default settings and test the Stage 1 output first. If the base motion or composition is wrong, adjust the prompt and references before relying on the upscaling stages. Use the second and third stages when the base video direction is already correct and you want more detail, cleaner structure, and better final rendering. If the identity drifts, simplify the reference set and strengthen the MSR identity direction. If the final result becomes too sharp or unstable, reduce aggressive prompt wording and focus on subject consistency, smooth motion, and coherent lighting.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2065743626793209857?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1xsJw6YEg6/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验，无需安装。
👉 工作流： https://www.runninghub.ai/post/2065743626793209857?inviteCode=rh-v1111
如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频： https://www.bilibili.com/video/BV1xsJw6YEg6/

我会在夸克网盘持续更新模型资源：
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户，方便进行创作与学习。

Description

FAQ

Details

Files

ltx23MSRMultiImage_v10.json

Mirrors

Description

FAQ

What is LTX-2.3 MSR Multi-Image Reference Three-Stage Enhanced Video Workflow?

What files are available and where can I download them?

Details

Files

ltx23MSRMultiImage_v10.json

Mirrors