Watch the full video first if you want to understand how this LTX2.3 3.5 MSR multi-image reference workflow works in practice. The video shows how multiple reference images can be locked into a single video generation pipeline, then refined through a three-stage rendering structure for stronger identity, composition, and visual consistency.
This ComfyUI workflow is designed for LTX2.3 3.5 MSR multi-image reference video generation. Its main purpose is to solve the common problem of reference drift. In many AI video workflows, a character, outfit, object, or scene reference may change after a few frames. This workflow uses MSR-style multi-reference guidance to keep the visual identity more stable across the video while still allowing motion, camera movement, and cinematic rendering.
The workflow is organized around a five-image input structure. These reference images can be used to describe identity, clothing, pose, scene, object details, or style direction. The MSR preprocessing section unifies the reference images before they enter the main generation chain. This makes the workflow more suitable for creators who want to combine several visual references instead of relying on one image or one prompt only.
The graph uses PromptRelay-style prompt handling together with LTX2.3 video generation. The prompt route defines the subject, action, camera logic, lighting, atmosphere, and final visual direction, while the MSR image route anchors the visual references. This separation is important: the prompt controls what should happen, while the reference images help define what should remain stable.
The rendering structure is divided into three stages. Stage 1 is the strong reference-locking stage. It uses MSR IC guidance at full strength to establish identity, subject structure, initial composition, and basic motion. This stage is designed to lock the references before the model starts refining details.
Stage 2 is the continuation and transition stage. It uses the LTX2.3 spatial upscaler x2 1.1 and a lower MSR guide strength. This lets the video improve resolution and structure without over-constraining the motion. The goal is to keep the reference relationship while giving the model enough freedom to build a smoother video.
Stage 3 is the final refinement stage. It uses a lighter MSR guide strength and high-definition sampling to preserve the reference without making the image feel frozen. This stage focuses on final texture, visual polish, temporal stability, and cleaner output.
The workflow also uses LTXVSeparateAVLatent and LTXVConcatAVLatent to split and recombine video and audio latent components between stages. VAEDecodeTiled is used for memory-friendly final decoding, while LTXVAudioVAEDecode and VHS output nodes assemble the final video.
This version is built as a clean three-stage MSR pipeline. It does not include a 2.5 subtitle or watermark cleanup stage. The focus is purely on multi-image reference stability, staged rendering, and final video quality.
Main features:
LTX2.3 3.5 MSR multi-image reference workflow
Five-image reference input structure
Multi-reference identity and composition locking
PromptRelay-style prompt control
Three-stage MSR rendering pipeline
Stage 1 strong MSR reference anchoring
Stage 2 latent x2 upscale continuation
Stage 3 high-definition final refinement
MSR IC Guide strength strategy across stages
LTXAddVideoICLoRAGuide reference injection
LTX2.3 spatial upscaler x2 1.1 support
LTXVLatentUpsampler latent refinement
SamplerCustomAdvanced staged sampling
ManualSigmas stage control
LTXVSeparateAVLatent audio-video split
LTXVConcatAVLatent audio-video reconstruction
VAEDecodeTiled memory-friendly decoding
LTXVAudioVAEDecode audio route
VHS final MP4 output
No 2.5 subtitle or watermark cleanup stage
Suitable for character consistency and multi-reference video creation
Suggested workflow:
Prepare five clear reference images first. Each image should have a specific purpose: character identity, outfit, face, scene, object, pose, or style. Avoid using five unrelated images, because that will make the model fight between references. Write a prompt that describes the final video action, camera movement, lighting, and atmosphere, but do not overload it with too many conflicting subjects. Use Stage 1 to check whether the identity and composition are locked correctly. Use Stage 2 for smoother continuation and higher structure quality. Use Stage 3 for final polish. If the output becomes too stiff, reduce reference pressure in the later stages. If the subject drifts, strengthen the reference description and simplify the motion.
⚙️ RunningHub Workflow
Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2068718418597081089?inviteCode=rh-v1111
If the results meet your expectations, you can later deploy it locally for customization.
🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!
📺 Bilibili Updates (Mainland China & Asia-Pacific)
If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1xw7F6XE7K/
☕ Support Me on Ko-fi
If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk
💼 Business Contact
For collaboration or inquiries, please contact aiksk95 on WeChat.
⚙️打开下方链接即可在线体验,无需安装。
👉 工作流: https://www.runninghub.ai/post/2068718418597081089?inviteCode=rh-v1111
如果觉得效果理想,你也可以在本地进行自定义部署。
🎁 粉丝福利: 注册即送 1000 积分,每日登录 100 积分,畅玩 4090 体验 48 G 超级性能!
📺 Bilibili 更新(中国大陆及南亚太地区)
如果你在中国大陆或南亚太地区,可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频: https://www.bilibili.com/video/BV1xw7F6XE7K/
我会在 夸克网盘 持续更新模型资源:
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户,方便进行创作与学习。
