LTX2.3 3.5 RetalkPro Video Re-Speaking Workflow

Watch the full video first if you want to understand how this LTX2.3 3.5 RetalkPro video re-speaking workflow works in practice. The video shows how an existing talking video can be driven by a new audio track while preserving the original person, face, identity, camera framing, scene structure, and overall visual continuity.

This ComfyUI workflow is designed for RetalkPro-style video re-speaking with LTX2.3 3.5. Its main purpose is to replace or update the spoken performance of an existing video without changing the person. In other words, it is a “change the voice and speech, keep the person” workflow. It is useful for dubbing, multilingual re-speaking, AI presenter updates, video localization, creator avatar re-recording, and replacing dialogue while keeping the original visual identity stable.

The workflow is built around ltx-2.3-22b-dev-dare-ties-distilled-1.1.safetensors as the main LTX2.3 video checkpoint. The text encoding route uses gemma_3_12B_it_fp8_e4m3fn.safetensors through the LTX AV text encoder loader. The graph also uses the LTX audio VAE route, allowing the new audio input to be encoded into an audio latent and used as a timing and performance reference during generation.

The key module is the LipDub IC LoRA route. The workflow loads ltx-2.3-22b-ic-lora-lipdub-0.9.safetensors as the main lip-sync and re-speaking control module. This gives the model stronger guidance for mouth movement, lip shape, speech timing, and facial performance. The goal is not to redesign the entire video, but to adjust the face and mouth area so the subject appears to speak the new audio naturally.

The source video is used as the visual guide. Frames from the original video are resized and prepared, then injected through LTXAddVideoICLoRAGuide. This helps preserve the original person, head position, face structure, clothing, lighting, background, camera angle, and motion continuity. The new audio is encoded through LTXVAudioVAEEncode, then passed into LTXVSetAudioRefTokens so the generation stages can follow the replacement speech.

The generation route is divided into three stages. Stage A builds the base re-speaking result with the source video reference and the new audio token guidance. Stage B continues from the first output, applies additional guide injection, and refines the audio-video alignment. Stage C performs the final lip-sync and high-definition polish pass, again using guided video conditioning and audio latent reconstruction.

The workflow uses RandomNoise, CFGGuider, ManualSigmas, KSamplerSelect, and SamplerCustomAdvanced across the stages. It also uses LTXVSeparateAVLatent and LTXVConcatAVLatent to split and recombine audio and video latent components between stages. The LTX2.3 spatial upscaler x2 1.1 is included for latent-level refinement, while VAEDecodeTiled and LTXVAudioVAEDecode handle final video and audio decoding.

The negative prompt is designed to suppress unstable lip sync, bad mouth shapes, broken lips, bad teeth, identity drift, face distortion, extra people, inconsistent clothing, subtitles, watermarks, readable text, robotic audio artifacts, off-sync dialogue, added dialogue, and unwanted music. This makes the workflow better suited for controlled re-speaking rather than unrestricted video stylization.

Main features:

LTX2.3 3.5 RetalkPro video re-speaking workflow
Change speech or voice without changing the person
Source video as visual identity reference
New audio as speech timing and performance driver
LipDub IC LoRA for mouth movement control
ltx-2.3-22b-ic-lora-lipdub-0.9 support
ltx-2.3-22b-dev-dare-ties-distilled-1.1 support
Gemma3 FP8 text encoder support
LTXVAudioVAEEncode audio latent encoding
LTXVSetAudioRefTokens audio reference injection
LTXAddVideoICLoRAGuide video guide control
Three-stage re-speaking generation route
Stage A base lip-sync generation
Stage B audio-video refinement
Stage C final lip-preserving polish
LTXVSeparateAVLatent audio-video split
LTXVConcatAVLatent audio-video reconstruction
LTX2.3 spatial upscaler x2 1.1 support
VVR temporal stability LoRA support
OmniNFT consistency enhancement support
MotionTrack control support
VAEDecodeTiled memory-friendly decoding
LTXVAudioVAEDecode final audio decode

Suggested workflow:

Prepare a clean source video first. The face should be visible, the mouth should not be heavily blocked, and the camera should not move too aggressively. Then prepare the new audio track. Short, clean speech works best for testing. Use the workflow to encode the audio, inject the source video as the visual guide, and let the LipDub IC LoRA control the mouth movement. If the lips are weak or off-sync, use a clearer audio track and avoid noisy background music. If the face changes too much, strengthen identity preservation and reduce style-changing prompt terms. This workflow is best used for re-speaking, dubbing, localization, AI presenter updates, and creator-avatar dialogue replacement.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2068718437811179522?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1xw7F6XE7K/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验，无需安装。
👉 工作流： https://www.runninghub.ai/post/2068718437811179522?inviteCode=rh-v1111
如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频： https://www.bilibili.com/video/BV1xw7F6XE7K/

我会在夸克网盘持续更新模型资源：
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户，方便进行创作与学习。

Description

Details

Files

ltx2335RetalkproVideo_v10.json

Mirrors