LTX 2.3 Audio-Driven Dual-Character Dialogue Workflow

This workflow is designed for LTX 2.3 audio-driven dual-character dialogue video generation, built as a more production-ready solution for two-person talking scenes. Its main purpose is to take a reference image and an audio track, then generate a cinematic dialogue video where two characters can appear to speak or perform in a more natural, continuous, and visually stable way.

Compared with a normal image-to-video workflow, this setup is more focused on dialogue production. A standard I2V graph can animate a still image, but it often struggles with two-character scenes: the characters may drift, swap positions, merge together, lose facial consistency, or move in ways that do not match the intended conversation rhythm. This workflow is designed to reduce those problems by combining image conditioning, audio latent routing, staged sampling, latent upscaling, and final audio-video export into one complete pipeline.

The workflow uses an LTX 2.3 video generation route with audio VAE support, LTXVImgToVideoConditionOnly, SetLatentNoiseMask, LTXVConcatAVLatent, LTXVSeparateAVLatent, LTXVLatentUpsampler, SamplerCustomAdvanced, VAEDecodeTiled-style output logic, LTXVAudioVAEDecode, CreateVideo, and SaveVideo. In practice, the input image acts as the visual anchor, while the audio route gives the video a stronger performance structure for speaking, dialogue rhythm, and presentation timing.

The key advantage is that the workflow is not only generating motion; it is organizing the generation around a two-person dialogue scene. The reference frame helps preserve the original character layout, while the prompt describes the conversation, camera behavior, expressions, gestures, and scene atmosphere. This makes it useful for AI short dramas, virtual presenter content, anime character dialogue, product explainers, two-person storytelling, roleplay clips, and social media video production.

The workflow also includes a multi-stage refinement structure. The first stage builds the base video motion from the image and audio-conditioned latent. Later stages use latent upscaling and additional sampler passes to improve detail, texture, motion smoothness, and final image quality. This is important because talking-character videos often fail not only in lip or face motion, but also in the overall polish of the frame. The staged route helps the result feel less like a rough preview and more like a publishable output.

Another practical point is the export pipeline. The workflow decodes both the video latent and audio latent, combines them through CreateVideo, and saves the final result as a usable video file. This makes it suitable for RunningHub demos, Civitai workflow publishing, YouTube tutorials, Bilibili examples, and repeatable production tests.

This workflow is ideal for creators who want a stronger LTX 2.3 solution for two-character audio-driven dialogue, especially when character positioning, speaking rhythm, and final visual quality matter. If you want to see how the reference image, audio input, LTX 2.3 sampling stages, latent refinement, and final video export are connected, watch the full tutorial from the YouTube link above.

⚙️ Try the Workflow Online

👉 Workflow: https://www.runninghub.ai/post/2048727975096487938/?inviteCode=rh-v1111

Open the link above to run the workflow directly online and view the generation results in real time.

If the results meet your expectations, you can also deploy it locally for further customization.

🎁 Fan Benefits: Register now to get 1000 points, plus 100 daily login points — enjoy 4090-level performance and 48 GB of powerful compute!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you are in Mainland China or the Asia-Pacific region, you can watch the video below for workflow demos and a detailed creative breakdown.

📺 Bilibili Video: https://www.bilibili.com/video/BV1DT9zBbEZu/

I will continue updating model resources on Quark Drive:

👉 https://pan.quark.cn/s/20c6f6f8d87b

These resources are mainly prepared for local users, making creation and learning more convenient.

⚙️ 在线体验工作流

👉 工作流： https://www.runninghub.ai/post/2048727975096487938/?inviteCode=rh-v1111

打开上方链接即可直接运行该工作流，实时查看生成效果。

如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。

📺 B站视频： https://www.bilibili.com/video/BV1DT9zBbEZu/

我会在夸克网盘持续更新模型资源：

👉 https://pan.quark.cn/s/20c6f6f8d87b

这些资源主要面向本地用户，方便进行创作与学习。

Description

Details

Files

ltx23AudioDrivenDual_v10.zip

Mirrors