LTX 2.3 Audio Avatar 2.0 | Three-Stage Natural Talking Video Workflow

This workflow is designed for LTX 2.3 audio-driven digital human video generation, upgraded with a three-stage cinematic refinement pipeline. Its main purpose is to take a character image and an audio track, then generate a more natural talking-person video where the subject keeps a stable identity, smoother motion, better facial performance, and stronger final image quality.

Compared with a basic audio-to-video or image-to-video workflow, this version is built as a more complete production pipeline. It does not simply animate a still image once and export the result. Instead, it uses image conditioning, audio latent encoding, LTX video generation, motion-control LoRA guidance, multiple sampler stages, latent upscaling, tiled decoding, audio decoding, and final video export. The goal is to make the character speak or perform more naturally while keeping the original visual design stable.

The workflow starts from an input image, which is resized and prepared through a longer-side image resize route, then processed with LTXVPreprocess. This helps normalize the image before it enters the LTX video conditioning stage. The image then becomes the visual anchor for the generated video, preserving the character’s face, body position, outfit, background atmosphere, and overall composition as much as possible.

The audio side is also important. The workflow uses LTXVAudioVAEEncode to convert the input audio into an audio latent, then combines the audio latent with the video latent through LTXVConcatAVLatent. This makes the pipeline suitable for digital human talking videos, voice-driven character clips, AI presenter videos, virtual influencer content, dialogue scenes, and social media short videos where both image and audio need to work together.

A key feature of this workflow is the LTX 2.3 IC LoRA motion-track control route. The LoRA helps guide motion behavior and supports more controlled character movement. This is useful for reducing random motion drift and making the generated performance feel more connected to the source image and the audio rhythm.

The three-stage refinement structure is the main upgrade. The first generation stage builds the base talking video. Later stages use LTXVLatentUpsampler, repeated LTXVImgToVideoConditionOnly guidance, noise-mask handling, and additional sampler passes to refine motion and improve visual texture. This makes the final result less like a rough preview and more like a polished output suitable for publishing.

The workflow also uses manual sigma schedules, CFG guider nodes, custom advanced samplers, VAEDecodeTiled, LTXVAudioVAEDecode, CreateVideo, and SaveVideo. These modules make the graph closer to a complete video production system: generate, refine, decode, combine audio, and export.

This workflow is ideal for AI digital humans, character voice videos, talking-avatar demos, virtual presenter clips, anime character dialogue, product narration videos, and Civitai / RunningHub showcase examples. If you want to see how the input image, audio encoding, LTX 2.3 motion control, and three-stage refinement pipeline work together, watch the full tutorial from the YouTube link above.

⚙️ Try the Workflow Online

👉 Workflow: https://www.runninghub.ai/post/2050906015968837633/?inviteCode=rh-v1111

Open the link above to run the workflow directly online and view the generation results in real time.

If the results meet your expectations, you can also deploy it locally for further customization.

🎁 Fan Benefits: Register now to get 1000 points, plus 100 daily login points — enjoy 4090-level performance and 48 GB of powerful compute!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you are in Mainland China or the Asia-Pacific region, you can watch the video below for workflow demos and a detailed creative breakdown.

📺 Bilibili Video: https://www.bilibili.com/video/BV1uhRyBGEFi/

I will continue updating model resources on Quark Drive:

👉 https://pan.quark.cn/s/20c6f6f8d87b

These resources are mainly prepared for local users, making creation and learning more convenient.

⚙️ 在线体验工作流

👉 工作流： https://www.runninghub.ai/post/2050906015968837633/?inviteCode=rh-v1111

打开上方链接即可直接运行该工作流，实时查看生成效果。

如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。

📺 B站视频： https://www.bilibili.com/video/BV1uhRyBGEFi/

我会在夸克网盘持续更新模型资源：

👉 https://pan.quark.cn/s/20c6f6f8d87b

这些资源主要面向本地用户，方便进行创作与学习。

Description

Details

Files

ltx23AudioAvatar20Three_v10.zip

Mirrors