LTX 2.3 + VBVR | Stable Digital Human Talking Avatar Workflow

This workflow is designed for LTX 2.3 1.1 + VBVR digital human video generation. Its main purpose is to take a still character image and an audio input, then generate a more stable talking-avatar style video where the subject keeps the same identity, camera framing, facial structure, clothing, and overall scene composition while speaking naturally.

The workflow is built around an LTX 2.3 video generation route with VBVR I2V enhancement, LTX audio / video latent routing, Gemma-style text encoding, LTX video VAE, LTX audio VAE, NAG enhancement, IC LoRA motion-track control, spatial latent upscaling, manual sigma sampling, tiled decoding, and final video export. This makes it more suitable for digital human production than a basic image-to-video workflow, because it is not only trying to animate a still image, but also preserve the person’s appearance while giving the face and body controlled speaking motion.

The core prompt in the uploaded workflow is very direct: keep the camera unchanged and make the woman speak. This simple instruction shows the intended design philosophy of the workflow. Digital human generation often fails when the model adds unnecessary movement, changes the camera angle, zooms out, changes the face, modifies the clothes, or creates unstable body motion. This workflow is optimized around a more conservative and production-oriented goal: hold the shot steady, keep the identity stable, and focus the motion on natural speaking performance.

The VBVR I2V LoRA route helps strengthen image-to-video consistency. For talking-avatar workflows, this is important because the first image is the identity anchor. The output should not become a different person after a few seconds. The face, hair, outfit, background, and body layout need to remain consistent across the generated frames. VBVR guidance helps reduce visual drift and makes the result more usable for repeatable digital human creation.

The workflow also includes an audio latent path. Audio is encoded through the LTX audio VAE and combined with the video latent structure, allowing the pipeline to handle audio-aware video generation instead of silent image animation. This makes the workflow suitable for AI presenters, talking portraits, virtual hosts, product explainers, education clips, short-form narration videos, character dialogue tests, and social media avatar content.

NAG enhancement and IC LoRA motion-track control are also key parts of this setup. NAG helps guide the model more steadily during sampling, while the motion-control LoRA helps reduce random animation and make the character movement more controlled. This matters because digital human videos need restrained motion: subtle head movement, natural facial expression, soft mouth movement, stable eyes, and minimal body drift.

The workflow uses multiple sampler stages and latent upscaling to improve the final output. The first stage builds the base speaking video, while later stages refine the latent result and improve image quality before export. This helps the final video look less like a rough test and more like a usable result for Civitai previews, RunningHub demos, YouTube tutorials, Bilibili showcases, and practical AI avatar production.

This workflow is ideal for creators who want a simple but stronger LTX 2.3 digital human pipeline with stable framing, audio-driven speaking behavior, and improved visual consistency through VBVR. If you want to see how the image input, audio route, VBVR guidance, NAG, motion-control LoRA, and final video export work together, watch the full tutorial from the YouTube link above.

⚙️ Try the Workflow Online

👉 Workflow: https://www.runninghub.ai/post/2045068946071621633/?inviteCode=rh-v1111

Open the link above to run the workflow directly online and view the generation results in real time.

If the results meet your expectations, you can also deploy it locally for further customization.

🎁 Fan Benefits: Register now to get 1000 points, plus 100 daily login points — enjoy 4090-level performance and 48 GB of powerful compute!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you are in Mainland China or the Asia-Pacific region, you can watch the video below for workflow demos and a detailed creative breakdown.

📺 Bilibili Video: https://www.bilibili.com/video/BV17Td5BgETn/

I will continue updating model resources on Quark Drive:

👉 https://pan.quark.cn/s/20c6f6f8d87b

These resources are mainly prepared for local users, making creation and learning more convenient.

⚙️ 在线体验工作流

👉 工作流： https://www.runninghub.ai/post/2045068946071621633/?inviteCode=rh-v1111

打开上方链接即可直接运行该工作流，实时查看生成效果。

如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。

📺 B站视频： https://www.bilibili.com/video/BV17Td5BgETn/

我会在夸克网盘持续更新模型资源：

👉 https://pan.quark.cn/s/20c6f6f8d87b

这些资源主要面向本地用户，方便进行创作与学习。

Description

Details

Files

ltx23VBVRStableDigital_v10.zip

Mirrors