VBVR Digital Human | Stable Talking Avatar Workflow

This workflow is designed for VBVR digital human video generation, focusing on stable talking-avatar animation from a single reference image and an audio input. Its main purpose is to help creators generate a more controlled digital human result where the character keeps the same face, framing, camera angle, clothing, and background while performing natural speaking motion.

The workflow is built around an LTX 2.3 video generation pipeline with VBVR I2V LoRA enhancement, LTX audio / video latent routing, Gemma-style text encoding, LTX video VAE, LTX audio VAE, NAG enhancement, IC LoRA motion-track control, spatial latent upscaling, multi-stage sampling, tiled decoding, and final video export. Compared with a basic image-to-video workflow, this setup is more suitable for talking-avatar production because it combines visual anchoring, audio routing, and controlled motion guidance in one graph.

The core idea is simple but important: keep the shot stable and make the woman speak. In digital human generation, the biggest problem is often not whether the image can move, but whether it moves too much. A weak workflow may change the face, zoom the camera out, alter the clothing, deform the mouth, create extra hands, or shift the background. This workflow is designed to reduce those problems by keeping the camera and scene steady while concentrating motion on the face, mouth, head, and subtle body performance.

VBVR is used here as an image-to-video consistency and motion-control booster. It helps the model follow the source image more closely and reduces random drift during generation. This is especially important for digital human videos because the first frame usually defines the person’s identity. If the generated video loses that identity after a few seconds, the result becomes unusable for avatar content, product narration, AI presenters, or character dialogue.

The workflow also includes an audio latent route. The audio is encoded through LTXVAudioVAEEncode, connected into the audio/video latent structure, and later separated and decoded for final output. This makes the workflow more than a silent animation setup. It is designed for speaking-person videos where the final result needs both visual motion and usable audio-video export.

Another important part is the use of NAG and IC LoRA motion-track control. NAG helps stabilize generation guidance, while the motion-control LoRA helps reduce uncontrolled movement. Together, they make the video more suitable for restrained digital human performance: stable eyes, soft head movement, natural mouth motion, minimal body drift, and consistent framing.

The pipeline uses several sampling and refinement stages. The first stage builds the base talking video from the image, prompt, and audio latent. Later stages use latent upscaling and additional sampler passes to improve texture, detail, and final quality. This helps the output look more polished for Civitai previews, RunningHub demos, YouTube tutorials, Bilibili showcases, and social media publishing.

This workflow is ideal for AI presenters, talking avatars, virtual hosts, product explainers, character narration, short-form dialogue videos, and digital human testing with LTX / VBVR. If you want to see how VBVR, audio conditioning, LTX 2.3 staged sampling, NAG guidance, and motion-control LoRA work together, watch the full tutorial from the YouTube link above.

⚙️ Try the Workflow Online

👉 Workflow: https://www.runninghub.ai/post/2043983796604768258/?inviteCode=rh-v1111

Open the link above to run the workflow directly online and view the generation results in real time.

If the results meet your expectations, you can also deploy it locally for further customization.

🎁 Fan Benefits: Register now to get 1000 points, plus 100 daily login points — enjoy 4090-level performance and 48 GB of powerful compute!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you are in Mainland China or the Asia-Pacific region, you can watch the video below for workflow demos and a detailed creative breakdown.

📺 Bilibili Video: https://www.bilibili.com/video/BV1PQQuBcEd5/

I will continue updating model resources on Quark Drive:

👉 https://pan.quark.cn/s/20c6f6f8d87b

These resources are mainly prepared for local users, making creation and learning more convenient.

⚙️ 在线体验工作流

👉 工作流： https://www.runninghub.ai/post/2043983796604768258/?inviteCode=rh-v1111

打开上方链接即可直接运行该工作流，实时查看生成效果。

如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。

📺 B站视频： https://www.bilibili.com/video/BV1PQQuBcEd5/

我会在夸克网盘持续更新模型资源：

👉 https://pan.quark.cn/s/20c6f6f8d87b

这些资源主要面向本地用户，方便进行创作与学习。

Description

Details

Files

vbvrDigitalHumanStable_v10.json

Mirrors