LTX 2.3 Four-Image Reference Audio-Driven Video Workflow

This workflow is designed for LTX 2.3 four-image reference audio-driven video generation. It combines multiple visual references with audio-aware video latent routing, making it suitable for creators who want a more controlled cinematic video instead of a random text-only result. The main purpose is to use four reference images as visual anchors, then guide the video generation with prompt structure, temporal motion planning, and audio-related conditioning so the final output feels more coherent, more rhythmic, and more production-ready.

The workflow is built around the LTX 2.3 video generation system, using LTX video and audio latent components, LTX VAE decoding, Gemma-style text conditioning, custom sampler routes, manual sigma control, and final video export. Compared with a simple image-to-video workflow, this setup is more advanced because it does not depend on only one image. It allows the user to provide multiple reference images that can define different parts of the final video: character identity, product appearance, clothing or pose, background atmosphere, color tone, camera style, and visual direction.

The four-image reference structure is the most important visual control layer. In practical use, Image 1 can define the main subject, Image 2 can provide the product or object, Image 3 can guide the scene or environment, and Image 4 can provide the final mood, style, or lighting reference. This gives LTX 2.3 more visual information to work with, reducing the chance of identity drift, unstable product appearance, or inconsistent scene design. For product videos, AI influencer clips, fashion showcases, beauty ads, music-video style shots, and short-form commercial content, this kind of multi-reference structure is much more useful than single-image generation.

The audio-driven part makes this version different from the normal four-image reference workflow. The graph includes audio VAE routing, audio latent connection, LTXVConcatAVLatent, LTXVSeparateAVLatent, and LTXVAudioVAEDecode-style processing, allowing the video pipeline to carry audio information through the generation and export process. This makes the workflow suitable for videos where rhythm, performance, presentation timing, music atmosphere, or spoken content matters. It is not just a silent image animation pipeline; it is structured for video output with audio-aware handling.

The workflow also includes NAG enhancement, multiple sampling stages, latent upscaling, tiled VAE decoding, and final video creation. These stages help improve visual stability, reduce drifting, refine detail, and make the final output more suitable for publishing. The workflow can generate a first controlled video pass, refine it through later sampling stages, upscale or enhance the latent result, decode the frames, and combine them into a final video output.

This workflow is especially useful for AI product advertising, beauty product showcases, character-driven video ads, music-driven AI clips, cinematic image-to-video demonstrations, Civitai workflow previews, RunningHub online demos, YouTube tutorials, and Bilibili content production. If you want to see how the four reference images are connected, how the audio route is handled, and how LTX 2.3 produces the final audio-driven cinematic result, watch the full tutorial from the YouTube link above.

⚙️ Try the Workflow Online

👉 Workflow: https://www.runninghub.ai/post/2052700211776110593/?inviteCode=rh-v1111

Open the link above to run the workflow directly online and view the generation results in real time.

If the results meet your expectations, you can also deploy it locally for further customization.

🎁 Fan Benefits: Register now to get 1000 points, plus 100 daily login points — enjoy 4090-level performance and 48 GB of powerful compute!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you are in Mainland China or the Asia-Pacific region, you can watch the video below for workflow demos and a detailed creative breakdown.

📺 Bilibili Video: https://www.bilibili.com/video/BV1DERQBeEm1/

I will continue updating model resources on Quark Drive:

👉 https://pan.quark.cn/s/20c6f6f8d87b

These resources are mainly prepared for local users, making creation and learning more convenient.

⚙️ 在线体验工作流