LTX 2.3 Text-to-Video 3.1 Three-Stage HD Refinement Workflow

Watch the full video first if you want to understand how this LTX 2.3 text-to-video 3.1 workflow works in practice. The video explains the three-stage rendering logic, the high-resolution refinement route, the universal negative prompt system, and how to run the workflow online without rebuilding a complex local ComfyUI environment.

This ComfyUI workflow is designed for LTX 2.3 text-to-video generation with a three-stage high-definition refinement pipeline. Its main purpose is to turn a text prompt into a cleaner, more stable, and more detailed video result by separating generation into clear stages: initial composition, latent-space upscaling, and final HD refinement.

The workflow is built around ltx-2.3-22b-dev-dare-ties-distilled-1.1.safetensors as the main video checkpoint. It also uses the Gemma3 fp8 text encoder, LTX Audio VAE, LTXVConditioning, LTX2_NAG, Seed Everywhere, ManualSigmas, CFGGuider, SamplerCustomAdvanced, LTXVLatentUpsampler, LTXVImgToVideoConditionOnly, VAEDecodeTiled, CreateVideo, and SaveVideo. The structure is designed for practical video production rather than a simple one-pass test graph.

The first stage focuses on initial composition. It uses an empty LTX video latent, empty audio latent, frame-rate conditioning, random noise, manual sigma control, and a dedicated sampler route to establish the main scene, motion foundation, camera behavior, lighting, and subject direction. This stage is where the video gains its basic identity.

The second stage performs latent-space upscaling. After the first stage is generated, the workflow separates the video and audio latents, sends the video latent through the LTX 2.3 spatial upscaler, and then recombines it with the audio latent. This gives the workflow a stronger intermediate structure before the final polish stage. Compared with generating everything at full quality from the beginning, this staged route is more controllable and more efficient.

The third stage performs HD refinement. It uses another controlled sampling pass with its own sampler, sigma schedule, noise seed, guidance route, and conditioning logic. This helps improve sharpness, texture, visual coherence, and final image quality. The workflow also includes tiled VAE decoding for staged previews and final output, reducing pressure during high-resolution decoding.

A major strength of this workflow is its stability system. The graph includes LTX2_NAG for universal negative guidance and a KSK-style universal negative prompt designed to suppress flicker, frame jitter, identity drift, broken anatomy, subtitles, captions, logos, watermarks, bad lip movement, unwanted audio artifacts, and random text. It also includes optional 10-second likeness and anchor modules, which can help preserve visual consistency when a reference is used.

Compared with ordinary LTX text-to-video workflows, this 3.1 version is more production-oriented. A basic T2V graph may generate motion quickly, but it often struggles with detail, consistency, and final polish. This workflow uses staged sampling, latent upscaling, NAG guidance, universal negative control, preview outputs, and final HD refinement to make the result easier to publish, compare, and reuse.

Main features:

LTX 2.3 text-to-video 3.1 workflow
Three-stage rendering structure
Initial composition, latent upscaling, and HD refinement
LTX 2.3 distilled 1.1 checkpoint route
Gemma3 fp8 text encoder
LTX Audio VAE support
LTXVConditioning at controlled frame rate
LTX2_NAG universal negative guidance
KSK universal negative prompt system
ManualSigmas and SamplerCustomAdvanced control
LTXVLatentUpsampler high-resolution transition
Optional likeness / anchor consistency modules
VAEDecodeTiled staged previews
CreateVideo and SaveVideo output for each stage

Suggested workflow:

Start with a clear text prompt. Define the subject, action, environment, camera movement, lighting, mood, and final visual style. Run the first stage first and check whether the composition and motion direction are correct. If the first stage is weak, adjust the prompt before moving forward. After the base motion is stable, continue into the second-stage latent upscaling route. Use the second preview to check whether the structure improves without drifting. Then run the third HD refinement stage for final polish. If the video shows flicker, unwanted text, watermark-like artifacts, or unstable identity, strengthen the negative prompt and simplify the positive prompt. Use this workflow when you want a cleaner LTX 2.3 text-to-video result instead of a quick draft.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2061480014171955202?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1nVVr6QEd8/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验，无需安装。
👉 工作流： https://www.runninghub.ai/post/2061480014171955202?inviteCode=rh-v1111
如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频： https://www.bilibili.com/video/BV1nVVr6QEd8/

我会在夸克网盘持续更新模型资源：
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户，方便进行创作与学习。

Description

Details

Files

ltx23TextToVideo31Three_v10.zip

Mirrors