LTX2.3 3.5 Three-Stage HD Text-to-Video Workflow

Watch the full video first if you want to understand how this LTX2.3 3.5 three-stage text-to-video workflow works in practice. The video shows how a text prompt can be expanded into a cleaner and higher-definition video through staged generation, latent upscaling, internal frame conditioning, and final tiled decoding.

This ComfyUI workflow is designed for LTX2.3 3.5 text-to-video generation with a three-stage high-definition refinement structure. Its main purpose is to turn a text prompt into a more stable video output than a simple one-pass generation route. Instead of generating once and stopping, the workflow builds a base video first, then uses later stages to extend, reconnect, upscale, and refine the result.

The workflow is built around ltx-2.3-22b-dev-dare-ties-distilled-1.1.safetensors as the main LTX2.3 video checkpoint. The text encoding route uses gemma_3_12B_it_fp8_e4m3fn.safetensors through the LTX AV text encoder loader. The workflow also loads the LTX audio VAE from the same distilled checkpoint, allowing the graph to handle audio-video latent structure even when the main creative task is text-to-video generation.

The prompt system uses LTXVConditioning to convert the positive and negative prompt conditions into video-aware conditioning at 24fps. A universal negative prompt is included to suppress common video problems such as low quality, blur, temporal flicker, frame jitter, ghosting, unstable geometry, identity drift, bad hands, unwanted subtitles, watermarks, text artifacts, audio distortion, and other video-specific failures. LTX2_NAG is also used as an additional negative guidance layer for structure and motion control.

The generation stage is divided into three major passes. Stage 1 creates the base video structure. It uses RandomNoise, CFGGuider, KSamplerSelect, ManualSigmas, and SamplerCustomAdvanced to establish subject, camera direction, motion, scene logic, and temporal rhythm. This stage is the foundation of the entire output.

After Stage 1, LTXVSeparateAVLatent separates the audio and video latent components. The video latent can then be handled independently, while the audio latent is preserved and later recombined through LTXVConcatAVLatent. This makes the workflow more modular and helps the later refinement stages focus on visual quality.

Stage 2 uses latent upscaling and internal image conditioning. The decoded Stage 1 result is used internally as an image-to-video condition, not as a user-uploaded image. This allows the workflow to continue from its own first-stage result and refine the video with better detail and structure. The second sampling stage uses its own ManualSigmas schedule and euler_cfg_pp sampler.

Stage 3 repeats the same logic at a higher refinement level. It receives the second-stage decoded frames, re-encodes them into the next stage as internal conditioning, and runs another sampler pass for high-definition detail refinement. This gives the final video a cleaner look than a single-stage LTX render.

The output section uses VAEDecodeTiled for memory-friendly decoding, LTXVAudioVAEDecode for audio latent decoding, CreateVideo for video assembly, and SaveVideo for final export. The workflow also includes stage-preview video outputs, which makes it easier to compare the first, second, and third render stages during testing.

Main features:

LTX2.3 3.5 text-to-video workflow
Three-stage high-definition rendering structure
ltx-2.3-22b-dev-dare-ties-distilled-1.1.safetensors support
Gemma3 FP8 text encoder support
LTXVConditioning video-aware prompt conditioning
Universal negative prompt for video generation
LTX2_NAG negative guidance control
Stage 1 base video generation
Stage 2 latent upscale refinement
Stage 3 high-definition detail refinement
ManualSigmas schedules for each refinement stage
euler_ancestral_cfg_pp and euler_cfg_pp sampler routes
Audio-video latent separation and recombination
LTXVSeparateAVLatent processing
LTXVConcatAVLatent reconstruction
Internal image-to-video conditioning between stages
LTXVLatentUpsampler refinement route
VAEDecodeTiled memory-friendly decoding
LTXVAudioVAEDecode audio restoration
CreateVideo 24fps output
SaveVideo final export

Suggested workflow:

Write a clear text prompt first. Describe the subject, environment, camera movement, lighting, motion, atmosphere, and video style. Keep the prompt direct and avoid overloading it with conflicting actions. Use Stage 1 to check whether the scene direction, subject, and motion are correct. If Stage 1 is wrong, fix the prompt before relying on Stage 2 or Stage 3. Use the later stages when the base video is already acceptable and you want cleaner detail, stronger structure, and a more polished final render. If the final result becomes too sharp or unstable, simplify the prompt, reduce conflicting style terms, and keep the motion description more controlled.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2067477095336464386?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1xw7F6XE7K/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验，无需安装。
👉 工作流： https://www.runninghub.ai/post/2067477095336464386?inviteCode=rh-v1111
如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频： https://www.bilibili.com/video/BV1xw7F6XE7K/

我会在夸克网盘持续更新模型资源：
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户，方便进行创作与学习。

Description

Details

Files

ltx2335ThreeStageHDText_v10.json

Mirrors