LTX-2 I2V Undistilled 60-Frame 1080p Consistency Workflow

This ComfyUI workflow is designed for LTX-2 image-to-video generation with stronger visual consistency, higher-resolution output, and a more production-ready 60-frame video structure. The workflow focuses on turning a single input image into a stable short video while preserving character identity, scene layout, lighting direction, and visual style across the generated frames.

The main purpose of this workflow is consistency. Many image-to-video workflows can create motion, but the result may suffer from identity drift, unstable faces, changing clothing, broken hands, background flicker, or inconsistent camera behavior. This workflow is designed to reduce those problems by using the LTX-2 Dev model route, controlled prompt conditioning, image-to-video latent initialization, staged sampling, manual sigma refinement, and latent spatial upscaling.

The workflow is built around LTX-2 19B Dev FP8, using ltx-2-19b-dev-fp8.safetensors as the main checkpoint and gemma_3_12B_it.safetensors as the text encoder. It also includes LTX-2 spatial latent upscaling through ltx-2-spatial-upscaler-x2-1.0.safetensors. This means the workflow is not only generating a video from an image, but also refining and enlarging the latent result before final decoding.

The input stage starts from a source image. The image is resized and prepared through ImageResizeKJv2 and ResizeImagesByLongerEdge. In the included setup, the workflow is prepared for high-resolution processing, with a target output route that can reach 1920 x 1088 when the GPU is strong enough. The workflow notes also mention that width and height need to follow the LTX-2 size rules, and frame count must follow the correct divisible-by-8-plus-1 structure. This is important because LTX-2 video generation is sensitive to valid dimensions and frame counts.

The workflow uses a frame-count calculation structure to produce valid video length. The calculator logic follows the “1 + 8 × n” style frame rule, which is important for LTX-2 compatibility. This makes the workflow suitable for around 60-frame class generation while still respecting the internal frame-count requirements. In practice, this kind of setup is useful for short cinematic clips, social media video previews, character motion tests, and polished image-to-video demonstrations.

The source image is passed through LTXVPreprocess before entering the video latent stage. This preprocessing helps prepare the image for LTX-2 generation and compression behavior. Then EmptyLTXVLatentVideo creates the video latent with the selected width, height, length, and batch size. LTXVImgToVideoInplace injects the source image into the video latent, allowing the model to use the image as the starting visual reference. This is the key step that gives the workflow its image-to-video identity preservation behavior.

A major part of this workflow is the audio-video latent structure. The workflow includes LTXVEmptyLatentAudio, LTXVConcatAVLatent, and LTXVSeparateAVLatent. Even when the focus is video consistency, this structure allows the workflow to work inside LTX-2’s broader audio-video latent system. The audio latent and video latent can be combined during sampling, then separated again before final decoding. This gives the workflow a more complete LTX-2 pipeline rather than a simplified video-only route.

The prompt section uses CLIPTextEncode with LTXVConditioning. The positive prompt describes the core scene, action, visual behavior, camera, dialogue, and temporal motion. The negative prompt is detailed and suppresses many common video-generation problems, including blur, overexposure, underexposure, low contrast, flicker, motion blur, distorted proportions, face deformation, hand problems, incorrect text, missing objects, inconsistent perspective, camera shake, mismatched lip sync, robotic audio, off-sync timing, unnatural transitions, and general AI artifacts.

This negative prompt design is important for consistency workflows. LTX-2 can generate strong motion, but high-quality video output depends heavily on suppressing drift and instability. For a 60-frame video, even small inconsistencies can become visible over time. A strong negative prompt helps keep the output cleaner and more stable.

The first sampling stage uses LTXVScheduler, CFGGuider, KSamplerSelect, RandomNoise, and SamplerCustomAdvanced. This stage establishes the main video motion and structure. In the included setup, the workflow uses Euler sampling and a scheduler designed for LTX video generation. CFG is set to guide the model without overforcing the prompt. The goal is to let the source image remain visually stable while adding believable motion.

The workflow also includes a LoRA stage. The uploaded graph shows a model-only LoRA loader using an LTX-2 distilled LoRA variant at controlled strength. This can help adjust motion behavior, generation speed, or model response depending on the route being tested. In practical use, creators can enable, disable, or tune LoRA strength depending on whether they want more speed, stronger motion, or closer Dev-style consistency.

After the first video generation pass, the workflow uses LTXVLatentUpsampler. This is one of the most important parts of the graph. Instead of simply decoding the first latent output and resizing the final frames afterward, the workflow performs spatial upscaling in latent space. This can produce a cleaner high-resolution result than normal pixel-level enlargement because the model can refine the video representation before final decoding.

The second sampling stage uses ManualSigmas with a gradient-estimation sampler route. ManualSigmas gives the workflow a more controlled final refinement pass. In the uploaded setup, the sigma list is manually defined, allowing a compact but targeted refinement stage after latent upscaling. This is useful for improving clarity, detail, and stability without fully regenerating the entire video from scratch.

After refinement, LTXVSeparateAVLatent separates the video latent and audio latent. VAEDecode converts the video latent back into image frames, and LTXVAudioVAEDecode can decode the audio latent if needed. The final video can then be assembled for output. This full structure makes the workflow more advanced than a simple one-pass image-to-video graph.

The workflow is especially suitable for creators who want to generate a short, stable, high-resolution video from a still image. It can be used for character presentation, product animation, cinematic shot generation, AI short clips, Civitai demo videos, YouTube previews, Bilibili workflow showcases, and RunningHub online workflow publishing.

Main features:

- LTX-2 image-to-video workflow

- Uses LTX-2 19B Dev FP8 model route

- Designed for strong frame-to-frame consistency

- Suitable for around 60-frame video generation

- 1080p-class output route with 1920 x 1088 support

- Gemma 3 12B text encoder support

- Source image preprocessing with LTXVPreprocess

- Image-to-video latent injection with LTXVImgToVideoInplace

- Valid frame-count logic based on LTX-2 requirements

- LTXVConditioning for video prompt conditioning

- Detailed negative prompt for artifact suppression

- LTXVScheduler and SamplerCustomAdvanced generation

- ManualSigmas second-stage refinement

- LTXVLatentUpsampler spatial latent upscaling

- Audio-video latent concat and separation structure

- Suitable for high-quality I2V consistency testing

Recommended use cases:

Image-to-video generation, LTX-2 consistency testing, 60-frame short video creation, 1080p AI video output, cinematic character animation, product video motion tests, portrait-to-video generation, social media video covers, YouTube AI video previews, Bilibili workflow demonstrations, Civitai video examples, RunningHub online workflow publishing, and high-resolution LTX-2 I2V experiments.

Suggested workflow:

Start by preparing a clean source image. The input image should have a clear subject, stable lighting, and strong composition. If the source image is blurry, low contrast, heavily compressed, or visually chaotic, the generated video will usually be less stable. For character videos, make sure the face, hands, clothing, and body silhouette are clear.

Choose the correct resolution. The workflow notes mention that width and height should follow LTX-2 valid size rules. The default route is safer for 720p-style testing, while 1920 x 1088 can be used on stronger GPUs. For first tests, use a smaller resolution to validate the prompt and seed. After the motion is stable, move to the 1080p route.

Set the frame count correctly. LTX-2 frame count should follow the divisible-by-8-plus-1 rule. If the frame count is invalid, the workflow may silently choose the closest valid value. For predictable results, keep the frame count aligned with the internal rule. This is important when trying to produce 60-frame-class video.

Write a prompt that describes motion over time. For LTX-2, do not write only a static image prompt. Describe what happens in the scene. Include subject motion, camera motion, facial expression, object interaction, lighting behavior, and atmosphere. The workflow notes specifically recommend describing core actions, visual details, and audio or dialogue when needed.

Keep motion controlled for consistency. If the goal is a stable 60-frame I2V result, avoid asking for extreme camera movement, fast spinning, rapid body action, or large scene changes. Use subtle camera push-in, slight head movement, natural hand movement, soft environmental motion, or gentle cinematic motion when preserving identity matters.

Use the negative prompt aggressively for stability. Suppress flicker, distorted faces, bad hands, inconsistent perspective, camera shake, wrong gaze direction, text artifacts, wrong clothing text, mismatched lip sync, robotic voice, off-sync timing, and unnatural transitions. These issues become more visible in longer videos.

Use the first generation stage to establish motion. Check whether the subject identity, composition, and camera behavior remain stable. If the first stage already drifts too much, do not rely on upscaling to fix it. Adjust the prompt, seed, input image, or motion strength first.

Use the latent upscaler after the base video motion is acceptable. The LTXVLatentUpsampler is best used when the generated motion already works and the result needs more resolution and clarity. If the base generation is unstable, upscaling will only make the instability more visible.

Use the second refinement stage for final polish. ManualSigmas and gradient-estimation sampling can help refine the upscaled latent without fully rebuilding the video. This is useful for improving detail, edges, and final sharpness while preserving the established motion.

When evaluating the result, check more than sharpness. Look for character identity, frame consistency, facial stability, hand stability, background continuity, lighting consistency, camera smoothness, and whether the motion feels natural. A good LTX-2 I2V output should feel like one continuous shot, not a sequence of unrelated frames.

For 1080p output, monitor VRAM carefully. High-resolution video generation with latent upscaling is significantly heavier than normal image generation. If the workflow is slow or unstable, reduce resolution, lower frame count, or test shorter clips first.

This workflow is designed for creators who want a stronger LTX-2 image-to-video consistency pipeline rather than a simple one-click I2V test. It combines source-image conditioning, LTX-2 Dev model generation, valid frame-count handling, advanced scheduler control, audio-video latent structure, latent spatial upscaling, and second-stage refinement into one practical workflow for high-quality short video production.

🎥 YouTube Video Tutorial

Want to know what this workflow actually does and how to start fast?

This video explains what the tool is, how to launch the workflow instantly, and shares my core design logic — no local setup, no complicated environment.

Everything starts directly on RunningHub, so you can experience it in action first.

👉 YouTube Tutorial: https://youtu.be/VYBoOk7pCJA

Before you begin, I recommend watching the video thoroughly — getting the full context helps you understand the tool faster and avoid common detours.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.

👉 Workflow: https://www.runninghub.ai/post/2019396485409939457/?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.

📺 Bilibili Video: https://www.bilibili.com/video/BV1wiFzzwEoR/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.

Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.

👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

🎥 YouTube 视频教程

想了解这个工作流到底是怎样的工具，以及如何快速启动？

视频主要介绍工具定位、快速启动方法和我的构筑思路。

我们会直接在 RunningHub 上进行演示，让你第一时间看到实际效果。

👉 YouTube 教程： https://youtu.be/VYBoOk7pCJA

开始前建议尽量完整地观看视频 —— 把握整体思路会更快上手，也能少走常见弯路。

⚙️ 在线体验工作流

现在就可以在线体验，无需安装。

👉 工作流： https://www.runninghub.ai/post/2019396485409939457/?inviteCode=rh-v1111

打开上方链接即可直接运行该工作流，实时查看生成效果。

如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。

📺 B站视频： https://www.bilibili.com/video/BV1wiFzzwEoR/

我会在夸克网盘持续更新模型资源：

👉 https://pan.quark.cn/s/20c6f6f8d87b

这些资源主要面向本地用户，方便进行创作与学习。

Description

Details

Files

ltx2I2VUndistilled60_v10.zip

Mirrors