LTX2.3 3.5 First-Frame High-Similarity Image-to-Video Workflow

Watch the full video first if you want to understand how this LTX2.3 3.5 first-frame high-similarity video workflow works in practice. The video shows how one uploaded image can be used as the first-frame reference, then expanded into a more stable video through LTX2.3, image-to-video conditioning, three-stage rendering, latent upscaling, and high-definition refinement.

This ComfyUI workflow is designed for LTX2.3 3.5 image-to-video generation with first-frame similarity preservation. Its main purpose is to turn a reference image into a video while keeping the opening frame, subject identity, visual style, composition, and scene structure close to the original image. Compared with a pure text-to-video workflow, this version gives the model a stronger visual anchor, making it more suitable for character animation, poster-to-video conversion, product visuals, AI influencer videos, cinematic image extension, and short-form video production.

The workflow is built around ltx-2.3-22b-dev-dare-ties-distilled-1.1.safetensors as the main LTX2.3 video checkpoint. The text encoding route uses gemma_3_12B_it_fp8_e4m3fn.safetensors through the LTX AV text encoder loader. It also loads the LTX audio VAE from the same distilled checkpoint, allowing the workflow to preserve the LTX audio-video latent structure during generation and preview.

The input image route is the key difference in this workflow. The uploaded image is resized through Image_Resize_longsize, then processed through LTXVPreprocess before entering the LTX image-to-video conditioning path. This gives the workflow a cleaner first-frame reference and helps the generated video stay closer to the source image. The first-frame condition is not only used at the beginning. Later stages also reuse internal decoded frames as image conditions, allowing the workflow to refine the video while maintaining visual continuity.

The prompt route uses LTXVConditioning at 24fps. A universal negative prompt is included to suppress low quality, blur, temporal flicker, frame jitter, geometry warping, identity drift, bad hands, unwanted subtitles, watermark artifacts, readable text, garbled letters, audio distortion, and other common video-generation failures. LTX2_NAG is used as an additional negative guidance layer for stronger similarity control and motion stability.

The generation route is divided into three stages. Stage 1 creates the initial image-to-video structure. It uses RandomNoise, CFGGuider, ManualSigmas, euler_ancestral_cfg_pp, and SamplerCustomAdvanced to establish the base motion, camera rhythm, and subject continuity from the first frame.

Stage 2 separates the audio and video latents, applies the LTX 2.3 spatial upscaler, and uses the original first frame as a light condition for latent upscaled refinement. This stage improves resolution and structure while keeping the source image relationship.

Stage 3 repeats the refinement logic. It uses the second-stage frames as internal image conditioning, then runs another euler_cfg_pp sampling pass with its own sigma schedule for high-definition detail cleanup. The final result is decoded through tiled VAE decoding, assembled through CreateVideo, and exported through SaveVideo.

Main features:

LTX2.3 3.5 first-frame image-to-video workflow
High-similarity first-frame preservation
Uploaded image as visual anchor
Image_Resize_longsize 1536 preprocessing
LTXVPreprocess image preparation
ltx-2.3-22b-dev-dare-ties-distilled-1.1.safetensors support
Gemma3 FP8 text encoder support
LTXVConditioning video prompt route
Universal negative prompt for video stability
LTX2_NAG similarity and motion constraint
Three-stage high-definition rendering
Stage 1 base image-to-video generation
Stage 2 latent x2 upscale refinement
Stage 3 high-definition detail polish
ManualSigmas schedules for each stage
euler_ancestral_cfg_pp and euler_cfg_pp samplers
LTXVSeparateAVLatent audio-video split
LTXVConcatAVLatent audio-video reconstruction
LTXVImgToVideoConditionOnly internal conditioning
LTX 2.3 spatial upscaler x2 1.1 support
VAEDecodeTiled memory-friendly decoding
CreateVideo and SaveVideo final output

Suggested workflow:

Prepare a clear reference image first. The image should have a readable subject, stable composition, clean lighting, and enough visual detail for the model to preserve. Then write a prompt that describes the motion, camera movement, atmosphere, and video style without contradicting the original image. Use Stage 1 to check whether the subject and motion direction are correct. If the first stage already drifts away from the reference, simplify the prompt and strengthen identity or composition preservation. Use Stage 2 and Stage 3 only after the first-stage direction is acceptable. If the final result becomes too sharp, unstable, or over-animated, reduce aggressive motion wording and keep the prompt closer to the source image.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2067652016737968129?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1xw7F6XE7K/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验，无需安装。
👉 工作流： https://www.runninghub.ai/post/2067652016737968129?inviteCode=rh-v1111
如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频： https://www.bilibili.com/video/BV1xw7F6XE7K/

我会在夸克网盘持续更新模型资源：
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户，方便进行创作与学习。

Description

FAQ

Details

Files

ltx2335FirstFrameHigh_v10.json

Mirrors

Description

FAQ

What is LTX2.3 3.5 First-Frame High-Similarity Image-to-Video Workflow?

What files are available and where can I download them?

Details

Files

ltx2335FirstFrameHigh_v10.json

Mirrors