WAN 2.2 IMAGE to VIDEO with Caption and Postprocessing

WAN 2.2 IMAGE to VIDEO with Caption and Postprocessing - Experimental

NSFW

Workflow: Image -> Autocaption (Prompt) -> WAN I2V with Upscale and Frame Interpolation and Video Extension

Creates Video Clips with 480p - 720p resolution.

Wan2.2 14B Image to Video MultiClip Version re-work with LongLook

create clips with 4-6 steps and extend up to 3 times, see examples posted with 15-20sec of length.
using LongLook nodes for improved processing: https://github.com/shootthesound/comfyUI-LongLook
- Increase overall quality for fast pace or complex motion clips
- Uses chunk of last frames for better continuity when extending
- Can pack more motion within a clip with 1 parameter, reduce slomo
process with improved Wan 2.2 models, Smoothmix with baked in LightX and other (including NSFW) Loras: https://huggingface.co/Bedovyy/smoothMixWan22-I2V-GGUF/tree/main
removed some custom nodes and replaced them with comfy core nodes where possible

Normal Version with own prompts, ideal to use for NSFW or specific clips with Loras.

Ollama Version, using an uncensored Qwen LLM to autocreate prompts for each clip sequence.

Ollama model (reads prompt only, fast): https://ollama.com/goonsai/josiefied-qwen2.5-7b-abliterated-v2
alternative model with Vision (reads input image+prompt, slower, it can do reasoning by enabling "think" in Ollama generate node): https://ollama.com/huihui_ai/qwen3-vl-abliterated

About below Versions: There is a Florence Caption Version and a LTX Prompt Enhancer (LTXPE) version. LTXPE is more heavy on VRAM.

Version use cases:

Create longer NSFW or specific clips with Loras and own prompts => MultiClip (14B) Normal
Create longer clips with autoprompts => MultiClip (14B) LTXPE or MultiClip_LTXPE+*
Generate short 5sec clips with own prompts or autoprompts => V1.0 (14B model) Florence or LTXPE*

*LTX Prompt Enhancer (LTXPE) might have issues with latest Comfy and Lightricks update

https://civarchive.com/models/1823416?commentId=1017869&dialog=commentThread

MultiClip LTXPE PLUS: Wan 2.2. 14B I2V Version based on below MultiClip workflow with improved LTX Prompt Enhancer (LTXPE) features (see notes in workflow). You may want to try below MultiClip workflow first.

Workflow enhances the LTXPE features to give more control over the prompt generation, it uses an uncensored language model, the video generation part is identical to below version. More Info: https://civarchive.com/models/1823416?modelVersionId=2303138&dialog=commentThread&commentId=972440

MultiClip: Wan 2.2. 14B I2V Version supporting LightX2V Wan 2.2. Loras to create clips with 4-6 steps and extend up to 3 times, see examples posted with 15-20sec of length.

There is a normal version which allows to use own prompts and a version using LTXPE for autoprompting. Normal version works well for specific or NSFW clips with Loras and the LTXPE is made to just drop an image, set width/height and hit run. The clips are combined to one full video at the end.

supporting new Wan 2.2. LightX2v Loras for low steps
Single Clip Versions included, which correspond to below V1.0 Workflow with additional Lora loader for "old" Wan 2.1. LightX2v Lora.

Since Wan 2.2 uses 2 models, the workflow gets complex. Still recommend to check the Wan 2.1 MultiClip Version, which is much leaner and has a rich selection of Loras. It can be found here: https://civarchive.com/models/1309065?modelVersionId=1998473

V1.0 WAN 2.2. 14B Image to Video workflow with LightX2v I2V Wan 2.2 Lora support for low steps (4-8 steps)

Wan 2.2. uses 2 models to process a clip. A High Noise and a Low Noise model, processed in sequence.
compatible with LightX2v Loras to process clips fast with low steps.

Models can be donwloaded here:

Vanilla Wan2.2 Models (Low & High Noise required, pick the ones matching your Vram): https://huggingface.co/bullerwins/Wan2.2-I2V-A14B-GGUF/tree/main

orig. LightX2v Loras for Wan 2.2. (I2v, Hi and Lo): https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Wan22-Lightning/old

Oct.14th 25: 2 New LightX Highnoise Loras (MoE and 1030) are out , try with strength > 1.5, 7 steps, SD3 shift =5.0. replace High Noise Lora:

https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Wan22_Lightx2v

Oct. 22nd 25: another LightX Lora has just been released (named 1022), recommended:

https://huggingface.co/lightx2v/Wan2.2-Distill-Loras/tree/main

Vae (same as Wan 2.1): https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/vae

Textencoder (same as Wan 2.1): https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/text_encoders

Alternative / newer Wan 2.2. 14B model merges:

https://civarchive.com/models/1823416/wan-22-image-to-video-with-caption-and-postprocessing?dialog=commentThread&commentId=1060392

WAN 2.2. I2V 5B Model (GGUF) workflow with Florence or LTXPE auto caption

lower quality than 14B model
720p @ 24 frames
with FastWan Lora use CFG of 1 and 4-5 Steps, place a LoraLoader node after Unet Loader to inject Lora

FastWan Lora: https://huggingface.co/Kijai/WanVideo_comfy/tree/main/FastWan

Model (GGUF, pick model matching your Vram): https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/tree/main

VAE : https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/vae

Textencoder (same as Wan 2.1) :https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/text_encoders

location to save those files within your Comfyui folder:

Wan GGUF Model -> models/unet

Textencoder -> models/clip

Vae -> models/vae

Tips (for 14b Model):

Wan 2.2. I2V Prompting Tips: https://civarchive.com/models/1823416?modelVersionId=2063446&dialog=commentThread&commentId=890880
What GGUF Model to download? I usually go for a model with around 10gb of size with my 16gb Vram/64gb Ram. (i.e. "...Q4_K_M.gguf" model)
Play with LightX Lora strength (ca.1.5) to increase motion/reduce slomo
If you face issues with LTXPE see this thread: https://civarchive.com/models/1823416?dialog=commentThread&commentId=955337
Last Frame: If you face issues finding the pack for that node: https://github.com/DoctorDiffusion/ComfyUI-MediaMixer

Description

(outdated): experimental workflow for WAN 2.2_14B. MultiClip to generate video with up to 20sec of length. (no Loras support yet)

Normal Version with own prompt and LTXPE version for autoprompt.

FAQ

Comments (22)

noojiez969Aug 1, 2025· 1 reaction

CivitAI

Sorry friend I'm not an expert, which of the 14B low and high noise models do I need? Is it all of them?

tremolo28

Author

Aug 1, 2025· 1 reaction

You just need 1 High Noise and 1 Low Noise Model. Idealy the one that fits your Vram. If you have like 16g of Vram, I´d go for the ... Q4_K_M.gguf model file.

noojiez969Aug 1, 2025· 1 reaction

tremolo28 Thanks for the help!

poondoggleAug 3, 2025· 1 reaction

CivitAI

Your outputs look fantastic @tremolo28. I am having an issue with the LTXV version of your workflow.

I get the following error:

!!! Exception during processing !!! 'Florence2ForConditionalGeneration' object has no attribute '_supports_sdpa'

I had the same error with the Florence version of your workflow but was able to workaround the issue by replacing two files, configuration_florence2.py and modeling_florence2.py. They fix the issue here: https://github.com/kijai/ComfyUI-Florence2/issues/175.

I figured that my replacement of the files would fix the LTXV version as well, but it didn't. Any ideas on what to do?

tremolo28

Author

Aug 3, 2025· 1 reaction

Hi, regarding LTX Prompt Enhancer, I am having no issue, however I saw some people reporting about an issue when having less than 16g of Vram. Someone found a solution, you can follow this thread, maybe it helps with your issue: https://civitai.com/models/1309065?modelVersionId=1998473&dialog=commentThread&commentId=730386

poondoggleAug 3, 2025· 1 reaction

tremolo28 Thanks for the reply. I will check the thread, but I have 24gb.

tremolo28

Author

Aug 3, 2025· 1 reaction

poondoggle just checked the error message with ChatGPT, it reports the issue might be related to Transformer Versions. It offers some solutions. Maybe you want to check with ChatGpt

9mmheater919Aug 3, 2025· 1 reaction

I am getting the same error when when trying to use the prompt enhancer with ltxv. Using a 5090

poondoggleAug 4, 2025· 1 reaction

9mmheater919 If you figure it out, please let me know. I don't want to downgrade transformer as it might break something else. I wish Python didn't have so many dependency issues. It makes comfy a real pain in the ass to use.

9mmheater919Aug 15, 2025· 1 reaction

poondoggle Yea i feel like I am in the minority these days, but i still much prefer my a1111 forge workflow to comfy for image generations. For video it makes sense to use comfy, but i get all sorts of issues generating images in comfy with teacache and sageattn dependencies, i dont want to turn them off since they are useful for video generation so that puts me back to a1111 for image generation and then comfy for video. I still havent found a solution unfortunately. For now I am just using my own prompts which has been fine.

juliusmartinAug 3, 2025· 1 reaction

CivitAI

Is it me or is the provided multiclip workflow not combining the video by default?

tremolo28

Author

Aug 3, 2025· 1 reaction

Hi, the "Image Batch Multi" node on the right is combining the clips, make sure one of the nodes is active. If you cant see the combined clip in "video combine" node below, rightclick and select "show preview", in case preview is hidden.

Thanks for the buzz btw :)

juliusmartinAug 3, 2025· 1 reaction

you are a king bro, in my case RealESRGAN_x2.pth was missing cus I had a clean comfy. ty and keep it coming, sending some buzzzz

TheNoobUserAug 4, 2025· 1 reaction

CivitAI

first time using this, how can i put my promt i can see anywhere i can promt on my own, there is a note say switch to own prompt but i cant see any node let me to do that at all

tremolo28

Author

Aug 4, 2025· 2 reactions

There is a cyan text field named „Your own prompt“, below is a red switch to switch to your own prompt, you can see it in the gui screenshot preview above.

tremolo28

Author

Aug 4, 2025· 2 reactions

CivitAI

Here are some links sharing interesting infos related to WAN2.2. and Loras:

https://www.reddit.com/r/StableDiffusion/comments/1mge29t/just_some_things_i_noticed_with_wan_22_loras/

https://www.reddit.com/r/comfyui/comments/1mgo0z3/wan_22_doesnt_load_certain_loras_i_got_it_working/

giving more insight on how Loras work with the 2 models (Hi and Lo noise), what is the difference to Wan 2.1. and what about the error message "Lora Key not loaded"

tremolo28

Author

Aug 4, 2025· 3 reactions

CivitAI

I2V Wan 2.2 Prompting tips:

"Image-to-Video Formula the source image already establishes the subject, scene, and style. Therefore, your prompt should focus on describing the desired motion and camera movement.

Prompt = Motion Description + Camera Movement

Motion Description：Describe the motion of elements in your image (e.g., people, animals), such as "running" or "waving hello." You can use adverbs like "quickly" or "slowly" to control the pace and intensity of the action.

Camera Movement: If you have specific requirements for camera motion, you can control it using prompts like "dolly in" or "pan left." If you wish for the camera to remain still, you can emphasize this with the prompt "static shot" or "fixed shot."

https://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y

tremolo28

Author

Aug 5, 2025· 2 reactions

CivitAI

Wan 2.2. LightX2V Loras for TEXT to Video have been released, those work with Image to video workflow & model, but with "mixed" results. We might need to wait for proper LightX2v IMAGE to Video Wan 2.2. Loras, which are worked on.

Text 2 Video LightX2v Wan2.2 Lora: https://huggingface.co/Kijai/WanVideo_comfy/tree/main/Wan22-Lightning

tremolo28

Author

Aug 7, 2025· 1 reaction

update Aug 07th: LightX2V I2V Loras for Wan2.2. released: https://huggingface.co/Kijai/WanVideo_comfy/tree/main/Wan22-Lightning

sera2084351Aug 5, 2025· 1 reaction

CivitAI

Is the cfg slider thought to determine how close the AI follows the prompt? values above 1 do double the processing time.

tremolo28

Author

Aug 6, 2025· 1 reaction

The lightx lora requires a cfg of 1.0 to work wth low steps fast.