Workflow: Image -> Autocaption (Prompt) -> WAN I2V with Upscale and Frame Interpolation and Video Extension
Creates Video Clips with 480p - 720p resolution.
Wan2.2 14B Image to Video MultiClip Version re-work with LongLook
create clips with 4-6 steps and extend up to 3 times, see examples posted with 15-20sec of length.
using LongLook nodes for improved processing: https://github.com/shootthesound/comfyUI-LongLook
Increase overall quality for fast pace or complex motion clips
Uses chunk of last frames for better continuity when extending
Can pack more motion within a clip with 1 parameter, reduce slomo
process with improved Wan 2.2 models, Smoothmix with baked in LightX and other (including NSFW) Loras: https://huggingface.co/Bedovyy/smoothMixWan22-I2V-GGUF/tree/main
removed some custom nodes and replaced them with comfy core nodes where possible
Normal Version with own prompts, ideal to use for NSFW or specific clips with Loras.
Ollama Version, using an uncensored Qwen LLM to autocreate prompts for each clip sequence.
Ollama model (reads prompt only, fast): https://ollama.com/goonsai/josiefied-qwen2.5-7b-abliterated-v2
alternative model with Vision (reads input image+prompt, slower, it can do reasoning by enabling "think" in Ollama generate node): https://ollama.com/huihui_ai/qwen3-vl-abliterated
About below Versions: There is a Florence Caption Version and a LTX Prompt Enhancer (LTXPE) version. LTXPE is more heavy on VRAM.
Version use cases:
Create longer NSFW or specific clips with Loras and own prompts => MultiClip (14B) Normal
Create longer clips with autoprompts => MultiClip (14B) LTXPE or MultiClip_LTXPE+*
Generate short 5sec clips with own prompts or autoprompts => V1.0 (14B model) Florence or LTXPE*
*LTX Prompt Enhancer (LTXPE) might have issues with latest Comfy and Lightricks update
https://civarchive.com/models/1823416?commentId=1017869&dialog=commentThread
MultiClip LTXPE PLUS: Wan 2.2. 14B I2V Version based on below MultiClip workflow with improved LTX Prompt Enhancer (LTXPE) features (see notes in workflow). You may want to try below MultiClip workflow first.
Workflow enhances the LTXPE features to give more control over the prompt generation, it uses an uncensored language model, the video generation part is identical to below version. More Info: https://civarchive.com/models/1823416?modelVersionId=2303138&dialog=commentThread&commentId=972440
MultiClip: Wan 2.2. 14B I2V Version supporting LightX2V Wan 2.2. Loras to create clips with 4-6 steps and extend up to 3 times, see examples posted with 15-20sec of length.
There is a normal version which allows to use own prompts and a version using LTXPE for autoprompting. Normal version works well for specific or NSFW clips with Loras and the LTXPE is made to just drop an image, set width/height and hit run. The clips are combined to one full video at the end.
supporting new Wan 2.2. LightX2v Loras for low steps
Single Clip Versions included, which correspond to below V1.0 Workflow with additional Lora loader for "old" Wan 2.1. LightX2v Lora.
Since Wan 2.2 uses 2 models, the workflow gets complex. Still recommend to check the Wan 2.1 MultiClip Version, which is much leaner and has a rich selection of Loras. It can be found here: https://civarchive.com/models/1309065?modelVersionId=1998473
V1.0 WAN 2.2. 14B Image to Video workflow with LightX2v I2V Wan 2.2 Lora support for low steps (4-8 steps)
Wan 2.2. uses 2 models to process a clip. A High Noise and a Low Noise model, processed in sequence.
compatible with LightX2v Loras to process clips fast with low steps.
Models can be donwloaded here:
Vanilla Wan2.2 Models (Low & High Noise required, pick the ones matching your Vram): https://huggingface.co/bullerwins/Wan2.2-I2V-A14B-GGUF/tree/main
orig. LightX2v Loras for Wan 2.2. (I2v, Hi and Lo): https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Wan22-Lightning/old
Oct.14th 25: 2 New LightX Highnoise Loras (MoE and 1030) are out , try with strength > 1.5, 7 steps, SD3 shift =5.0. replace High Noise Lora:
https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Wan22_Lightx2v
Oct. 22nd 25: another LightX Lora has just been released (named 1022), recommended:
https://huggingface.co/lightx2v/Wan2.2-Distill-Loras/tree/main
Vae (same as Wan 2.1): https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/vae
Textencoder (same as Wan 2.1): https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/text_encoders
Alternative / newer Wan 2.2. 14B model merges:
WAN 2.2. I2V 5B Model (GGUF) workflow with Florence or LTXPE auto caption
lower quality than 14B model
720p @ 24 frames
with FastWan Lora use CFG of 1 and 4-5 Steps, place a LoraLoader node after Unet Loader to inject Lora
FastWan Lora: https://huggingface.co/Kijai/WanVideo_comfy/tree/main/FastWan
Model (GGUF, pick model matching your Vram): https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/tree/main
VAE : https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/vae
Textencoder (same as Wan 2.1) :https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/text_encoders
location to save those files within your Comfyui folder:
Wan GGUF Model -> models/unet
Textencoder -> models/clip
Vae -> models/vae
Tips (for 14b Model):
Wan 2.2. I2V Prompting Tips: https://civarchive.com/models/1823416?modelVersionId=2063446&dialog=commentThread&commentId=890880
What GGUF Model to download? I usually go for a model with around 10gb of size with my 16gb Vram/64gb Ram. (i.e. "...Q4_K_M.gguf" model)
Play with LightX Lora strength (ca.1.5) to increase motion/reduce slomo
If you face issues with LTXPE see this thread: https://civarchive.com/models/1823416?dialog=commentThread&commentId=955337
Last Frame: If you face issues finding the pack for that node: https://github.com/DoctorDiffusion/ComfyUI-MediaMixer
Description
Wan2.2 Image to Video MultiClip Version with LongLook
Normal Version for using own prompts
Ollama Version for generated prompts
FAQ
Comments (15)
tremolo28, nice working workflow --- ollama integration works well. Will be swappign out the interpolation with tensorrt to increase speed. decent results with longlook. Some redundancy in sizing for width and height on each sequence -- is there a reason to recalc this on every one? Also, you're a bit of a psychopath on your workflow size - what kind of display are you using??? Subgraphs....
thanks :) Well, I use the mouse wheel and pan/zoom through the WF, no issue for me, but yeah it could be more compact. The good thing with comfy is, you can change layouts as you need it.
The size calculation is a workaround that helps me to not worry about width/height settings for each input image. I mostly do landscape clips and just set height to 512 and leave width to 2222 (just a high value) and get 512p, as the node calculates and crops it without losing anything or have odd aspect ratio. Bit complicated to explain, it places the landscape input image into a square and then crops the square to calculate the correct width, so i dont have to do it :) Works as well with portrait mode, then you set width to 512 and height to a high value. Sorry for long text...
@tremolo28 no worries on length. props for putting your work out there.
Another update! Thank you for the work. It seems to be working as intended so far! And happy new year!
Happy new year to you too, mate. And thank you for the buzz :)
Some tips for Ollama Multiclip Version:
You can directly refer to each sequence prompt within the green "Tell Ollama what shal happen" node. example prompt:
"a woman picks a flower from the field. The camera follows the woman.
Here is the order for the woman´s action per prompt:
Prompt1: The woman squats and looks at the flower
Prompt2: The woman picks a flower from the field
Prompt3: The woman gets up and admires the flower.
etc...".
Other example: "An astronaut and a tiny robot are watching the waves when a giant tsunami wave builds up and flushes them away. The wave shall hit them with Prompt3."
This helps to give more controll what shall happen in each sequence, as an alternative to use/switch to own prompts per sequence.
I am still working to improve the Ollama System prompt. Will update the latest prompt here:
(Jan 4th,26)
"You are an AI prompt artist specialized in cinematic video generation.
Using one input image and one user prompt with instructions, generate 4 fully independent but logically connected prompts that together form a short, dynamic video sequence.
First, analyze the image and the user prompt to identify all key visual elements, style cues, mood, environment, subjects, and actions.
Then divide the scene into 4 consecutive sequences, each representing a clear moment in time.
Rules for each of the 4 prompts:
Each prompt must be self-contained, usable on its own.
Maintain visual, stylistic, and narrative consistency across all prompts.
Predict and describe natural motion progression from the previous sequence.
Include both motion description and camera movement.
Motion Description: Clearly describe how subjects or elements move, including pace and intensity when relevant.
Camera Movement: Follow the camera instructions implied or explicitly stated in the user prompt.
Output must be exactly formatted as follows: "***1***Prompt1***2***Prompt2***3***Prompt3***4***Prompt4"
In one single continuous line, with no line breaks."
If you are only using sequence 1, should you be changing the Ollama prompt so it does not split them up into 4 sequences like you have on your workflow as default?
If you are not extending the video I think that means I should be setting it to only use sequence 1 unless I am wrong.
Should I also be bypassing clips 2-4? I am pretty new to all of this so this might be an obvious question.
Also, when using LORAs, would it be better to put the LORA in the sequence 1 part or sequence 4 for the times I use all sequences? Asking this as the Ollama seems to put most of the information of my prompt about the LORA on the 4th clip. (I know I can use my own prompts but would like to try out the Auto-Caption method)
Hi, yes for a single clip, you can switch off the other sequences. You can tell Ollama to put the main action in Prompt1. It will still create 4 prompts, but you can specify what shall happen in each prompt. There is another thread here with more info&tips for the Ollama version.
So for single clip, I would load Loras in Sequence1, switch off seq2-4 and prompt something like:"...ensure action xyz is described within Prompt1...". if you use one of the proposed ollama models (qwen>7b) it is supposed to work in most cases, smaller models can fail there.
You can also copy the Ollama generated prompt of a sequence you like and paste it into any sequence as own prompt.
Almost gave up on ComfyUI and everything... but this thing restored my belief!!! Outstanding. finally working
smoothMixWan22I2VT2V low and high suck at NSFW, any alternate ggufs u recommend? i tried some but too many noise issues
@dmahadomahadn201 Basic NSFW stuff works with smootmix, but you might need a dedicated Lora for more delicate stuff :) Here is a thread with lora links, some might be unavail. in the meantime, but you can browse civit for Loras: https://civitai.com/models/1823416?commentId=911748&dialog=commentThread
@tremolo28 tried several loras but still not near to what i want, also thank you for such amazing work ive tried dozens of workflows, models and stuff but only this one is working so far. Now trying with wan22RemixI2VGGUFV20_lowQ6K and wan22RemixI2VGGUFV20_highQ6K
Can this be use with 8GB of VRAM
Hey. I can't seem to find how to disable the frame interpolation. Any help is appreciated. (multiclip longlook)
Hi, every sequence has a switch node named "Upscale / Frame Interpolation", you can toggle upscale or frame interpolation. node (cyan) is above video combine node of each sequence
