WAN 2.2 IMAGE to VIDEO with Caption and Postprocessing

WAN 2.2 IMAGE to VIDEO with Caption and Postprocessing - MultiClip_LTXPE+

NSFW

Workflow: Image -> Autocaption (Prompt) -> WAN I2V with Upscale and Frame Interpolation and Video Extension

Creates Video Clips with 480p - 720p resolution.

Wan2.2 14B Image to Video MultiClip Version re-work with LongLook

create clips with 4-6 steps and extend up to 3 times, see examples posted with 15-20sec of length.
using LongLook nodes for improved processing: https://github.com/shootthesound/comfyUI-LongLook
- Increase overall quality for fast pace or complex motion clips
- Uses chunk of last frames for better continuity when extending
- Can pack more motion within a clip with 1 parameter, reduce slomo
process with improved Wan 2.2 models, Smoothmix with baked in LightX and other (including NSFW) Loras: https://huggingface.co/Bedovyy/smoothMixWan22-I2V-GGUF/tree/main
removed some custom nodes and replaced them with comfy core nodes where possible

Normal Version with own prompts, ideal to use for NSFW or specific clips with Loras.

Ollama Version, using an uncensored Qwen LLM to autocreate prompts for each clip sequence.

Ollama model (reads prompt only, fast): https://ollama.com/goonsai/josiefied-qwen2.5-7b-abliterated-v2
alternative model with Vision (reads input image+prompt, slower, it can do reasoning by enabling "think" in Ollama generate node): https://ollama.com/huihui_ai/qwen3-vl-abliterated

About below Versions: There is a Florence Caption Version and a LTX Prompt Enhancer (LTXPE) version. LTXPE is more heavy on VRAM.

Version use cases:

Create longer NSFW or specific clips with Loras and own prompts => MultiClip (14B) Normal
Create longer clips with autoprompts => MultiClip (14B) LTXPE or MultiClip_LTXPE+*
Generate short 5sec clips with own prompts or autoprompts => V1.0 (14B model) Florence or LTXPE*

*LTX Prompt Enhancer (LTXPE) might have issues with latest Comfy and Lightricks update

https://civarchive.com/models/1823416?commentId=1017869&dialog=commentThread

MultiClip LTXPE PLUS: Wan 2.2. 14B I2V Version based on below MultiClip workflow with improved LTX Prompt Enhancer (LTXPE) features (see notes in workflow). You may want to try below MultiClip workflow first.

Workflow enhances the LTXPE features to give more control over the prompt generation, it uses an uncensored language model, the video generation part is identical to below version. More Info: https://civarchive.com/models/1823416?modelVersionId=2303138&dialog=commentThread&commentId=972440

MultiClip: Wan 2.2. 14B I2V Version supporting LightX2V Wan 2.2. Loras to create clips with 4-6 steps and extend up to 3 times, see examples posted with 15-20sec of length.

There is a normal version which allows to use own prompts and a version using LTXPE for autoprompting. Normal version works well for specific or NSFW clips with Loras and the LTXPE is made to just drop an image, set width/height and hit run. The clips are combined to one full video at the end.

supporting new Wan 2.2. LightX2v Loras for low steps
Single Clip Versions included, which correspond to below V1.0 Workflow with additional Lora loader for "old" Wan 2.1. LightX2v Lora.

Since Wan 2.2 uses 2 models, the workflow gets complex. Still recommend to check the Wan 2.1 MultiClip Version, which is much leaner and has a rich selection of Loras. It can be found here: https://civarchive.com/models/1309065?modelVersionId=1998473

V1.0 WAN 2.2. 14B Image to Video workflow with LightX2v I2V Wan 2.2 Lora support for low steps (4-8 steps)

Wan 2.2. uses 2 models to process a clip. A High Noise and a Low Noise model, processed in sequence.
compatible with LightX2v Loras to process clips fast with low steps.

Models can be donwloaded here:

Vanilla Wan2.2 Models (Low & High Noise required, pick the ones matching your Vram): https://huggingface.co/bullerwins/Wan2.2-I2V-A14B-GGUF/tree/main

orig. LightX2v Loras for Wan 2.2. (I2v, Hi and Lo): https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Wan22-Lightning/old

Oct.14th 25: 2 New LightX Highnoise Loras (MoE and 1030) are out , try with strength > 1.5, 7 steps, SD3 shift =5.0. replace High Noise Lora:

https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Wan22_Lightx2v

Oct. 22nd 25: another LightX Lora has just been released (named 1022), recommended:

https://huggingface.co/lightx2v/Wan2.2-Distill-Loras/tree/main

Vae (same as Wan 2.1): https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/vae

Textencoder (same as Wan 2.1): https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/text_encoders

Alternative / newer Wan 2.2. 14B model merges:

https://civarchive.com/models/1823416/wan-22-image-to-video-with-caption-and-postprocessing?dialog=commentThread&commentId=1060392

WAN 2.2. I2V 5B Model (GGUF) workflow with Florence or LTXPE auto caption

lower quality than 14B model
720p @ 24 frames
with FastWan Lora use CFG of 1 and 4-5 Steps, place a LoraLoader node after Unet Loader to inject Lora

FastWan Lora: https://huggingface.co/Kijai/WanVideo_comfy/tree/main/FastWan

Model (GGUF, pick model matching your Vram): https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/tree/main

VAE : https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/vae

Textencoder (same as Wan 2.1) :https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/text_encoders

location to save those files within your Comfyui folder:

Wan GGUF Model -> models/unet

Textencoder -> models/clip

Vae -> models/vae

Tips (for 14b Model):

Wan 2.2. I2V Prompting Tips: https://civarchive.com/models/1823416?modelVersionId=2063446&dialog=commentThread&commentId=890880
What GGUF Model to download? I usually go for a model with around 10gb of size with my 16gb Vram/64gb Ram. (i.e. "...Q4_K_M.gguf" model)
Play with LightX Lora strength (ca.1.5) to increase motion/reduce slomo
If you face issues with LTXPE see this thread: https://civarchive.com/models/1823416?dialog=commentThread&commentId=955337
Last Frame: If you face issues finding the pack for that node: https://github.com/DoctorDiffusion/ComfyUI-MediaMixer

Description

Wan2.2 MultiClip Image2Video with LTX Prompt Enhancer PLUS

Workflow to create and extend autoprompted video Clips with LTXPE Plus
Workflow to create Prompts only for testing LTXPE Plus

FAQ

Comments (21)

tremolo28

Author

Oct 11, 2025· 1 reaction

CivitAI

LTXPE MultiClip PLUS Workflow:

Why was it created?

Found out with LTXPE prompts you can create good extended clips on an input image, however I had to alter the System Prompt each time to better latch on the desired output.

Instead of fiddling around with a text wall, I have added placeholders that can be maintained with free Text in the GUI to better control the output. The video clip generation itself remains unchanged.

Keep in mind it is all managed by a language model, there is no total control and not always perfect results, it is more about increasing chances to get good results.

The workflow corresponds to the MultiClip (14B) workflow with LTXPE to create video prompts per video sequence, only the LTXPE features have been enhanced:

- Uncensored language model (Llama-3.2-3B-Instruct, will be downloaded on first run, NSFW ready)

- LTXPE creates the individual prompts for each Sequence, based on the input image, all in one shot with a logical sequence, it predicts motion and sets a view focus on a subject, defined by a system prompt.

- The user can enhance the system prompt by setting 3 parameters: CONTENT, TYPE and SUBJECT in corresponding nodes.

- CONTENT describes what shall happen. i.e. "a woman dancing", "a dog and cat playing", "a car driving", etc.

- TYPE describes the type of the video. i.e. "cinematic", "erotic" , "action" ,"dramatic", etc.

- SUBJECT sets the camera focus on a certain subject in the input image. i.e. "woman", "man", "dog", "vehicle", etc.

- Toggle between a "static" or a "follow" type camera view on the subject, follow type shows more camera motion.

- Switch between 2 types of System prompts, a default one, more general and a strong one, that sets focus on the Content paramter by placing it at the start of each prompt.

- LTXPE sometimes ignores/screws the defined format to split the prompts and shows a TextSplit Error. This happens rarely, but increases if you put too much info into it. Shorten or update the wording slightly and try again. Edit: Place this at the start of the LTXPE System prompts (blue nodes): Avoid message like "I cannot create explicit content"

Example use cases:

Clip for a car in motion: Want to keep the car always in frame?

-> Set Camtype switch to "static" (true), name "car" in node "Subject"

Clip of a nude woman with camera motion, woman shall remain nude, even after getting back from out of frame?

-> Switch Camtype to "follow", set LTXPE System prompt switch to "strong", name "nude woman" in "Content". (low sucess rate tho...)

Clip using a Lora with trigger phrase?

-> Switch LTXPE system prompt to "strong", name the trigger in "Content" (keep it short!)

Clip of dog and cat playing together, with view focus on the dog?

-> Set content to "a cat and dog playing", set Subject to "dog", set Camtype to "follow"

Other tips:

The individual prompts per sequence are too short or too long for your taste?

-> set the maximum no. of words defined in the blue LTXP System prompt to a higher or lower count (default = 45 words)

You have changed the LTXPE System prompt and now you often get "TextSplit Error"

-> there are backup nodes in the upper left corner containing the default prompts, so you can copy & paste them back.

You have used the Content, Type and Subject nodes and always get a "TextSplit Error"

-> the text might have been too long and confused LTXPE to output in correct format, try to shorten the text.

You just want to drop the image and dont bother about all above?

-> use the default settings: Content = "a scene", Type = "cinematic", Subject = "subject", those are very general.

Use the "prompt only" version of the workflow to test and get familiar with the prompt generation, without generating clips each time. Recommend to start with this workflow.

Here is a selection of NSFW Loras, the first one by Mystic can be used for general NSFW purpose:

https://civitai.com/models/1823416?modelVersionId=2303138&dialog=commentThread&commentId=911748

adrianolimaengenharia90Oct 12, 2025· 2 reactions

Hey man, I've been struggling trying to run this workflow but I'm stuck on some nasty error LTXVPromptEnhancerLoader 'Florence2ForConditionalGeneration' object has no attribute '_supports_sdpa'. I looked it up and i needed to downgrade transformers to version 4.49.0. I've managed to get passed that and another nasty error came up:

LTXVPromptEnhancer Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select).

This one has been a challange no matter what I do. Do you have a list of errors and fixes that might solve this problems?

tremolo28

Author

Oct 12, 2025· 1 reaction

@adrianolimaengenharia90 hi, recall the LTXPE error from another workflow, a user had same issue and found a fix, hope it helps:

https://civitai.com/models/995093?modelVersionId=1791896&dialog=commentThread&commentId=727932

adrianolimaengenharia90Oct 12, 2025· 1 reaction

@tremolo28 I've managed to fix this issue but now I'm facing a TextSplit error like you've documented. I'm running on low end (8GB RAM) and I cant work with more than 180 tokens. Do you have any suggestions on how I could adapt your workflow so I work with only 180 tokens?

tremolo28

Author

Oct 12, 2025· 1 reaction

@adrianolimaengenharia90 180tokens only screws the format more often and outputs the textsplit error. You could try the previous non LTXPE (normal) workflow with own prompts, it is more precisice anyway, as the LTXPE is more for the lazy ones, like me :)

To be honest you might not have a lot of fun with the LTXPE WF and 8gb Vram. Suggestions to change would be to reduce the output to only 2-3 prompts with lower word count, but you would need to update the textsplit nodes as well, maybe not worth the hustle

tremolo28

Author

Oct 27, 2025

To reduce Textxplit Errors you can add this to the start of the LTXPE System prompts (blue nodes): Avoid message like "I cannot create explicit content".

Although the model is uncensored, it sometimes spits out that message, above entry avoids it mostly.

tremolo28

Author

Nov 22, 2025· 3 reactions

CivitAI

New Comfy Version (>0.3.68) breaks LTX Prompt enhancer.

Quick fix: find file "ltx_model.py" in ComfyUI\custom_nodes\ComfyUI-LTXVideo\tricks\modules

Edit Line 9 by adding a # in front (# apply_rotary_emb).

As in the sceenshot of this thread: https://github.com/Lightricks/ComfyUI-LTXVideo/issues/283

You might also need to change the transformers version, which works like this:

from your folder ComfyUI_windows_portable\python_embeded:

python.exe -m pip uninstall transformers

python.exe -m pip install transformers==4.49.0

to check installed version, try this:

python.exe -m pip show transformers

tremolo28

Author

Jan 7, 2026

With latest ComfyUi update and Lightricks (LTX 2) updates, LTXPE might no longer work

db9sDec 21, 2025

CivitAI

I have an issue that no nodes are marked as missing in the manager. But a lot are still showing as red these nodes are:

https://ibb.co/RG87vm6P

When i try to run it keeps saying:

Cannot execute because a node is missing the class_type property.: Node ID '#96'

Node ID 96 is FinalFrameSelector

tremolo28

Author

Dec 22, 2025

Hi, those missing nodes are from this custom-node repo: https://github.com/giriss/comfy-image-saver

In comfyui manager the pack is named "Save Image with Generation Metadata"

db9sDec 22, 2025

@tremolo28 Thanks a lot that solved all issues expect the one with node 196:

FinalFrameSelector node is apparently missing.

https://ibb.co/XrpHswPL

https://ibb.co/tTLdwcvF

I googled this node and even installed

https://github.com/DoctorDiffusion/ComfyUI-MediaMixer

Along with what you suggested the

https://github.com/giriss/comfy-image-saver/blob/main/nodes.py

But it is getting stuck on node 196/FinalFrameSelector

The workflow file is

Wan2.2_14B_I2V_Lora_Florence_LightX-2.json

tremolo28

Author

Dec 22, 2025

@db9s The Mediamixer Link you have provided is the right one, FinalFrameSelector seems to be installed, but not working properly on your setup. I rememeber I had a similar issue after a comfy update, but dont recall how I resolved it. Think I had a mismatching transformer version installed.

Can you go to your comfy/python_embeded folder and check the version by entering: python.exe -m pip show transformers

you can change the version with this to transformer version 4.49 in comfy/python_embeded:

python.exe -m pip uninstall transformers

python.exe -m pip install transformers==4.49.0

(note your current installed Version , so you can go back if needed)

Hope this helps.

suille1Dec 28, 2025

@tremolo28 I downloaded "Save Image with Generation Metadata" but its still throwing the error referring to Node ID #96

tremolo28

Author

Dec 29, 2025

@suille1 Node #196 (final frame selector) is from this repo: https://github.com/DoctorDiffusion/ComfyUI-MediaMixer

Maybe try uninstall and re-install, see install instructions. Hope it helps

Babyshark89Jan 1, 2026

I got same issue. please help

sexytimesalycat2Jan 21, 2026

@tremolo28 This doesn't work. For whatever reason ComfyUI refuses to install this node. I have tried multiple times, selected both the latest version and nightly, and it refuses to install.

tremolo28

Author

Jan 21, 2026

@sexytimesalycat2 if you chose the LongLook workflow, you can delete that "final frame" nodes, those are view only. Last frame will be injected anyway.

I have that node installed on mutliple comfy versions and no issue, some people report problems to install tho. Comfy can be a pain sometimes...

db9sDec 25, 2025

CivitAI

Is there any version that accepts high and low safetensor ? This workflow is using GGUF.

tremolo28

Author

Dec 26, 2025· 1 reaction

Hi, you can just swap the two "Unet Loader" Nodes (GGUF) with "Load Checkpoint" Nodes (Safetensor)

tremolo28

Author

Dec 29, 2025· 2 reactions

CivitAI

Here are some alternative Wan 2.2 14B GGUF models with improved animation. Bypass the LightX Lora loaders, as those Loras are already baked in. NSFW ready:

1. Smoothmix:

https://huggingface.co/Bedovyy/smoothMixWan22-I2V-GGUF/tree/main

2. DaSiWa-WAN 2.2 I2V 14B TastySin

https://civitai.com/models/2190659?modelVersionId=2467097

Smoothmix is great if you are tired of slowmo clips, it better reproduces natural speed of motion. Not perfect, but far better than vanilla Wan2.2 lightx model/lora

FferrettJan 2, 2026

preferring the tastysin over the smoothmix by far.. much more natural.

Workflows

Wan Video 2.2 I2V-A14B

by tremolo28

Download (Beta) View on CivitAI