This uses many of the same nodes as my image to video (I2V) workflow - and is also designed for lower VRAM GPUs (I use a 12 GB, though I know it also works on an 8 GB with the Q3 GGUF files for Wan 2.2).
General performance sits around ~4-5 minutes for 5-6 seconds of video at 480p (480 x 832). This includes the upscale (2x) to 960 x 1664. Also of note, I tend to save the "raw" 16 frames per second more often than the RIFE + 32 frames per second final save. This is because I like to merge videos - and it is easier to use non-interpolated videos for easier frame management, then add interpolation back in after the fact.
One note that I find is that, likely due to the lower quants, using lighting descriptions is touchy - it can cause pretty substantial lighting fluctuation, especially when running more than 81 (5s) or 97 (6s) frames. One day I'll have enough extra cash to upgrade my PC to support larger faster RAM (not even just VRAM) and I'll get the bigger non-quant models. A guy can dream ...
Description
Minor Changes:
Update to nodes (remove obscure int to string node)
Update and refresh existing nodes to newer versions (better color matching from Wan nodes, yay!)
FAQ
Comments (7)
Can you please fix the workflow with the last update of ComfyUI?
I'll have it done in the next few days - I've been meaning to update this and my other workflow
@logos011 Thanks bro
Apologies on the delay, but 2.0 is up with the latest ComfyUI updates (though I did not use the beta UI as that was ... definitely a beta).
@logos011 Thank you SO MUCH man. Love your work, please keep it up !
Florence-2-base-PromptGen-v2.0, is there an extra step to get florence to work? where do i find the model and place?
The florence URL link is in the notes on the left next to the prompt (on hugginface). You can alternatively skip this if you want to describe your own image with the prompt - the idea is that you let florence describe the image, so you only need to provide the "action" for the prompt. Florence gets it done well ... most of the time - just remember whatever it describes, WAN will try to "keep" in your video.

