Transform Low-Res Videos into HD Masterpieces — The Intelligent Way
Introduction: Beyond Traditional Upscaling
Traditional AI upscalers like RealESRGAN are great for images, but they often struggle with videos. They can introduce artifacts, fail to add meaningful detail, and leave footage looking blurry and unconvincing.
This workflow, "Wan 2.2 5B - Latent Video Upscaler," offers a paradigm shift. Instead of just guessing pixels, it uses the immense power of the Wan 2.2 5B Text-to-Video model to intelligently reinterpret and reconstruct your video in high definition. It doesn't just scale up; it dreams up the missing details, resulting in a cleaner, more detailed, and more coherent HD video than any conventional upscaler can achieve.
TL;DR: Stop using image upscalers on video. Use a diffusion model to truly enhance and upscale your footage with intelligent detail.
Key Features & Highlights
🤖 Intelligent Enhancement: Leverages the Wan 2.2 5B model to add semantically correct details, textures, and coherence, far surpassing the capabilities of traditional upscalers.
⚡ Fast & Efficient: Built on the lightweight 5B parameter model, this workflow performs latent upscaling and denoising significantly faster than generating from scratch.
🎨 Quality Preservation: Applies a light touch (
denoise=0.2) to enhance and upscale without altering the original motion or content of the video drastically.📈 2x Resolution Boost: Doubles the resolution of your input video directly in the latent space before decoding.
🎬 Smooth Final Output: Includes an optional RIFE frame interpolation pass to double the frame rate (from 16fps to 32fps) for buttery-smooth motion in the final render.
🔊 Audio Passthrough: Automatically carries over the original audio track from your source video to the final enhanced output.
Workflow Overview & Strategy
This workflow is a sophisticated video processing chain:
Input: Load your low-resolution source video using VHS_LoadVideo.
Initial Upscale: The video is immediately 2x upscaled using a Lanczos filter to get to the target size. This provides a better starting point for the model.
Latent Processing: The upscaled frames are encoded into the latent space.
Intelligent Enhancement: The core of the workflow. The Wan 2.2 5B model, guided by quality-positive and detail-negative prompts, gently denoises (
denoise=0.2) the latents over just 8 steps with UniPC. This step is where the "magic" happens—the model fills in plausible, high-quality details.Decoding: The enhanced latents are decoded back into a high-resolution image sequence.
Final Output:
Option A: Save the immediately upscaled video at 16fps.
Option B (Recommended): Pass the sequence through RIFE VFI to interpolate frames to 32fps, creating a final video that is both high-resolution and super smooth.
Technical Details & Requirements
🧰 Models Required:
Base Model: (GGUF Format)
Wan2.2-TI2V-5B-Q8_0.ggufSource: Likely from HuggingFace or other model repositories.
LoRA:
Wan2_2_5B_FastWanFullAttn_lora_rank_128_bf16.safetensors(Applied at strength0.5)
VAE:
Wan2.2_VAE.safetensors
CLIP Vision: (For GGUF Loader)
umt5-xxl-encoder-q4_k_m.gguf
Interpolation Model:
rife47.pth(For RIFE VFI node)
⚙️ Recommended Hardware:
A GPU with a good amount of VRAM (e.g., 12GB+) is recommended for comfortable operation, especially when processing longer videos.
🔌 Custom Nodes:
This workflow uses:
comfyui-videohelpersuite(VHS) - For video loading/combiningcomfyui-frame-interpolation- For RIFE VFIcomfyui-gguf/gguf- For model loadingcomfyui-easy-use- For memory managementcomfyui-kjnodes- For performance patches (Sage Attention)
Usage Instructions
Load the JSON: Import the provided
.jsonfile into your ComfyUI.Load the Models: Ensure all required models are in their correct folders. Check the paths in the
LoaderGGUF,VAELoader, andLoraLoaderModelOnlynodes.Select Your Video: In the VHS_LoadVideo node, click the video icon to select your low-resolution input video.
Queue Prompt: Run the workflow!
Retrieve Output: Find your two enhanced videos in the output directory:
.../Wan 2.2 5B Upscales/Denoise 0.2_xxxxx.mp4(16fps).../Wan 2.2 5B Upscales/Denoise 0.2_32fps_xxxxx.mp4(32fps - Smoother)
Tips & Tricks
Denoise Strength: The
denoiseparameter in the KSampler (default0.2) is key.~0.1-0.3: Best for upscaling/enhancement. Preserves the original content while improving quality.
>0.5: Will start to significantly alter the content and style, moving towards a new generation based on your video.
Source Quality: This workflow excels at breathing new life into low-quality, pixelated, or noisy source videos from older generators.
Prompt Engineering: The positive prompt (
high detail, high quality...) is generic to encourage enhancement. For stylistic changes, you can modify this prompt (e.g., "cinematic, film grain, photorealism").
Conclusion: The Future of Video Upscaling
This workflow demonstrates a powerful new application for diffusion models: not just as generators, but as intelligent enhancement tools. By leveraging the knowledge within the Wan 2.2 model, we can upscale videos with a level of coherence and detail that traditional methods simply cannot match. It’s faster than full generation and smarter than simple scaling.
Upload your low-res clips and witness the intelligent upscaling revolution.
Credit: Crafted by the ComfyUI community. Special thanks to the creators of the Wan 2.2 models and the FastWanFullAttn LoRA.
Description
FAQ
Comments (20)
This method is very important, and many of us have been using it for ages as the only decent local temporal solution. I prefer the better low noise 14B Wan2.2 model- the 5B model has no speed advantage if you know how to launch Comfy properly so the model uses RAM not VRAM.
AND, it is not true that one cannot use image upscaling successfully for some videos - SD ultimate upscale workflows allow 1080P videos to become 4K without hitting VRAM restrictions by processing the video as a series of images, that can later be recombined from a folder. The lack of a temporal element strangely doesn't matter.
Speed? This WF sucks up every available byte of memory on my box. Sadly, I've only got 64 GB RAM and a 4090. But I was all the way down to Q3_K umt5 and wan2.2 i2v low Q2 GGUF and it still maxed both kinds of RAM then OOMed. Yet for regular i2v I'm running the fp16 t5 and Wan2.2 i2v fp16, plus an accelerator LoRA and drone LoRA... and I tacked an upscale on the end, but just ESRGAN and FILM VFI. No OOMs.
Now I'm no stranger to SD ultimate upscale (just new to Comfy). Hell I'll take Forge-style Hires Fix. Just 4x-UltraSharp is enough to give me enough extra fine details. Latent ends up changing the image too much. I guess separating frames and upscaling each individually is a last resort I might have to take. The original images were made in Flux and Wan 2.2 is just taking a wild guess, but doing a surprisingly good job overall. It's just missing the fine details. Overlookable at 1408 x 800... but not at twice that. Maybe you're right... it's time to bite the bullet. You've frame numbers for a "temporal element". ;-> That's all I've ever had doing 3D animation for many years. I'll only render straight to an mp4 for a quickie test.
I'm new to ComfyUI and so far I haven't found any video scaling models that don't destroy my 16GB of VRAM. Do you have a workflow that works for scaling?
It would be nice if you added links to the necessary models in the description.
LORA:
MODEL: https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/blob/main/Wan2.2-TI2V-5B-Q8_0.gguf
VAE: https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/blob/main/VAE/Wan2.2_VAE.safetensors
CLIP: https://huggingface.co/city96/umt5-xxl-encoder-gguf/blob/0e9a7657447c3a2215edf3a7c5a081633102d19c/umt5-xxl-encoder-Q4_K_M.gguf
it keeps getting stuck at vae decode stage - res is 1216 w 480h (trying for an ultrawide look) and I'm feeding it 16 fps clips with 102 frames per clip. I'm also using all your model files, text encoder and vae files, so idk -- I also have a 4090, lots of ram and cpu so - kinda stuck as to how to make it work. BTW your motion enhancements worked great for me - very cool. But now I'm trying to upscale them and yeah , no dice.
Did you try using the VAE tile node (encode/decode)?
For low VRAM, it should work since you are using an RTX 4090, which only has 24 GB of VRAM, if I am correct.
@zardozai ok, yeah gpt actually suggested that I do tiling also -- so i should run tiling then -- ok I'll try it -- yes a 4090.
@zardozai ok, I'm gonna use a tiled vae decode at 512 just to see if it works. I'll let you know how it goes.
@zardozai I got it to work -- batch vae decode worked well at 512, and I was feeding in 24 fps clips before, that's probably why it was failing before. I thought the clips were 16 fps, but the were 24 fps. now that I have the clip feeds coming in at 16 fps it's working now.
@Robopsycho To feed in at 24 frames per second, simply adjust the output settings accordingly.
any way to make the upscale less saturated? I notice that it turned a warm light pretty orange. I tried putting denoise to 1.0 - we'll see what happens. TY
@Robopsycho ComfyUI Node: Color Match
Give it a try.
@zardozai so it worked - made the light bright -- but not too saturated - Thanks :)
Not sure what is going wrong here, but trying to use the workflow unedited I am for some reason getting "torch.linalg.solve: The solver failed because the input matrix is singular."
I suspect that uni_pc as the sampler is the problem, what other samplers would you say would work best?
I tried changing to SA_Solver and Beta but then I got "contracted dimensions need to match, but first has size 4 in dim 1 and second has size 0 in dim 0"
This is when using all the default values of the workflow.
Tried swapping ComfyUI to launching qith Quad attention instead of Sage Attention, and I don't get any errors, but the workflow finishes far faster than expected and the result is just pure noise.
I have removed the Lora that can cause this issue in the last version.
On a 4090 with 64G system RAM, I couldn't get this to work with even the smallest quants. It maxed out both kinds of RAM completely. I was admittedly not trying to upscale tiny videos. Low Res in my world is 16:9-ish at anywhere from 1280 to 1536 wide. I really WANT this to work though. And preferably with the low noise side of 15B. Guess I need to look into BlockSwap, etc.
do not touch anything and re download the workflow in its original state and run it again
There is ZERO chance of temporal upscaling at decent speed on a 4090 if your input and output have so large a resolution, you exhaust latent storage in your VRAM. ZERO! Temporal processing requires access to the entire latent data set. What you are doing is catastrophic thrashing of memory from VRAM to system RAM, and more likely to an SSD swap file- absolutely catastrophic.
Too may people here know nothing about computer science, data flow, memory systems, and memory management. Worse, they think because they overspent on a 4090, or 5090, everything must be possible.
What you can do is use a non-temporal upscale method. Recently, surprisingly, it was discovered that SD ultimate upscale, with a fixed seed, is good for doubling the linear rez. Any amount of VRAM is good for a pretty high-rez input video, providing you use a workflow that treats each frame as an image, and saves the upscaled images one by one to a folder. Then you can create a new upscaled video from that folder.