Workflow for Hunyuan video that can generate a small resolution video first very quickly, then upscales it with Hunyuan v2v when you find one you like. There is a third step for upscaling and video interpolation.
Version 1.5 uses the fast video lora to generate the first video in 7 steps, significantly increasing the speed of the first generation without compromising the second.
Version 1.6 uses a TeaCache sampler to increase generation speed by 1.6 and optionally by 2.1 with worse quality.
Version 1.7 adds Wavespeed, which has increased speed for me by about 15 %. To use it you will need to clone the wavespeed repo in custom nodes. Some wavespeed functionality requires installing triton, but if you only use the "Apply first block cache" node you may not need it.
If you already have a video you simply want to upscale, you can connect the muted load video node to the top left connection in the "Upscale and Interpolation" group and mute the previous 2.
This is just the application of some tips from this article with already available workflows.
This is not intended as tutorial on Hunyuan video, please check out the links above.
Description
Added TeaCache sampler to increase generation speed.
FAQ
Comments (32)
Why am I only getting this error:
UNETLoader: - Value not in list: unet_name: 'hunyuan_video_t2v_720p_bf16.safetensors' not in ['flux1-dev-Q8_0.gguf']
For some reason, your ComfyUI is only detecting your flux model. Check your folders to see if you have the Hunyuan model where it's supposed to go.
@bonetrousers I do..But I worked out the problem. It seems that many people have this issue, the solution is to delete the ComfyUI-Flow-Control folder.
Not sure why, but whenever using the fast lora, I get garbage results...putting literally any other Lora in there makes perfect crystal clear results, but adding fast doesn't speed anything up and just makes a blocky mess. is my fast thingie nerfed or is this the weirdest placebo ever. Even tossing a flux lora makes clear results compared to the outcome of the fast lora. anyone know why?
Are you also using the fast version hunyuan? That would basically be doubling the lora which would mess up your outputs.
The fast lora just means you will need less steps to get a video. 20 steps for the full model, 7 steps for the fast model or normal model + fast lora.
@Flexability that makes sense, but oddly enough, on the fast version, with no loras, its garbage, but putting any lora in there, even unused, and it turns out pretty high quality. thanks for the reply. I'll test things further.
@saturngfx if using the fast lora with fast model, drop fast lora strength to -0.3 to -0.5
@Melty1989 I've learned to simply not use the fast lora when using the fast model tbh. but thanks, I'll try that.
Thanks! I discovered teacache yesterday and I must say it does speed it up a lot on my rig.
Is there any reason you aren't using the GGUF versions of the model and clip or is it just that you didn't went for a low v-ram workflow?
No particular reason, I simply haven't tested the quantized versions of the model yet. Feel free to give it a go.
@bonetrousers I tried it and for some reason it took up more RAM lol. Did not check into it any further though.
@funscripter627 Your comment made me curious, and I tried it as well. The sampling node in the V2V step took me 105 seconds with the full bf16 model and 114 with the quantized one. Unless there's something I'm missing, it doesn't seem to help at all.
@bonetrousers I might be wrong but I think that GUFF mostly takes up less memory and processing can actually be slower because of the heavier compression.
Very cool. Does anyone know how to let this thing loop and run over and over again with the same prompt (with different seed). Simply queuing it up a bunch of time in Comfy doesn't work for some reason.
In the random noise node, change "control_after_generate" to increment or randomize to change every run.
@bonetrousers Thx for some reason missed that!
I tried it and it works, but even if the preview gen is photorealistic, the 2nd pass gen gets anime styled. how to avoid this?
It will depend on what you're trying to make, but usually adding "realistic", "photorealistic" and things like that to your prompt should solve it in most cases. If it persists, you can also add photorealistic loras, like secret sauce: https://civitai.com/models/889205/secret-sauce
Try reducing the denoise of the BasicScheduler in the v2v step, try with 0.6 or lower, currently this WF node has 0.85 which can sometimes change the style of the video, Also Enhancing your prompt or using realistic loras should also help.
I had this same thing happen. I had Secret Sauce added, but the strength was too low. I put the strength back to 1.0 and it solved the problem. As others have noted, adding "realistic" or "photorealistic" will also help.
Absolutely love the workflow! By far my favorite as someone who's very new to comfyui.
I loaded a specific lora that applies properly during the first pass, but I lose it by the second pass. Any suggestions to fix it? Thanks!
Place it before the "Set Model" SetNode, that way it'll apply to both passes. It's separated that way so that the fast lora is only applied on the first pass.
@bonetrousers thanks so much for the help, I feel stupid for not noticing that note that pretty much answered me question lol, I appreciate the help and thanks again for the workflow
I'm having trouble finding the "fastvideo" lora. Is this under a different name on civitAI?
Hey, I looked up the name and found this: https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/hyvideo_FastVideo_LoRA-fp8.safetensors
That's the one!
having trouble finding the llava llama model
I think I got the scaled version from here: https://huggingface.co/calcuis/hunyuan-gguf/blob/main/llava_llama3_fp8_scaled.safetensors
You can also get the full version from civitai: https://civitai.com/models/1018217?modelVersionId=1141685
nice workflow for making incremental changes/fine tunes. thanks for posting.
are you gonna make a workflow for wavespeed + teacache? I heard you can get like 3x more speed for generation times.
I've been messing around with it, but I was only able to get about a 15-20 % reduction in generation time. I think TeaCache and the "Apply First Block Cache" wavespeed node do pretty similar things. The model compiler seemed promising, but I haven't managed to increase speed with the video sizes I usually do.
I'll post it here because 15 % extra speed is still pretty neat, even if installing triton on windows is a pain in the ass.
