Hunyuan - Image2Video (Jan/2025) - obsolete - CivArchive (CivitAI Archive)

Hunyuan - Image2Video (Jan/2025) - obsolete - v1.1

NSFW

**Don't forget to Like 👍 the model. ;)

!!!This workflow is Obsolete!!! Some better options:

Wan2.1 (Best Quality also slowest, high Vram usage for great results, but have GGUF options)

https://civarchive.com/models/1300201/wan-ai-img2vid-video-extend

Skyreels (Hunyuan Variant) (Good Quality, Mid Vram usage)

https://civarchive.com/models/1278247/skyreels-hunyuan-img2vid

Hunyuan WF (Fastest one. I don't like the quality so much but I'm still testing. Lowest Vram usage and FAST lora!)

https://civarchive.com/models/1328592/hunyuan-wf-img2vid-fast

*Just added a version without auto image resize due to the high amount of people having errors with it. The manual one will work 100%. Sorry about that :)

**Error: unsupported operand type(s) for //: 'int' and 'NoneType'" error Fix: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269

Straightforward, this is an Image-to-Video workflow using the resources we have today (January 2025) with Hunyuan models. Using I2V LeapFusion Lora plus IP2V encoding, it can be very consistent and, in my opinion, as good as an older Kling version in terms of consistency. It’s not perfect, but it delivers solid results if used well, especially with videos of humans.

I kept it as simple as possible and didn’t include the faceswap node this time, but it’s a great addition if you’re planning to generate videos with human subjects. The VRAM usage depends heavily on the length and dimensions of the video you want to generate, but 12GB of VRAM is ideal to get good results.

As always, instructions and links are included in the workflow. Don’t forget to update Comfy and HunyuanVideoWrapper nodes!

That’s it. Leave a like and have fun!

Description

Added Version with Manual Image resize for the ones having error with the automatic one.

FAQ

Comments (193)

dominic1336756Jan 27, 2025· 9 reactions

CivitAI

ImageScale

unsupported operand type(s) for /: 'NoneType' and 'int'

Author

Jan 27, 2025· 1 reaction

Fixed. Sorry about that. :)

yajukunJan 28, 2025

@Sam_A I just DL'd the workflow and I still get this error? Is there something we need to change in the workflow? Thanks!

shawnkaron295Jan 28, 2025

I get the same error, any ideas?

yajukunJan 28, 2025· 1 reaction

@shawnkaron295 I just saw this post and it fixed it for me.

This is a known issue: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269

Downgrade the transformers library to fix it.
Run “python.exe -m pip install transformers==4.47.0” in the python_embedded folder.

Author

Jan 27, 2025· 3 reactions

CivitAI

Just reuploaded with the "unsupported operand type(s) for /: 'NoneType' and 'int'" problem solved. Sorry about that :)

zengrathJan 27, 2025

Thanks. i was just about to post that i am also getting error and was trying to figure it out.

zengrathJan 27, 2025· 1 reaction

I am guessing it's no updated yet on civit? i redownloaded and still getting:
HyVideoTextImageEncode

unsupported operand type(s) for //: 'int' and 'NoneType'

Author

Jan 27, 2025

@zengrath On HyVideoTextImageEncode node?

AicushJan 27, 2025

@Sam_A Yep, I tried downloading it and am getting the same issue too, got the recent uploaded version too.

Author

Jan 27, 2025

@Aicush Now I think it's fixed. At least I tried with many different images o_o

funscripter627Jan 27, 2025· 2 reactions

I got the error or a very similar one too, but it was because of my transformers library not being the right version. Had to downgrade to 4.47.0.

SysDeepJan 27, 2025· 1 reaction

Same Problem.

Update from 8min ago same Error

Author

Jan 27, 2025

Well. I ust added a version without the auto image resize. I'm unable to identify the problem, so the alternative might work. It's boring to calculate the image sizes but it's good enough I think.

SysDeepJan 27, 2025

@Sam_A sry same problem ^^" unsupported operand type(s) for //: 'int' and 'NoneType'

Author

Jan 28, 2025

@SysDeep Using the one with manual image size????

SysDeepJan 28, 2025

@Sam_A Yes 1.1 tested wont work. Image2Video-By-Sam-ManualSize

SysDeepJan 28, 2025· 2 reactions

@Sam_A - Worked now thanks to funscripter627

Transformers 4.47.0 works fine! i must downgrade it.

Author

Jan 28, 2025· 1 reaction

CivitAI

Just added a version without auto image resize due to the high amount of people having errors with it. The manual one will work 100%. Sorry about that :)

SysDeepJan 28, 2025· 4 reactions

CivitAI

How to get it work:

All things inside Python (i use Windows11):

- Downgrade Transformers to 4.47.0

- Install sageattention

- Install triton

Then worked fine.

liquidhead440Jan 28, 2025

CivitAI

Can someone tell me if this one or https://civitai.com/models/1180764/hunyuan-img2vid-leapfusion-lora?modelVersionId=1328798 is better , I am just wondering, since that version is working for me, it's not super good, but it does pretty okay, but if this one is better I am willing to pus some effort into all the workarounds yall are talking about. Because as of now, everything I would need to change just sounds like a lot of work, and with the 2 examples provided I am not really up for doing all that, and risking on breaking something while doing it, because we all know how one tiny thing can just mess up a whole lot of other workflows. Anyway I apreaciate the upload regardless. Just can't use it LOL

Author

Jan 28, 2025

My version use Lora and IP2V to reforce the result. It's just one small detail I added for better results. Also easy image resize...

zengrathJan 28, 2025

once i found correct instructions it only takes minutes to fix. If your on portable comfyui.

go to your python_embeded folder. inside folder in address bar type cmd. In cmd window do:
python -m pip uninstall transformers -y
python -m pip install --upgrade transformers==4.47.0

that's it. this fixed it for me.

zengrathJan 28, 2025

What i can say is, so far it's more consistent then other workflows i tried. Other workflows the face of the person for example would change too much where this one it doesn't. However as far as actions, it likely isn't going to do a whole lot of what you want right now. Hopefully official img2vid support will work better. It certainly is worth giving it a go though. At very least to be able to see some movement of still images is pretty neat without paying huge price of premium AI's to do it right now.

liquidhead440Jan 28, 2025

@Sam_A thank you :)

liquidhead440Jan 28, 2025

@zengrath well that sounds way easier than whatever I found, I looked into it during the day if I need to do anything else while I was at work not with my PC, and I got to say THANK YOU SO MUCH for this super easy explaination, I'm probably gonna do it either later today, or tomorrow. I guess hopefully I won't wreck anything in the mean time LMAO

zengrathJan 28, 2025

@liquidhead440 Hope it works for you. And after playing around with this workflow a while trying different source images. I been getting good results and finally with right prompting and lora's getting pretty cool results. So it's worth trying. I have a feeling even official img2video won't be perfect and will require trial and error, though i hope official support is more efficient and consistent

yajukunJan 28, 2025· 2 reactions

CivitAI

Hi, are there specific sizes and resolutions that work best? What is the largest picture we can start with. Is there a limit to the number of frames/video length?

Author

Jan 28, 2025· 1 reaction

It depends all on your GPU Vram. With a 4070 I usually try 75 frames with 768x432 latent size. You can try to play around it and see what your GPU can handle.

yajukunJan 28, 2025

@Sam_A I have been able to do 560x1024 portrait, 72+1 frames, uses 20-23GB on my 4090. I tried making longer videos and got multiple errors.

Author

Jan 28, 2025· 1 reaction

@yajukun On my 4090 I tried to make the longest I could with a descent quality. I got around 9 sec video in 768x432. More than this I get the low memory error. Maybe in a 5090? Hehe.

TroublesomeAJan 28, 2025· 1 reaction

CivitAI

Can I2V be done with the native nodes? I can't make kijai nodes work.

illinarJan 28, 2025· 2 reactions

CivitAI

Do I have to use Sage Attention? With sdpa Sampler used to throw errors, now after update it just samples forever stuck at 0%.

P.S. OMG it moved! 233s/it that's a bit much.

goresj2932Jan 28, 2025· 1 reaction

I'm not in front of my machine right now, but I've had luck switching from "sdpa" to "comfy". Does that do anything for you?

wqn999Jan 28, 2025

CivitAI

I think I need help. I get the following message when I run

HyVideoModelLoader

Can't import SageAttention: No module named 'sageattention'

Author

Jan 28, 2025· 1 reaction

Just install it using pip and you're good to go.

auroch22934Jan 28, 2025

I installed it using GIT, and it still doesn't work, does it have to be with PIP?

Author

Jan 28, 2025

@auroch22934 sageattention you install using pip in python. Open python on your terminal and use pip install sageattention.

wqn999Jan 29, 2025· 1 reaction

@Sam_A Thank you very much, I think I will try this method

Author

Jan 29, 2025

@wange999 If nothing work, you can ask to free gpt how to install it and it will give you the command and everything you need with more details than I'm able to do! :D

auroch22934Jan 29, 2025· 1 reaction

I got it to work using this tutorial https://old.reddit.com/r/StableDiffusion/comments/1h7hunp/how_to_run_hunyuanvideo_on_a_single_24gb_vram_card/

There's also a video that makes it even easier to follow. It's a LOT

https://www.youtube.com/watch?v=DigvHsn_Qrw

SantaonholidaysJan 28, 2025· 2 reactions

CivitAI

i get this error :c

HyVideoTextImageEncode

unsupported operand type(s) for //: 'int' and 'NoneType'

Author

Jan 28, 2025· 1 reaction

This is a known error from kijai nodes. Solution: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269

SantaonholidaysJan 28, 2025

Now is my question where is the python_embed in the Desktop Version

Author

Jan 28, 2025· 1 reaction

@Santaonholidays It's your local python instalation. If you didnt create a venv for ComfyUi, you can just install it in your local python.

royal96sero519Jan 29, 2025

@Santaonholidays run: python.exe -m pip install transformers==4.47.0 in python_embed folder

SantaonholidaysJan 29, 2025

@Sam_A in ComfyUI Desktop app is a .venv folder in it

dscvffJan 28, 2025

CivitAI

I get "Only vision_languague models support image input"?

Author

Jan 28, 2025

Did you remove the <image> tag from prompt? Or did you change the TextEncoder model?

dscvffJan 28, 2025

Not removed <image>, what TextEncoder should I use?

Author

Jan 28, 2025

@dscvff In node (Down)Load HunyuanVideo TextEncoder use xtune/llava-llama-3-8b... If you didn't change it, please tell me which node is returning this error.

dscvffJan 29, 2025

@Sam_A Dl the model and got the int error, ran python.exe -m pip install transformers==4.47.0 in python_embed folder, still get the error.

Author

Jan 29, 2025

@dscvff Did you change any config before running the workflow?

dscvffJan 29, 2025

@Sam_A No tried both workflows I see on https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269 that others have same issue even after downgrading, Im on portable.

Author

Jan 29, 2025

@dscvff To try to help you I'll need to know which node is returning the error, So maybe I cna figure what's going on...

ObsidianDreamsJan 29, 2025

@Sam_A hello, same problem as above, I didn't remove <image> from the prompt and neither of the two text encoders listed work, both give me this error:

got prompt

Loading text encoder model (clipL) from: C:\pinokio\api\comfy.git\app\models\clip\clip-vit-large-patch14

Text encoder to dtype: torch.float16

Loading tokenizer (clipL) from: C:\pinokio\api\comfy.git\app\models\clip\clip-vit-large-patch14

Loading text encoder model (llm) from: C:\pinokio\api\comfy.git\app\models\LLM\llava-llama-3-8b-text-encoder-tokenizer

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00, 1.40s/it]

Text encoder to dtype: torch.bfloat16

Loading tokenizer (llm) from: C:\pinokio\api\comfy.git\app\models\LLM\llava-llama-3-8b-text-encoder-tokenizer

!!! Exception during processing !!! Only vision_languague models support image input

Traceback (most recent call last):

File "C:\pinokio\api\comfy.git\app\execution.py", line 327, in execute

output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

File "C:\pinokio\api\comfy.git\app\execution.py", line 202, in get_output_data

return_values = mapnode_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

File "C:\pinokio\api\comfy.git\app\execution.py", line 174, in mapnode_over_list

process_inputs(input_dict, i)

File "C:\pinokio\api\comfy.git\app\execution.py", line 163, in process_inputs

results.append(getattr(obj, func)(**inputs))

File "C:\pinokio\api\comfy.git\app\custom_nodes\ComfyUI-HunyuanVideoWrapper\nodes.py", line 881, in process

prompt_embeds, negative_prompt_embeds, attention_mask, negative_attention_mask = encode_prompt(self,

File "C:\pinokio\api\comfy.git\app\custom_nodes\ComfyUI-HunyuanVideoWrapper\nodes.py", line 806, in encode_prompt

text_inputs = text_encoder.text2tokens(prompt,

File "C:\pinokio\api\comfy.git\app\custom_nodes\ComfyUI-HunyuanVideoWrapper\hyvideo\text_encoder\__init__.py", line 214, in text2tokens

raise ValueError("Only vision_languague models support image input")

ValueError: Only vision_languague models support image input

NikmagoJan 28, 2025· 1 reaction

CivitAI

Res: 796x448, frames 53, steps 9 -- rendering time 1 hour on 12 gb vram. It's normal?

Author

Jan 28, 2025

I would try to reduce resolution a little bit and use the upsaler later. 768x432 75 frames usually takes around 6~7 minutes in a 4070. Maybe less. I'm not sure. You cna go even lower on resolution.

NikmagoJan 28, 2025

@Sam_A I changed the resolution to 768x432, but it didn't affect the rendering time in any way. I assumed that this was due to the fact that the large - "hunyus_video_t2v_720p_bf16" model was used. Then I downloaded it as you advised -- "hunyus_video_fastvideo_720_fp8" -- it didn't help. By the way, with the text2video Hunyuan, the video is generated for no more than 10 minutes. I don't know, maybe 3060 (12 GB) is not friendly with image2video method(

Author

Jan 28, 2025

@Nikmago You will need to reduce resolution/length until you see it's not using 100% of your Vram. I think it divide the memory when you reach 100% and the process goes a lot slower.

zengrathJan 28, 2025

@Nikmago check your vram usage in resource monitor. i found if i am using up 95% or more vram generation time goes up considerably. even on my 4090 at a 720p resolution i am usually 80-90% of my vram if my frams are around 81 or so. So you'll likely need to go down to like 336x576 or lower. until your not hitting over 95% vram and that wiil likely result in your generations only taking minutes. there is another workflow here that specializes in fastest video generation possible and runs fast lora at low resolutions and such. I personally find for a 4090 faster speed not worth loss in quality but for you it may be your best option on a 12gb vram card. there are workflows that say it's specifically fo 12gb vram on civit too. Bur you can use this one if you just lower resolution enough. try starting at 45 frames and resolution i provided to see if it works in just minutes. If not go even lower on the resolution or try another workflow designed for 12gb.

NikmagoJan 29, 2025· 2 reactions

@Sam_A @zengrath Guys, I found something interesting! I tried to set the resolution to 576x336 and 45 frames. It still takes a long time - 30 minutes. But if you set the wrong resolution 3 times and go back again, for example, to 576x336 (45 frames), rendering takes 2 minutes at 9 steps and 4 minutes at 24 steps. As it should be, I suppose.
That is, it turns out to be some kind of bug. I entered the wrong values 3 times, the system gave me an error like ----"The size of tensor a (63) must match the size of tensor b (64) at non-singleton dimension 4". And on the 4th time, the generation was very fast.

MugenManFeb 1, 2025

@Nikmago where should I set the wrong value and what is the wrong value?

NikmagoFeb 1, 2025· 1 reaction

@MugenMan It is better not to bother with these values and add blockswap and teacache to the workflow. This solves the problem. There is a normal workflow on the site for 12 GB

tomazxzas143Jan 28, 2025

CivitAI

I keep getting:

shape mismatch: value tensor of shape [16, 1, 61, 34] cannot be broadcast to indexing result of shape [1, 16, 1, 62, 34]

I know it's got something to do with resolution, when I generate at the resolution that was there by default, it works, but any other resolution gives me this error

Author

Jan 29, 2025· 1 reaction

If you're using manual resize, switch the dimentions to something multiple of 16.

funscripter627Jan 29, 2025· 1 reaction

CivitAI

It works extremely well. Best i2vid workflow I tried, although I did spend more time trying to get the right settings for this one. Thank you so much!

goresj2932Jan 29, 2025

What settings did you end up with? I'm really struggling to get good quality out of this, but I'm pretty sure it's something I've got misconfigured.

funscripter627Jan 29, 2025· 2 reactions

@goresj2932 You have to play around with the CFG scale and flow shift values a bit sometimes, although I usually keep them as they are by default in the workflow. I also make sure the frame count is a total of 24 otherwise the movement seem to get messed up. Not 100% sure though.

Make your prompts really simple and make sure that you don't change too much, otherwise it will generate a whole new image instead. Look at the examples posted here and just change the picture if you want to see it work.

Author

Jan 30, 2025· 2 reactions

@goresj2932 I will suggest you what worked for me... Check what Hunyuan can generate, make images with the thing you want, but with similar visual composition, and then it will understand your input. Also, lora of the thing you want to animate will help a lot.

Kate_Wett770Jan 29, 2025· 1 reaction

CivitAI

Where I can find the img2vid.safetensors (LoRA)

Author

Jan 30, 2025

The link is in the workflow instructions.

kanghua151613Jan 30, 2025

not find sexy dance lora !

Author

Jan 30, 2025

@kanghua151613 Not really necessary, but here is it...
https://civitai.com/models/1110311/sexy-dance

myprivacy27091991221Jan 30, 2025· 1 reaction

CivitAI

RTX 4070 12GB taking too much time for 432x768, still running more than 2hrs.. why?

Author

Jan 30, 2025

Probably too many frames. It's reaching your Vram Limit. Try to fun it in a way it will reach max 95% of your Vram and it will run in 5~7 minutes. You can reduce length of the frames or frame size.

kanghua151613Jan 30, 2025

CivitAI

The video seems to have undergone significant deformation after three seconds or more. Did I make a mistake

funscripter627Jan 30, 2025· 1 reaction

It's probably not recognizing the thing you want to change. Try to describe it more clearly or generalize it more depending on the picture. For example, instead of "a woman in a black dress and dark brown hair" just say "a woman". If there are multiple woman in the picture then it helps to distinguish them.

In addition play around with the sampler parameters, guided cfg scale, flow_shift and denoise. Lowering denoise should make it deviate less from the picture.

Also, sometimes a picture just doesn't work. If that's the case, waiting for the official i2v model is probably best.

Author

Jan 30, 2025

Like funscripter627 said, sometimes it does not understand what you're trying to do. My suggestion is, try to create what you want in T2V workflow, just do check how much of your prompt Hunyuan understands, and how it understands it. Generate your image with a similar viual composition, and it might work. It's like old Kling, that used to deformate what it does not understands. I believe with better models this problem might be gone in the future.

essseekay476Jan 30, 2025

CivitAI

I keep getting this:

HyVideoTextImageEncode

unsupported operand type(s) for //: 'int' and 'NoneType'

funscripter627Jan 30, 2025· 1 reaction

It's probably the issue linked at the top: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269 Basically you need to downgrade the transformers package. Also a good idea to update your custom nodes if you haven't already.

essseekay476Jan 30, 2025

@funscripter627 thanks for your reply. how do i downgrade the package?

funscripter627Jan 30, 2025

@essseekay476 It's in the link. It depends on your comfy installation. See https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269#issuecomment-2585504240

essseekay476Jan 30, 2025

@funscripter627 im running comfy within pinokio, still dont know what to do (sorry for being annoying)

funscripter627Jan 30, 2025

@essseekay476 it's okay man. I don't know pinokio specifically, but generally you want to execute the downgrade and any other pip commands on your virtual environment (venv). I think pinokio comes with conda so maybe try to look for a venv there.

Author

Jan 30, 2025· 1 reaction

According with GPT:

I'm running comfyui in pinokio. How do I install a python package in it?

To install a Python package in ComfyUI running on Pinokio, you can follow these general steps. Since ComfyUI is typically running in a Python environment (likely in a virtual environment), you'll want to install your package into that environment.

Here’s how you can install a Python package:

Access the terminal/command line: If you're running Pinokio with a graphical interface, you should be able to access a terminal from within the environment.

Activate your virtual environment (if applicable): If ComfyUI is using a virtual environment, activate it. Typically, you'd activate the virtual environment like this (assuming it's named env):

For Linux/macOS:

bash

source env/bin/activate

For Windows:

bash

.\env\Scripts\activate

Install the Python package: Once your virtual environment is activated, you can install the package using pip. For example, if you want to install requests, you'd run:

bash

pip install requests

Verify the installation: After installation, you can check that the package has been installed by running:

bash

pip list

This should show the installed packages, including the one you just added.

Restart ComfyUI: After installing the package, you may need to restart ComfyUI to ensure it picks up the new package.

Let me know if you encounter any issues along the way!

If you need further instructions, GPT can help you with more details than I can 100%! lol

essseekay476Jan 31, 2025· 2 reactions

@Sam_A @Sam_A thanks for this reply man you went above and beyond lol

DroneMeOutJan 31, 2025· 1 reaction

How do I know if I am using others that do require the version of the transformers package I am currently running? I wouldn't want to downgrade just for this I2V and break everything else in the process. Thoughts?
UPDATE: I looked, and I was only at 4.47.1 anyway. I seriously doubt anything is going to care about the downgrade.

funscripter627Jan 31, 2025

@DroneMeOut Haven't ran into any issues myself after downgrading, although I've mostly been using this workflow lol

felipesscaff925Jan 30, 2025

CivitAI

Everytime I get this error, even with other Hunyuan workflows

RuntimeError: CUDA error: the launch timed out and was terminated CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Author

Jan 30, 2025

WHat GPU do you have?

felipesscaff925Feb 3, 2025

@Sam_A im currently using a nvidia a16-16q

Author

Feb 3, 2025

@felipesscaff925 Well... this is kinda of a "server" GPU. I'm sorry but I'm not sure how to fix this bro. Maybe GPT have a solution?

LexiBarberFeb 12, 2025

Yeah, I got the same error.

vim_brigantJan 31, 2025· 1 reaction

CivitAI

This is the first i2v hunyuan workflow I've tried that's actually worked for me. Thanks for putting this together!

LucasYaoJan 31, 2025· 1 reaction

CivitAI

"Where can I find Sexy Dance E15 lora?"

Author

Jan 31, 2025

https://civitai.com/models/1110311/sexy-dance

LucasYaoJan 31, 2025· 1 reaction

Thx Bro！

LucasYaoJan 31, 2025· 1 reaction

CivitAI

This is the error I encountered. Could everyone please take a look?

got prompt

encoded latents shape torch.Size([1, 16, 1, 96, 54])

Loading text encoder model (clipL) from: C:\ComfyUI_windows_portable\ComfyUI\models\clip\clip-vit-large-patch14

Text encoder to dtype: torch.float16

Loading tokenizer (clipL) from: C:\ComfyUI_windows_portable\ComfyUI\models\clip\clip-vit-large-patch14

Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.

Loading text encoder model (vlm) from: C:\ComfyUI_windows_portable\ComfyUI\models\LLM\llava-llama-3-8b-v1_1-transformers

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:42<00:00, 10.64s/it]

Text encoder to dtype: torch.bfloat16

Loading tokenizer (vlm) from: C:\ComfyUI_windows_portable\ComfyUI\models\LLM\llava-llama-3-8b-v1_1-transformers

!!! Exception during processing !!! unsupported operand type(s) for //: 'int' and 'NoneType'

Traceback (most recent call last):

File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 327, in execute

output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 202, in get_output_data

return_values = mapnode_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 174, in mapnode_over_list

process_inputs(input_dict, i)

File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 163, in process_inputs

results.append(getattr(obj, func)(**inputs))

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-hunyuanvideowrapper\nodes.py", line 884, in process

prompt_embeds, negative_prompt_embeds, attention_mask, negative_attention_mask = encode_prompt(self,

^^^^^^^^^^^^^^^^^^^

File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-hunyuanvideowrapper\nodes.py", line 809, in encode_prompt

text_inputs = text_encoder.text2tokens(prompt,

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-hunyuanvideowrapper\hyvideo\text_encoder\__init__.py", line 253, in text2tokens

text_tokens = self.processor(

^^^^^^^^^^^^^^^

File "C:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\models\llava\processing_llava.py", line 160, in call

num_image_tokens = (height // self.patch_size) * (

~~~~~~~^^~~~~~~~~~~~~~~~~

TypeError: unsupported operand type(s) for //: 'int' and 'NoneType'

Prompt executed in 69.41 seconds

tangentplum598Jan 31, 2025· 1 reaction

CivitAI

Dumb question but i noticed you had the fastvideo selected without hte lora. is the lora required?

also is there a reason you did not use the 544 960 resolution specified in the i2v loras repo?

Author

Jan 31, 2025· 1 reaction

The fast video model already have the lora included. It's like a "lcm" SD model. About the resolution, it works in different resolutions. Anything multiple of 16 pixels will work. Adjust according with what your GPU can handle. Don't go to low because the model will start to not undestand what is in the picture. Don't go too high, or your GPU will not be able to process.

tangentplum598Jan 31, 2025· 1 reaction

@Sam_A ok thank you. another question. running on 432x768 at 37 frames easily gives me a OOM on a 3090.. is this normal?

Author

Jan 31, 2025· 1 reaction

@tangentplum598 No. It's not normal. With 24GB you should be able to run maybe 7~8 seconds (98% Vram) at this resolution I think. I suggest you to maybe udate the nodes and comfy and check if everything else is in order.

tangentplum598Jan 31, 2025· 1 reaction

@Sam_A Alright . yeah i cannot figure it out. i've updated all nodes even used a combination of --disable-smart-memory --disable-cuda-malloc. i've noticed models aren't being offloaded which is of course not a problem with this workflow. I'm just so confused what is happening :(

UPDATE:
I just made a fresh comfyui installation and it works now

Author

Jan 31, 2025· 1 reaction

@tangentplum598 I wish I could help, but I really don't know what could be the problem.

Update: Oh! Great!

Tom_AttoJan 31, 2025· 1 reaction

CivitAI

I like the work that you've put into this - thank you :)

Do you have a recommendation for any Comfy settings to avoid memory issues?

-GPU: 4080 Super

-Dedicated VRAM for windows - 15384MB

-Shared system memory - 49022MB

The workflow is running my GPU and VRAM at 100% without changing any settings from the base workflow with some memory errors that have stopped generation. I am 100% sure I'm doing something wrong :)

Author

Feb 1, 2025· 2 reactions

Of course! This is how I adjust this workflow according with the GPU I'm using. I have a 4090 and a 4070 (12GB Vram).
1. I like to input images with 16:9 aspect ratio (1920x1080 or 1280x720, etc). Then I define the large size of the image. I usually start with 768.
2. In num_frame on sampler you need to put a numer multiple of 4 + 1. Eg: (4*10)+1 = 41.
Since it works in a base of 24 frames/sex, you can use 24 times length in seconds you wish, plus 1. I believe with 16GB Vram and the size I said in item 1, you can generate at least 5 seconds of video, but you need to test. Start small. test with 2 seconds, 3, etc. Until you find the sweet spot for your GPU. If the generation starts to go crazy like 100sec/it, the setting is wrong and you need to reduce length or image size.

And finally you finetune your config. I use to reduce the large size of the image 16 pixels each time. So you will test if Hunyuam understands your image in the size you're inputing it. Atm I'm using large 672 as large size, trying to create the video as long as possible. For what I'm generating, below this size things starts to get weird. But it's really about try and error.

This is the general way I think to use this workflow. I hope it helps!

Tom_AttoFeb 1, 2025· 1 reaction

@Sam_A Thank you - this helped. It's still a bit slow, but I can actually get some generations going.

cgimwFeb 2, 2025

@Sam_A I'm getting quite bad VRAM usage on a Windows (not the portable, installed in a venv). I have a full 3090 (actually I have two in this machine, but seems bad if I am getting excessive usage) which is only barely able to run a 512x768x73 workflow before OOM. I installed this all yesterday so it's a fresh installation. Any idea of some optimisation options I'm missing?

P.S: Thanks for the workflow! Though I find it can be a bit reluctant with certain loras (with anime style images) - do you find the start image can have a big effect on the resulting action?

Author

Feb 2, 2025

@cgimw In another comment someone was also having problems in a 3090 and a fresh comfy reinstalation/updated solved the problem. But since you said it's fresh, I don't really know. I wish I could help.
About the image, yes. It have a strong impact in the movement. It feels like that if Hunyuan model don't have enough data of your image, the result can be deformed or with small movements. The ideal is try to play with runyuan a little before start with I2V, just to "feel" what kind of images the model generate.

cgimwFeb 2, 2025

@Sam_A Hmm. I will keep an eye on it - maybe I will try reinstalling.

Do you know if there's a simple way of offloading the text encoder onto my other GPU? As that does seem like it would be a simple way of getting some big improvement in throughput.

JankolonkoFeb 3, 2025

Hi guys any idea how to fix it? I have 16GBVRAM but every time I got an error when loading llama HyVideoTextImageEncode

Allocation on device

Seems like not enough vram? Any idea how to load it?

tangentplum598Feb 1, 2025

CivitAI

Hey does anyone have a tips for encouraging motion? frequently the I2V is barely moving.

Author

Feb 1, 2025

My suggestion is try to use pictures the model understand and use Lora. But to be honest, most of the time it's not needed. What are you trying to move? Maybe I can try some examples to help you.

tangentplum598Feb 1, 2025

@Sam_A I'm thinking my custom lora may not just be trained enough. thanks. I will try to experiment more and update. Is there a group on discord or something for people to discuss related things?

seyik83688983Feb 3, 2025· 1 reaction

CivitAI

Sadly doesn't work, fist it was throwing the text encoder issue. Downgraded to 4.47.0. Now it still wont work even though I am not getting any error lol. Comfy is a pain the ass

Author

Feb 3, 2025· 1 reaction

It's sad. I would try a fresh Comfy instalation and see if the problem is solved.

GigaThiccMar 1, 2025· 1 reaction

I was able to get it to work by following all of the instructions exactly -- granted on a MimicPC instance :-) I think everything has to do with clean install and available vram!

MeletheiaFeb 3, 2025· 1 reaction

CivitAI

that's the error i get 'Failed to import transformers.models.timm_wrapper.configuration_timm_wrapper because of the following error (look up to see its traceback): No module named 'transformers.models.timm_wrapper.configuration_timm_wrapper'

VintagePhotographerFeb 8, 2025

same here

VintagePhotographerFeb 8, 2025· 1 reaction

https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/332 helped

PetePabloFeb 4, 2025

CivitAI

Anyone get this error:
HyVideoVAELoader

Error(s) in loading state_dict for AutoencoderKLCausal3D: Missing key(s) in state_dict: "encoder.................

PetePabloFeb 4, 2025

HyVideoVAELoader

Error(s) in loading state_dict for AutoencoderKLCausal3D: Missing key(s) in state_dict: "encoder.down_blocks.0.resnets.0.norm1.weight", "encoder.down_blocks.0.resnets.0.norm1.bias", "encoder.down_blocks.0.resnets.0.conv1.conv.weight", "encoder.down_blocks.0.resnets.0.conv1.conv.bias", "encoder.down_blocks.0.resnets.0.norm2.weight", "encoder.down_blocks.0.resnets.0.norm2.bias", "encoder.down_blocks.0.resnets.0.conv2.conv.weight", "encoder.down_blocks.0.resnets.0.conv2.conv.bias", "encoder.down_blocks.0.resnets.1.norm1.weight", "encoder.down_blocks.0.resnets.1.norm1.bias", "encoder.down_blocks.0.resnets.1.conv1.conv.weight", "encoder.down_blocks.0.resnets.1.conv1.conv.bias", "encoder.down_blocks.0.resnets.1.norm2.weight", "encoder.down_blocks.0.resnets.1.norm2.bias", "encoder.down_blocks.0.resnets.1.conv2.conv.weight", "encoder.down_blocks.0.resnets.1.conv2.conv.bias", "encoder.down_blocks.0.downsamplers.0.conv.conv.weight", "encoder.down_blocks.0.downsamplers.0.conv.conv.bias", "encoder.down_blocks.1.resnets.0.norm1.weight", "encoder.down_blocks.1.resnets.0.norm1.bias", "encoder.down_blocks.1.resnets.0.conv1.conv.weight", "encoder.down_blocks.1.resnets.0.conv1.conv.bias", "encoder.down_blocks.1.resnets.0.norm2.weight", "encoder.down_blocks.1.resnets.0.norm2.bias", "encoder.down_blocks.1.resnets.0.conv2.conv.weight", "encoder.down_blocks.1.resnets.0.conv2.conv.bias", "encoder.down_blocks.1.resnets.0.conv_shortcut.conv.weight", "encoder.down_blocks.1.resnets.0.conv_shortcut.conv.bias", "encoder.down_blocks.1.resnets.1.norm1.weight", "encoder.down_blocks.1.resnets.1.norm1.bias", "encoder.down_blocks.1.resnets.1.conv1.conv.weight", "encoder.down_blocks.1.resnets.1.conv1.conv.bias", "encoder.down_blocks.1.resnets.1.norm2.weight", "encoder.down_blocks.1.resnets.1.norm2.bias", "encoder.down_blocks.1.resnets.1.conv2.conv.weight", "encoder.down_blocks.1.resnets.1.conv2.conv.bias", "encoder.down_blocks.1.downsamplers.0.conv.conv.weight", "encoder.down_blocks.1.downsamplers.0.conv.conv.bias", "encoder.down_blocks.2.resnets.0.norm1.weight", "encoder.down_blocks.2.resnets.0.norm1.bias", "encoder.down_blocks.2.resnets.0.conv1.conv.weight", "encoder.down_blocks.2.resnets.0.conv1.conv.bias", "encoder.down_blocks.2.resnets.0.norm2.weight", "encoder.down_blocks.2.resnets.0.norm2.bias", "encoder.down_blocks.2.resnets.0.conv2.conv.weight", "encoder.down_blocks.2.resnets.0.conv2.conv.bias", "encoder.down_blocks.2.resnets.0.conv_shortcut.conv.weight", "encoder.down_blocks.2.resnets.0.conv_shortcut.conv.bias", "encoder.down_blocks.2.resnets.1.norm1.weight", "encoder.down_blocks.2.resnets.1.norm1.bias", "encoder.down_blocks.2.resnets.1.conv1.conv.weight", "encoder.down_blocks.2.resnets.1.conv1.conv.bias", "encoder.down_blocks.2.resnets.1.norm2.weight", "encoder.down_blocks.2.resnets.1.norm2.bias", "encoder.down_blocks.2.resnets.1.conv2.conv.weight", "encoder.down_blocks.2.resnets.1.conv2.conv.bias", "encoder.down_blocks.2.downsamplers.0.conv.conv.weight", "encoder.down_blocks.2.downsamplers.0.conv.conv.bias", "encoder.down_blocks.3.resnets.0.norm1.weight", "encoder.down_blocks.3.resnets.0.norm1.bias", "encoder.down_blocks.3.resnets.0.conv1.conv.weight", "encoder.down_blocks.3.resnets.0.conv1.conv.bias", "encoder.down_blocks.3.resnets.0.norm2.weight", "encoder.down_blocks.3.resnets.0.norm2.bias", "encoder.down_blocks.3.resnets.0.conv2.conv.weight", "encoder.down_blocks.3.resnets.0.conv2.conv.bias", "encoder.down_blocks.3.resnets.1.norm1.weight", "encoder.down_blocks.3.resnets.1.norm1.bias", "encoder.down_blocks.3.resnets.1.conv1.conv.weight", "encoder.down_blocks.3.resnets.1.conv1.conv.bias", "encoder.down_blocks.3.resnets.1.norm2.weight", "encoder.down_blocks.3.resnets.1.norm2.bias", "encoder.down_blocks.3.resnets.1.conv2.conv.weight", "encoder.down_blocks.3.resnets.1.conv2.conv.bias", "encoder.mid_block.attentions.0.group_norm.weight", "encoder.mid_block.attentions.0.group_norm.bias", "encoder.mid_block.attentions.0.to_q.weight", "encoder.mid_block.attentions.0.to_q.bias", "encoder.mid_block.attentions.0.to_k.weight", "encoder.mid_block.attentions.0.to_k.bias", "encoder.mid_block.attentions.0.to_v.weight", "encoder.mid_block.attentions.0.to_v.bias", "encoder.mid_block.attentions.0.to_out.0.weight", "encoder.mid_block.attentions.0.to_out.0.bias", "encoder.mid_block.resnets.0.norm1.weight", "encoder.mid_block.resnets.0.norm1.bias", "encoder.mid_block.resnets.0.conv1.conv.weight", "encoder.mid_block.resnets.0.conv1.conv.bias", "encoder.mid_block.resnets.0.norm2.weight", "encoder.mid_block.resnets.0.norm2.bias", "encoder.mid_block.resnets.0.conv2.conv.weight", "encoder.mid_block.resnets.0.conv2.conv.bias", "encoder.mid_block.resnets.1.norm1.weight", "encoder.mid_block.resnets.1.norm1.bias", "encoder.mid_block.resnets.1.conv1.conv.weight", "encoder.mid_block.resnets.1.conv1.conv.bias", "encoder.mid_block.resnets.1.norm2.weight", "encoder.mid_block.resnets.1.norm2.bias", "encoder.mid_block.resnets.1.conv2.conv.weight", "encoder.mid_block.resnets.1.conv2.conv.bias", "encoder.conv_norm_out.weight", "encoder.conv_norm_out.bias", "decoder.up_blocks.0.resnets.0.norm1.weight", "decoder.up_blocks.0.resnets.0.norm1.bias", "decoder.up_blocks.0.resnets.0.conv1.conv.weight", "decoder.up_blocks.0.resnets.0.conv1.conv.bias", "decoder.up_blocks.0.resnets.0.norm2.weight", "decoder.up_blocks.0.resnets.0.norm2.bias", "decoder.up_blocks.0.resnets.0.conv2.conv.weight", "decoder.up_blocks.0.resnets.0.conv2.conv.bias", "decoder.up_blocks.0.resnets.1.norm1.weight", "decoder.up_blocks.0.resnets.1.norm1.bias", "decoder.up_blocks.0.resnets.1.conv1.conv.weight", "decoder.up_blocks.0.resnets.1.conv1.conv.bias", "decoder.up_blocks.0.resnets.1.norm2.weight", "decoder.up_blocks.0.resnets.1.norm2.bias", "decoder.up_blocks.0.resnets.1.conv2.conv.weight", "decoder.up_blocks.0.resnets.1.conv2.conv.bias", "decoder.up_blocks.0.resnets.2.norm1.weight", "decoder.up_blocks.0.resnets.2.norm1.bias", "decoder.up_blocks.0.resnets.2.conv1.conv.weight", "decoder.up_blocks.0.resnets.2.conv1.conv.bias", "decoder.up_blocks.0.resnets.2.norm2.weight", "decoder.up_blocks.0.resnets.2.norm2.bias", "decoder.up_blocks.0.resnets.2.conv2.conv.weight", "decoder.up_blocks.0.resnets.2.conv2.conv.bias", "decoder.up_blocks.0.upsamplers.0.conv.conv.weight", "decoder.up_blocks.0.upsamplers.0.conv.conv.bias", "decoder.up_blocks.1.resnets.0.norm1.weight", "decoder.up_blocks.1.resnets.0.norm1.bias", "decoder.up_blocks.1.resnets.0.conv1.conv.weight", "decoder.up_blocks.1.resnets.0.conv1.conv.bias", "decoder.up_blocks.1.resnets.0.norm2.weight", "decoder.up_blocks.1.resnets.0.norm2.bias", "decoder.up_blocks.1.resnets.0.conv2.conv.weight", "decoder.up_blocks.1.resnets.0.conv2.conv.bias", "decoder.up_blocks.1.resnets.1.norm1.weight", "decoder.up_blocks.1.resnets.1.norm1.bias", "decoder.up_blocks.1.resnets.1.conv1.conv.weight", "decoder.up_blocks.1.resnets.1.conv1.conv.bias", "decoder.up_blocks.1.resnets.1.norm2.weight", "decoder.up_blocks.1.resnets.1.norm2.bias", "decoder.up_blocks.1.resnets.1.conv2.conv.weight", "decoder.up_blocks.1.resnets.1.conv2.conv.bias", "decoder.up_blocks.1.resnets.2.norm1.weight", "decoder.up_blocks.1.resnets.2.norm1.bias", "decoder.up_blocks.1.resnets.2.conv1.conv.weight", "decoder.up_blocks.1.resnets.2.conv1.conv.bias", "decoder.up_blocks.1.resnets.2.norm2.weight", "decoder.up_blocks.1.resnets.2.norm2.bias", "decoder.up_blocks.1.resnets.2.conv2.conv.weight", "decoder.up_blocks.1.resnets.2.conv2.conv.bias", "decoder.up_blocks.1.upsamplers.0.conv.conv.weight", "decoder.up_blocks.1.upsamplers.0.conv.conv.bias", "decoder.up_blocks.2.resnets.0.norm1.weight", "decoder.up_blocks.2.resnets.0.norm1.bias", "decoder.up_blocks.2.resnets.0.conv1.conv.weight", "decoder.up_blocks.2.resnets.0.conv1.conv.bias", "decoder.up_blocks.2.resnets.0.norm2.weight", "decoder.up_blocks.2.resnets.0.norm2.bias", "decoder.up_blocks.2.resnets.0.conv2.conv.weight", "decoder.up_blocks.2.resnets.0.conv2.conv.bias", "decoder.up_blocks.2.resnets.0.conv_shortcut.conv.weight", "decoder.up_blocks.2.resnets.0.conv_shortcut.conv.bias", "decoder.up_blocks.2.resnets.1.norm1.weight", "decoder.up_blocks.2.resnets.1.norm1.bias", "decoder.up_blocks.2.resnets.1.conv1.conv.weight", "decoder.up_blocks.2.resnets.1.conv1.conv.bias", "decoder.up_blocks.2.resnets.1.norm2.weight", "decoder.up_blocks.2.resnets.1.norm2.bias", "decoder.up_blocks.2.resnets.1.conv2.conv.weight", "decoder.up_blocks.2.resnets.1.conv2.conv.bias", "decoder.up_blocks.2.resnets.2.norm1.weight", "decoder.up_blocks.2.resnets.2.norm1.bias", "decoder.up_blocks.2.resnets.2.conv1.conv.weight", "decoder.up_blocks.2.resnets.2.conv1.conv.bias", "decoder.up_blocks.2.resnets.2.norm2.weight", "decoder.up_blocks.2.resnets.2.norm2.bias", "decoder.up_blocks.2.resnets.2.conv2.conv.weight", "decoder.up_blocks.2.resnets.2.conv2.conv.bias", "decoder.up_blocks.2.upsamplers.0.conv.conv.weight", "decoder.up_blocks.2.upsamplers.0.conv.conv.bias", "decoder.up_blocks.3.resnets.0.norm1.weight", "decoder.up_blocks.3.resnets.0.norm1.bias", "decoder.up_blocks.3.resnets.0.conv1.conv.weight", "decoder.up_blocks.3.resnets.0.conv1.conv.bias", "decoder.up_blocks.3.resnets.0.norm2.weight", "decoder.up_blocks.3.resnets.0.norm2.bias", "decoder.up_blocks.3.resnets.0.conv2.conv.weight", "decoder.up_blocks.3.resnets.0.conv2.conv.bias", "decoder.up_blocks.3.resnets.0.conv_shortcut.conv.weight", "decoder.up_blocks.3.resnets.0.conv_shortcut.conv.bias", "decoder.up_blocks.3.resnets.1.norm1.weight", "decoder.up_blocks.3.resnets.1.norm1.bias", "decoder.up_blocks.3.resnets.1.conv1.conv.weight", "decoder.up_blocks.3.resnets.1.conv1.conv.bias", "decoder.up_blocks.3.resnets.1.norm2.weight", "decoder.up_blocks.3.resnets.1.norm2.bias", "decoder.up_blocks.3.resnets.1.conv2.conv.weight", "decoder.up_blocks.3.resnets.1.conv2.conv.bias", "decoder.up_blocks.3.resnets.2.norm1.weight", "decoder.up_blocks.3.resnets.2.norm1.bias", "decoder.up_blocks.3.resnets.2.conv1.conv.weight", "decoder.up_blocks.3.resnets.2.conv1.conv.bias", "decoder.up_blocks.3.resnets.2.norm2.weight", "decoder.up_blocks.3.resnets.2.norm2.bias", "decoder.up_blocks.3.resnets.2.conv2.conv.weight", "decoder.up_blocks.3.resnets.2.conv2.conv.bias", "decoder.mid_block.attentions.0.group_norm.weight", "decoder.mid_block.attentions.0.group_norm.bias", "decoder.mid_block.attentions.0.to_q.weight", "decoder.mid_block.attentions.0.to_q.bias", "decoder.mid_block.attentions.0.to_k.weight", "decoder.mid_block.attentions.0.to_k.bias", "decoder.mid_block.attentions.0.to_v.weight", "decoder.mid_block.attentions.0.to_v.bias", "decoder.mid_block.attentions.0.to_out.0.weight", "decoder.mid_block.attentions.0.to_out.0.bias", "decoder.mid_block.resnets.0.norm1.weight", "decoder.mid_block.resnets.0.norm1.bias", "decoder.mid_block.resnets.0.conv1.conv.weight", "decoder.mid_block.resnets.0.conv1.conv.bias", "decoder.mid_block.resnets.0.norm2.weight", "decoder.mid_block.resnets.0.norm2.bias", "decoder.mid_block.resnets.0.conv2.conv.weight", "decoder.mid_block.resnets.0.conv2.conv.bias", "decoder.mid_block.resnets.1.norm1.weight", "decoder.mid_block.resnets.1.norm1.bias", "decoder.mid_block.resnets.1.conv1.conv.weight", "decoder.mid_block.resnets.1.conv1.conv.bias", "decoder.mid_block.resnets.1.norm2.weight", "decoder.mid_block.resnets.1.norm2.bias", "decoder.mid_block.resnets.1.conv2.conv.weight", "decoder.mid_block.resnets.1.conv2.conv.bias", "decoder.conv_norm_out.weight", "decoder.conv_norm_out.bias". Unexpected key(s) in state_dict: "encoder.down.0.block.0.conv1.conv.bias", "encoder.down.0.block.0.conv1.conv.weight", "encoder.down.0.block.0.conv2.conv.bias", "encoder.down.0.block.0.conv2.conv.weight", "encoder.down.0.block.0.norm1.bias", "encoder.down.0.block.0.norm1.weight", "encoder.down.0.block.0.norm2.bias", "encoder.down.0.block.0.norm2.weight", "encoder.down.0.block.1.conv1.conv.bias", "encoder.down.0.block.1.conv1.conv.weight", "encoder.down.0.block.1.conv2.conv.bias", "encoder.down.0.block.1.conv2.conv.weight", "encoder.down.0.block.1.norm1.bias", "encoder.down.0.block.1.norm1.weight", "encoder.down.0.block.1.norm2.bias", "encoder.down.0.block.1.norm2.weight", "encoder.down.0.downsample.conv.conv.bias", "encoder.down.0.downsample.conv.conv.weight", "encoder.down.1.block.0.conv1.conv.bias", "encoder.down.1.block.0.conv1.conv.weight", "encoder.down.1.block.0.conv2.conv.bias", "encoder.down.1.block.0.conv2.conv.weight", "encoder.down.1.block.0.nin_shortcut.conv.bias", "encoder.down.1.block.0.nin_shortcut.conv.weight", "encoder.down.1.block.0.norm1.bias", "encoder.down.1.block.0.norm1.weight", "encoder.down.1.block.0.norm2.bias", "encoder.down.1.block.0.norm2.weight", "encoder.down.1.block.1.conv1.conv.bias", "encoder.down.1.block.1.conv1.conv.weight", "encoder.down.1.block.1.conv2.conv.bias", "encoder.down.1.block.1.conv2.conv.weight", "encoder.down.1.block.1.norm1.bias", "encoder.down.1.block.1.norm1.weight", "encoder.down.1.block.1.norm2.bias", "encoder.down.1.block.1.norm2.weight", "encoder.down.1.downsample.conv.conv.bias", "encoder.down.1.downsample.conv.conv.weight", "encoder.down.2.block.0.conv1.conv.bias", "encoder.down.2.block.0.conv1.conv.weight", "encoder.down.2.block.0.conv2.conv.bias", "encoder.down.2.block.0.conv2.conv.weight", "encoder.down.2.block.0.nin_shortcut.conv.bias", "encoder.down.2.block.0.nin_shortcut.conv.weight", "encoder.down.2.block.0.norm1.bias", "encoder.down.2.block.0.norm1.weight", "encoder.down.2.block.0.norm2.bias", "encoder.down.2.block.0.norm2.weight", "encoder.down.2.block.1.conv1.conv.bias", "encoder.down.2.block.1.conv1.conv.weight", "encoder.down.2.block.1.conv2.conv.bias", "encoder.down.2.block.1.conv2.conv.weight", "encoder.down.2.block.1.norm1.bias", "encoder.down.2.block.1.norm1.weight", "encoder.down.2.block.1.norm2.bias", "encoder.down.2.block.1.norm2.weight", "encoder.down.2.downsample.conv.conv.bias", "encoder.down.2.downsample.conv.conv.weight", "encoder.down.3.block.0.conv1.conv.bias", "encoder.down.3.block.0.conv1.conv.weight", "encoder.down.3.block.0.conv2.conv.bias", "encoder.down.3.block.0.conv2.conv.weight", "encoder.down.3.block.0.norm1.bias", "encoder.down.3.block.0.norm1.weight", "encoder.down.3.block.0.norm2.bias", "encoder.down.3.block.0.norm2.weight", "encoder.down.3.block.1.conv1.conv.bias", "encoder.down.3.block.1.conv1.conv.weight", "encoder.down.3.block.1.conv2.conv.bias", "encoder.down.3.block.1.conv2.conv.weight", "encoder.down.3.block.1.norm1.bias", "encoder.down.3.block.1.norm1.weight", "encoder.down.3.block.1.norm2.bias", "encoder.down.3.block.1.norm2.weight", "encoder.mid.attn_1.k.bias", "encoder.mid.attn_1.k.weight", "encoder.mid.attn_1.norm.bias", "encoder.mid.attn_1.norm.weight", "encoder.mid.attn_1.proj_out.bias", "encoder.mid.attn_1.proj_out.weight", "encoder.mid.attn_1.q.bias", "encoder.mid.attn_1.q.weight", "encoder.mid.attn_1.v.bias", "encoder.mid.attn_1.v.weight", "encoder.mid.block_1.conv1.conv.bias", "encoder.mid.block_1.conv1.conv.weight", "encoder.mid.block_1.conv2.conv.bias", "encoder.mid.block_1.conv2.conv.weight", "encoder.mid.block_1.norm1.bias", "encoder.mid.block_1.norm1.weight", "encoder.mid.block_1.norm2.bias", "encoder.mid.block_1.norm2.weight", "encoder.mid.block_2.conv1.conv.bias", "encoder.mid.block_2.conv1.conv.weight", "encoder.mid.block_2.conv2.conv.bias", "encoder.mid.block_2.conv2.conv.weight", "encoder.mid.block_2.norm1.bias", "encoder.mid.block_2.norm1.weight", "encoder.mid.block_2.norm2.bias", "encoder.mid.block_2.norm2.weight", "encoder.norm_out.bias", "encoder.norm_out.weight", "decoder.mid.attn_1.k.bias", "decoder.mid.attn_1.k.weight", "decoder.mid.attn_1.norm.bias", "decoder.mid.attn_1.norm.weight", "decoder.mid.attn_1.proj_out.bias", "decoder.mid.attn_1.proj_out.weight", "decoder.mid.attn_1.q.bias", "decoder.mid.attn_1.q.weight", "decoder.mid.attn_1.v.bias", "decoder.mid.attn_1.v.weight", "decoder.mid.block_1.conv1.conv.bias", "decoder.mid.block_1.conv1.conv.weight", "decoder.mid.block_1.conv2.conv.bias", "decoder.mid.block_1.conv2.conv.weight", "decoder.mid.block_1.norm1.bias", "decoder.mid.block_1.norm1.weight", "decoder.mid.block_1.norm2.bias", "decoder.mid.block_1.norm2.weight", "decoder.mid.block_2.conv1.conv.bias", "decoder.mid.block_2.conv1.conv.weight", "decoder.mid.block_2.conv2.conv.bias", "decoder.mid.block_2.conv2.conv.weight", "decoder.mid.block_2.norm1.bias", "decoder.mid.block_2.norm1.weight", "decoder.mid.block_2.norm2.bias", "decoder.mid.block_2.norm2.weight", "decoder.norm_out.bias", "decoder.norm_out.weight", "decoder.up.0.block.0.conv1.conv.bias", "decoder.up.0.block.0.conv1.conv.weight", "decoder.up.0.block.0.conv2.conv.bias", "decoder.up.0.block.0.conv2.conv.weight", "decoder.up.0.block.0.nin_shortcut.conv.bias", "decoder.up.0.block.0.nin_shortcut.conv.weight", "decoder.up.0.block.0.norm1.bias", "decoder.up.0.block.0.norm1.weight", "decoder.up.0.block.0.norm2.bias", "decoder.up.0.block.0.norm2.weight", "decoder.up.0.block.1.conv1.conv.bias", "decoder.up.0.block.1.conv1.conv.weight", "decoder.up.0.block.1.conv2.conv.bias", "decoder.up.0.block.1.conv2.conv.weight", "decoder.up.0.block.1.norm1.bias", "decoder.up.0.block.1.norm1.weight", "decoder.up.0.block.1.norm2.bias", "decoder.up.0.block.1.norm2.weight", "decoder.up.0.block.2.conv1.conv.bias", "decoder.up.0.block.2.conv1.conv.weight", "decoder.up.0.block.2.conv2.conv.bias", "decoder.up.0.block.2.conv2.conv.weight", "decoder.up.0.block.2.norm1.bias", "decoder.up.0.block.2.norm1.weight", "decoder.up.0.block.2.norm2.bias", "decoder.up.0.block.2.norm2.weight", "decoder.up.1.block.0.conv1.conv.bias", "decoder.up.1.block.0.conv1.conv.weight", "decoder.up.1.block.0.conv2.conv.bias", "decoder.up.1.block.0.conv2.conv.weight", "decoder.up.1.block.0.nin_shortcut.conv.bias", "decoder.up.1.block.0.nin_shortcut.conv.weight", "decoder.up.1.block.0.norm1.bias", "decoder.up.1.block.0.norm1.weight", "decoder.up.1.block.0.norm2.bias", "decoder.up.1.block.0.norm2.weight", "decoder.up.1.block.1.conv1.conv.bias", "decoder.up.1.block.1.conv1.conv.weight", "decoder.up.1.block.1.conv2.conv.bias", "decoder.up.1.block.1.conv2.conv.weight", "decoder.up.1.block.1.norm1.bias", "decoder.up.1.block.1.norm1.weight", "decoder.up.1.block.1.norm2.bias", "decoder.up.1.block.1.norm2.weight", "decoder.up.1.block.2.conv1.conv.bias", "decoder.up.1.block.2.conv1.conv.weight", "decoder.up.1.block.2.conv2.conv.bias", "decoder.up.1.block.2.conv2.conv.weight", "decoder.up.1.block.2.norm1.bias", "decoder.up.1.block.2.norm1.weight", "decoder.up.1.block.2.norm2.bias", "decoder.up.1.block.2.norm2.weight", "decoder.up.1.upsample.conv.conv.bias", "decoder.up.1.upsample.conv.conv.weight", "decoder.up.2.block.0.conv1.conv.bias", "decoder.up.2.block.0.conv1.conv.weight", "decoder.up.2.block.0.conv2.conv.bias", "decoder.up.2.block.0.conv2.conv.weight", "decoder.up.2.block.0.norm1.bias", "decoder.up.2.block.0.norm1.weight", "decoder.up.2.block.0.norm2.bias", "decoder.up.2.block.0.norm2.weight", "decoder.up.2.block.1.conv1.conv.bias", "decoder.up.2.block.1.conv1.conv.weight", "decoder.up.2.block.1.conv2.conv.bias", "decoder.up.2.block.1.conv2.conv.weight", "decoder.up.2.block.1.norm1.bias", "decoder.up.2.block.1.norm1.weight", "decoder.up.2.block.1.norm2.bias", "decoder.up.2.block.1.norm2.weight", "decoder.up.2.block.2.conv1.conv.bias", "decoder.up.2.block.2.conv1.conv.weight", "decoder.up.2.block.2.conv2.conv.bias", "decoder.up.2.block.2.conv2.conv.weight", "decoder.up.2.block.2.norm1.bias", "decoder.up.2.block.2.norm1.weight", "decoder.up.2.block.2.norm2.bias", "decoder.up.2.block.2.norm2.weight", "decoder.up.2.upsample.conv.conv.bias", "decoder.up.2.upsample.conv.conv.weight", "decoder.up.3.block.0.conv1.conv.bias", "decoder.up.3.block.0.conv1.conv.weight", "decoder.up.3.block.0.conv2.conv.bias", "decoder.up.3.block.0.conv2.conv.weight", "decoder.up.3.block.0.norm1.bias", "decoder.up.3.block.0.norm1.weight", "decoder.up.3.block.0.norm2.bias", "decoder.up.3.block.0.norm2.weight", "decoder.up.3.block.1.conv1.conv.bias", "decoder.up.3.block.1.conv1.conv.weight", "decoder.up.3.block.1.conv2.conv.bias", "decoder.up.3.block.1.conv2.conv.weight", "decoder.up.3.block.1.norm1.bias", "decoder.up.3.block.1.norm1.weight", "decoder.up.3.block.1.norm2.bias", "decoder.up.3.block.1.norm2.weight", "decoder.up.3.block.2.conv1.conv.bias", "decoder.up.3.block.2.conv1.conv.weight", "decoder.up.3.block.2.conv2.conv.bias", "decoder.up.3.block.2.conv2.conv.weight", "decoder.up.3.block.2.norm1.bias", "decoder.up.3.block.2.norm1.weight", "decoder.up.3.block.2.norm2.bias", "decoder.up.3.block.2.norm2.weight", "decoder.up.3.upsample.conv.conv.bias", "decoder.up.3.upsample.conv.conv.weight".

Author

Feb 4, 2025

@PetePablo Download the VAE from the instructions in the workflow.

RetroEvanFeb 14, 2025

@Sam_A worked for me, thank you

Ash51Feb 25, 2025

@Sam_A where to put it and whiche vae please direct links

Author

Feb 25, 2025

@Ash51 The link for VAE and all other models are in the workflow! The BIG red note at the start of the workflow.

WellFormedMonkeyFeb 5, 2025· 2 reactions

CivitAI

i keep getting this error

Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Author

Feb 5, 2025

Which node return this error? Are you trying to run the firtst time offline?

jovannskyler405Feb 8, 2025· 1 reaction

I had this error. I found my llava-llama-3-8b-text-encoder-tokenizer safetensor files were corrupted somehow as their SHA256 did not match those on Hugging Face. You can verify SHA256 on windows through PowerShell command File-GetHash. Redownload of the files fixed the problem.

JankolonkoFeb 5, 2025· 2 reactions

CivitAI

I got this error after a fresh install of a ComfyUI. How is it possible to fit almost 16GB LLM model in GPU? I have 16GB VRAM but got a not enough memory error. So how its possible to run even on 12GB? Can someone help me please?
sformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.

Loading text encoder model (vlm) from: C:\pinokio\api\comfy.git\app\models\LLM\llava-llama-3-8b-v1_1-transformers

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [02:50<00:00, 42.65s/it]

Text encoder to dtype: torch.bfloat16

!!! Exception during processing !!! Allocation on device

Traceback (most recent call last):

File "C:\pinokio\api\comfy.git\app\execution.py", line 327, in execute

output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

File "C:\pinokio\api\comfy.git\app\execution.py", line 202, in get_output_data

return_values = mapnode_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

File "C:\pinokio\api\comfy.git\app\execution.py", line 174, in mapnode_over_list

process_inputs(input_dict, i)

File "C:\pinokio\api\comfy.git\app\execution.py", line 163, in process_inputs

results.append(getattr(obj, func)(**inputs))

File "C:\pinokio\api\comfy.git\app\custom_nodes\comfyui-hunyuanvideowrapper\nodes.py", line 684, in loadmodel

text_encoder = TextEncoder(

File "C:\pinokio\api\comfy.git\app\custom_nodes\comfyui-hunyuanvideowrapper\hyvideo\text_encoder\__init__.py", line 167, in init

self.model, self.model_path = load_text_encoder(

File "C:\pinokio\api\comfy.git\app\custom_nodes\comfyui-hunyuanvideowrapper\hyvideo\text_encoder\__init__.py", line 64, in load_text_encoder

text_encoder = text_encoder.to(device)

File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\transformers\modeling_utils.py", line 3110, in to

return super().to(*args, **kwargs)

File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 1340, in to

return self._apply(convert)

File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply

module._apply(fn)

File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply

module._apply(fn)

File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 927, in _apply

param_applied = fn(param)

File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 1326, in convert

return t.to(

torch.OutOfMemoryError: Allocation on device

Got an OOM, unloading all loaded models.

Prompt executed in 1870.39 seconds

ClassicalSalamanderFeb 14, 2025

Same error, no idea how to fix it, sadly. I'll keep looking and post back here if I find anything.

stylobcnFeb 18, 2025

same error :(

Machine_SpiritFeb 7, 2025

CivitAI

I get this error on the text decoder download:
DownloadAndLoadHyVideoTextEncoder

Failed to import transformers.models.timm_wrapper.configuration_timm_wrapper because of the following error (look up to see its traceback): cannot import name 'ImageNetInfo' from 'timm.data' (C:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\timm\data\__init__.py)

Author

Feb 7, 2025

Your comfy and nodes are updated?

Machine_SpiritFeb 7, 2025

@Sam_A Yes, just did update all to make sure, still the same error

Author

Feb 7, 2025

Are you using the workflow in the first run with internet? Because it will download some models. I'm not sure if it's the problem.

asd231734624Feb 7, 2025

https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/332
This worked for me

Machine_SpiritFeb 7, 2025

@asd231734624 Thank you! I did that (changed directory so the embeded pythyon gets updated) and got this error when I tried the workflow again:

No such file or directory: "C:\\ComfyUI_windows_portable\\ComfyUI\\models\\LLM\\llava-llama-3-8b-v1_1-transformers\\model-00001-of-00004.safetensors"

I then followed this advice:
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/81
and deleted the folders in ComfyUI_windows_portable\ComfyUI\models\LLM
because it seems the first try the download started something went wrong.

I tried again and the download starts again now. Don't be irritated if it stays on "fetching files" a while, it's 15GB, for anyone who reads this.

Then I got this error on a new run:
HyVideoTextImageEncode

unsupported operand type(s) for //: 'int' and 'NoneType'

and used this link to downgrade transformer version:
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269

after that, new error:
HyVideoModelLoader

Can't import SageAttention: No module named 'sageattention'

This says we need triton, that is not supporting windows:
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/108

So this workflow does not work for windows I guess?

Author

Feb 7, 2025

@DUDE33 Yeah. It works on windows. You just need to install the module the error says is missing in your venv. sageattention.

Machine_SpiritFeb 7, 2025

@Sam_A Ok, thanks for answering! I'm just irritated because the forum post says it needs triton, which is not available for windows. I now found this vid https://www.youtube.com/watch?v=DigvHsn_Qrw
Do I have to do all that or is there a easier way?

Author

Feb 8, 2025

@DUDE33 If the only error now is the missing "sageattention", you just need to install it in your enviroment using pip and it will work. It's not normal to have so much problems trying to use this workflow. Idk what is happening with you in special. lol

Machine_SpiritFeb 8, 2025

@Sam_A I'm just new to this haha, like a lot of other people here. I installed sageattention via pip command.

PS:
C:\ComfyUI_windows_portable\python_embeded> python.exe -m pip install sageattention

Collecting sageattention

Downloading sageattention-1.0.6-py3-none-any.whl.metadata (5.6 kB)

Downloading sageattention-1.0.6-py3-none-any.whl (20 kB)

Installing collected packages: sageattention

Successfully installed sageattention-1.0.6

PS:

C:\ComfyUI_windows_portable\python_embeded> python.exe -m pip install --upgrade sageattention

Requirement already satisfied: sageattention in c:\users\X\appdata\local\programs\python\python312\lib\site-packages (1.0.6)

Restarted comfyui but still get the error, did I do something wrong?
Also tried via the git url

asd231734624Feb 8, 2025· 1 reaction

@DUDE33 I had same problem and change Sampler sdpa to comfy then worked.

Machine_SpiritFeb 9, 2025

@asd231734624 Thank you very much =). The workflow works now, sage just wont work I guess, if anyone has an idea what to check, let me know.

mkDanielFeb 7, 2025· 6 reactions

CivitAI

What does Can't import SageAttention: No module named 'sageattention' error mean?

genuralFeb 8, 2025

You need to install SageAttention in your venv environment of your ComfyUi installation.

pip install sageattention

I think that you might also need to install triton:

https://www.youtube.com/watch?v=DigvHsn_Qrw

civitai7_Feb 11, 2025

I did pip install sageattention as genural wrote.. that didn't fix the problem. I didn't go through the youtube video tutorial as people in the comments are still complaining that it doesn't work. So I saw in the workflow that it stops at node "HunyuanVideo Model Loader" and attention_mode references this sageattention. So I simply changed it to the comfy option. That worked. Try the other options also, as I will. Not sure how the quality or speed is affected. If it's just speed, I'm not going to whine too much about it. If it affects quality without sageattention, then that's unfortunate. If someone has sageattention working, please comment on if there are quality differences, or if it's just speed.

NeverWasFeb 11, 2025

@civitai7_ good

IdelacioFeb 12, 2025

CivitAI

I'm getting the following error-

HyVideoTextImageEncode

text input must be of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples).

Everything left as default aside from the uploaded image, which is similar to the test ones. I did have to manually download the xtuner llm as the workflow was unable to locate the source from hugging face.

Kijai LLM DID download from source but gives the following error-

HyVideoTextImageEncode

Only vision_languague models support image input

LexiBarberFeb 13, 2025· 3 reactions

CivitAI

I've run into an error I can't work out. Every time the compile gets to this point, ComfyUI crashes out and disconnects. Does anyone have any idea what's going on here?

It can't seem to get past the HyuanVideo Sampler. I'm not a coder, so any help would be appreciated.

My GPU is a 4090 with 24GB of RAM - So I figure it should be able to handle it.

Here's the process:

got prompt

encoded latents shape torch.Size([1, 16, 1, 96, 54])

Loading text encoder model (clipL) from: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\models\clip\clip-vit-large-patch14

Text encoder to dtype: torch.float16

Loading tokenizer (clipL) from: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\models\clip\clip-vit-large-patch14

2025-02-13 19:44:31,692 WARNING: Warn!: You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.

Loading text encoder model (vlm) from: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\models\LLM\llava-llama-3-8b-v1_1-transformers

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:10<00:00, 2.58s/it]

Text encoder to dtype: torch.bfloat16

Loading tokenizer (vlm) from: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\models\LLM\llava-llama-3-8b-v1_1-transformers

2025-02-13 19:44:48,341 WARNING: Warn!: Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50.

2025-02-13 19:44:48,546 WARNING: Warn!: Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50.

vlm prompt attention_mask shape: torch.Size([1, 208]), masked tokens: 208

clipL prompt attention_mask shape: torch.Size([1, 77]), masked tokens: 17

model_type FLOW

The config attributes {'use_flow_sigmas': True, 'prediction_type': 'flow_prediction'} were passed to FlowMatchDiscreteScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.

Scheduler config: FrozenDict({'num_train_timesteps': 1000, 'flow_shift': 9.0, 'reverse': True, 'solver': 'euler', 'n_tokens': None, '_use_default_values': ['n_tokens', 'num_train_timesteps']})

Using accelerate to load and assign model weights to device...

Loading LoRA: img2vid with strength: 1.0

Requested to load HyVideoModel

loaded completely 21443.013157653808 12555.953247070312 True

Input (height, width, video_length) = (768, 432, 73)

The config attributes {'use_flow_sigmas': True, 'prediction_type': 'flow_prediction'} were passed to FlowMatchDiscreteScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.

Scheduler config: FrozenDict({'num_train_timesteps': 1000, 'flow_shift': 9.0, 'reverse': True, 'solver': 'euler', 'n_tokens': None, '_use_default_values': ['n_tokens', 'num_train_timesteps']})

Single input latent frame detected, LeapFusion img2vid enabled

Sampling 73 frames in 19 latents at 432x768 with 9 inference steps

100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [01:20<00:00, 9.00s/it]

Allocated memory: memory=12.301 GB

Max allocated memory: max_memory=16.878 GB

Max reserved memory: max_reserved=18.969 GB

C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable>pause

Press any key to continue . . .

miraishounenFeb 15, 2025

did you fix it

nymicalFeb 17, 2025

Did you try restarting you PC.
I'm not sure about your particular problem, but whenever comfy gives this "Press any key to continue . . .", without any particular reason, and I'm sure that memory isn't a problem. I close everything and restart the PC. That has always worked for me.

LexiBarberFeb 18, 2025

@nymical thanks for the tip! I'll set up a clean install of comfy and see if I can get it working - if I run into that problem again, I'll give your suggestion a shot.

LexiBarberFeb 18, 2025

@miraishounen not yet I'm afraid. I'll let you know of my progress!

vim_brigantFeb 14, 2025

CivitAI

Hi, could someone offer a little advice for getting better results? I always get something faded with crosshatch lines. I posted this to demonstrate: https://civitai.com/posts/12909266 That's one of the better results I got. I kept the settings almost identical to those in the v1.1 workflow, only replacing the sexy dance lora and the source image. This was the source image in case anyone wants to replicate it: https://civitai.com/images/54695445

VIRTUALISFeb 14, 2025· 1 reaction

CivitAI

Getting this after doing the solution to the //: 'int' and 'NoneType'" error:

"Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50."

I don't know where to write that, if that's even the solution

bgbg001516Feb 22, 2025

I had the //: 'int' and 'NoneType'" error: rolling back the tansformers from v4.48.0 to python.exe -m pip install transformers==4.47.0 fixed it for me.

gambikules858Feb 23, 2025

installing 4.47.0 doesnt help me

FireRaidenFeb 14, 2025

CivitAI

Can you make a video how to setup the workflow? I dont understand how "I Select the "Long Side of Image" you wish (before upscale)"

Author

Feb 14, 2025

It's simpler than you think. You will add an input image in the workflow. You just need to define what will be the bigger size of the image and the workflow will auto calculate the smaller size for you.

zorgavorkFeb 15, 2025· 3 reactions

CivitAI

im having a problem with
HyVideoSampler
Failed to find C compiler. Please specify via CC environment variable

i can't get through this

iksatgoFeb 18, 2025

running into the same issue after installing triton + sageattention

62605Feb 23, 2025

I did it following that workflow

https://github.com/woct0rdho/triton-windows?tab=readme-ov-file

stex7722Feb 21, 2025

CivitAI

I have a problem with this one:

"HyVideoTextImageEncode

text input must be of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples)."

Do you have a solution? Thanks

Author

Feb 21, 2025

Can you show me your prompt?

stex7722Feb 22, 2025

The one you left as an example:

"masterpiece, best quality of <image> girl dancing moving her hips softly."

stex7722Mar 1, 2025· 1 reaction

I solved it, thanks anyway!

tetrarrow842Feb 22, 2025

CivitAI

somehow, i get the error:

"Only vision_languague models support image input"
it seems that the text-image-encode wouldn't take the image as prompt?

Author

Feb 22, 2025

Did you change the original encoder?

tetrarrow842Feb 22, 2025

oh,okay, i used the text-encoder-tokenizer instead of the transformer,is that the problem?

Author

Feb 22, 2025

@tetrarrow842 Probably yes. To use image as prompt

bgbg001516Feb 22, 2025· 1 reaction

CivitAI

Finally got i working, but how did you get the animated images to look the same as the original, mine turn into a completely different image.

Sono1050Feb 22, 2025

CivitAI

Have fixed other errors I encountered but keep receiving this error and can not get past it:

"HyVideoModelLoader

Error while deserializing header: HeaderTooLarge File path: /workspace/ComfyUI/models/diffusion_models/hunyuan_video_FastVideo_720_fp8_e4m3fn.safetensors The safetensors file is corrupt or invalid. Make sure this is actually a safetensors file and not a ckpt or pt or other filetype."

Also get the same type of error for the vae file as well.

ClocksmithFeb 22, 2025

CivitAI

I'm afraid the triton dependencies make this one a no go for me. Wish I could make it work but I don't want to spend all day debugging that crap.

Author

Feb 22, 2025

To be true, the new workflow I posted is better. Newer tech and don't use triton.

ClocksmithFeb 23, 2025

@Sam_A That would be awesome. But the one I got from the download link required triton. Do you have a link to the non-triton one?

Author

Feb 23, 2025· 1 reaction

@Clocksmith Of course bro! This one: https://civitai.com/models/1278247/skyreels-hunyuan-img2vid
It's newer and the result is better. It's slower becuase they didn't release the fast lora (that return result with less steps) for this version yet. But it's amazing!

gambikules858Feb 23, 2025

CivitAI

i have this error now

'img_in.proj.weight' in HunyuanVideo Model Loader node

Author

Feb 23, 2025· 1 reaction

You know there is a newer I2V model?

https://civitai.com/models/1278247/skyreels-hunyuan-img2vid

Easier to install and better results.

voidyearFeb 26, 2025

CivitAI

Calculated padded input size per channel: (0 x 16 x 16). Kernel size: (1 x 1 x 1). Kernel size can't be greater than actual input size
When I use the decoding node, it prompts this error. I don't get an error when I use Python3.11.6 and torch.3.0 and cu121, but I do get an error when I use Python3.12.8 and torch2.6.0 cu126. The resolution I adjusted is 408*496.

Author

Feb 26, 2025

408 / 16 = 25,5. Not a multiple of 16. Hence I made nodes to auto calculate dimentions, so this error doesn't happen. Change the image size so something multiple of 16.

voidyearFeb 27, 2025

@Sam_A Thank you for your reply. I will try to modify the size to see if it works. However, I don't know why torch2.3 to be able to run with a size of 408, even if it is not a multiple of 16.

Author

Feb 27, 2025

@voidyear It's interesting to be true. I didn't know about it.

GigaThiccMar 1, 2025· 1 reaction

CivitAI

Great work! Love it, works great in my first tests. Am curious about output length -- I have tried 97 frames and can only seem to get 3 seconds. No errors, no black output, but it will not create any videos longer than the default 73 frames. I have tried with a smaller resolution image to no avail.

Any tips?

I'm using L40S丨48GB vram丨32GB ram so I think vram is not the issue...

Author

Mar 1, 2025

Hmmm. It's supposed to work when you change the num_frames in Sampler. Btw, with 48GB Vram you can try to run 201 frames. It will Loop the video!

GigaThiccMar 1, 2025· 1 reaction

i still haven't gotten it to do more than 73 frames! i will try with 201 later. thanks!

zoom83Mar 2, 2025· 5 reactions

CivitAI

tested it today but got:
* HyVideoSampler 6:

- Return type mismatch between linked nodes: context_options, received_type(FETAARGS) mismatch input_type(HYVIDCONTEXT)

Output will be ignored

EldrinMar 6, 2025

wanted to mention my troubleshooting process for this spoiler alert it did not work:

So I found out any sort of errors comfyui will circle with red context options under the node hunyuan video sampler connects to hunyuan video enhance node feta args
I disconnected these and things started to load after queue

then it tried downloading text encoders llava-llama-3-8b-text-encoder-tokenizer or llava-llama-3-8b-v1_1-transformers there was no telling out long it would take to download so i manually downloaded the files myself so I could track the ETA. then I got this error

DownloadAndLoadHyVideoTextEncoder

Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

to try and fix this i verified SHA256 hash for the model safetensor files in LLM folder for the llava-llama-3-8b-v1_1-transformers text encoder 2 of the files were correct so I redownloaded and the hashes matched hugging face but no dice. then I tried using the llava-llama-3-8b-text-encoder-tokenizer text encoder that would not work either getting the same error.

I feel like disconnected the node for context options caused this so I'm back to wondering how we fix our issue of

DownloadAndLoadHyVideoTextEncoder

Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

it would seem like feta args on enhance should be connected to feta args on sampler but trying that gives me
DownloadAndLoadHyVideoTextEncoder

Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Author

Mar 6, 2025· 1 reaction

@redoctober2 @zoom83 Being 100% honest, this model is already obsolete. It uses a Lora model to become a I2V workflow. It was good when we had no native workflows. But now we have it:

Wan2.1 (Best Quality, high Vram usage for great results)

https://civitai.com/models/1300201/wan-ai-img2vid-video-extend

Skyreels (Hunyuan Variant) (Good Quality, Mid Vram usage)

https://civitai.com/models/1278247/skyreels-hunyuan-img2vid

Hunyuan WF (I don't like the quality so much but I'm still testing. Lowest Vram usage and FAST lora!)

https://civitai.com/models/1328592/hunyuan-wf-img2vid-fast

rhasan1903783May 22, 2025· 1 reaction

CivitAI

I dont know Im doing wrong. I use the provided workflow but it takes 1.5 hrs to generate a 5-second clip. And thats with sageattention and a RTX 5080.

virtualdategaming494Dec 18, 2025

Try to use Framepack I2V Hunyuan or ggfu with a 5080
16gb vram is not really good to use Hunyuan or Wan.
I rent a 5090 32gb on https://runpod.io?ref=gnspz552

birlanesan983May 29, 2025

CivitAI

Hi,

im getting this error
any fix
HyVideoSampler

cannot access local variable 'original_latents' where it is not associated with a value

Workflows

Hunyuan Video

Download (Beta) View on CivitAI

Details

Downloads

7,110

Platform

CivitAI

Platform Status

Available

Created

1/27/2025

Updated

4/30/2026

Deleted

-

Files

hunyuanImage2videoJan_v11.zip

Size:

1.54 MB

SHA256:

7d31f7f6e6dcc57788374c9d61854995b31dc75251358f708cdfea102a1c6c2c

Mirrors

Huggingface (1 mirrors)

hunyuanImage2videoJan_v11.zip

CivitAI (1 mirrors)

hunyuanImage2videoJan_v11.zip