**Don't forget to Like π the model. ;)
!!!This workflow is Obsolete!!! Some better options:
Wan2.1 (Best Quality also slowest, high Vram usage for great results, but have GGUF options)
https://civarchive.com/models/1300201/wan-ai-img2vid-video-extend
Skyreels (Hunyuan Variant) (Good Quality, Mid Vram usage)
https://civarchive.com/models/1278247/skyreels-hunyuan-img2vid
Hunyuan WF (Fastest one. I don't like the quality so much but I'm still testing. Lowest Vram usage and FAST lora!)
https://civarchive.com/models/1328592/hunyuan-wf-img2vid-fast
*Just added a version without auto image resize due to the high amount of people having errors with it. The manual one will work 100%. Sorry about that :)
**Error: unsupported operand type(s) for //: 'int' and 'NoneType'" error Fix: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269
Straightforward, this is an Image-to-Video workflow using the resources we have today (January 2025) with Hunyuan models. Using I2V LeapFusion Lora plus IP2V encoding, it can be very consistent and, in my opinion, as good as an older Kling version in terms of consistency. Itβs not perfect, but it delivers solid results if used well, especially with videos of humans.
I kept it as simple as possible and didnβt include the faceswap node this time, but itβs a great addition if youβre planning to generate videos with human subjects. The VRAM usage depends heavily on the length and dimensions of the video you want to generate, but 12GB of VRAM is ideal to get good results.
As always, instructions and links are included in the workflow. Donβt forget to update Comfy and HunyuanVideoWrapper nodes!
Thatβs it. Leave a like and have fun!
Description
Added Version with Manual Image resize for the ones having error with the automatic one.
FAQ
Comments (193)
ImageScale
unsupported operand type(s) for /: 'NoneType' and 'int'
Fixed. Sorry about that. :)
@Sam_AΒ I just DL'd the workflow and I still get this error? Is there something we need to change in the workflow? Thanks!
I get the same error, any ideas?
@shawnkaron295Β I just saw this post and it fixed it for me.
This is a known issue: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269
Downgrade the transformers library to fix it.
Run βpython.exe -m pip install transformers==4.47.0β in the python_embedded folder.
Just reuploaded with the "unsupported operand type(s) for /: 'NoneType' and 'int'" problem solved. Sorry about that :)
Thanks. i was just about to post that i am also getting error and was trying to figure it out.
I am guessing it's no updated yet on civit? i redownloaded and still getting:
HyVideoTextImageEncode
unsupported operand type(s) for //: 'int' and 'NoneType'
@zengrathΒ On HyVideoTextImageEncode node?
@Sam_AΒ Yep, I tried downloading it and am getting the same issue too, got the recent uploaded version too.
@AicushΒ Now I think it's fixed. At least I tried with many different images o_o
I got the error or a very similar one too, but it was because of my transformers library not being the right version. Had to downgrade to 4.47.0.
Same Problem.
Update from 8min ago same Error
Well. I ust added a version without the auto image resize. I'm unable to identify the problem, so the alternative might work. It's boring to calculate the image sizes but it's good enough I think.
@Sam_AΒ sry same problem ^^" unsupported operand type(s) for //: 'int' and 'NoneType'
@SysDeepΒ Using the one with manual image size????
@Sam_AΒ Yes 1.1 tested wont work. Image2Video-By-Sam-ManualSize
@Sam_AΒ - Worked now thanks to funscripter627
Transformers 4.47.0 works fine! i must downgrade it.
Just added a version without auto image resize due to the high amount of people having errors with it. The manual one will work 100%. Sorry about that :)
How to get it work:
All things inside Python (i use Windows11):
- Downgrade Transformers to 4.47.0
- Install sageattention
- Install triton
Then worked fine.
Can someone tell me if this one or https://civitai.com/models/1180764/hunyuan-img2vid-leapfusion-lora?modelVersionId=1328798 is better , I am just wondering, since that version is working for me, it's not super good, but it does pretty okay, but if this one is better I am willing to pus some effort into all the workarounds yall are talking about. Because as of now, everything I would need to change just sounds like a lot of work, and with the 2 examples provided I am not really up for doing all that, and risking on breaking something while doing it, because we all know how one tiny thing can just mess up a whole lot of other workflows. Anyway I apreaciate the upload regardless. Just can't use it LOL
My version use Lora and IP2V to reforce the result. It's just one small detail I added for better results. Also easy image resize...
once i found correct instructions it only takes minutes to fix. If your on portable comfyui.
go to your python_embeded folder. inside folder in address bar type cmd. In cmd window do:
python -m pip uninstall transformers -y
python -m pip install --upgrade transformers==4.47.0
that's it. this fixed it for me.
What i can say is, so far it's more consistent then other workflows i tried. Other workflows the face of the person for example would change too much where this one it doesn't. However as far as actions, it likely isn't going to do a whole lot of what you want right now. Hopefully official img2vid support will work better. It certainly is worth giving it a go though. At very least to be able to see some movement of still images is pretty neat without paying huge price of premium AI's to do it right now.
@Sam_AΒ thank you :)
@zengrathΒ well that sounds way easier than whatever I found, I looked into it during the day if I need to do anything else while I was at work not with my PC, and I got to say THANK YOU SO MUCH for this super easy explaination, I'm probably gonna do it either later today, or tomorrow. I guess hopefully I won't wreck anything in the mean time LMAO
@liquidhead440Β Hope it works for you. And after playing around with this workflow a while trying different source images. I been getting good results and finally with right prompting and lora's getting pretty cool results. So it's worth trying. I have a feeling even official img2video won't be perfect and will require trial and error, though i hope official support is more efficient and consistent
Hi, are there specific sizes and resolutions that work best? What is the largest picture we can start with. Is there a limit to the number of frames/video length?
It depends all on your GPU Vram. With a 4070 I usually try 75 frames with 768x432 latent size. You can try to play around it and see what your GPU can handle.
@Sam_AΒ I have been able to do 560x1024 portrait, 72+1 frames, uses 20-23GB on my 4090. I tried making longer videos and got multiple errors.
@yajukunΒ On my 4090 I tried to make the longest I could with a descent quality. I got around 9 sec video in 768x432. More than this I get the low memory error. Maybe in a 5090? Hehe.
Can I2V be done with the native nodes? I can't make kijai nodes work.
Do I have to use Sage Attention? With sdpa Sampler used to throw errors, now after update it just samples forever stuck at 0%.
P.S. OMG it moved! 233s/it that's a bit much.
I'm not in front of my machine right now, but I've had luck switching from "sdpa" to "comfy". Does that do anything for you?
I think I need help. I get the following message when I run
HyVideoModelLoader
Can't import SageAttention: No module named 'sageattention'
Just install it using pip and you're good to go.
I installed it using GIT, and it still doesn't work, does it have to be with PIP?
@auroch22934Β sageattention you install using pip in python. Open python on your terminal and use pip install sageattention.
@Sam_AΒ Thank you very much, I think I will try this method
@wange999Β If nothing work, you can ask to free gpt how to install it and it will give you the command and everything you need with more details than I'm able to do! :D
I got it to work using this tutorial https://old.reddit.com/r/StableDiffusion/comments/1h7hunp/how_to_run_hunyuanvideo_on_a_single_24gb_vram_card/
There's also a video that makes it even easier to follow. It's a LOT
https://www.youtube.com/watch?v=DigvHsn_Qrw
i get this error :c
HyVideoTextImageEncode
unsupported operand type(s) for //: 'int' and 'NoneType'
This is a known error from kijai nodes. Solution: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269
Now is my question where is the python_embed in the Desktop Version
@SantaonholidaysΒ It's your local python instalation. If you didnt create a venv for ComfyUi, you can just install it in your local python.
@SantaonholidaysΒ run: python.exe -m pip install transformers==4.47.0 in python_embed folder
@Sam_AΒ in ComfyUI Desktop app is a .venv folder in it
I get "Only vision_languague models support image input"?
Did you remove the <image> tag from prompt? Or did you change the TextEncoder model?
Not removed <image>, what TextEncoder should I use?
@dscvffΒ In node (Down)Load HunyuanVideo TextEncoder use xtune/llava-llama-3-8b... If you didn't change it, please tell me which node is returning this error.
@Sam_AΒ Dl the model and got the int error, ran python.exe -m pip install transformers==4.47.0 in python_embed folder, still get the error.
@dscvffΒ Did you change any config before running the workflow?
@Sam_AΒ No tried both workflows I see on https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269 that others have same issue even after downgrading, Im on portable.
@dscvffΒ To try to help you I'll need to know which node is returning the error, So maybe I cna figure what's going on...
@Sam_AΒ hello, same problem as above, I didn't remove <image> from the prompt and neither of the two text encoders listed work, both give me this error:
got prompt
Loading text encoder model (clipL) from: C:\pinokio\api\comfy.git\app\models\clip\clip-vit-large-patch14
Text encoder to dtype: torch.float16
Loading tokenizer (clipL) from: C:\pinokio\api\comfy.git\app\models\clip\clip-vit-large-patch14
Loading text encoder model (llm) from: C:\pinokio\api\comfy.git\app\models\LLM\llava-llama-3-8b-text-encoder-tokenizer
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [00:05<00:00, 1.40s/it]
Text encoder to dtype: torch.bfloat16
Loading tokenizer (llm) from: C:\pinokio\api\comfy.git\app\models\LLM\llava-llama-3-8b-text-encoder-tokenizer
!!! Exception during processing !!! Only vision_languague models support image input
Traceback (most recent call last):
File "C:\pinokio\api\comfy.git\app\execution.py", line 327, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "C:\pinokio\api\comfy.git\app\execution.py", line 202, in get_output_data
return_values = mapnode_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "C:\pinokio\api\comfy.git\app\execution.py", line 174, in mapnode_over_list
process_inputs(input_dict, i)
File "C:\pinokio\api\comfy.git\app\execution.py", line 163, in process_inputs
results.append(getattr(obj, func)(**inputs))
File "C:\pinokio\api\comfy.git\app\custom_nodes\ComfyUI-HunyuanVideoWrapper\nodes.py", line 881, in process
prompt_embeds, negative_prompt_embeds, attention_mask, negative_attention_mask = encode_prompt(self,
File "C:\pinokio\api\comfy.git\app\custom_nodes\ComfyUI-HunyuanVideoWrapper\nodes.py", line 806, in encode_prompt
text_inputs = text_encoder.text2tokens(prompt,
File "C:\pinokio\api\comfy.git\app\custom_nodes\ComfyUI-HunyuanVideoWrapper\hyvideo\text_encoder\__init__.py", line 214, in text2tokens
raise ValueError("Only vision_languague models support image input")
ValueError: Only vision_languague models support image input
Res: 796x448, frames 53, steps 9 -- rendering time 1 hour on 12 gb vram. It's normal?
I would try to reduce resolution a little bit and use the upsaler later. 768x432 75 frames usually takes around 6~7 minutes in a 4070. Maybe less. I'm not sure. You cna go even lower on resolution.
@Sam_AΒ I changed the resolution to 768x432, but it didn't affect the rendering time in any way. I assumed that this was due to the fact that the large - "hunyus_video_t2v_720p_bf16" model was used. Then I downloaded it as you advised -- "hunyus_video_fastvideo_720_fp8" -- it didn't help. By the way, with the text2video Hunyuan, the video is generated for no more than 10 minutes. I don't know, maybe 3060 (12 GB) is not friendly with image2video method(
@NikmagoΒ You will need to reduce resolution/length until you see it's not using 100% of your Vram. I think it divide the memory when you reach 100% and the process goes a lot slower.
@NikmagoΒ check your vram usage in resource monitor. i found if i am using up 95% or more vram generation time goes up considerably. even on my 4090 at a 720p resolution i am usually 80-90% of my vram if my frams are around 81 or so. So you'll likely need to go down to like 336x576 or lower. until your not hitting over 95% vram and that wiil likely result in your generations only taking minutes. there is another workflow here that specializes in fastest video generation possible and runs fast lora at low resolutions and such. I personally find for a 4090 faster speed not worth loss in quality but for you it may be your best option on a 12gb vram card. there are workflows that say it's specifically fo 12gb vram on civit too. Bur you can use this one if you just lower resolution enough. try starting at 45 frames and resolution i provided to see if it works in just minutes. If not go even lower on the resolution or try another workflow designed for 12gb.
@Sam_AΒ @zengrathΒ Guys, I found something interesting! I tried to set the resolution to 576x336 and 45 frames. It still takes a long time - 30 minutes. But if you set the wrong resolution 3 times and go back again, for example, to 576x336 (45 frames), rendering takes 2 minutes at 9 steps and 4 minutes at 24 steps. As it should be, I suppose.
That is, it turns out to be some kind of bug. I entered the wrong values 3 times, the system gave me an error like ----"The size of tensor a (63) must match the size of tensor b (64) at non-singleton dimension 4". And on the 4th time, the generation was very fast.
@NikmagoΒ where should I set the wrong value and what is the wrong value?
@MugenManΒ It is better not to bother with these values and add blockswap and teacache to the workflow. This solves the problem. There is a normal workflow on the site for 12 GB
I keep getting:
shape mismatch: value tensor of shape [16, 1, 61, 34] cannot be broadcast to indexing result of shape [1, 16, 1, 62, 34]
I know it's got something to do with resolution, when I generate at the resolution that was there by default, it works, but any other resolution gives me this error
If you're using manual resize, switch the dimentions to something multiple of 16.
It works extremely well. Best i2vid workflow I tried, although I did spend more time trying to get the right settings for this one. Thank you so much!
What settings did you end up with? I'm really struggling to get good quality out of this, but I'm pretty sure it's something I've got misconfigured.
@goresj2932Β You have to play around with the CFG scale and flow shift values a bit sometimes, although I usually keep them as they are by default in the workflow. I also make sure the frame count is a total of 24 otherwise the movement seem to get messed up. Not 100% sure though.
Make your prompts really simple and make sure that you don't change too much, otherwise it will generate a whole new image instead. Look at the examples posted here and just change the picture if you want to see it work.
@goresj2932Β I will suggest you what worked for me... Check what Hunyuan can generate, make images with the thing you want, but with similar visual composition, and then it will understand your input. Also, lora of the thing you want to animate will help a lot.
Where I can find the img2vid.safetensors (LoRA)
The link is in the workflow instructions.
not find sexy dance lora !
@kanghua151613Β Not really necessary, but here is it...
https://civitai.com/models/1110311/sexy-dance
RTX 4070 12GB taking too much time for 432x768, still running more than 2hrs.. why?
Probably too many frames. It's reaching your Vram Limit. Try to fun it in a way it will reach max 95% of your Vram and it will run in 5~7 minutes. You can reduce length of the frames or frame size.
The video seems to have undergone significant deformation after three seconds or more. Did I make a mistake
It's probably not recognizing the thing you want to change. Try to describe it more clearly or generalize it more depending on the picture. For example, instead of "a woman in a black dress and dark brown hair" just say "a woman". If there are multiple woman in the picture then it helps to distinguish them.
In addition play around with the sampler parameters, guided cfg scale, flow_shift and denoise. Lowering denoise should make it deviate less from the picture.
Also, sometimes a picture just doesn't work. If that's the case, waiting for the official i2v model is probably best.
Like funscripter627 said, sometimes it does not understand what you're trying to do. My suggestion is, try to create what you want in T2V workflow, just do check how much of your prompt Hunyuan understands, and how it understands it. Generate your image with a similar viual composition, and it might work. It's like old Kling, that used to deformate what it does not understands. I believe with better models this problem might be gone in the future.
I keep getting this:
HyVideoTextImageEncode
unsupported operand type(s) for //: 'int' and 'NoneType'
It's probably the issue linked at the top: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269 Basically you need to downgrade the transformers package. Also a good idea to update your custom nodes if you haven't already.
@funscripter627Β thanks for your reply. how do i downgrade the package?
@essseekay476Β It's in the link. It depends on your comfy installation. See https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269#issuecomment-2585504240
@funscripter627Β im running comfy within pinokio, still dont know what to do (sorry for being annoying)
@essseekay476Β it's okay man. I don't know pinokio specifically, but generally you want to execute the downgrade and any other pip commands on your virtual environment (venv). I think pinokio comes with conda so maybe try to look for a venv there.
According with GPT:
I'm running comfyui in pinokio. How do I install a python package in it?
To install a Python package in ComfyUI running on Pinokio, you can follow these general steps. Since ComfyUI is typically running in a Python environment (likely in a virtual environment), you'll want to install your package into that environment.
Hereβs how you can install a Python package:
Access the terminal/command line: If you're running Pinokio with a graphical interface, you should be able to access a terminal from within the environment.
Activate your virtual environment (if applicable): If ComfyUI is using a virtual environment, activate it. Typically, you'd activate the virtual environment like this (assuming it's named env):
For Linux/macOS:
bashsource env/bin/activate
For Windows:
bash.\env\Scripts\activate
Install the Python package: Once your virtual environment is activated, you can install the package using pip. For example, if you want to install requests, you'd run:
bashpip install requests
Verify the installation: After installation, you can check that the package has been installed by running:
bashpip list
This should show the installed packages, including the one you just added.
Restart ComfyUI: After installing the package, you may need to restart ComfyUI to ensure it picks up the new package.
Let me know if you encounter any issues along the way!
If you need further instructions, GPT can help you with more details than I can 100%! lol
@Sam_AΒ @Sam_AΒ thanks for this reply man you went above and beyond lol
How do I know if I am using others that do require the version of the transformers package I am currently running? I wouldn't want to downgrade just for this I2V and break everything else in the process. Thoughts?
UPDATE: I looked, and I was only at 4.47.1 anyway. I seriously doubt anything is going to care about the downgrade.
@DroneMeOutΒ Haven't ran into any issues myself after downgrading, although I've mostly been using this workflow lol
Everytime I get this error, even with other Hunyuan workflows
RuntimeError: CUDA error: the launch timed out and was terminated CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
WHat GPU do you have?
@Sam_AΒ im currently using a nvidia a16-16q
@felipesscaff925Β Well... this is kinda of a "server" GPU. I'm sorry but I'm not sure how to fix this bro. Maybe GPT have a solution?
Yeah, I got the same error.
This is the first i2v hunyuan workflow I've tried that's actually worked for me. Thanks for putting this together!
"Where can I find Sexy Dance E15 lora?"
Thx BroοΌ
This is the error I encountered. Could everyone please take a look?
got prompt
encoded latents shape torch.Size([1, 16, 1, 96, 54])
Loading text encoder model (clipL) from: C:\ComfyUI_windows_portable\ComfyUI\models\clip\clip-vit-large-patch14
Text encoder to dtype: torch.float16
Loading tokenizer (clipL) from: C:\ComfyUI_windows_portable\ComfyUI\models\clip\clip-vit-large-patch14
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Loading text encoder model (vlm) from: C:\ComfyUI_windows_portable\ComfyUI\models\LLM\llava-llama-3-8b-v1_1-transformers
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [00:42<00:00, 10.64s/it]
Text encoder to dtype: torch.bfloat16
Loading tokenizer (vlm) from: C:\ComfyUI_windows_portable\ComfyUI\models\LLM\llava-llama-3-8b-v1_1-transformers
!!! Exception during processing !!! unsupported operand type(s) for //: 'int' and 'NoneType'
Traceback (most recent call last):
File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 327, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 202, in get_output_data
return_values = mapnode_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 174, in mapnode_over_list
process_inputs(input_dict, i)
File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 163, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-hunyuanvideowrapper\nodes.py", line 884, in process
prompt_embeds, negative_prompt_embeds, attention_mask, negative_attention_mask = encode_prompt(self,
^^^^^^^^^^^^^^^^^^^
File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-hunyuanvideowrapper\nodes.py", line 809, in encode_prompt
text_inputs = text_encoder.text2tokens(prompt,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-hunyuanvideowrapper\hyvideo\text_encoder\__init__.py", line 253, in text2tokens
text_tokens = self.processor(
^^^^^^^^^^^^^^^
File "C:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\models\llava\processing_llava.py", line 160, in call
num_image_tokens = (height // self.patch_size) * (
~~~~~~~^^~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for //: 'int' and 'NoneType'
Prompt executed in 69.41 seconds
Dumb question but i noticed you had the fastvideo selected without hte lora. is the lora required?
also is there a reason you did not use the 544 960 resolution specified in the i2v loras repo?
The fast video model already have the lora included. It's like a "lcm" SD model. About the resolution, it works in different resolutions. Anything multiple of 16 pixels will work. Adjust according with what your GPU can handle. Don't go to low because the model will start to not undestand what is in the picture. Don't go too high, or your GPU will not be able to process.
@Sam_AΒ ok thank you. another question. running on 432x768 at 37 frames easily gives me a OOM on a 3090.. is this normal?
@tangentplum598Β No. It's not normal. With 24GB you should be able to run maybe 7~8 seconds (98% Vram) at this resolution I think. I suggest you to maybe udate the nodes and comfy and check if everything else is in order.
@Sam_AΒ Alright . yeah i cannot figure it out. i've updated all nodes even used a combination of --disable-smart-memory --disable-cuda-malloc. i've noticed models aren't being offloaded which is of course not a problem with this workflow. I'm just so confused what is happening :(
UPDATE:
I just made a fresh comfyui installation and it works now
@tangentplum598Β I wish I could help, but I really don't know what could be the problem.
Update: Oh! Great!
I like the work that you've put into this - thank you :)
Do you have a recommendation for any Comfy settings to avoid memory issues?
-GPU: 4080 Super
-Dedicated VRAM for windows - 15384MB
-Shared system memory - 49022MB
The workflow is running my GPU and VRAM at 100% without changing any settings from the base workflow with some memory errors that have stopped generation. I am 100% sure I'm doing something wrong :)
Of course! This is how I adjust this workflow according with the GPU I'm using. I have a 4090 and a 4070 (12GB Vram).
1. I like to input images with 16:9 aspect ratio (1920x1080 or 1280x720, etc). Then I define the large size of the image. I usually start with 768.
2. In num_frame on sampler you need to put a numer multiple of 4 + 1. Eg: (4*10)+1 = 41.
Since it works in a base of 24 frames/sex, you can use 24 times length in seconds you wish, plus 1. I believe with 16GB Vram and the size I said in item 1, you can generate at least 5 seconds of video, but you need to test. Start small. test with 2 seconds, 3, etc. Until you find the sweet spot for your GPU. If the generation starts to go crazy like 100sec/it, the setting is wrong and you need to reduce length or image size.
And finally you finetune your config. I use to reduce the large size of the image 16 pixels each time. So you will test if Hunyuam understands your image in the size you're inputing it. Atm I'm using large 672 as large size, trying to create the video as long as possible. For what I'm generating, below this size things starts to get weird. But it's really about try and error.
This is the general way I think to use this workflow. I hope it helps!
@Sam_AΒ Thank you - this helped. It's still a bit slow, but I can actually get some generations going.
@Sam_AΒ I'm getting quite bad VRAM usage on a Windows (not the portable, installed in a venv). I have a full 3090 (actually I have two in this machine, but seems bad if I am getting excessive usage) which is only barely able to run a 512x768x73 workflow before OOM. I installed this all yesterday so it's a fresh installation. Any idea of some optimisation options I'm missing?
P.S: Thanks for the workflow! Though I find it can be a bit reluctant with certain loras (with anime style images) - do you find the start image can have a big effect on the resulting action?
@cgimwΒ In another comment someone was also having problems in a 3090 and a fresh comfy reinstalation/updated solved the problem. But since you said it's fresh, I don't really know. I wish I could help.
About the image, yes. It have a strong impact in the movement. It feels like that if Hunyuan model don't have enough data of your image, the result can be deformed or with small movements. The ideal is try to play with runyuan a little before start with I2V, just to "feel" what kind of images the model generate.
@Sam_AΒ Hmm. I will keep an eye on it - maybe I will try reinstalling.
Do you know if there's a simple way of offloading the text encoder onto my other GPU? As that does seem like it would be a simple way of getting some big improvement in throughput.
Hi guys any idea how to fix it? I have 16GBVRAM but every time I got an error when loading llama HyVideoTextImageEncode
Allocation on device
Seems like not enough vram? Any idea how to load it?
Hey does anyone have a tips for encouraging motion? frequently the I2V is barely moving.
My suggestion is try to use pictures the model understand and use Lora. But to be honest, most of the time it's not needed. What are you trying to move? Maybe I can try some examples to help you.
@Sam_AΒ I'm thinking my custom lora may not just be trained enough. thanks. I will try to experiment more and update. Is there a group on discord or something for people to discuss related things?
Sadly doesn't work, fist it was throwing the text encoder issue. Downgraded to 4.47.0. Now it still wont work even though I am not getting any error lol. Comfy is a pain the ass
It's sad. I would try a fresh Comfy instalation and see if the problem is solved.
I was able to get it to work by following all of the instructions exactly -- granted on a MimicPC instance :-) I think everything has to do with clean install and available vram!
that's the error i get 'Failed to import transformers.models.timm_wrapper.configuration_timm_wrapper because of the following error (look up to see its traceback): No module named 'transformers.models.timm_wrapper.configuration_timm_wrapper'
same here
Anyone get this error:
HyVideoVAELoader
Error(s) in loading state_dict for AutoencoderKLCausal3D: Missing key(s) in state_dict: "encoder.................
HyVideoVAELoader
Error(s) in loading state_dict for AutoencoderKLCausal3D: Missing key(s) in state_dict: "encoder.down_blocks.0.resnets.0.norm1.weight", "encoder.down_blocks.0.resnets.0.norm1.bias", "encoder.down_blocks.0.resnets.0.conv1.conv.weight", "encoder.down_blocks.0.resnets.0.conv1.conv.bias", "encoder.down_blocks.0.resnets.0.norm2.weight", "encoder.down_blocks.0.resnets.0.norm2.bias", "encoder.down_blocks.0.resnets.0.conv2.conv.weight", "encoder.down_blocks.0.resnets.0.conv2.conv.bias", "encoder.down_blocks.0.resnets.1.norm1.weight", "encoder.down_blocks.0.resnets.1.norm1.bias", "encoder.down_blocks.0.resnets.1.conv1.conv.weight", "encoder.down_blocks.0.resnets.1.conv1.conv.bias", "encoder.down_blocks.0.resnets.1.norm2.weight", "encoder.down_blocks.0.resnets.1.norm2.bias", "encoder.down_blocks.0.resnets.1.conv2.conv.weight", "encoder.down_blocks.0.resnets.1.conv2.conv.bias", "encoder.down_blocks.0.downsamplers.0.conv.conv.weight", "encoder.down_blocks.0.downsamplers.0.conv.conv.bias", "encoder.down_blocks.1.resnets.0.norm1.weight", "encoder.down_blocks.1.resnets.0.norm1.bias", "encoder.down_blocks.1.resnets.0.conv1.conv.weight", "encoder.down_blocks.1.resnets.0.conv1.conv.bias", "encoder.down_blocks.1.resnets.0.norm2.weight", "encoder.down_blocks.1.resnets.0.norm2.bias", "encoder.down_blocks.1.resnets.0.conv2.conv.weight", "encoder.down_blocks.1.resnets.0.conv2.conv.bias", "encoder.down_blocks.1.resnets.0.conv_shortcut.conv.weight", "encoder.down_blocks.1.resnets.0.conv_shortcut.conv.bias", "encoder.down_blocks.1.resnets.1.norm1.weight", "encoder.down_blocks.1.resnets.1.norm1.bias", "encoder.down_blocks.1.resnets.1.conv1.conv.weight", "encoder.down_blocks.1.resnets.1.conv1.conv.bias", "encoder.down_blocks.1.resnets.1.norm2.weight", "encoder.down_blocks.1.resnets.1.norm2.bias", "encoder.down_blocks.1.resnets.1.conv2.conv.weight", "encoder.down_blocks.1.resnets.1.conv2.conv.bias", "encoder.down_blocks.1.downsamplers.0.conv.conv.weight", "encoder.down_blocks.1.downsamplers.0.conv.conv.bias", "encoder.down_blocks.2.resnets.0.norm1.weight", "encoder.down_blocks.2.resnets.0.norm1.bias", "encoder.down_blocks.2.resnets.0.conv1.conv.weight", "encoder.down_blocks.2.resnets.0.conv1.conv.bias", "encoder.down_blocks.2.resnets.0.norm2.weight", "encoder.down_blocks.2.resnets.0.norm2.bias", "encoder.down_blocks.2.resnets.0.conv2.conv.weight", "encoder.down_blocks.2.resnets.0.conv2.conv.bias", "encoder.down_blocks.2.resnets.0.conv_shortcut.conv.weight", "encoder.down_blocks.2.resnets.0.conv_shortcut.conv.bias", "encoder.down_blocks.2.resnets.1.norm1.weight", "encoder.down_blocks.2.resnets.1.norm1.bias", "encoder.down_blocks.2.resnets.1.conv1.conv.weight", "encoder.down_blocks.2.resnets.1.conv1.conv.bias", "encoder.down_blocks.2.resnets.1.norm2.weight", "encoder.down_blocks.2.resnets.1.norm2.bias", "encoder.down_blocks.2.resnets.1.conv2.conv.weight", "encoder.down_blocks.2.resnets.1.conv2.conv.bias", "encoder.down_blocks.2.downsamplers.0.conv.conv.weight", "encoder.down_blocks.2.downsamplers.0.conv.conv.bias", "encoder.down_blocks.3.resnets.0.norm1.weight", "encoder.down_blocks.3.resnets.0.norm1.bias", "encoder.down_blocks.3.resnets.0.conv1.conv.weight", "encoder.down_blocks.3.resnets.0.conv1.conv.bias", "encoder.down_blocks.3.resnets.0.norm2.weight", "encoder.down_blocks.3.resnets.0.norm2.bias", "encoder.down_blocks.3.resnets.0.conv2.conv.weight", "encoder.down_blocks.3.resnets.0.conv2.conv.bias", "encoder.down_blocks.3.resnets.1.norm1.weight", "encoder.down_blocks.3.resnets.1.norm1.bias", "encoder.down_blocks.3.resnets.1.conv1.conv.weight", "encoder.down_blocks.3.resnets.1.conv1.conv.bias", "encoder.down_blocks.3.resnets.1.norm2.weight", "encoder.down_blocks.3.resnets.1.norm2.bias", "encoder.down_blocks.3.resnets.1.conv2.conv.weight", "encoder.down_blocks.3.resnets.1.conv2.conv.bias", "encoder.mid_block.attentions.0.group_norm.weight", "encoder.mid_block.attentions.0.group_norm.bias", "encoder.mid_block.attentions.0.to_q.weight", "encoder.mid_block.attentions.0.to_q.bias", "encoder.mid_block.attentions.0.to_k.weight", "encoder.mid_block.attentions.0.to_k.bias", "encoder.mid_block.attentions.0.to_v.weight", "encoder.mid_block.attentions.0.to_v.bias", "encoder.mid_block.attentions.0.to_out.0.weight", "encoder.mid_block.attentions.0.to_out.0.bias", "encoder.mid_block.resnets.0.norm1.weight", "encoder.mid_block.resnets.0.norm1.bias", "encoder.mid_block.resnets.0.conv1.conv.weight", "encoder.mid_block.resnets.0.conv1.conv.bias", "encoder.mid_block.resnets.0.norm2.weight", "encoder.mid_block.resnets.0.norm2.bias", "encoder.mid_block.resnets.0.conv2.conv.weight", "encoder.mid_block.resnets.0.conv2.conv.bias", "encoder.mid_block.resnets.1.norm1.weight", "encoder.mid_block.resnets.1.norm1.bias", "encoder.mid_block.resnets.1.conv1.conv.weight", "encoder.mid_block.resnets.1.conv1.conv.bias", "encoder.mid_block.resnets.1.norm2.weight", "encoder.mid_block.resnets.1.norm2.bias", "encoder.mid_block.resnets.1.conv2.conv.weight", "encoder.mid_block.resnets.1.conv2.conv.bias", "encoder.conv_norm_out.weight", "encoder.conv_norm_out.bias", "decoder.up_blocks.0.resnets.0.norm1.weight", "decoder.up_blocks.0.resnets.0.norm1.bias", "decoder.up_blocks.0.resnets.0.conv1.conv.weight", "decoder.up_blocks.0.resnets.0.conv1.conv.bias", "decoder.up_blocks.0.resnets.0.norm2.weight", "decoder.up_blocks.0.resnets.0.norm2.bias", "decoder.up_blocks.0.resnets.0.conv2.conv.weight", "decoder.up_blocks.0.resnets.0.conv2.conv.bias", "decoder.up_blocks.0.resnets.1.norm1.weight", "decoder.up_blocks.0.resnets.1.norm1.bias", "decoder.up_blocks.0.resnets.1.conv1.conv.weight", "decoder.up_blocks.0.resnets.1.conv1.conv.bias", "decoder.up_blocks.0.resnets.1.norm2.weight", "decoder.up_blocks.0.resnets.1.norm2.bias", "decoder.up_blocks.0.resnets.1.conv2.conv.weight", "decoder.up_blocks.0.resnets.1.conv2.conv.bias", "decoder.up_blocks.0.resnets.2.norm1.weight", "decoder.up_blocks.0.resnets.2.norm1.bias", "decoder.up_blocks.0.resnets.2.conv1.conv.weight", "decoder.up_blocks.0.resnets.2.conv1.conv.bias", "decoder.up_blocks.0.resnets.2.norm2.weight", "decoder.up_blocks.0.resnets.2.norm2.bias", "decoder.up_blocks.0.resnets.2.conv2.conv.weight", "decoder.up_blocks.0.resnets.2.conv2.conv.bias", "decoder.up_blocks.0.upsamplers.0.conv.conv.weight", "decoder.up_blocks.0.upsamplers.0.conv.conv.bias", "decoder.up_blocks.1.resnets.0.norm1.weight", "decoder.up_blocks.1.resnets.0.norm1.bias", "decoder.up_blocks.1.resnets.0.conv1.conv.weight", "decoder.up_blocks.1.resnets.0.conv1.conv.bias", "decoder.up_blocks.1.resnets.0.norm2.weight", "decoder.up_blocks.1.resnets.0.norm2.bias", "decoder.up_blocks.1.resnets.0.conv2.conv.weight", "decoder.up_blocks.1.resnets.0.conv2.conv.bias", "decoder.up_blocks.1.resnets.1.norm1.weight", "decoder.up_blocks.1.resnets.1.norm1.bias", "decoder.up_blocks.1.resnets.1.conv1.conv.weight", "decoder.up_blocks.1.resnets.1.conv1.conv.bias", "decoder.up_blocks.1.resnets.1.norm2.weight", "decoder.up_blocks.1.resnets.1.norm2.bias", "decoder.up_blocks.1.resnets.1.conv2.conv.weight", "decoder.up_blocks.1.resnets.1.conv2.conv.bias", "decoder.up_blocks.1.resnets.2.norm1.weight", "decoder.up_blocks.1.resnets.2.norm1.bias", "decoder.up_blocks.1.resnets.2.conv1.conv.weight", "decoder.up_blocks.1.resnets.2.conv1.conv.bias", "decoder.up_blocks.1.resnets.2.norm2.weight", "decoder.up_blocks.1.resnets.2.norm2.bias", "decoder.up_blocks.1.resnets.2.conv2.conv.weight", "decoder.up_blocks.1.resnets.2.conv2.conv.bias", "decoder.up_blocks.1.upsamplers.0.conv.conv.weight", "decoder.up_blocks.1.upsamplers.0.conv.conv.bias", "decoder.up_blocks.2.resnets.0.norm1.weight", "decoder.up_blocks.2.resnets.0.norm1.bias", "decoder.up_blocks.2.resnets.0.conv1.conv.weight", "decoder.up_blocks.2.resnets.0.conv1.conv.bias", "decoder.up_blocks.2.resnets.0.norm2.weight", "decoder.up_blocks.2.resnets.0.norm2.bias", "decoder.up_blocks.2.resnets.0.conv2.conv.weight", "decoder.up_blocks.2.resnets.0.conv2.conv.bias", "decoder.up_blocks.2.resnets.0.conv_shortcut.conv.weight", "decoder.up_blocks.2.resnets.0.conv_shortcut.conv.bias", "decoder.up_blocks.2.resnets.1.norm1.weight", "decoder.up_blocks.2.resnets.1.norm1.bias", "decoder.up_blocks.2.resnets.1.conv1.conv.weight", "decoder.up_blocks.2.resnets.1.conv1.conv.bias", "decoder.up_blocks.2.resnets.1.norm2.weight", "decoder.up_blocks.2.resnets.1.norm2.bias", "decoder.up_blocks.2.resnets.1.conv2.conv.weight", "decoder.up_blocks.2.resnets.1.conv2.conv.bias", "decoder.up_blocks.2.resnets.2.norm1.weight", "decoder.up_blocks.2.resnets.2.norm1.bias", "decoder.up_blocks.2.resnets.2.conv1.conv.weight", "decoder.up_blocks.2.resnets.2.conv1.conv.bias", "decoder.up_blocks.2.resnets.2.norm2.weight", "decoder.up_blocks.2.resnets.2.norm2.bias", "decoder.up_blocks.2.resnets.2.conv2.conv.weight", "decoder.up_blocks.2.resnets.2.conv2.conv.bias", "decoder.up_blocks.2.upsamplers.0.conv.conv.weight", "decoder.up_blocks.2.upsamplers.0.conv.conv.bias", "decoder.up_blocks.3.resnets.0.norm1.weight", "decoder.up_blocks.3.resnets.0.norm1.bias", "decoder.up_blocks.3.resnets.0.conv1.conv.weight", "decoder.up_blocks.3.resnets.0.conv1.conv.bias", "decoder.up_blocks.3.resnets.0.norm2.weight", "decoder.up_blocks.3.resnets.0.norm2.bias", "decoder.up_blocks.3.resnets.0.conv2.conv.weight", "decoder.up_blocks.3.resnets.0.conv2.conv.bias", "decoder.up_blocks.3.resnets.0.conv_shortcut.conv.weight", "decoder.up_blocks.3.resnets.0.conv_shortcut.conv.bias", "decoder.up_blocks.3.resnets.1.norm1.weight", "decoder.up_blocks.3.resnets.1.norm1.bias", "decoder.up_blocks.3.resnets.1.conv1.conv.weight", "decoder.up_blocks.3.resnets.1.conv1.conv.bias", "decoder.up_blocks.3.resnets.1.norm2.weight", "decoder.up_blocks.3.resnets.1.norm2.bias", "decoder.up_blocks.3.resnets.1.conv2.conv.weight", "decoder.up_blocks.3.resnets.1.conv2.conv.bias", "decoder.up_blocks.3.resnets.2.norm1.weight", "decoder.up_blocks.3.resnets.2.norm1.bias", "decoder.up_blocks.3.resnets.2.conv1.conv.weight", "decoder.up_blocks.3.resnets.2.conv1.conv.bias", "decoder.up_blocks.3.resnets.2.norm2.weight", "decoder.up_blocks.3.resnets.2.norm2.bias", "decoder.up_blocks.3.resnets.2.conv2.conv.weight", "decoder.up_blocks.3.resnets.2.conv2.conv.bias", "decoder.mid_block.attentions.0.group_norm.weight", "decoder.mid_block.attentions.0.group_norm.bias", "decoder.mid_block.attentions.0.to_q.weight", "decoder.mid_block.attentions.0.to_q.bias", "decoder.mid_block.attentions.0.to_k.weight", "decoder.mid_block.attentions.0.to_k.bias", "decoder.mid_block.attentions.0.to_v.weight", "decoder.mid_block.attentions.0.to_v.bias", "decoder.mid_block.attentions.0.to_out.0.weight", "decoder.mid_block.attentions.0.to_out.0.bias", "decoder.mid_block.resnets.0.norm1.weight", "decoder.mid_block.resnets.0.norm1.bias", "decoder.mid_block.resnets.0.conv1.conv.weight", "decoder.mid_block.resnets.0.conv1.conv.bias", "decoder.mid_block.resnets.0.norm2.weight", "decoder.mid_block.resnets.0.norm2.bias", "decoder.mid_block.resnets.0.conv2.conv.weight", "decoder.mid_block.resnets.0.conv2.conv.bias", "decoder.mid_block.resnets.1.norm1.weight", "decoder.mid_block.resnets.1.norm1.bias", "decoder.mid_block.resnets.1.conv1.conv.weight", "decoder.mid_block.resnets.1.conv1.conv.bias", "decoder.mid_block.resnets.1.norm2.weight", "decoder.mid_block.resnets.1.norm2.bias", "decoder.mid_block.resnets.1.conv2.conv.weight", "decoder.mid_block.resnets.1.conv2.conv.bias", "decoder.conv_norm_out.weight", "decoder.conv_norm_out.bias". Unexpected key(s) in state_dict: "encoder.down.0.block.0.conv1.conv.bias", "encoder.down.0.block.0.conv1.conv.weight", "encoder.down.0.block.0.conv2.conv.bias", "encoder.down.0.block.0.conv2.conv.weight", "encoder.down.0.block.0.norm1.bias", "encoder.down.0.block.0.norm1.weight", "encoder.down.0.block.0.norm2.bias", "encoder.down.0.block.0.norm2.weight", "encoder.down.0.block.1.conv1.conv.bias", "encoder.down.0.block.1.conv1.conv.weight", "encoder.down.0.block.1.conv2.conv.bias", "encoder.down.0.block.1.conv2.conv.weight", "encoder.down.0.block.1.norm1.bias", "encoder.down.0.block.1.norm1.weight", "encoder.down.0.block.1.norm2.bias", "encoder.down.0.block.1.norm2.weight", "encoder.down.0.downsample.conv.conv.bias", "encoder.down.0.downsample.conv.conv.weight", "encoder.down.1.block.0.conv1.conv.bias", "encoder.down.1.block.0.conv1.conv.weight", "encoder.down.1.block.0.conv2.conv.bias", "encoder.down.1.block.0.conv2.conv.weight", "encoder.down.1.block.0.nin_shortcut.conv.bias", "encoder.down.1.block.0.nin_shortcut.conv.weight", "encoder.down.1.block.0.norm1.bias", "encoder.down.1.block.0.norm1.weight", "encoder.down.1.block.0.norm2.bias", "encoder.down.1.block.0.norm2.weight", "encoder.down.1.block.1.conv1.conv.bias", "encoder.down.1.block.1.conv1.conv.weight", "encoder.down.1.block.1.conv2.conv.bias", "encoder.down.1.block.1.conv2.conv.weight", "encoder.down.1.block.1.norm1.bias", "encoder.down.1.block.1.norm1.weight", "encoder.down.1.block.1.norm2.bias", "encoder.down.1.block.1.norm2.weight", "encoder.down.1.downsample.conv.conv.bias", "encoder.down.1.downsample.conv.conv.weight", "encoder.down.2.block.0.conv1.conv.bias", "encoder.down.2.block.0.conv1.conv.weight", "encoder.down.2.block.0.conv2.conv.bias", "encoder.down.2.block.0.conv2.conv.weight", "encoder.down.2.block.0.nin_shortcut.conv.bias", "encoder.down.2.block.0.nin_shortcut.conv.weight", "encoder.down.2.block.0.norm1.bias", "encoder.down.2.block.0.norm1.weight", "encoder.down.2.block.0.norm2.bias", "encoder.down.2.block.0.norm2.weight", "encoder.down.2.block.1.conv1.conv.bias", "encoder.down.2.block.1.conv1.conv.weight", "encoder.down.2.block.1.conv2.conv.bias", "encoder.down.2.block.1.conv2.conv.weight", "encoder.down.2.block.1.norm1.bias", "encoder.down.2.block.1.norm1.weight", "encoder.down.2.block.1.norm2.bias", "encoder.down.2.block.1.norm2.weight", "encoder.down.2.downsample.conv.conv.bias", "encoder.down.2.downsample.conv.conv.weight", "encoder.down.3.block.0.conv1.conv.bias", "encoder.down.3.block.0.conv1.conv.weight", "encoder.down.3.block.0.conv2.conv.bias", "encoder.down.3.block.0.conv2.conv.weight", "encoder.down.3.block.0.norm1.bias", "encoder.down.3.block.0.norm1.weight", "encoder.down.3.block.0.norm2.bias", "encoder.down.3.block.0.norm2.weight", "encoder.down.3.block.1.conv1.conv.bias", "encoder.down.3.block.1.conv1.conv.weight", "encoder.down.3.block.1.conv2.conv.bias", "encoder.down.3.block.1.conv2.conv.weight", "encoder.down.3.block.1.norm1.bias", "encoder.down.3.block.1.norm1.weight", "encoder.down.3.block.1.norm2.bias", "encoder.down.3.block.1.norm2.weight", "encoder.mid.attn_1.k.bias", "encoder.mid.attn_1.k.weight", "encoder.mid.attn_1.norm.bias", "encoder.mid.attn_1.norm.weight", "encoder.mid.attn_1.proj_out.bias", "encoder.mid.attn_1.proj_out.weight", "encoder.mid.attn_1.q.bias", "encoder.mid.attn_1.q.weight", "encoder.mid.attn_1.v.bias", "encoder.mid.attn_1.v.weight", "encoder.mid.block_1.conv1.conv.bias", "encoder.mid.block_1.conv1.conv.weight", "encoder.mid.block_1.conv2.conv.bias", "encoder.mid.block_1.conv2.conv.weight", "encoder.mid.block_1.norm1.bias", "encoder.mid.block_1.norm1.weight", "encoder.mid.block_1.norm2.bias", "encoder.mid.block_1.norm2.weight", "encoder.mid.block_2.conv1.conv.bias", "encoder.mid.block_2.conv1.conv.weight", "encoder.mid.block_2.conv2.conv.bias", "encoder.mid.block_2.conv2.conv.weight", "encoder.mid.block_2.norm1.bias", "encoder.mid.block_2.norm1.weight", "encoder.mid.block_2.norm2.bias", "encoder.mid.block_2.norm2.weight", "encoder.norm_out.bias", "encoder.norm_out.weight", "decoder.mid.attn_1.k.bias", "decoder.mid.attn_1.k.weight", "decoder.mid.attn_1.norm.bias", "decoder.mid.attn_1.norm.weight", "decoder.mid.attn_1.proj_out.bias", "decoder.mid.attn_1.proj_out.weight", "decoder.mid.attn_1.q.bias", "decoder.mid.attn_1.q.weight", "decoder.mid.attn_1.v.bias", "decoder.mid.attn_1.v.weight", "decoder.mid.block_1.conv1.conv.bias", "decoder.mid.block_1.conv1.conv.weight", "decoder.mid.block_1.conv2.conv.bias", "decoder.mid.block_1.conv2.conv.weight", "decoder.mid.block_1.norm1.bias", "decoder.mid.block_1.norm1.weight", "decoder.mid.block_1.norm2.bias", "decoder.mid.block_1.norm2.weight", "decoder.mid.block_2.conv1.conv.bias", "decoder.mid.block_2.conv1.conv.weight", "decoder.mid.block_2.conv2.conv.bias", "decoder.mid.block_2.conv2.conv.weight", "decoder.mid.block_2.norm1.bias", "decoder.mid.block_2.norm1.weight", "decoder.mid.block_2.norm2.bias", "decoder.mid.block_2.norm2.weight", "decoder.norm_out.bias", "decoder.norm_out.weight", "decoder.up.0.block.0.conv1.conv.bias", "decoder.up.0.block.0.conv1.conv.weight", "decoder.up.0.block.0.conv2.conv.bias", "decoder.up.0.block.0.conv2.conv.weight", "decoder.up.0.block.0.nin_shortcut.conv.bias", "decoder.up.0.block.0.nin_shortcut.conv.weight", "decoder.up.0.block.0.norm1.bias", "decoder.up.0.block.0.norm1.weight", "decoder.up.0.block.0.norm2.bias", "decoder.up.0.block.0.norm2.weight", "decoder.up.0.block.1.conv1.conv.bias", "decoder.up.0.block.1.conv1.conv.weight", "decoder.up.0.block.1.conv2.conv.bias", "decoder.up.0.block.1.conv2.conv.weight", "decoder.up.0.block.1.norm1.bias", "decoder.up.0.block.1.norm1.weight", "decoder.up.0.block.1.norm2.bias", "decoder.up.0.block.1.norm2.weight", "decoder.up.0.block.2.conv1.conv.bias", "decoder.up.0.block.2.conv1.conv.weight", "decoder.up.0.block.2.conv2.conv.bias", "decoder.up.0.block.2.conv2.conv.weight", "decoder.up.0.block.2.norm1.bias", "decoder.up.0.block.2.norm1.weight", "decoder.up.0.block.2.norm2.bias", "decoder.up.0.block.2.norm2.weight", "decoder.up.1.block.0.conv1.conv.bias", "decoder.up.1.block.0.conv1.conv.weight", "decoder.up.1.block.0.conv2.conv.bias", "decoder.up.1.block.0.conv2.conv.weight", "decoder.up.1.block.0.nin_shortcut.conv.bias", "decoder.up.1.block.0.nin_shortcut.conv.weight", "decoder.up.1.block.0.norm1.bias", "decoder.up.1.block.0.norm1.weight", "decoder.up.1.block.0.norm2.bias", "decoder.up.1.block.0.norm2.weight", "decoder.up.1.block.1.conv1.conv.bias", "decoder.up.1.block.1.conv1.conv.weight", "decoder.up.1.block.1.conv2.conv.bias", "decoder.up.1.block.1.conv2.conv.weight", "decoder.up.1.block.1.norm1.bias", "decoder.up.1.block.1.norm1.weight", "decoder.up.1.block.1.norm2.bias", "decoder.up.1.block.1.norm2.weight", "decoder.up.1.block.2.conv1.conv.bias", "decoder.up.1.block.2.conv1.conv.weight", "decoder.up.1.block.2.conv2.conv.bias", "decoder.up.1.block.2.conv2.conv.weight", "decoder.up.1.block.2.norm1.bias", "decoder.up.1.block.2.norm1.weight", "decoder.up.1.block.2.norm2.bias", "decoder.up.1.block.2.norm2.weight", "decoder.up.1.upsample.conv.conv.bias", "decoder.up.1.upsample.conv.conv.weight", "decoder.up.2.block.0.conv1.conv.bias", "decoder.up.2.block.0.conv1.conv.weight", "decoder.up.2.block.0.conv2.conv.bias", "decoder.up.2.block.0.conv2.conv.weight", "decoder.up.2.block.0.norm1.bias", "decoder.up.2.block.0.norm1.weight", "decoder.up.2.block.0.norm2.bias", "decoder.up.2.block.0.norm2.weight", "decoder.up.2.block.1.conv1.conv.bias", "decoder.up.2.block.1.conv1.conv.weight", "decoder.up.2.block.1.conv2.conv.bias", "decoder.up.2.block.1.conv2.conv.weight", "decoder.up.2.block.1.norm1.bias", "decoder.up.2.block.1.norm1.weight", "decoder.up.2.block.1.norm2.bias", "decoder.up.2.block.1.norm2.weight", "decoder.up.2.block.2.conv1.conv.bias", "decoder.up.2.block.2.conv1.conv.weight", "decoder.up.2.block.2.conv2.conv.bias", "decoder.up.2.block.2.conv2.conv.weight", "decoder.up.2.block.2.norm1.bias", "decoder.up.2.block.2.norm1.weight", "decoder.up.2.block.2.norm2.bias", "decoder.up.2.block.2.norm2.weight", "decoder.up.2.upsample.conv.conv.bias", "decoder.up.2.upsample.conv.conv.weight", "decoder.up.3.block.0.conv1.conv.bias", "decoder.up.3.block.0.conv1.conv.weight", "decoder.up.3.block.0.conv2.conv.bias", "decoder.up.3.block.0.conv2.conv.weight", "decoder.up.3.block.0.norm1.bias", "decoder.up.3.block.0.norm1.weight", "decoder.up.3.block.0.norm2.bias", "decoder.up.3.block.0.norm2.weight", "decoder.up.3.block.1.conv1.conv.bias", "decoder.up.3.block.1.conv1.conv.weight", "decoder.up.3.block.1.conv2.conv.bias", "decoder.up.3.block.1.conv2.conv.weight", "decoder.up.3.block.1.norm1.bias", "decoder.up.3.block.1.norm1.weight", "decoder.up.3.block.1.norm2.bias", "decoder.up.3.block.1.norm2.weight", "decoder.up.3.block.2.conv1.conv.bias", "decoder.up.3.block.2.conv1.conv.weight", "decoder.up.3.block.2.conv2.conv.bias", "decoder.up.3.block.2.conv2.conv.weight", "decoder.up.3.block.2.norm1.bias", "decoder.up.3.block.2.norm1.weight", "decoder.up.3.block.2.norm2.bias", "decoder.up.3.block.2.norm2.weight", "decoder.up.3.upsample.conv.conv.bias", "decoder.up.3.upsample.conv.conv.weight".
@PetePabloΒ Download the VAE from the instructions in the workflow.
@Sam_AΒ worked for me, thank you
@Sam_AΒ where to put it and whiche vae please direct links
@Ash51Β The link for VAE and all other models are in the workflow! The BIG red note at the start of the workflow.
i keep getting this error
Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
Which node return this error? Are you trying to run the firtst time offline?
I had this error. I found my llava-llama-3-8b-text-encoder-tokenizer safetensor files were corrupted somehow as their SHA256 did not match those on Hugging Face. You can verify SHA256 on windows through PowerShell command File-GetHash. Redownload of the files fixed the problem.
I got this error after a fresh install of a ComfyUI. How is it possible to fit almost 16GB LLM model in GPU? I have 16GB VRAM but got a not enough memory error. So how its possible to run even on 12GB? Can someone help me please?
sformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Loading text encoder model (vlm) from: C:\pinokio\api\comfy.git\app\models\LLM\llava-llama-3-8b-v1_1-transformers
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [02:50<00:00, 42.65s/it]
Text encoder to dtype: torch.bfloat16
!!! Exception during processing !!! Allocation on device
Traceback (most recent call last):
File "C:\pinokio\api\comfy.git\app\execution.py", line 327, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "C:\pinokio\api\comfy.git\app\execution.py", line 202, in get_output_data
return_values = mapnode_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "C:\pinokio\api\comfy.git\app\execution.py", line 174, in mapnode_over_list
process_inputs(input_dict, i)
File "C:\pinokio\api\comfy.git\app\execution.py", line 163, in process_inputs
results.append(getattr(obj, func)(**inputs))
File "C:\pinokio\api\comfy.git\app\custom_nodes\comfyui-hunyuanvideowrapper\nodes.py", line 684, in loadmodel
text_encoder = TextEncoder(
File "C:\pinokio\api\comfy.git\app\custom_nodes\comfyui-hunyuanvideowrapper\hyvideo\text_encoder\__init__.py", line 167, in init
self.model, self.model_path = load_text_encoder(
File "C:\pinokio\api\comfy.git\app\custom_nodes\comfyui-hunyuanvideowrapper\hyvideo\text_encoder\__init__.py", line 64, in load_text_encoder
text_encoder = text_encoder.to(device)
File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\transformers\modeling_utils.py", line 3110, in to
return super().to(*args, **kwargs)
File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 1340, in to
return self._apply(convert)
File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
module._apply(fn)
File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
module._apply(fn)
File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
param_applied = fn(param)
File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
return t.to(
torch.OutOfMemoryError: Allocation on device
Got an OOM, unloading all loaded models.
Prompt executed in 1870.39 seconds
Same error, no idea how to fix it, sadly. I'll keep looking and post back here if I find anything.
same error :(
I get this error on the text decoder download:
DownloadAndLoadHyVideoTextEncoder
Failed to import transformers.models.timm_wrapper.configuration_timm_wrapper because of the following error (look up to see its traceback): cannot import name 'ImageNetInfo' from 'timm.data' (C:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\timm\data\__init__.py)
Your comfy and nodes are updated?
@Sam_AΒ Yes, just did update all to make sure, still the same error
Are you using the workflow in the first run with internet? Because it will download some models. I'm not sure if it's the problem.
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/332
This worked for me
@asd231734624Β Thank you! I did that (changed directory so the embeded pythyon gets updated) and got this error when I tried the workflow again:
No such file or directory: "C:\\ComfyUI_windows_portable\\ComfyUI\\models\\LLM\\llava-llama-3-8b-v1_1-transformers\\model-00001-of-00004.safetensors"
I then followed this advice:
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/81
and deleted the folders in ComfyUI_windows_portable\ComfyUI\models\LLM
because it seems the first try the download started something went wrong.
I tried again and the download starts again now. Don't be irritated if it stays on "fetching files" a while, it's 15GB, for anyone who reads this.
Then I got this error on a new run:
HyVideoTextImageEncode
unsupported operand type(s) for //: 'int' and 'NoneType'
and used this link to downgrade transformer version:
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269
after that, new error:
HyVideoModelLoader
Can't import SageAttention: No module named 'sageattention'
This says we need triton, that is not supporting windows:
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/108
So this workflow does not work for windows I guess?
@DUDE33Β Yeah. It works on windows. You just need to install the module the error says is missing in your venv. sageattention.
@Sam_AΒ Ok, thanks for answering! I'm just irritated because the forum post says it needs triton, which is not available for windows. I now found this vid https://www.youtube.com/watch?v=DigvHsn_Qrw
Do I have to do all that or is there a easier way?
@DUDE33Β If the only error now is the missing "sageattention", you just need to install it in your enviroment using pip and it will work. It's not normal to have so much problems trying to use this workflow. Idk what is happening with you in special. lol
@Sam_A I'm just new to this haha, like a lot of other people here. I installed sageattention via pip command.
PS:
C:\ComfyUI_windows_portable\python_embeded> python.exe -m pip install sageattention
Collecting sageattention
Downloading sageattention-1.0.6-py3-none-any.whl.metadata (5.6 kB)
Downloading sageattention-1.0.6-py3-none-any.whl (20 kB)
Installing collected packages: sageattention
Successfully installed sageattention-1.0.6
PS:
C:\ComfyUI_windows_portable\python_embeded> python.exe -m pip install --upgrade sageattention
Requirement already satisfied: sageattention in c:\users\X\appdata\local\programs\python\python312\lib\site-packages (1.0.6)
Restarted comfyui but still get the error, did I do something wrong?
Also tried via the git url
@DUDE33Β I had same problem and change Sampler sdpa to comfy then worked.
@asd231734624Β Thank you very much =). The workflow works now, sage just wont work I guess, if anyone has an idea what to check, let me know.
What does Can't import SageAttention: No module named 'sageattention' error mean?
You need to install SageAttention in your venv environment of your ComfyUi installation.
pip install sageattention
I think that you might also need to install triton:
https://www.youtube.com/watch?v=DigvHsn_Qrw
I did pip install sageattention as genural wrote.. that didn't fix the problem. I didn't go through the youtube video tutorial as people in the comments are still complaining that it doesn't work. So I saw in the workflow that it stops at node "HunyuanVideo Model Loader" and attention_mode references this sageattention. So I simply changed it to the comfy option. That worked. Try the other options also, as I will. Not sure how the quality or speed is affected. If it's just speed, I'm not going to whine too much about it. If it affects quality without sageattention, then that's unfortunate. If someone has sageattention working, please comment on if there are quality differences, or if it's just speed.
@civitai7_Β good
I'm getting the following error-
HyVideoTextImageEncode
text input must be of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples).
Everything left as default aside from the uploaded image, which is similar to the test ones. I did have to manually download the xtuner llm as the workflow was unable to locate the source from hugging face.
Kijai LLM DID download from source but gives the following error-
HyVideoTextImageEncode
Only vision_languague models support image input
I've run into an error I can't work out. Every time the compile gets to this point, ComfyUI crashes out and disconnects. Does anyone have any idea what's going on here?
It can't seem to get past the HyuanVideo Sampler. I'm not a coder, so any help would be appreciated.
My GPU is a 4090 with 24GB of RAM - So I figure it should be able to handle it.
Here's the process:
got prompt
encoded latents shape torch.Size([1, 16, 1, 96, 54])
Loading text encoder model (clipL) from: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\models\clip\clip-vit-large-patch14
Text encoder to dtype: torch.float16
Loading tokenizer (clipL) from: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\models\clip\clip-vit-large-patch14
2025-02-13 19:44:31,692 WARNING: Warn!: You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Loading text encoder model (vlm) from: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\models\LLM\llava-llama-3-8b-v1_1-transformers
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [00:10<00:00, 2.58s/it]
Text encoder to dtype: torch.bfloat16
Loading tokenizer (vlm) from: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\models\LLM\llava-llama-3-8b-v1_1-transformers
2025-02-13 19:44:48,341 WARNING: Warn!: Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50.
2025-02-13 19:44:48,546 WARNING: Warn!: Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50.
vlm prompt attention_mask shape: torch.Size([1, 208]), masked tokens: 208
clipL prompt attention_mask shape: torch.Size([1, 77]), masked tokens: 17
model_type FLOW
The config attributes {'use_flow_sigmas': True, 'prediction_type': 'flow_prediction'} were passed to FlowMatchDiscreteScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.
Scheduler config: FrozenDict({'num_train_timesteps': 1000, 'flow_shift': 9.0, 'reverse': True, 'solver': 'euler', 'n_tokens': None, '_use_default_values': ['n_tokens', 'num_train_timesteps']})
Using accelerate to load and assign model weights to device...
Loading LoRA: img2vid with strength: 1.0
Requested to load HyVideoModel
loaded completely 21443.013157653808 12555.953247070312 True
Input (height, width, video_length) = (768, 432, 73)
The config attributes {'use_flow_sigmas': True, 'prediction_type': 'flow_prediction'} were passed to FlowMatchDiscreteScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.
Scheduler config: FrozenDict({'num_train_timesteps': 1000, 'flow_shift': 9.0, 'reverse': True, 'solver': 'euler', 'n_tokens': None, '_use_default_values': ['n_tokens', 'num_train_timesteps']})
Single input latent frame detected, LeapFusion img2vid enabled
Sampling 73 frames in 19 latents at 432x768 with 9 inference steps
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9/9 [01:20<00:00, 9.00s/it]
Allocated memory: memory=12.301 GB
Max allocated memory: max_memory=16.878 GB
Max reserved memory: max_reserved=18.969 GB
C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable>pause
Press any key to continue . . .
did you fix it
Did you try restarting you PC.
I'm not sure about your particular problem, but whenever comfy gives this "Press any key to continue . . .", without any particular reason, and I'm sure that memory isn't a problem. I close everything and restart the PC. That has always worked for me.
@nymicalΒ thanks for the tip! I'll set up a clean install of comfy and see if I can get it working - if I run into that problem again, I'll give your suggestion a shot.
@miraishounen not yet I'm afraid. I'll let you know of my progress!Β
Hi, could someone offer a little advice for getting better results? I always get something faded with crosshatch lines. I posted this to demonstrate: https://civitai.com/posts/12909266 That's one of the better results I got. I kept the settings almost identical to those in the v1.1 workflow, only replacing the sexy dance lora and the source image. This was the source image in case anyone wants to replicate it: https://civitai.com/images/54695445
Getting this after doing the solution to the //: 'int' and 'NoneType'" error:
"Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50."
I don't know where to write that, if that's even the solution
I had the //: 'int' and 'NoneType'" error: rolling back the tansformers from v4.48.0 to python.exe -m pip install transformers==4.47.0 fixed it for me.
installing 4.47.0 doesnt help me
Can you make a video how to setup the workflow? I dont understand how "I Select the "Long Side of Image" you wish (before upscale)"
It's simpler than you think. You will add an input image in the workflow. You just need to define what will be the bigger size of the image and the workflow will auto calculate the smaller size for you.
im having a problem with
HyVideoSampler
Failed to find C compiler. Please specify via CC environment variable
i can't get through this
running into the same issue after installing triton + sageattention
I did it following that workflow
https://github.com/woct0rdho/triton-windows?tab=readme-ov-file
I have a problem with this one:
"HyVideoTextImageEncode
text input must be of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples)."
Do you have a solution? Thanks
somehow, i get the error:
"Only vision_languague models support image input"
it seems that the text-image-encode wouldn't take the image as prompt?
Did you change the original encoder?
oh,okay, i used the text-encoder-tokenizer instead of the transformer,is that the problem?
@tetrarrow842Β Probably yes. To use image as prompt
Finally got i working, but how did you get the animated images to look the same as the original, mine turn into a completely different image.
Have fixed other errors I encountered but keep receiving this error and can not get past it:
"HyVideoModelLoader
Error while deserializing header: HeaderTooLarge File path: /workspace/ComfyUI/models/diffusion_models/hunyuan_video_FastVideo_720_fp8_e4m3fn.safetensors The safetensors file is corrupt or invalid. Make sure this is actually a safetensors file and not a ckpt or pt or other filetype."
Also get the same type of error for the vae file as well.
I'm afraid the triton dependencies make this one a no go for me. Wish I could make it work but I don't want to spend all day debugging that crap.
To be true, the new workflow I posted is better. Newer tech and don't use triton.
@Sam_AΒ That would be awesome. But the one I got from the download link required triton. Do you have a link to the non-triton one?
@ClocksmithΒ Of course bro! This one: https://civitai.com/models/1278247/skyreels-hunyuan-img2vid
It's newer and the result is better. It's slower becuase they didn't release the fast lora (that return result with less steps) for this version yet. But it's amazing!
i have this error now
'img_in.proj.weight' in HunyuanVideo Model Loader node
You know there is a newer I2V model?
https://civitai.com/models/1278247/skyreels-hunyuan-img2vid
Easier to install and better results.
Calculated padded input size per channel: (0 x 16 x 16). Kernel size: (1 x 1 x 1). Kernel size can't be greater than actual input size
When I use the decoding node, it prompts this error. I don't get an error when I use Python3.11.6 and torch.3.0 and cu121, but I do get an error when I use Python3.12.8 and torch2.6.0 cu126. The resolution I adjusted is 408*496.
408 / 16 = 25,5. Not a multiple of 16. Hence I made nodes to auto calculate dimentions, so this error doesn't happen. Change the image size so something multiple of 16.
@Sam_AΒ Thank you for your reply. I will try to modify the size to see if it works. However, I don't know why torch2.3 to be able to run with a size of 408, even if it is not a multiple of 16.
@voidyearΒ It's interesting to be true. I didn't know about it.
Great work! Love it, works great in my first tests. Am curious about output length -- I have tried 97 frames and can only seem to get 3 seconds. No errors, no black output, but it will not create any videos longer than the default 73 frames. I have tried with a smaller resolution image to no avail.
Any tips?
I'm using L40SδΈ¨48GB vramδΈ¨32GB ram so I think vram is not the issue...
tested it today but got:
* HyVideoSampler 6:
- Return type mismatch between linked nodes: context_options, received_type(FETAARGS) mismatch input_type(HYVIDCONTEXT)
Output will be ignored
wanted to mention my troubleshooting process for this spoiler alert it did not work:
So I found out any sort of errors comfyui will circle with red context options under the node hunyuan video sampler connects to hunyuan video enhance node feta args
I disconnected these and things started to load after queue
then it tried downloading text encoders llava-llama-3-8b-text-encoder-tokenizer or llava-llama-3-8b-v1_1-transformers there was no telling out long it would take to download so i manually downloaded the files myself so I could track the ETA. then I got this error
DownloadAndLoadHyVideoTextEncoder
Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
to try and fix this i verified SHA256 hash for the model safetensor files in LLM folder for the llava-llama-3-8b-v1_1-transformers text encoder 2 of the files were correct so I redownloaded and the hashes matched hugging face but no dice. then I tried using the llava-llama-3-8b-text-encoder-tokenizer text encoder that would not work either getting the same error.
I feel like disconnected the node for context options caused this so I'm back to wondering how we fix our issue of
DownloadAndLoadHyVideoTextEncoder
Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
it would seem like feta args on enhance should be connected to feta args on sampler but trying that gives me
DownloadAndLoadHyVideoTextEncoder
Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
@redoctober2Β @zoom83 Being 100% honest, this model is already obsolete. It uses a Lora model to become a I2V workflow. It was good when we had no native workflows. But now we have it:
Wan2.1 (Best Quality, high Vram usage for great results)
https://civitai.com/models/1300201/wan-ai-img2vid-video-extend
Skyreels (Hunyuan Variant) (Good Quality, Mid Vram usage)
https://civitai.com/models/1278247/skyreels-hunyuan-img2vid
Hunyuan WF (I don't like the quality so much but I'm still testing. Lowest Vram usage and FAST lora!)
I dont know Im doing wrong. I use the provided workflow but it takes 1.5 hrs to generate a 5-second clip. And thats with sageattention and a RTX 5080.
Try to use Framepack I2V Hunyuan or ggfu with a 5080
16gb vram is not really good to use Hunyuan or Wan.
I rent a 5090 32gb on https://runpod.io?ref=gnspz552
Hi,
im getting this error
any fix
HyVideoSampler
cannot access local variable 'original_latents' where it is not associated with a value