CivArchive
    Hunyuan - Image2Video (Jan/2025) - obsolete - v1.1
    NSFW

    **Don't forget to Like πŸ‘ the model. ;)


    !!!This workflow is Obsolete!!! Some better options:

    Wan2.1 (Best Quality also slowest, high Vram usage for great results, but have GGUF options)

    https://civarchive.com/models/1300201/wan-ai-img2vid-video-extend

    Skyreels (Hunyuan Variant) (Good Quality, Mid Vram usage)

    https://civarchive.com/models/1278247/skyreels-hunyuan-img2vid

    Hunyuan WF (Fastest one. I don't like the quality so much but I'm still testing. Lowest Vram usage and FAST lora!)

    https://civarchive.com/models/1328592/hunyuan-wf-img2vid-fast


    *Just added a version without auto image resize due to the high amount of people having errors with it. The manual one will work 100%. Sorry about that :)

    **Error: unsupported operand type(s) for //: 'int' and 'NoneType'" error Fix: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269

    Straightforward, this is an Image-to-Video workflow using the resources we have today (January 2025) with Hunyuan models. Using I2V LeapFusion Lora plus IP2V encoding, it can be very consistent and, in my opinion, as good as an older Kling version in terms of consistency. It’s not perfect, but it delivers solid results if used well, especially with videos of humans.

    I kept it as simple as possible and didn’t include the faceswap node this time, but it’s a great addition if you’re planning to generate videos with human subjects. The VRAM usage depends heavily on the length and dimensions of the video you want to generate, but 12GB of VRAM is ideal to get good results.

    As always, instructions and links are included in the workflow. Don’t forget to update Comfy and HunyuanVideoWrapper nodes!

    That’s it. Leave a like and have fun!

    Description

    • Added Version with Manual Image resize for the ones having error with the automatic one.

    FAQ

    Comments (193)

    dominic1336756Jan 27, 2025Β· 9 reactions
    CivitAI

    ImageScale

    unsupported operand type(s) for /: 'NoneType' and 'int'

    Sam_A
    Author
    Jan 27, 2025Β· 1 reaction

    Fixed. Sorry about that. :)

    yajukunJan 28, 2025

    @Sam_AΒ I just DL'd the workflow and I still get this error? Is there something we need to change in the workflow? Thanks!

    shawnkaron295Jan 28, 2025

    I get the same error, any ideas?

    yajukunJan 28, 2025Β· 1 reaction

    @shawnkaron295Β I just saw this post and it fixed it for me.

    This is a known issue: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269

    Downgrade the transformers library to fix it.
    Run β€œpython.exe -m pip install transformers==4.47.0” in the python_embedded folder.

    Sam_A
    Author
    Jan 27, 2025Β· 3 reactions
    CivitAI

    Just reuploaded with the "unsupported operand type(s) for /: 'NoneType' and 'int'" problem solved. Sorry about that :)

    zengrathJan 27, 2025

    Thanks. i was just about to post that i am also getting error and was trying to figure it out.

    zengrathJan 27, 2025Β· 1 reaction

    I am guessing it's no updated yet on civit? i redownloaded and still getting:
    HyVideoTextImageEncode

    unsupported operand type(s) for //: 'int' and 'NoneType'

    Sam_A
    Author
    Jan 27, 2025

    @zengrathΒ On HyVideoTextImageEncode node?

    AicushJan 27, 2025

    @Sam_AΒ Yep, I tried downloading it and am getting the same issue too, got the recent uploaded version too.

    Sam_A
    Author
    Jan 27, 2025

    @AicushΒ Now I think it's fixed. At least I tried with many different images o_o

    funscripter627Jan 27, 2025Β· 2 reactions

    I got the error or a very similar one too, but it was because of my transformers library not being the right version. Had to downgrade to 4.47.0.

    SysDeepJan 27, 2025Β· 1 reaction

    Same Problem.

    Update from 8min ago same Error

    Sam_A
    Author
    Jan 27, 2025

    Well. I ust added a version without the auto image resize. I'm unable to identify the problem, so the alternative might work. It's boring to calculate the image sizes but it's good enough I think.

    SysDeepJan 27, 2025

    @Sam_AΒ sry same problem ^^" unsupported operand type(s) for //: 'int' and 'NoneType'

    Sam_A
    Author
    Jan 28, 2025

    @SysDeepΒ Using the one with manual image size????

    SysDeepJan 28, 2025

    @Sam_AΒ Yes 1.1 tested wont work. Image2Video-By-Sam-ManualSize

    SysDeepJan 28, 2025Β· 2 reactions

    @Sam_AΒ - Worked now thanks to funscripter627

    Transformers 4.47.0 works fine! i must downgrade it.

    Sam_A
    Author
    Jan 28, 2025Β· 1 reaction
    CivitAI

    Just added a version without auto image resize due to the high amount of people having errors with it. The manual one will work 100%. Sorry about that :)

    SysDeepJan 28, 2025Β· 4 reactions
    CivitAI

    How to get it work:

    All things inside Python (i use Windows11):

    - Downgrade Transformers to 4.47.0

    - Install sageattention

    - Install triton

    Then worked fine.

    liquidhead440Jan 28, 2025
    CivitAI

    Can someone tell me if this one or https://civitai.com/models/1180764/hunyuan-img2vid-leapfusion-lora?modelVersionId=1328798 is better , I am just wondering, since that version is working for me, it's not super good, but it does pretty okay, but if this one is better I am willing to pus some effort into all the workarounds yall are talking about. Because as of now, everything I would need to change just sounds like a lot of work, and with the 2 examples provided I am not really up for doing all that, and risking on breaking something while doing it, because we all know how one tiny thing can just mess up a whole lot of other workflows. Anyway I apreaciate the upload regardless. Just can't use it LOL

    Sam_A
    Author
    Jan 28, 2025

    My version use Lora and IP2V to reforce the result. It's just one small detail I added for better results. Also easy image resize...

    zengrathJan 28, 2025

    once i found correct instructions it only takes minutes to fix. If your on portable comfyui.

    go to your python_embeded folder. inside folder in address bar type cmd. In cmd window do:
    python -m pip uninstall transformers -y
    python -m pip install --upgrade transformers==4.47.0


    that's it. this fixed it for me.

    zengrathJan 28, 2025

    What i can say is, so far it's more consistent then other workflows i tried. Other workflows the face of the person for example would change too much where this one it doesn't. However as far as actions, it likely isn't going to do a whole lot of what you want right now. Hopefully official img2vid support will work better. It certainly is worth giving it a go though. At very least to be able to see some movement of still images is pretty neat without paying huge price of premium AI's to do it right now.

    liquidhead440Jan 28, 2025

    @Sam_AΒ thank you :)

    liquidhead440Jan 28, 2025

    @zengrathΒ well that sounds way easier than whatever I found, I looked into it during the day if I need to do anything else while I was at work not with my PC, and I got to say THANK YOU SO MUCH for this super easy explaination, I'm probably gonna do it either later today, or tomorrow. I guess hopefully I won't wreck anything in the mean time LMAO

    zengrathJan 28, 2025

    @liquidhead440Β  Hope it works for you. And after playing around with this workflow a while trying different source images. I been getting good results and finally with right prompting and lora's getting pretty cool results. So it's worth trying. I have a feeling even official img2video won't be perfect and will require trial and error, though i hope official support is more efficient and consistent

    yajukunJan 28, 2025Β· 2 reactions
    CivitAI

    Hi, are there specific sizes and resolutions that work best? What is the largest picture we can start with. Is there a limit to the number of frames/video length?

    Sam_A
    Author
    Jan 28, 2025Β· 1 reaction

    It depends all on your GPU Vram. With a 4070 I usually try 75 frames with 768x432 latent size. You can try to play around it and see what your GPU can handle.

    yajukunJan 28, 2025

    @Sam_AΒ I have been able to do 560x1024 portrait, 72+1 frames, uses 20-23GB on my 4090. I tried making longer videos and got multiple errors.

    Sam_A
    Author
    Jan 28, 2025Β· 1 reaction

    @yajukunΒ On my 4090 I tried to make the longest I could with a descent quality. I got around 9 sec video in 768x432. More than this I get the low memory error. Maybe in a 5090? Hehe.

    TroublesomeAJan 28, 2025Β· 1 reaction
    CivitAI

    Can I2V be done with the native nodes? I can't make kijai nodes work.

    illinarJan 28, 2025Β· 2 reactions
    CivitAI

    Do I have to use Sage Attention? With sdpa Sampler used to throw errors, now after update it just samples forever stuck at 0%.

    P.S. OMG it moved! 233s/it that's a bit much.

    goresj2932Jan 28, 2025Β· 1 reaction

    I'm not in front of my machine right now, but I've had luck switching from "sdpa" to "comfy". Does that do anything for you?

    wqn999Jan 28, 2025
    CivitAI

    I think I need help. I get the following message when I run

    HyVideoModelLoader

    Can't import SageAttention: No module named 'sageattention'

    Sam_A
    Author
    Jan 28, 2025Β· 1 reaction

    Just install it using pip and you're good to go.

    auroch22934Jan 28, 2025

    I installed it using GIT, and it still doesn't work, does it have to be with PIP?

    Sam_A
    Author
    Jan 28, 2025

    @auroch22934Β sageattention you install using pip in python. Open python on your terminal and use pip install sageattention.

    wqn999Jan 29, 2025Β· 1 reaction

    @Sam_AΒ Thank you very much, I think I will try this method

    Sam_A
    Author
    Jan 29, 2025

    @wange999Β If nothing work, you can ask to free gpt how to install it and it will give you the command and everything you need with more details than I'm able to do! :D

    auroch22934Jan 29, 2025Β· 1 reaction

    I got it to work using this tutorial https://old.reddit.com/r/StableDiffusion/comments/1h7hunp/how_to_run_hunyuanvideo_on_a_single_24gb_vram_card/

    There's also a video that makes it even easier to follow. It's a LOT

    https://www.youtube.com/watch?v=DigvHsn_Qrw

    SantaonholidaysJan 28, 2025Β· 2 reactions
    CivitAI

    i get this error :c

    HyVideoTextImageEncode

    unsupported operand type(s) for //: 'int' and 'NoneType'

    Sam_A
    Author
    Jan 28, 2025Β· 1 reaction

    This is a known error from kijai nodes. Solution: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269

    SantaonholidaysJan 28, 2025

    Now is my question where is the python_embed in the Desktop Version

    Sam_A
    Author
    Jan 28, 2025Β· 1 reaction

    @SantaonholidaysΒ It's your local python instalation. If you didnt create a venv for ComfyUi, you can just install it in your local python.

    royal96sero519Jan 29, 2025

    @SantaonholidaysΒ run: python.exe -m pip install transformers==4.47.0 in python_embed folder

    SantaonholidaysJan 29, 2025

    @Sam_AΒ in ComfyUI Desktop app is a .venv folder in it

    dscvffJan 28, 2025
    CivitAI

    I get "Only vision_languague models support image input"?

    Sam_A
    Author
    Jan 28, 2025

    Did you remove the <image> tag from prompt? Or did you change the TextEncoder model?

    dscvffJan 28, 2025

    Not removed <image>, what TextEncoder should I use?

    Sam_A
    Author
    Jan 28, 2025

    @dscvffΒ In node (Down)Load HunyuanVideo TextEncoder use xtune/llava-llama-3-8b... If you didn't change it, please tell me which node is returning this error.

    dscvffJan 29, 2025

    @Sam_AΒ Dl the model and got the int error, ran python.exe -m pip install transformers==4.47.0 in python_embed folder, still get the error.

    Sam_A
    Author
    Jan 29, 2025

    @dscvffΒ Did you change any config before running the workflow?

    dscvffJan 29, 2025

    @Sam_AΒ No tried both workflows I see on https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269 that others have same issue even after downgrading, Im on portable.

    Sam_A
    Author
    Jan 29, 2025

    @dscvffΒ To try to help you I'll need to know which node is returning the error, So maybe I cna figure what's going on...

    ObsidianDreamsJan 29, 2025

    @Sam_AΒ hello, same problem as above, I didn't remove <image> from the prompt and neither of the two text encoders listed work, both give me this error:

    got prompt

    Loading text encoder model (clipL) from: C:\pinokio\api\comfy.git\app\models\clip\clip-vit-large-patch14

    Text encoder to dtype: torch.float16

    Loading tokenizer (clipL) from: C:\pinokio\api\comfy.git\app\models\clip\clip-vit-large-patch14

    Loading text encoder model (llm) from: C:\pinokio\api\comfy.git\app\models\LLM\llava-llama-3-8b-text-encoder-tokenizer

    Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:05<00:00, 1.40s/it]

    Text encoder to dtype: torch.bfloat16

    Loading tokenizer (llm) from: C:\pinokio\api\comfy.git\app\models\LLM\llava-llama-3-8b-text-encoder-tokenizer

    !!! Exception during processing !!! Only vision_languague models support image input

    Traceback (most recent call last):

    File "C:\pinokio\api\comfy.git\app\execution.py", line 327, in execute

    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

    File "C:\pinokio\api\comfy.git\app\execution.py", line 202, in get_output_data

    return_values = mapnode_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

    File "C:\pinokio\api\comfy.git\app\execution.py", line 174, in mapnode_over_list

    process_inputs(input_dict, i)

    File "C:\pinokio\api\comfy.git\app\execution.py", line 163, in process_inputs

    results.append(getattr(obj, func)(**inputs))

    File "C:\pinokio\api\comfy.git\app\custom_nodes\ComfyUI-HunyuanVideoWrapper\nodes.py", line 881, in process

    prompt_embeds, negative_prompt_embeds, attention_mask, negative_attention_mask = encode_prompt(self,

    File "C:\pinokio\api\comfy.git\app\custom_nodes\ComfyUI-HunyuanVideoWrapper\nodes.py", line 806, in encode_prompt

    text_inputs = text_encoder.text2tokens(prompt,

    File "C:\pinokio\api\comfy.git\app\custom_nodes\ComfyUI-HunyuanVideoWrapper\hyvideo\text_encoder\__init__.py", line 214, in text2tokens

    raise ValueError("Only vision_languague models support image input")

    ValueError: Only vision_languague models support image input

    NikmagoJan 28, 2025Β· 1 reaction
    CivitAI

    Res: 796x448, frames 53, steps 9 -- rendering time 1 hour on 12 gb vram. It's normal?

    Sam_A
    Author
    Jan 28, 2025

    I would try to reduce resolution a little bit and use the upsaler later. 768x432 75 frames usually takes around 6~7 minutes in a 4070. Maybe less. I'm not sure. You cna go even lower on resolution.

    NikmagoJan 28, 2025

    @Sam_AΒ I changed the resolution to 768x432, but it didn't affect the rendering time in any way. I assumed that this was due to the fact that the large - "hunyus_video_t2v_720p_bf16" model was used. Then I downloaded it as you advised -- "hunyus_video_fastvideo_720_fp8" -- it didn't help. By the way, with the text2video Hunyuan, the video is generated for no more than 10 minutes. I don't know, maybe 3060 (12 GB) is not friendly with image2video method(

    Sam_A
    Author
    Jan 28, 2025

    @NikmagoΒ You will need to reduce resolution/length until you see it's not using 100% of your Vram. I think it divide the memory when you reach 100% and the process goes a lot slower.

    zengrathJan 28, 2025

    @NikmagoΒ check your vram usage in resource monitor. i found if i am using up 95% or more vram generation time goes up considerably. even on my 4090 at a 720p resolution i am usually 80-90% of my vram if my frams are around 81 or so. So you'll likely need to go down to like 336x576 or lower. until your not hitting over 95% vram and that wiil likely result in your generations only taking minutes. there is another workflow here that specializes in fastest video generation possible and runs fast lora at low resolutions and such. I personally find for a 4090 faster speed not worth loss in quality but for you it may be your best option on a 12gb vram card. there are workflows that say it's specifically fo 12gb vram on civit too. Bur you can use this one if you just lower resolution enough. try starting at 45 frames and resolution i provided to see if it works in just minutes. If not go even lower on the resolution or try another workflow designed for 12gb.

    NikmagoJan 29, 2025Β· 2 reactions

    @Sam_AΒ @zengrathΒ Guys, I found something interesting! I tried to set the resolution to 576x336 and 45 frames. It still takes a long time - 30 minutes. But if you set the wrong resolution 3 times and go back again, for example, to 576x336 (45 frames), rendering takes 2 minutes at 9 steps and 4 minutes at 24 steps. As it should be, I suppose.
    That is, it turns out to be some kind of bug. I entered the wrong values 3 times, the system gave me an error like ----"The size of tensor a (63) must match the size of tensor b (64) at non-singleton dimension 4". And on the 4th time, the generation was very fast.

    MugenManFeb 1, 2025

    @NikmagoΒ where should I set the wrong value and what is the wrong value?

    NikmagoFeb 1, 2025Β· 1 reaction

    @MugenManΒ It is better not to bother with these values and add blockswap and teacache to the workflow. This solves the problem. There is a normal workflow on the site for 12 GB

    tomazxzas143Jan 28, 2025
    CivitAI

    I keep getting:

    shape mismatch: value tensor of shape [16, 1, 61, 34] cannot be broadcast to indexing result of shape [1, 16, 1, 62, 34]

    I know it's got something to do with resolution, when I generate at the resolution that was there by default, it works, but any other resolution gives me this error

    Sam_A
    Author
    Jan 29, 2025Β· 1 reaction

    If you're using manual resize, switch the dimentions to something multiple of 16.

    funscripter627Jan 29, 2025Β· 1 reaction
    CivitAI

    It works extremely well. Best i2vid workflow I tried, although I did spend more time trying to get the right settings for this one. Thank you so much!

    goresj2932Jan 29, 2025

    What settings did you end up with? I'm really struggling to get good quality out of this, but I'm pretty sure it's something I've got misconfigured.

    funscripter627Jan 29, 2025Β· 2 reactions

    @goresj2932Β You have to play around with the CFG scale and flow shift values a bit sometimes, although I usually keep them as they are by default in the workflow. I also make sure the frame count is a total of 24 otherwise the movement seem to get messed up. Not 100% sure though.

    Make your prompts really simple and make sure that you don't change too much, otherwise it will generate a whole new image instead. Look at the examples posted here and just change the picture if you want to see it work.

    Sam_A
    Author
    Jan 30, 2025Β· 2 reactions

    @goresj2932Β I will suggest you what worked for me... Check what Hunyuan can generate, make images with the thing you want, but with similar visual composition, and then it will understand your input. Also, lora of the thing you want to animate will help a lot.

    Kate_Wett770Jan 29, 2025Β· 1 reaction
    CivitAI

    Where I can find the img2vid.safetensors (LoRA)

    Sam_A
    Author
    Jan 30, 2025

    The link is in the workflow instructions.

    kanghua151613Jan 30, 2025

    not find sexy dance lora !

    Sam_A
    Author
    Jan 30, 2025

    @kanghua151613Β Not really necessary, but here is it...
    https://civitai.com/models/1110311/sexy-dance

    myprivacy27091991221Jan 30, 2025Β· 1 reaction
    CivitAI

    RTX 4070 12GB taking too much time for 432x768, still running more than 2hrs.. why?

    Sam_A
    Author
    Jan 30, 2025

    Probably too many frames. It's reaching your Vram Limit. Try to fun it in a way it will reach max 95% of your Vram and it will run in 5~7 minutes. You can reduce length of the frames or frame size.

    kanghua151613Jan 30, 2025
    CivitAI

    The video seems to have undergone significant deformation after three seconds or more. Did I make a mistake

    funscripter627Jan 30, 2025Β· 1 reaction

    It's probably not recognizing the thing you want to change. Try to describe it more clearly or generalize it more depending on the picture. For example, instead of "a woman in a black dress and dark brown hair" just say "a woman". If there are multiple woman in the picture then it helps to distinguish them.

    In addition play around with the sampler parameters, guided cfg scale, flow_shift and denoise. Lowering denoise should make it deviate less from the picture.

    Also, sometimes a picture just doesn't work. If that's the case, waiting for the official i2v model is probably best.

    Sam_A
    Author
    Jan 30, 2025

    Like funscripter627 said, sometimes it does not understand what you're trying to do. My suggestion is, try to create what you want in T2V workflow, just do check how much of your prompt Hunyuan understands, and how it understands it. Generate your image with a similar viual composition, and it might work. It's like old Kling, that used to deformate what it does not understands. I believe with better models this problem might be gone in the future.

    essseekay476Jan 30, 2025
    CivitAI

    I keep getting this:

    HyVideoTextImageEncode

    unsupported operand type(s) for //: 'int' and 'NoneType'

    funscripter627Jan 30, 2025Β· 1 reaction

    It's probably the issue linked at the top: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269 Basically you need to downgrade the transformers package. Also a good idea to update your custom nodes if you haven't already.

    essseekay476Jan 30, 2025

    @funscripter627Β thanks for your reply. how do i downgrade the package?

    funscripter627Jan 30, 2025

    @essseekay476Β It's in the link. It depends on your comfy installation. See https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269#issuecomment-2585504240

    essseekay476Jan 30, 2025

    @funscripter627Β im running comfy within pinokio, still dont know what to do (sorry for being annoying)

    funscripter627Jan 30, 2025

    @essseekay476Β it's okay man. I don't know pinokio specifically, but generally you want to execute the downgrade and any other pip commands on your virtual environment (venv). I think pinokio comes with conda so maybe try to look for a venv there.

    Sam_A
    Author
    Jan 30, 2025Β· 1 reaction

    According with GPT:

    I'm running comfyui in pinokio. How do I install a python package in it?

    To install a Python package in ComfyUI running on Pinokio, you can follow these general steps. Since ComfyUI is typically running in a Python environment (likely in a virtual environment), you'll want to install your package into that environment.

    Here’s how you can install a Python package:

    Access the terminal/command line: If you're running Pinokio with a graphical interface, you should be able to access a terminal from within the environment.

    Activate your virtual environment (if applicable): If ComfyUI is using a virtual environment, activate it. Typically, you'd activate the virtual environment like this (assuming it's named env):

    For Linux/macOS:

    bash

    source env/bin/activate

    For Windows:

    bash

    .\env\Scripts\activate

    Install the Python package: Once your virtual environment is activated, you can install the package using pip. For example, if you want to install requests, you'd run:

    bash

    pip install requests

    Verify the installation: After installation, you can check that the package has been installed by running:

    bash

    pip list

    This should show the installed packages, including the one you just added.

    Restart ComfyUI: After installing the package, you may need to restart ComfyUI to ensure it picks up the new package.

    Let me know if you encounter any issues along the way!


    If you need further instructions, GPT can help you with more details than I can 100%! lol

    essseekay476Jan 31, 2025Β· 2 reactions

    @Sam_AΒ @Sam_AΒ thanks for this reply man you went above and beyond lol

    DroneMeOutJan 31, 2025Β· 1 reaction

    How do I know if I am using others that do require the version of the transformers package I am currently running? I wouldn't want to downgrade just for this I2V and break everything else in the process. Thoughts?
    UPDATE: I looked, and I was only at 4.47.1 anyway. I seriously doubt anything is going to care about the downgrade.

    funscripter627Jan 31, 2025

    @DroneMeOutΒ Haven't ran into any issues myself after downgrading, although I've mostly been using this workflow lol

    felipesscaff925Jan 30, 2025
    CivitAI

    Everytime I get this error, even with other Hunyuan workflows

    RuntimeError: CUDA error: the launch timed out and was terminated CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

    Sam_A
    Author
    Jan 30, 2025

    WHat GPU do you have?

    felipesscaff925Feb 3, 2025

    @Sam_AΒ im currently using a nvidia a16-16q

    Sam_A
    Author
    Feb 3, 2025

    @felipesscaff925Β Well... this is kinda of a "server" GPU. I'm sorry but I'm not sure how to fix this bro. Maybe GPT have a solution?

    LexiBarberFeb 12, 2025

    Yeah, I got the same error.

    vim_brigantJan 31, 2025Β· 1 reaction
    CivitAI

    This is the first i2v hunyuan workflow I've tried that's actually worked for me. Thanks for putting this together!

    LucasYaoJan 31, 2025Β· 1 reaction
    CivitAI

    "Where can I find Sexy Dance E15 lora?"

    LucasYaoJan 31, 2025Β· 1 reaction

    Thx Bro!

    LucasYaoJan 31, 2025Β· 1 reaction
    CivitAI

    This is the error I encountered. Could everyone please take a look?

    got prompt

    encoded latents shape torch.Size([1, 16, 1, 96, 54])

    Loading text encoder model (clipL) from: C:\ComfyUI_windows_portable\ComfyUI\models\clip\clip-vit-large-patch14

    Text encoder to dtype: torch.float16

    Loading tokenizer (clipL) from: C:\ComfyUI_windows_portable\ComfyUI\models\clip\clip-vit-large-patch14

    Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.

    You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.

    Loading text encoder model (vlm) from: C:\ComfyUI_windows_portable\ComfyUI\models\LLM\llava-llama-3-8b-v1_1-transformers

    Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:42<00:00, 10.64s/it]

    Text encoder to dtype: torch.bfloat16

    Loading tokenizer (vlm) from: C:\ComfyUI_windows_portable\ComfyUI\models\LLM\llava-llama-3-8b-v1_1-transformers

    !!! Exception during processing !!! unsupported operand type(s) for //: 'int' and 'NoneType'

    Traceback (most recent call last):

    File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 327, in execute

    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 202, in get_output_data

    return_values = mapnode_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 174, in mapnode_over_list

    process_inputs(input_dict, i)

    File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 163, in process_inputs

    results.append(getattr(obj, func)(**inputs))

    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-hunyuanvideowrapper\nodes.py", line 884, in process

    prompt_embeds, negative_prompt_embeds, attention_mask, negative_attention_mask = encode_prompt(self,

    ^^^^^^^^^^^^^^^^^^^

    File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-hunyuanvideowrapper\nodes.py", line 809, in encode_prompt

    text_inputs = text_encoder.text2tokens(prompt,

    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-hunyuanvideowrapper\hyvideo\text_encoder\__init__.py", line 253, in text2tokens

    text_tokens = self.processor(

    ^^^^^^^^^^^^^^^

    File "C:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\models\llava\processing_llava.py", line 160, in call

    num_image_tokens = (height // self.patch_size) * (

    ~~~~~~~^^~~~~~~~~~~~~~~~~

    TypeError: unsupported operand type(s) for //: 'int' and 'NoneType'

    Prompt executed in 69.41 seconds

    tangentplum598Jan 31, 2025Β· 1 reaction
    CivitAI

    Dumb question but i noticed you had the fastvideo selected without hte lora. is the lora required?

    also is there a reason you did not use the 544 960 resolution specified in the i2v loras repo?

    Sam_A
    Author
    Jan 31, 2025Β· 1 reaction

    The fast video model already have the lora included. It's like a "lcm" SD model. About the resolution, it works in different resolutions. Anything multiple of 16 pixels will work. Adjust according with what your GPU can handle. Don't go to low because the model will start to not undestand what is in the picture. Don't go too high, or your GPU will not be able to process.

    tangentplum598Jan 31, 2025Β· 1 reaction

    @Sam_AΒ ok thank you. another question. running on 432x768 at 37 frames easily gives me a OOM on a 3090.. is this normal?

    Sam_A
    Author
    Jan 31, 2025Β· 1 reaction

    @tangentplum598Β No. It's not normal. With 24GB you should be able to run maybe 7~8 seconds (98% Vram) at this resolution I think. I suggest you to maybe udate the nodes and comfy and check if everything else is in order.

    tangentplum598Jan 31, 2025Β· 1 reaction

    @Sam_AΒ Alright . yeah i cannot figure it out. i've updated all nodes even used a combination of --disable-smart-memory --disable-cuda-malloc. i've noticed models aren't being offloaded which is of course not a problem with this workflow. I'm just so confused what is happening :(


    UPDATE:
    I just made a fresh comfyui installation and it works now

    Sam_A
    Author
    Jan 31, 2025Β· 1 reaction

    @tangentplum598Β I wish I could help, but I really don't know what could be the problem.

    Update: Oh! Great!

    Tom_AttoJan 31, 2025Β· 1 reaction
    CivitAI

    I like the work that you've put into this - thank you :)

    Do you have a recommendation for any Comfy settings to avoid memory issues?

    -GPU: 4080 Super

    -Dedicated VRAM for windows - 15384MB

    -Shared system memory - 49022MB

    The workflow is running my GPU and VRAM at 100% without changing any settings from the base workflow with some memory errors that have stopped generation. I am 100% sure I'm doing something wrong :)

    Sam_A
    Author
    Feb 1, 2025Β· 2 reactions

    Of course! This is how I adjust this workflow according with the GPU I'm using. I have a 4090 and a 4070 (12GB Vram).
    1. I like to input images with 16:9 aspect ratio (1920x1080 or 1280x720, etc). Then I define the large size of the image. I usually start with 768.
    2. In num_frame on sampler you need to put a numer multiple of 4 + 1. Eg: (4*10)+1 = 41.
    Since it works in a base of 24 frames/sex, you can use 24 times length in seconds you wish, plus 1. I believe with 16GB Vram and the size I said in item 1, you can generate at least 5 seconds of video, but you need to test. Start small. test with 2 seconds, 3, etc. Until you find the sweet spot for your GPU. If the generation starts to go crazy like 100sec/it, the setting is wrong and you need to reduce length or image size.

    And finally you finetune your config. I use to reduce the large size of the image 16 pixels each time. So you will test if Hunyuam understands your image in the size you're inputing it. Atm I'm using large 672 as large size, trying to create the video as long as possible. For what I'm generating, below this size things starts to get weird. But it's really about try and error.

    This is the general way I think to use this workflow. I hope it helps!

    Tom_AttoFeb 1, 2025Β· 1 reaction

    @Sam_AΒ Thank you - this helped. It's still a bit slow, but I can actually get some generations going.

    cgimwFeb 2, 2025

    @Sam_AΒ I'm getting quite bad VRAM usage on a Windows (not the portable, installed in a venv). I have a full 3090 (actually I have two in this machine, but seems bad if I am getting excessive usage) which is only barely able to run a 512x768x73 workflow before OOM. I installed this all yesterday so it's a fresh installation. Any idea of some optimisation options I'm missing?

    P.S: Thanks for the workflow! Though I find it can be a bit reluctant with certain loras (with anime style images) - do you find the start image can have a big effect on the resulting action?

    Sam_A
    Author
    Feb 2, 2025

    @cgimwΒ In another comment someone was also having problems in a 3090 and a fresh comfy reinstalation/updated solved the problem. But since you said it's fresh, I don't really know. I wish I could help.
    About the image, yes. It have a strong impact in the movement. It feels like that if Hunyuan model don't have enough data of your image, the result can be deformed or with small movements. The ideal is try to play with runyuan a little before start with I2V, just to "feel" what kind of images the model generate.

    cgimwFeb 2, 2025

    @Sam_AΒ Hmm. I will keep an eye on it - maybe I will try reinstalling.

    Do you know if there's a simple way of offloading the text encoder onto my other GPU? As that does seem like it would be a simple way of getting some big improvement in throughput.

    JankolonkoFeb 3, 2025

    Hi guys any idea how to fix it? I have 16GBVRAM but every time I got an error when loading llama HyVideoTextImageEncode

    Allocation on device

    Seems like not enough vram? Any idea how to load it?

    tangentplum598Feb 1, 2025
    CivitAI

    Hey does anyone have a tips for encouraging motion? frequently the I2V is barely moving.

    Sam_A
    Author
    Feb 1, 2025

    My suggestion is try to use pictures the model understand and use Lora. But to be honest, most of the time it's not needed. What are you trying to move? Maybe I can try some examples to help you.

    tangentplum598Feb 1, 2025

    @Sam_AΒ I'm thinking my custom lora may not just be trained enough. thanks. I will try to experiment more and update. Is there a group on discord or something for people to discuss related things?

    seyik83688983Feb 3, 2025Β· 1 reaction
    CivitAI

    Sadly doesn't work, fist it was throwing the text encoder issue. Downgraded to 4.47.0. Now it still wont work even though I am not getting any error lol. Comfy is a pain the ass

    Sam_A
    Author
    Feb 3, 2025Β· 1 reaction

    It's sad. I would try a fresh Comfy instalation and see if the problem is solved.

    GigaThiccMar 1, 2025Β· 1 reaction

    I was able to get it to work by following all of the instructions exactly -- granted on a MimicPC instance :-) I think everything has to do with clean install and available vram!

    MeletheiaFeb 3, 2025Β· 1 reaction
    CivitAI

    that's the error i get 'Failed to import transformers.models.timm_wrapper.configuration_timm_wrapper because of the following error (look up to see its traceback): No module named 'transformers.models.timm_wrapper.configuration_timm_wrapper'

    PetePabloFeb 4, 2025
    CivitAI

    Anyone get this error:
    HyVideoVAELoader

    Error(s) in loading state_dict for AutoencoderKLCausal3D: Missing key(s) in state_dict: "encoder.................

    PetePabloFeb 4, 2025

    HyVideoVAELoader

    Error(s) in loading state_dict for AutoencoderKLCausal3D: Missing key(s) in state_dict: "encoder.down_blocks.0.resnets.0.norm1.weight", "encoder.down_blocks.0.resnets.0.norm1.bias", "encoder.down_blocks.0.resnets.0.conv1.conv.weight", "encoder.down_blocks.0.resnets.0.conv1.conv.bias", "encoder.down_blocks.0.resnets.0.norm2.weight", "encoder.down_blocks.0.resnets.0.norm2.bias", "encoder.down_blocks.0.resnets.0.conv2.conv.weight", "encoder.down_blocks.0.resnets.0.conv2.conv.bias", "encoder.down_blocks.0.resnets.1.norm1.weight", "encoder.down_blocks.0.resnets.1.norm1.bias", "encoder.down_blocks.0.resnets.1.conv1.conv.weight", "encoder.down_blocks.0.resnets.1.conv1.conv.bias", "encoder.down_blocks.0.resnets.1.norm2.weight", "encoder.down_blocks.0.resnets.1.norm2.bias", "encoder.down_blocks.0.resnets.1.conv2.conv.weight", "encoder.down_blocks.0.resnets.1.conv2.conv.bias", "encoder.down_blocks.0.downsamplers.0.conv.conv.weight", "encoder.down_blocks.0.downsamplers.0.conv.conv.bias", "encoder.down_blocks.1.resnets.0.norm1.weight", "encoder.down_blocks.1.resnets.0.norm1.bias", "encoder.down_blocks.1.resnets.0.conv1.conv.weight", "encoder.down_blocks.1.resnets.0.conv1.conv.bias", "encoder.down_blocks.1.resnets.0.norm2.weight", "encoder.down_blocks.1.resnets.0.norm2.bias", "encoder.down_blocks.1.resnets.0.conv2.conv.weight", "encoder.down_blocks.1.resnets.0.conv2.conv.bias", "encoder.down_blocks.1.resnets.0.conv_shortcut.conv.weight", "encoder.down_blocks.1.resnets.0.conv_shortcut.conv.bias", "encoder.down_blocks.1.resnets.1.norm1.weight", "encoder.down_blocks.1.resnets.1.norm1.bias", "encoder.down_blocks.1.resnets.1.conv1.conv.weight", "encoder.down_blocks.1.resnets.1.conv1.conv.bias", "encoder.down_blocks.1.resnets.1.norm2.weight", "encoder.down_blocks.1.resnets.1.norm2.bias", "encoder.down_blocks.1.resnets.1.conv2.conv.weight", "encoder.down_blocks.1.resnets.1.conv2.conv.bias", "encoder.down_blocks.1.downsamplers.0.conv.conv.weight", "encoder.down_blocks.1.downsamplers.0.conv.conv.bias", "encoder.down_blocks.2.resnets.0.norm1.weight", "encoder.down_blocks.2.resnets.0.norm1.bias", "encoder.down_blocks.2.resnets.0.conv1.conv.weight", "encoder.down_blocks.2.resnets.0.conv1.conv.bias", "encoder.down_blocks.2.resnets.0.norm2.weight", "encoder.down_blocks.2.resnets.0.norm2.bias", "encoder.down_blocks.2.resnets.0.conv2.conv.weight", "encoder.down_blocks.2.resnets.0.conv2.conv.bias", "encoder.down_blocks.2.resnets.0.conv_shortcut.conv.weight", "encoder.down_blocks.2.resnets.0.conv_shortcut.conv.bias", "encoder.down_blocks.2.resnets.1.norm1.weight", "encoder.down_blocks.2.resnets.1.norm1.bias", "encoder.down_blocks.2.resnets.1.conv1.conv.weight", "encoder.down_blocks.2.resnets.1.conv1.conv.bias", "encoder.down_blocks.2.resnets.1.norm2.weight", "encoder.down_blocks.2.resnets.1.norm2.bias", "encoder.down_blocks.2.resnets.1.conv2.conv.weight", "encoder.down_blocks.2.resnets.1.conv2.conv.bias", "encoder.down_blocks.2.downsamplers.0.conv.conv.weight", "encoder.down_blocks.2.downsamplers.0.conv.conv.bias", "encoder.down_blocks.3.resnets.0.norm1.weight", "encoder.down_blocks.3.resnets.0.norm1.bias", "encoder.down_blocks.3.resnets.0.conv1.conv.weight", "encoder.down_blocks.3.resnets.0.conv1.conv.bias", "encoder.down_blocks.3.resnets.0.norm2.weight", "encoder.down_blocks.3.resnets.0.norm2.bias", "encoder.down_blocks.3.resnets.0.conv2.conv.weight", "encoder.down_blocks.3.resnets.0.conv2.conv.bias", "encoder.down_blocks.3.resnets.1.norm1.weight", "encoder.down_blocks.3.resnets.1.norm1.bias", "encoder.down_blocks.3.resnets.1.conv1.conv.weight", "encoder.down_blocks.3.resnets.1.conv1.conv.bias", "encoder.down_blocks.3.resnets.1.norm2.weight", "encoder.down_blocks.3.resnets.1.norm2.bias", "encoder.down_blocks.3.resnets.1.conv2.conv.weight", "encoder.down_blocks.3.resnets.1.conv2.conv.bias", "encoder.mid_block.attentions.0.group_norm.weight", "encoder.mid_block.attentions.0.group_norm.bias", "encoder.mid_block.attentions.0.to_q.weight", "encoder.mid_block.attentions.0.to_q.bias", "encoder.mid_block.attentions.0.to_k.weight", "encoder.mid_block.attentions.0.to_k.bias", "encoder.mid_block.attentions.0.to_v.weight", "encoder.mid_block.attentions.0.to_v.bias", "encoder.mid_block.attentions.0.to_out.0.weight", "encoder.mid_block.attentions.0.to_out.0.bias", "encoder.mid_block.resnets.0.norm1.weight", "encoder.mid_block.resnets.0.norm1.bias", "encoder.mid_block.resnets.0.conv1.conv.weight", "encoder.mid_block.resnets.0.conv1.conv.bias", "encoder.mid_block.resnets.0.norm2.weight", "encoder.mid_block.resnets.0.norm2.bias", "encoder.mid_block.resnets.0.conv2.conv.weight", "encoder.mid_block.resnets.0.conv2.conv.bias", "encoder.mid_block.resnets.1.norm1.weight", "encoder.mid_block.resnets.1.norm1.bias", "encoder.mid_block.resnets.1.conv1.conv.weight", "encoder.mid_block.resnets.1.conv1.conv.bias", "encoder.mid_block.resnets.1.norm2.weight", "encoder.mid_block.resnets.1.norm2.bias", "encoder.mid_block.resnets.1.conv2.conv.weight", "encoder.mid_block.resnets.1.conv2.conv.bias", "encoder.conv_norm_out.weight", "encoder.conv_norm_out.bias", "decoder.up_blocks.0.resnets.0.norm1.weight", "decoder.up_blocks.0.resnets.0.norm1.bias", "decoder.up_blocks.0.resnets.0.conv1.conv.weight", "decoder.up_blocks.0.resnets.0.conv1.conv.bias", "decoder.up_blocks.0.resnets.0.norm2.weight", "decoder.up_blocks.0.resnets.0.norm2.bias", "decoder.up_blocks.0.resnets.0.conv2.conv.weight", "decoder.up_blocks.0.resnets.0.conv2.conv.bias", "decoder.up_blocks.0.resnets.1.norm1.weight", "decoder.up_blocks.0.resnets.1.norm1.bias", "decoder.up_blocks.0.resnets.1.conv1.conv.weight", "decoder.up_blocks.0.resnets.1.conv1.conv.bias", "decoder.up_blocks.0.resnets.1.norm2.weight", "decoder.up_blocks.0.resnets.1.norm2.bias", "decoder.up_blocks.0.resnets.1.conv2.conv.weight", "decoder.up_blocks.0.resnets.1.conv2.conv.bias", "decoder.up_blocks.0.resnets.2.norm1.weight", "decoder.up_blocks.0.resnets.2.norm1.bias", "decoder.up_blocks.0.resnets.2.conv1.conv.weight", "decoder.up_blocks.0.resnets.2.conv1.conv.bias", "decoder.up_blocks.0.resnets.2.norm2.weight", "decoder.up_blocks.0.resnets.2.norm2.bias", "decoder.up_blocks.0.resnets.2.conv2.conv.weight", "decoder.up_blocks.0.resnets.2.conv2.conv.bias", "decoder.up_blocks.0.upsamplers.0.conv.conv.weight", "decoder.up_blocks.0.upsamplers.0.conv.conv.bias", "decoder.up_blocks.1.resnets.0.norm1.weight", "decoder.up_blocks.1.resnets.0.norm1.bias", "decoder.up_blocks.1.resnets.0.conv1.conv.weight", "decoder.up_blocks.1.resnets.0.conv1.conv.bias", "decoder.up_blocks.1.resnets.0.norm2.weight", "decoder.up_blocks.1.resnets.0.norm2.bias", "decoder.up_blocks.1.resnets.0.conv2.conv.weight", "decoder.up_blocks.1.resnets.0.conv2.conv.bias", "decoder.up_blocks.1.resnets.1.norm1.weight", "decoder.up_blocks.1.resnets.1.norm1.bias", "decoder.up_blocks.1.resnets.1.conv1.conv.weight", "decoder.up_blocks.1.resnets.1.conv1.conv.bias", "decoder.up_blocks.1.resnets.1.norm2.weight", "decoder.up_blocks.1.resnets.1.norm2.bias", "decoder.up_blocks.1.resnets.1.conv2.conv.weight", "decoder.up_blocks.1.resnets.1.conv2.conv.bias", "decoder.up_blocks.1.resnets.2.norm1.weight", "decoder.up_blocks.1.resnets.2.norm1.bias", "decoder.up_blocks.1.resnets.2.conv1.conv.weight", "decoder.up_blocks.1.resnets.2.conv1.conv.bias", "decoder.up_blocks.1.resnets.2.norm2.weight", "decoder.up_blocks.1.resnets.2.norm2.bias", "decoder.up_blocks.1.resnets.2.conv2.conv.weight", "decoder.up_blocks.1.resnets.2.conv2.conv.bias", "decoder.up_blocks.1.upsamplers.0.conv.conv.weight", "decoder.up_blocks.1.upsamplers.0.conv.conv.bias", "decoder.up_blocks.2.resnets.0.norm1.weight", "decoder.up_blocks.2.resnets.0.norm1.bias", "decoder.up_blocks.2.resnets.0.conv1.conv.weight", "decoder.up_blocks.2.resnets.0.conv1.conv.bias", "decoder.up_blocks.2.resnets.0.norm2.weight", "decoder.up_blocks.2.resnets.0.norm2.bias", "decoder.up_blocks.2.resnets.0.conv2.conv.weight", "decoder.up_blocks.2.resnets.0.conv2.conv.bias", "decoder.up_blocks.2.resnets.0.conv_shortcut.conv.weight", "decoder.up_blocks.2.resnets.0.conv_shortcut.conv.bias", "decoder.up_blocks.2.resnets.1.norm1.weight", "decoder.up_blocks.2.resnets.1.norm1.bias", "decoder.up_blocks.2.resnets.1.conv1.conv.weight", "decoder.up_blocks.2.resnets.1.conv1.conv.bias", "decoder.up_blocks.2.resnets.1.norm2.weight", "decoder.up_blocks.2.resnets.1.norm2.bias", "decoder.up_blocks.2.resnets.1.conv2.conv.weight", "decoder.up_blocks.2.resnets.1.conv2.conv.bias", "decoder.up_blocks.2.resnets.2.norm1.weight", "decoder.up_blocks.2.resnets.2.norm1.bias", "decoder.up_blocks.2.resnets.2.conv1.conv.weight", "decoder.up_blocks.2.resnets.2.conv1.conv.bias", "decoder.up_blocks.2.resnets.2.norm2.weight", "decoder.up_blocks.2.resnets.2.norm2.bias", "decoder.up_blocks.2.resnets.2.conv2.conv.weight", "decoder.up_blocks.2.resnets.2.conv2.conv.bias", "decoder.up_blocks.2.upsamplers.0.conv.conv.weight", "decoder.up_blocks.2.upsamplers.0.conv.conv.bias", "decoder.up_blocks.3.resnets.0.norm1.weight", "decoder.up_blocks.3.resnets.0.norm1.bias", "decoder.up_blocks.3.resnets.0.conv1.conv.weight", "decoder.up_blocks.3.resnets.0.conv1.conv.bias", "decoder.up_blocks.3.resnets.0.norm2.weight", "decoder.up_blocks.3.resnets.0.norm2.bias", "decoder.up_blocks.3.resnets.0.conv2.conv.weight", "decoder.up_blocks.3.resnets.0.conv2.conv.bias", "decoder.up_blocks.3.resnets.0.conv_shortcut.conv.weight", "decoder.up_blocks.3.resnets.0.conv_shortcut.conv.bias", "decoder.up_blocks.3.resnets.1.norm1.weight", "decoder.up_blocks.3.resnets.1.norm1.bias", "decoder.up_blocks.3.resnets.1.conv1.conv.weight", "decoder.up_blocks.3.resnets.1.conv1.conv.bias", "decoder.up_blocks.3.resnets.1.norm2.weight", "decoder.up_blocks.3.resnets.1.norm2.bias", "decoder.up_blocks.3.resnets.1.conv2.conv.weight", "decoder.up_blocks.3.resnets.1.conv2.conv.bias", "decoder.up_blocks.3.resnets.2.norm1.weight", "decoder.up_blocks.3.resnets.2.norm1.bias", "decoder.up_blocks.3.resnets.2.conv1.conv.weight", "decoder.up_blocks.3.resnets.2.conv1.conv.bias", "decoder.up_blocks.3.resnets.2.norm2.weight", "decoder.up_blocks.3.resnets.2.norm2.bias", "decoder.up_blocks.3.resnets.2.conv2.conv.weight", "decoder.up_blocks.3.resnets.2.conv2.conv.bias", "decoder.mid_block.attentions.0.group_norm.weight", "decoder.mid_block.attentions.0.group_norm.bias", "decoder.mid_block.attentions.0.to_q.weight", "decoder.mid_block.attentions.0.to_q.bias", "decoder.mid_block.attentions.0.to_k.weight", "decoder.mid_block.attentions.0.to_k.bias", "decoder.mid_block.attentions.0.to_v.weight", "decoder.mid_block.attentions.0.to_v.bias", "decoder.mid_block.attentions.0.to_out.0.weight", "decoder.mid_block.attentions.0.to_out.0.bias", "decoder.mid_block.resnets.0.norm1.weight", "decoder.mid_block.resnets.0.norm1.bias", "decoder.mid_block.resnets.0.conv1.conv.weight", "decoder.mid_block.resnets.0.conv1.conv.bias", "decoder.mid_block.resnets.0.norm2.weight", "decoder.mid_block.resnets.0.norm2.bias", "decoder.mid_block.resnets.0.conv2.conv.weight", "decoder.mid_block.resnets.0.conv2.conv.bias", "decoder.mid_block.resnets.1.norm1.weight", "decoder.mid_block.resnets.1.norm1.bias", "decoder.mid_block.resnets.1.conv1.conv.weight", "decoder.mid_block.resnets.1.conv1.conv.bias", "decoder.mid_block.resnets.1.norm2.weight", "decoder.mid_block.resnets.1.norm2.bias", "decoder.mid_block.resnets.1.conv2.conv.weight", "decoder.mid_block.resnets.1.conv2.conv.bias", "decoder.conv_norm_out.weight", "decoder.conv_norm_out.bias". Unexpected key(s) in state_dict: "encoder.down.0.block.0.conv1.conv.bias", "encoder.down.0.block.0.conv1.conv.weight", "encoder.down.0.block.0.conv2.conv.bias", "encoder.down.0.block.0.conv2.conv.weight", "encoder.down.0.block.0.norm1.bias", "encoder.down.0.block.0.norm1.weight", "encoder.down.0.block.0.norm2.bias", "encoder.down.0.block.0.norm2.weight", "encoder.down.0.block.1.conv1.conv.bias", "encoder.down.0.block.1.conv1.conv.weight", "encoder.down.0.block.1.conv2.conv.bias", "encoder.down.0.block.1.conv2.conv.weight", "encoder.down.0.block.1.norm1.bias", "encoder.down.0.block.1.norm1.weight", "encoder.down.0.block.1.norm2.bias", "encoder.down.0.block.1.norm2.weight", "encoder.down.0.downsample.conv.conv.bias", "encoder.down.0.downsample.conv.conv.weight", "encoder.down.1.block.0.conv1.conv.bias", "encoder.down.1.block.0.conv1.conv.weight", "encoder.down.1.block.0.conv2.conv.bias", "encoder.down.1.block.0.conv2.conv.weight", "encoder.down.1.block.0.nin_shortcut.conv.bias", "encoder.down.1.block.0.nin_shortcut.conv.weight", "encoder.down.1.block.0.norm1.bias", "encoder.down.1.block.0.norm1.weight", "encoder.down.1.block.0.norm2.bias", "encoder.down.1.block.0.norm2.weight", "encoder.down.1.block.1.conv1.conv.bias", "encoder.down.1.block.1.conv1.conv.weight", "encoder.down.1.block.1.conv2.conv.bias", "encoder.down.1.block.1.conv2.conv.weight", "encoder.down.1.block.1.norm1.bias", "encoder.down.1.block.1.norm1.weight", "encoder.down.1.block.1.norm2.bias", "encoder.down.1.block.1.norm2.weight", "encoder.down.1.downsample.conv.conv.bias", "encoder.down.1.downsample.conv.conv.weight", "encoder.down.2.block.0.conv1.conv.bias", "encoder.down.2.block.0.conv1.conv.weight", "encoder.down.2.block.0.conv2.conv.bias", "encoder.down.2.block.0.conv2.conv.weight", "encoder.down.2.block.0.nin_shortcut.conv.bias", "encoder.down.2.block.0.nin_shortcut.conv.weight", "encoder.down.2.block.0.norm1.bias", "encoder.down.2.block.0.norm1.weight", "encoder.down.2.block.0.norm2.bias", "encoder.down.2.block.0.norm2.weight", "encoder.down.2.block.1.conv1.conv.bias", "encoder.down.2.block.1.conv1.conv.weight", "encoder.down.2.block.1.conv2.conv.bias", "encoder.down.2.block.1.conv2.conv.weight", "encoder.down.2.block.1.norm1.bias", "encoder.down.2.block.1.norm1.weight", "encoder.down.2.block.1.norm2.bias", "encoder.down.2.block.1.norm2.weight", "encoder.down.2.downsample.conv.conv.bias", "encoder.down.2.downsample.conv.conv.weight", "encoder.down.3.block.0.conv1.conv.bias", "encoder.down.3.block.0.conv1.conv.weight", "encoder.down.3.block.0.conv2.conv.bias", "encoder.down.3.block.0.conv2.conv.weight", "encoder.down.3.block.0.norm1.bias", "encoder.down.3.block.0.norm1.weight", "encoder.down.3.block.0.norm2.bias", "encoder.down.3.block.0.norm2.weight", "encoder.down.3.block.1.conv1.conv.bias", "encoder.down.3.block.1.conv1.conv.weight", "encoder.down.3.block.1.conv2.conv.bias", "encoder.down.3.block.1.conv2.conv.weight", "encoder.down.3.block.1.norm1.bias", "encoder.down.3.block.1.norm1.weight", "encoder.down.3.block.1.norm2.bias", "encoder.down.3.block.1.norm2.weight", "encoder.mid.attn_1.k.bias", "encoder.mid.attn_1.k.weight", "encoder.mid.attn_1.norm.bias", "encoder.mid.attn_1.norm.weight", "encoder.mid.attn_1.proj_out.bias", "encoder.mid.attn_1.proj_out.weight", "encoder.mid.attn_1.q.bias", "encoder.mid.attn_1.q.weight", "encoder.mid.attn_1.v.bias", "encoder.mid.attn_1.v.weight", "encoder.mid.block_1.conv1.conv.bias", "encoder.mid.block_1.conv1.conv.weight", "encoder.mid.block_1.conv2.conv.bias", "encoder.mid.block_1.conv2.conv.weight", "encoder.mid.block_1.norm1.bias", "encoder.mid.block_1.norm1.weight", "encoder.mid.block_1.norm2.bias", "encoder.mid.block_1.norm2.weight", "encoder.mid.block_2.conv1.conv.bias", "encoder.mid.block_2.conv1.conv.weight", "encoder.mid.block_2.conv2.conv.bias", "encoder.mid.block_2.conv2.conv.weight", "encoder.mid.block_2.norm1.bias", "encoder.mid.block_2.norm1.weight", "encoder.mid.block_2.norm2.bias", "encoder.mid.block_2.norm2.weight", "encoder.norm_out.bias", "encoder.norm_out.weight", "decoder.mid.attn_1.k.bias", "decoder.mid.attn_1.k.weight", "decoder.mid.attn_1.norm.bias", "decoder.mid.attn_1.norm.weight", "decoder.mid.attn_1.proj_out.bias", "decoder.mid.attn_1.proj_out.weight", "decoder.mid.attn_1.q.bias", "decoder.mid.attn_1.q.weight", "decoder.mid.attn_1.v.bias", "decoder.mid.attn_1.v.weight", "decoder.mid.block_1.conv1.conv.bias", "decoder.mid.block_1.conv1.conv.weight", "decoder.mid.block_1.conv2.conv.bias", "decoder.mid.block_1.conv2.conv.weight", "decoder.mid.block_1.norm1.bias", "decoder.mid.block_1.norm1.weight", "decoder.mid.block_1.norm2.bias", "decoder.mid.block_1.norm2.weight", "decoder.mid.block_2.conv1.conv.bias", "decoder.mid.block_2.conv1.conv.weight", "decoder.mid.block_2.conv2.conv.bias", "decoder.mid.block_2.conv2.conv.weight", "decoder.mid.block_2.norm1.bias", "decoder.mid.block_2.norm1.weight", "decoder.mid.block_2.norm2.bias", "decoder.mid.block_2.norm2.weight", "decoder.norm_out.bias", "decoder.norm_out.weight", "decoder.up.0.block.0.conv1.conv.bias", "decoder.up.0.block.0.conv1.conv.weight", "decoder.up.0.block.0.conv2.conv.bias", "decoder.up.0.block.0.conv2.conv.weight", "decoder.up.0.block.0.nin_shortcut.conv.bias", "decoder.up.0.block.0.nin_shortcut.conv.weight", "decoder.up.0.block.0.norm1.bias", "decoder.up.0.block.0.norm1.weight", "decoder.up.0.block.0.norm2.bias", "decoder.up.0.block.0.norm2.weight", "decoder.up.0.block.1.conv1.conv.bias", "decoder.up.0.block.1.conv1.conv.weight", "decoder.up.0.block.1.conv2.conv.bias", "decoder.up.0.block.1.conv2.conv.weight", "decoder.up.0.block.1.norm1.bias", "decoder.up.0.block.1.norm1.weight", "decoder.up.0.block.1.norm2.bias", "decoder.up.0.block.1.norm2.weight", "decoder.up.0.block.2.conv1.conv.bias", "decoder.up.0.block.2.conv1.conv.weight", "decoder.up.0.block.2.conv2.conv.bias", "decoder.up.0.block.2.conv2.conv.weight", "decoder.up.0.block.2.norm1.bias", "decoder.up.0.block.2.norm1.weight", "decoder.up.0.block.2.norm2.bias", "decoder.up.0.block.2.norm2.weight", "decoder.up.1.block.0.conv1.conv.bias", "decoder.up.1.block.0.conv1.conv.weight", "decoder.up.1.block.0.conv2.conv.bias", "decoder.up.1.block.0.conv2.conv.weight", "decoder.up.1.block.0.nin_shortcut.conv.bias", "decoder.up.1.block.0.nin_shortcut.conv.weight", "decoder.up.1.block.0.norm1.bias", "decoder.up.1.block.0.norm1.weight", "decoder.up.1.block.0.norm2.bias", "decoder.up.1.block.0.norm2.weight", "decoder.up.1.block.1.conv1.conv.bias", "decoder.up.1.block.1.conv1.conv.weight", "decoder.up.1.block.1.conv2.conv.bias", "decoder.up.1.block.1.conv2.conv.weight", "decoder.up.1.block.1.norm1.bias", "decoder.up.1.block.1.norm1.weight", "decoder.up.1.block.1.norm2.bias", "decoder.up.1.block.1.norm2.weight", "decoder.up.1.block.2.conv1.conv.bias", "decoder.up.1.block.2.conv1.conv.weight", "decoder.up.1.block.2.conv2.conv.bias", "decoder.up.1.block.2.conv2.conv.weight", "decoder.up.1.block.2.norm1.bias", "decoder.up.1.block.2.norm1.weight", "decoder.up.1.block.2.norm2.bias", "decoder.up.1.block.2.norm2.weight", "decoder.up.1.upsample.conv.conv.bias", "decoder.up.1.upsample.conv.conv.weight", "decoder.up.2.block.0.conv1.conv.bias", "decoder.up.2.block.0.conv1.conv.weight", "decoder.up.2.block.0.conv2.conv.bias", "decoder.up.2.block.0.conv2.conv.weight", "decoder.up.2.block.0.norm1.bias", "decoder.up.2.block.0.norm1.weight", "decoder.up.2.block.0.norm2.bias", "decoder.up.2.block.0.norm2.weight", "decoder.up.2.block.1.conv1.conv.bias", "decoder.up.2.block.1.conv1.conv.weight", "decoder.up.2.block.1.conv2.conv.bias", "decoder.up.2.block.1.conv2.conv.weight", "decoder.up.2.block.1.norm1.bias", "decoder.up.2.block.1.norm1.weight", "decoder.up.2.block.1.norm2.bias", "decoder.up.2.block.1.norm2.weight", "decoder.up.2.block.2.conv1.conv.bias", "decoder.up.2.block.2.conv1.conv.weight", "decoder.up.2.block.2.conv2.conv.bias", "decoder.up.2.block.2.conv2.conv.weight", "decoder.up.2.block.2.norm1.bias", "decoder.up.2.block.2.norm1.weight", "decoder.up.2.block.2.norm2.bias", "decoder.up.2.block.2.norm2.weight", "decoder.up.2.upsample.conv.conv.bias", "decoder.up.2.upsample.conv.conv.weight", "decoder.up.3.block.0.conv1.conv.bias", "decoder.up.3.block.0.conv1.conv.weight", "decoder.up.3.block.0.conv2.conv.bias", "decoder.up.3.block.0.conv2.conv.weight", "decoder.up.3.block.0.norm1.bias", "decoder.up.3.block.0.norm1.weight", "decoder.up.3.block.0.norm2.bias", "decoder.up.3.block.0.norm2.weight", "decoder.up.3.block.1.conv1.conv.bias", "decoder.up.3.block.1.conv1.conv.weight", "decoder.up.3.block.1.conv2.conv.bias", "decoder.up.3.block.1.conv2.conv.weight", "decoder.up.3.block.1.norm1.bias", "decoder.up.3.block.1.norm1.weight", "decoder.up.3.block.1.norm2.bias", "decoder.up.3.block.1.norm2.weight", "decoder.up.3.block.2.conv1.conv.bias", "decoder.up.3.block.2.conv1.conv.weight", "decoder.up.3.block.2.conv2.conv.bias", "decoder.up.3.block.2.conv2.conv.weight", "decoder.up.3.block.2.norm1.bias", "decoder.up.3.block.2.norm1.weight", "decoder.up.3.block.2.norm2.bias", "decoder.up.3.block.2.norm2.weight", "decoder.up.3.upsample.conv.conv.bias", "decoder.up.3.upsample.conv.conv.weight".

    Sam_A
    Author
    Feb 4, 2025

    @PetePabloΒ Download the VAE from the instructions in the workflow.

    RetroEvanFeb 14, 2025

    @Sam_AΒ worked for me, thank you

    Ash51Feb 25, 2025

    @Sam_AΒ where to put it and whiche vae please direct links

    Sam_A
    Author
    Feb 25, 2025

    @Ash51Β The link for VAE and all other models are in the workflow! The BIG red note at the start of the workflow.

    WellFormedMonkeyFeb 5, 2025Β· 2 reactions
    CivitAI

    i keep getting this error

    Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

    Sam_A
    Author
    Feb 5, 2025

    Which node return this error? Are you trying to run the firtst time offline?

    jovannskyler405Feb 8, 2025Β· 1 reaction

    I had this error. I found my llava-llama-3-8b-text-encoder-tokenizer safetensor files were corrupted somehow as their SHA256 did not match those on Hugging Face. You can verify SHA256 on windows through PowerShell command File-GetHash. Redownload of the files fixed the problem.

    JankolonkoFeb 5, 2025Β· 2 reactions
    CivitAI

    I got this error after a fresh install of a ComfyUI. How is it possible to fit almost 16GB LLM model in GPU? I have 16GB VRAM but got a not enough memory error. So how its possible to run even on 12GB? Can someone help me please?
    sformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.

    Loading text encoder model (vlm) from: C:\pinokio\api\comfy.git\app\models\LLM\llava-llama-3-8b-v1_1-transformers

    Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [02:50<00:00, 42.65s/it]

    Text encoder to dtype: torch.bfloat16

    !!! Exception during processing !!! Allocation on device

    Traceback (most recent call last):

    File "C:\pinokio\api\comfy.git\app\execution.py", line 327, in execute

    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

    File "C:\pinokio\api\comfy.git\app\execution.py", line 202, in get_output_data

    return_values = mapnode_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

    File "C:\pinokio\api\comfy.git\app\execution.py", line 174, in mapnode_over_list

    process_inputs(input_dict, i)

    File "C:\pinokio\api\comfy.git\app\execution.py", line 163, in process_inputs

    results.append(getattr(obj, func)(**inputs))

    File "C:\pinokio\api\comfy.git\app\custom_nodes\comfyui-hunyuanvideowrapper\nodes.py", line 684, in loadmodel

    text_encoder = TextEncoder(

    File "C:\pinokio\api\comfy.git\app\custom_nodes\comfyui-hunyuanvideowrapper\hyvideo\text_encoder\__init__.py", line 167, in init

    self.model, self.model_path = load_text_encoder(

    File "C:\pinokio\api\comfy.git\app\custom_nodes\comfyui-hunyuanvideowrapper\hyvideo\text_encoder\__init__.py", line 64, in load_text_encoder

    text_encoder = text_encoder.to(device)

    File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\transformers\modeling_utils.py", line 3110, in to

    return super().to(*args, **kwargs)

    File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 1340, in to

    return self._apply(convert)

    File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply

    module._apply(fn)

    File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply

    module._apply(fn)

    File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 927, in _apply

    param_applied = fn(param)

    File "C:\pinokio\api\comfy.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 1326, in convert

    return t.to(

    torch.OutOfMemoryError: Allocation on device

    Got an OOM, unloading all loaded models.

    Prompt executed in 1870.39 seconds

    Same error, no idea how to fix it, sadly. I'll keep looking and post back here if I find anything.

    stylobcnFeb 18, 2025

    same error :(

    Machine_SpiritFeb 7, 2025
    CivitAI

    I get this error on the text decoder download:
    DownloadAndLoadHyVideoTextEncoder

    Failed to import transformers.models.timm_wrapper.configuration_timm_wrapper because of the following error (look up to see its traceback): cannot import name 'ImageNetInfo' from 'timm.data' (C:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\timm\data\__init__.py)

    Sam_A
    Author
    Feb 7, 2025

    Your comfy and nodes are updated?

    Machine_SpiritFeb 7, 2025

    @Sam_AΒ Yes, just did update all to make sure, still the same error

    Sam_A
    Author
    Feb 7, 2025

    Are you using the workflow in the first run with internet? Because it will download some models. I'm not sure if it's the problem.

    Machine_SpiritFeb 7, 2025

    @asd231734624Β  Thank you! I did that (changed directory so the embeded pythyon gets updated) and got this error when I tried the workflow again:

    No such file or directory: "C:\\ComfyUI_windows_portable\\ComfyUI\\models\\LLM\\llava-llama-3-8b-v1_1-transformers\\model-00001-of-00004.safetensors"

    I then followed this advice:
    https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/81
    and deleted the folders in ComfyUI_windows_portable\ComfyUI\models\LLM
    because it seems the first try the download started something went wrong.

    I tried again and the download starts again now. Don't be irritated if it stays on "fetching files" a while, it's 15GB, for anyone who reads this.

    Then I got this error on a new run:
    HyVideoTextImageEncode

    unsupported operand type(s) for //: 'int' and 'NoneType'

    and used this link to downgrade transformer version:
    https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/269

    after that, new error:
    HyVideoModelLoader

    Can't import SageAttention: No module named 'sageattention'

    This says we need triton, that is not supporting windows:
    https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/108

    So this workflow does not work for windows I guess?

    Sam_A
    Author
    Feb 7, 2025

    @DUDE33Β Yeah. It works on windows. You just need to install the module the error says is missing in your venv. sageattention.

    Machine_SpiritFeb 7, 2025

    @Sam_AΒ Ok, thanks for answering! I'm just irritated because the forum post says it needs triton, which is not available for windows. I now found this vid https://www.youtube.com/watch?v=DigvHsn_Qrw
    Do I have to do all that or is there a easier way?

    Sam_A
    Author
    Feb 8, 2025

    @DUDE33Β If the only error now is the missing "sageattention", you just need to install it in your enviroment using pip and it will work. It's not normal to have so much problems trying to use this workflow. Idk what is happening with you in special. lol

    Machine_SpiritFeb 8, 2025

    @Sam_A I'm just new to this haha, like a lot of other people here. I installed sageattention via pip command.

    PS:
    C:\ComfyUI_windows_portable\python_embeded> python.exe -m pip install sageattention

    Collecting sageattention

    Downloading sageattention-1.0.6-py3-none-any.whl.metadata (5.6 kB)

    Downloading sageattention-1.0.6-py3-none-any.whl (20 kB)

    Installing collected packages: sageattention

    Successfully installed sageattention-1.0.6

    PS:

    C:\ComfyUI_windows_portable\python_embeded> python.exe -m pip install --upgrade sageattention

    Requirement already satisfied: sageattention in c:\users\X\appdata\local\programs\python\python312\lib\site-packages (1.0.6)


    Restarted comfyui but still get the error, did I do something wrong?
    Also tried via the git url

    asd231734624Feb 8, 2025Β· 1 reaction

    @DUDE33Β I had same problem and change Sampler sdpa to comfy then worked.

    Machine_SpiritFeb 9, 2025

    @asd231734624Β Thank you very much =). The workflow works now, sage just wont work I guess, if anyone has an idea what to check, let me know.

    mkDanielFeb 7, 2025Β· 6 reactions
    CivitAI

    What does Can't import SageAttention: No module named 'sageattention' error mean?

    genuralFeb 8, 2025

    You need to install SageAttention in your venv environment of your ComfyUi installation.

    pip install sageattention

    I think that you might also need to install triton:

    https://www.youtube.com/watch?v=DigvHsn_Qrw

    civitai7_Feb 11, 2025

    I did pip install sageattention as genural wrote.. that didn't fix the problem. I didn't go through the youtube video tutorial as people in the comments are still complaining that it doesn't work. So I saw in the workflow that it stops at node "HunyuanVideo Model Loader" and attention_mode references this sageattention. So I simply changed it to the comfy option. That worked. Try the other options also, as I will. Not sure how the quality or speed is affected. If it's just speed, I'm not going to whine too much about it. If it affects quality without sageattention, then that's unfortunate. If someone has sageattention working, please comment on if there are quality differences, or if it's just speed.

    NeverWasFeb 11, 2025

    @civitai7_Β good

    IdelacioFeb 12, 2025
    CivitAI

    I'm getting the following error-

    HyVideoTextImageEncode

    text input must be of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples).

    Everything left as default aside from the uploaded image, which is similar to the test ones. I did have to manually download the xtuner llm as the workflow was unable to locate the source from hugging face.

    Kijai LLM DID download from source but gives the following error-

    HyVideoTextImageEncode

    Only vision_languague models support image input

    LexiBarberFeb 13, 2025Β· 3 reactions
    CivitAI

    I've run into an error I can't work out. Every time the compile gets to this point, ComfyUI crashes out and disconnects. Does anyone have any idea what's going on here?

    It can't seem to get past the HyuanVideo Sampler. I'm not a coder, so any help would be appreciated.

    My GPU is a 4090 with 24GB of RAM - So I figure it should be able to handle it.

    Here's the process:

    got prompt

    encoded latents shape torch.Size([1, 16, 1, 96, 54])

    Loading text encoder model (clipL) from: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\models\clip\clip-vit-large-patch14

    Text encoder to dtype: torch.float16

    Loading tokenizer (clipL) from: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\models\clip\clip-vit-large-patch14

    2025-02-13 19:44:31,692 WARNING: Warn!: You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.

    Loading text encoder model (vlm) from: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\models\LLM\llava-llama-3-8b-v1_1-transformers

    Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:10<00:00, 2.58s/it]

    Text encoder to dtype: torch.bfloat16

    Loading tokenizer (vlm) from: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\models\LLM\llava-llama-3-8b-v1_1-transformers

    2025-02-13 19:44:48,341 WARNING: Warn!: Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50.

    2025-02-13 19:44:48,546 WARNING: Warn!: Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50.

    vlm prompt attention_mask shape: torch.Size([1, 208]), masked tokens: 208

    clipL prompt attention_mask shape: torch.Size([1, 77]), masked tokens: 17

    model_type FLOW

    The config attributes {'use_flow_sigmas': True, 'prediction_type': 'flow_prediction'} were passed to FlowMatchDiscreteScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.

    Scheduler config: FrozenDict({'num_train_timesteps': 1000, 'flow_shift': 9.0, 'reverse': True, 'solver': 'euler', 'n_tokens': None, '_use_default_values': ['n_tokens', 'num_train_timesteps']})

    Using accelerate to load and assign model weights to device...

    Loading LoRA: img2vid with strength: 1.0

    Requested to load HyVideoModel

    loaded completely 21443.013157653808 12555.953247070312 True

    Input (height, width, video_length) = (768, 432, 73)

    The config attributes {'use_flow_sigmas': True, 'prediction_type': 'flow_prediction'} were passed to FlowMatchDiscreteScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.

    Scheduler config: FrozenDict({'num_train_timesteps': 1000, 'flow_shift': 9.0, 'reverse': True, 'solver': 'euler', 'n_tokens': None, '_use_default_values': ['n_tokens', 'num_train_timesteps']})

    Single input latent frame detected, LeapFusion img2vid enabled

    Sampling 73 frames in 19 latents at 432x768 with 9 inference steps

    100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9/9 [01:20<00:00, 9.00s/it]

    Allocated memory: memory=12.301 GB

    Max allocated memory: max_memory=16.878 GB

    Max reserved memory: max_reserved=18.969 GB

    C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable>pause

    Press any key to continue . . .

    miraishounenFeb 15, 2025

    did you fix it

    nymicalFeb 17, 2025

    Did you try restarting you PC.
    I'm not sure about your particular problem, but whenever comfy gives this "Press any key to continue . . .", without any particular reason, and I'm sure that memory isn't a problem. I close everything and restart the PC. That has always worked for me.

    LexiBarberFeb 18, 2025

    @nymicalΒ thanks for the tip! I'll set up a clean install of comfy and see if I can get it working - if I run into that problem again, I'll give your suggestion a shot.

    LexiBarberFeb 18, 2025

    @miraishounen not yet I'm afraid. I'll let you know of my progress!Β 

    vim_brigantFeb 14, 2025
    CivitAI

    Hi, could someone offer a little advice for getting better results? I always get something faded with crosshatch lines. I posted this to demonstrate: https://civitai.com/posts/12909266 That's one of the better results I got. I kept the settings almost identical to those in the v1.1 workflow, only replacing the sexy dance lora and the source image. This was the source image in case anyone wants to replicate it: https://civitai.com/images/54695445

    VIRTUALISFeb 14, 2025Β· 1 reaction
    CivitAI

    Getting this after doing the solution to the //: 'int' and 'NoneType'" error:

    "Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50."

    I don't know where to write that, if that's even the solution

    bgbg001516Feb 22, 2025

    I had the //: 'int' and 'NoneType'" error: rolling back the tansformers from v4.48.0 to python.exe -m pip install transformers==4.47.0 fixed it for me.

    gambikules858Feb 23, 2025

    installing 4.47.0 doesnt help me

    FireRaidenFeb 14, 2025
    CivitAI

    Can you make a video how to setup the workflow? I dont understand how "I Select the "Long Side of Image" you wish (before upscale)"

    Sam_A
    Author
    Feb 14, 2025

    It's simpler than you think. You will add an input image in the workflow. You just need to define what will be the bigger size of the image and the workflow will auto calculate the smaller size for you.

    zorgavorkFeb 15, 2025Β· 3 reactions
    CivitAI

    im having a problem with
    HyVideoSampler
    Failed to find C compiler. Please specify via CC environment variable

    i can't get through this

    iksatgoFeb 18, 2025

    running into the same issue after installing triton + sageattention

    62605Feb 23, 2025
    stex7722Feb 21, 2025
    CivitAI

    I have a problem with this one:

    "HyVideoTextImageEncode

    text input must be of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples)."

    Do you have a solution? Thanks

    Sam_A
    Author
    Feb 21, 2025

    Can you show me your prompt?

    stex7722Feb 22, 2025

    The one you left as an example:

    "masterpiece, best quality of <image> girl dancing moving her hips softly."

    stex7722Mar 1, 2025Β· 1 reaction

    I solved it, thanks anyway!

    tetrarrow842Feb 22, 2025
    CivitAI

    somehow, i get the error:

    "Only vision_languague models support image input"
    it seems that the text-image-encode wouldn't take the image as prompt?

    Sam_A
    Author
    Feb 22, 2025

    Did you change the original encoder?

    tetrarrow842Feb 22, 2025

    oh,okay, i used the text-encoder-tokenizer instead of the transformer,is that the problem?

    Sam_A
    Author
    Feb 22, 2025

    @tetrarrow842Β Probably yes. To use image as prompt

    bgbg001516Feb 22, 2025Β· 1 reaction
    CivitAI

    Finally got i working, but how did you get the animated images to look the same as the original, mine turn into a completely different image.

    Sono1050Feb 22, 2025
    CivitAI

    Have fixed other errors I encountered but keep receiving this error and can not get past it:

    "HyVideoModelLoader

    Error while deserializing header: HeaderTooLarge File path: /workspace/ComfyUI/models/diffusion_models/hunyuan_video_FastVideo_720_fp8_e4m3fn.safetensors The safetensors file is corrupt or invalid. Make sure this is actually a safetensors file and not a ckpt or pt or other filetype."

    Also get the same type of error for the vae file as well.

    ClocksmithFeb 22, 2025
    CivitAI

    I'm afraid the triton dependencies make this one a no go for me. Wish I could make it work but I don't want to spend all day debugging that crap.

    Sam_A
    Author
    Feb 22, 2025

    To be true, the new workflow I posted is better. Newer tech and don't use triton.

    ClocksmithFeb 23, 2025

    @Sam_AΒ That would be awesome. But the one I got from the download link required triton. Do you have a link to the non-triton one?

    Sam_A
    Author
    Feb 23, 2025Β· 1 reaction

    @ClocksmithΒ Of course bro! This one: https://civitai.com/models/1278247/skyreels-hunyuan-img2vid
    It's newer and the result is better. It's slower becuase they didn't release the fast lora (that return result with less steps) for this version yet. But it's amazing!

    gambikules858Feb 23, 2025
    CivitAI

    i have this error now

    'img_in.proj.weight' in HunyuanVideo Model Loader node

    Sam_A
    Author
    Feb 23, 2025Β· 1 reaction

    You know there is a newer I2V model?

    https://civitai.com/models/1278247/skyreels-hunyuan-img2vid

    Easier to install and better results.

    voidyearFeb 26, 2025
    CivitAI

    Calculated padded input size per channel: (0 x 16 x 16). Kernel size: (1 x 1 x 1). Kernel size can't be greater than actual input size
    When I use the decoding node, it prompts this error. I don&#039;t get an error when I use Python3.11.6 and torch.3.0 and cu121, but I do get an error when I use Python3.12.8 and torch2.6.0 cu126. The resolution I adjusted is 408*496.

    Sam_A
    Author
    Feb 26, 2025

    408 / 16 = 25,5. Not a multiple of 16. Hence I made nodes to auto calculate dimentions, so this error doesn't happen. Change the image size so something multiple of 16.

    voidyearFeb 27, 2025

    @Sam_AΒ Thank you for your reply. I will try to modify the size to see if it works. However, I don&#039;t know why torch2.3 to be able to run with a size of 408, even if it is not a multiple of 16.

    Sam_A
    Author
    Feb 27, 2025

    @voidyearΒ It's interesting to be true. I didn't know about it.

    GigaThiccMar 1, 2025Β· 1 reaction
    CivitAI

    Great work! Love it, works great in my first tests. Am curious about output length -- I have tried 97 frames and can only seem to get 3 seconds. No errors, no black output, but it will not create any videos longer than the default 73 frames. I have tried with a smaller resolution image to no avail.

    Any tips?

    I'm using L40SδΈ¨48GB vramδΈ¨32GB ram so I think vram is not the issue...

    Sam_A
    Author
    Mar 1, 2025

    Hmmm. It's supposed to work when you change the num_frames in Sampler. Btw, with 48GB Vram you can try to run 201 frames. It will Loop the video!

    GigaThiccMar 1, 2025Β· 1 reaction

    i still haven't gotten it to do more than 73 frames! i will try with 201 later. thanks!

    zoom83Mar 2, 2025Β· 5 reactions
    CivitAI

    tested it today but got:
    * HyVideoSampler 6:

    - Return type mismatch between linked nodes: context_options, received_type(FETAARGS) mismatch input_type(HYVIDCONTEXT)

    Output will be ignored

    EldrinMar 6, 2025

    wanted to mention my troubleshooting process for this spoiler alert it did not work:

    So I found out any sort of errors comfyui will circle with red context options under the node hunyuan video sampler connects to hunyuan video enhance node feta args
    I disconnected these and things started to load after queue

    then it tried downloading text encoders llava-llama-3-8b-text-encoder-tokenizer or llava-llama-3-8b-v1_1-transformers there was no telling out long it would take to download so i manually downloaded the files myself so I could track the ETA. then I got this error

    DownloadAndLoadHyVideoTextEncoder

    Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

    to try and fix this i verified SHA256 hash for the model safetensor files in LLM folder for the llava-llama-3-8b-v1_1-transformers text encoder 2 of the files were correct so I redownloaded and the hashes matched hugging face but no dice. then I tried using the llava-llama-3-8b-text-encoder-tokenizer text encoder that would not work either getting the same error.

    I feel like disconnected the node for context options caused this so I'm back to wondering how we fix our issue of

    DownloadAndLoadHyVideoTextEncoder

    Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

    it would seem like feta args on enhance should be connected to feta args on sampler but trying that gives me
    DownloadAndLoadHyVideoTextEncoder

    Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

    Sam_A
    Author
    Mar 6, 2025Β· 1 reaction

    @redoctober2Β  @zoom83 Being 100% honest, this model is already obsolete. It uses a Lora model to become a I2V workflow. It was good when we had no native workflows. But now we have it:

    Wan2.1 (Best Quality, high Vram usage for great results)

    https://civitai.com/models/1300201/wan-ai-img2vid-video-extend

    Skyreels (Hunyuan Variant) (Good Quality, Mid Vram usage)

    https://civitai.com/models/1278247/skyreels-hunyuan-img2vid

    Hunyuan WF (I don't like the quality so much but I'm still testing. Lowest Vram usage and FAST lora!)

    https://civitai.com/models/1328592/hunyuan-wf-img2vid-fast

    rhasan1903783May 22, 2025Β· 1 reaction
    CivitAI

    I dont know Im doing wrong. I use the provided workflow but it takes 1.5 hrs to generate a 5-second clip. And thats with sageattention and a RTX 5080.

    Try to use Framepack I2V Hunyuan or ggfu with a 5080
    16gb vram is not really good to use Hunyuan or Wan.
    I rent a 5090 32gb on https://runpod.io?ref=gnspz552

    birlanesan983May 29, 2025
    CivitAI

    Hi,

    im getting this error
    any fix
    HyVideoSampler

    cannot access local variable 'original_latents' where it is not associated with a value

    Workflows
    Hunyuan Video

    Details

    Downloads
    7,110
    Platform
    CivitAI
    Platform Status
    Available
    Created
    1/27/2025
    Updated
    4/30/2026
    Deleted
    -

    Files

    hunyuanImage2videoJan_v11.zip

    Mirrors

    Huggingface (1 mirrors)