Originally shared on GitHub by guoyww
Learn about how to run this model to create animated images on GitHub.
Choose the version that aligns with the version your desired model was based on.
Description
FAQ
Comments (87)
Hello, the SDXL version just gulps up VRAM. I can't get an animation more than 512x512 because I get an error out of VRAM. I have an 4090 RTX . I use Auto1111. Any ideas?
Get a 5090
@bigTiddyLover @TheP3NGU1N @guoyww The official SDXL github repo branch states the following: "Inference at recommended resolution of 16 frames usually requires ~13GB VRAM.". But I have not be able to get it down to this. Memory usage blows up during inference and goes (B)OOM. I wondering if anyone here was able to achieve inference for SDXL standard resolution with about 13Gb VRAM only yet? Maybe there is some important aspects like specific versions of the libraries such as xformers or accelerate.
How did you configure it on A1111?
With --api --xformers --no-half-vae --disable-safe-unpickle I'm able to get 1024x640 (3060@12GB copaxtimeless, no Loras, automatic1111).
@schielo Interesting. I do have a lot of other extensions, etc plugged in. I will do a fresh repo clone to see if that fixes it. On another note, is comfyui better in terms of VRAM usage/speed?
Using the repo directly 768x768 pixels for 16 frames requires about 16GB of VRAM.
@bigTiddyLover Haven't tried it (hooked on Deforum and other automatic1111 extensions), but most people say comfy uses less ressources/is a little faster. My resolution is for 32 frames btw, any more and I start getting cuda errors.
@schielo Number of frames is the length of the animation. Resolution is like 512x512 or 1024x1024, i.e. number of pixels. Do you generate in SDXL standard resolution?
@Hevok Finished the clip: https://youtu.be/IUcT96rLHzs This was 32frames, context batch size 16, looped (each scene playedx2), interpolated 2xFILM+3xRIFE, upscaled and blended together with windows movie maker. Original resolution was 1024x640px (so I cut the borders/distorted the image a little for 4k 16:9).
@schielo This is actually very impressive! Well done. You basically made a little short movie already with it and it looks extremely decent. Perhaps using the native resolution SDXL was trained on would make it even better: https://www.reddit.com/r/StableDiffusion/comments/15c3rf6/sdxl_resolution_cheat_sheet
Unfortunately there is no 16:9 aspect ratio. 1344x768 seems the closest to it. I am wondering why you are applying both FILM and RIFE? Is there any special benefit gained from combining those two interpolators compared to just using either one alone?
@Hevok I've tried different 16:9 XL resolutions with my Deforum clips, normally I prefer 1280x704 (1344px does not look better, just uses more resources imho). FILM + RIFE is due to the AnimateDiff options/my batch creation workflow. It does allow automatic FILM interpolation, but I think RIFE (Flowframes) is a little bit faster, so I normally use 4x RIFE after upscaling. The clip might have looked better with cherrypicking (I took almost all the scenes I got on the first try) and with longer, non looped scenes, I don't think the resolution changed much.
@schielo Oh you even used all generated clips without selection? Wow. Copax TimeLessXL seems good for realistic animations. I noticed some models perform better while other worse with the motion module. How about adding just few pixels (+16) to the height? This would result in 1280x720 which would be 16:9 resolution, i.e. 720p (HD). In this way you would not have to trim out any pixels and with 2 times upscaling achieve 1440p (2k) 2560x1440 as well as with 3 times upscaling get to 2160p (4K) 3840x2160.
Time to shift to comfyUI
@Hevok That's not a supported resolution for most of the stuff I do (Deforum - and I intend to combine/merge clips with AnimateDiff sequences some time), but might be an option for others. ;)
Use ComfyUI. Automatic1111 is sooooo 2023/Q2 :D
System Fallback on CUDA, worst case scenario...
I've written a tutorial on prompt travel, introducing how it work with animatediff
Need this online
Has anyone found a way to make AnimateDiff work with TensorRT? It generates for me, but all frames are different
For some reason yet unclear to me it always does 2 animations in 1 batch which is kinda annoying since the finished animation includes both after one another. In the regular sampling settings batch size and batch count are both set to 1 and I cannot find the responsible setting in AnimateDiff either. Any advice?
You may check if your prompt is too long. I encountered similar problem and reducing the prompt length helps. I suspect it may be caused by the CLIP model’s input length limit. Although A1111 has already solved that problem long ago, the extension may have some conflicts.
@stormriver Yeah, what this person said. If I recall, your prompt needs to be 50ish tokens to get one long coherent animation. Loras bump that up in a way that is not transparent when looking at the webui. I had a prompt that was only 45 tokens, but because I used a character lora, it pushed animatediff past its comfort zone.
Thank you guys. :) This solved my problem! Would've never thought of it myself.
@StableFocus Awesome, glad it worked. I'm still working on it, because the character lora was the whole point of that particular animatediff experiment.
Enable Pad prompt/negative prompt to be same length in Settings/Optimization and click Apply settings. You must do this to prevent generating two separate unrelated GIFs. Checking Batch cond/uncond is optional, which can improve speed but increase VRAM usage.From 【continue-revolution/sd-webui-animatediff: AnimateDiff for AUTOMATIC1111 Stable Diffusion WebUI (github.com)——How to Use】
Why all I get is just a grid of images?
How are you using it? What did you try?
you need to pick a model for animatediff.
turn on "MP4" or "GIF", If in comfyUI, Use VideoCombine instead of SaveImage
I know this is a month later, but for anyone else having this issue, first double-checck the animatediff model you picked. Make sure you didn't choose the json file instead of the checkpoint. Whoops
Anyone got any good results with the mm_sdxl_v10_beta.ckpt ? What settings do you use?
I certainly have not. The results are pretty awful. I've tried it with Juggernaut XL and I wonder if a different model would work better, but haven't downloaded another one to try.
is this still work with 8GB VRam?
Does anyone have a proper workflow with this? I cannot seem to get a good quality video out of it
You need two custom nodes: AnimateDiff Evolved and Video Helper Suite.
In the AnimateDiff Evolved folder, put the Animation model you downloaded here into its model folder. Then you'll be able to select it in the AnimateDiff Loader node.
Load the AnimateDiff Loader node right before a normal KSampler. All inputs are optional except for the model : just plug the Checkpoint Loader into that.
In the Empty Latent Upscale node, DON'T FORGET to raise the batch size (that's the number of frames that will be generated). Load the output with VHS Video Combine node.
That's it. Basically just AnimateDiff Loader plugged into a normal Ksampler plugged into Video Combine, along with the necessary inputs (a model, prompts, an empty latent image).
@dante012d29328229 "In the Empty Latent Upscale node, DON'T FORGET to raise the batch size (that's the number of frames that will be generated). Load the output with VHS Video Combine node."
Is this post animatediff, or pre animatediff?
I used a workflow that used a "Empty Latent Image" not Upscale, a large batch size '32'. While that works, it was resource heavy. I found that using the "RepeatImageBatch" node after a single image generation worked wonders with less generation time. As you're not generating the other 31 images from the empty latent.
Just it wasn't a upscale workflow, so would you plug the upscale in before or after the Vae Decode to image that goes to the VHS Video Combine?
@AlArt84 Do you think you could share a workflow for the RepeatImageBatch part? Since i don't really understand how you're getting generations that aren't just noise with a single latent batch.
@Genie123 Sure give me a few days, I'll make a post on it as a screenshot can't be upload in the comments.
@Genie123 I have uploaded a small hopefully simple workflow for ComfyUI that uses the RepeatImageBatch. I was typically getting basically Idle Animations using a generated image, still kind of do if I use IPAdapter. However the introduction of PowerNoise and Latent Blending helps. Check out the workflow here: https://civitai.com/models/277339/animated-diff-from-generated-base-image-optional-ip-adaptor
@AlArt84 Alright, I'll check it out when i'm able to. Thanks
v3 when? Pls, the motion module is amazing but needs work
Please help me:
Isn't 'animatediff Motion' from SDXL? Why is it wrong for me to use it as the basic model and 'aaa' as the animatediff model?
The error is as follows:
File "/root/sd-webui-aki-v4.5/extensions/sd-webui-animatediff/scripts/animatediff.py", line 51, in before_process
motion_module.inject(p.sd_model, params.model)
File "/root/sd-webui-aki-v4.5/extensions/sd-webui-animatediff/scripts/animatediff_mm.py", line 65, in inject
assert sd_model.is_sdxl == self.mm.is_xl, f"Motion module incompatible with SD. You are using {sd_ver} with {self.mm.mm_type}."
AssertionError: Motion module incompatible with SD. You are using SD1.5 with MotionModuleType.AnimateDiffXL.
It's probably trained diffrent for the motion
@brianzivkovich02575 thank you.I have solved it, it's because the model didn't work properly
can you offer a safetensor file so we don't have to convert or enable the skip safe pickle check
When can we use it onsite? 💀
Not anytime soon, They are having problems with image generation, video generation is a whole 'nother beast
I keep getting this:
NansException: A tensor with all NaNs was produced in VAE. This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.
I tried both but nothing, Disabling NAN just produce flat black imagens
that means the checkpoint has error, try redownloading it
I had the same issue, I solved it by ticking the
"Automatically revert VAE to 32-bit floats (triggers when a tensor with NaNs is produced in VAE; disabling the option in this case will result in a black square image)"
in the VAE settings.
Doesnt work: The file may be malicious, so the program is not going to read it.
Omg... why???
--disable-safe-unpickle for anyone wondering
Why is the SDXL quality so blurry? Even the OP's examples look very blurry/hazy. Is this just a limit of the motion model?
More of a Computer Limit, you need XL Models to run a high resolution and animate Diff takes a lot of ressources for XL, aside that a typical XL Model is three time´s the size of an SD Model, you gotta batch at least 16-20 frames at the same time, so yeah it looks sadly blurred, but its hard to make it better without a top of the line GPU aka A100 series or 4090 to make something with a high resolution. Most SD 1.5 Models just look better if you mix in hires.fix but its still might look just a little instant crappy for it XD.
@Subtra I have a 4090, and generated 1024x1024 at 20 steps. It takes forever, but still the results have the hazy/blurry quality to them that's very unbefitting of SDXL. So, I don't think this quality issue is related to specs.
@Foxbite strange what kind of sampler do you use? I dont have blurry images, but i used 1280 x 1280 as a base resolution. DPM3 SDE and the sdxl vae together with the xl motion module, less blurry, but that motion module is far from finished either way.
@Subtra I used Euler a which seems to have the most stability, but I've tried them all, including DPM3 SDE. All of them showed signs of degraded quality. Do you have any examples of the results you're getting? I would love to compare.
I can't get this working for the life of me.
All of the gifs I try to make end up looking like absolute chaos, turning into this weird, out-of-focus mosaic pattern, almost like a really low resultion version of what a Google DeepDream from back in the day would look like, except you can't make out a single object or creature.
It's hard to say without a workflow or any information to go on, but if I had to take a wild stab at it I'd say try increasing your latent batch size to the number of frames you need, and make sure your AnimateDiff settings reflect that. You may have tried to process an animation on one single frame. Hope you can get it figured out!
I'm assuming you are using Automatic1111, that happens to me too after some time. The only solution was to reboot the whole PC and start again. This of course is really annoying so I switched to ComfyUI and that problem doesn't happen there. I'm assuming it's a VRAM issue (have 12GB)
same problem here
same issue, tried a buncha checkpoints, cfg levels, beta schedulers, step counts, with/without loras, I cant get a good sdxl gen with this model
This issue is a result of some bug where the CUDA resource gets reserved and not released. There shouldn't be a need to reset your computer. However, you will need to reset A1111, and then generate a single image with AnimatedDiff and other features OFF. This should release the memory and allow you to try again.
It almost always occurs when I pick a specific combination of settings. Just avoid those settings. Kind of trial and error.
I get the same thing, using a ComfyUI graph that Krita set up. Looks like psychedelic tiling noise.
I just want to say thank you to Civit for hosting these models for free for the world to download. You are doing the world a service and you are a good company. Thank you :)
And thanks to the animatediff team as well!
I download the model, place it in the desired folder, but the model does not appear in the list of models available for generation, although other models are visible. Maybe someone has encountered this?
works very fast
AnimateDiff-Lightning is a lightning-fast text-to-video generation model.
comfyui workflow
I'm using ComfyUI integration in Krita. How would I go ahead and make these integrations work?
Getting the following error when using AnimateDiff
RuntimeError: mixed dtype (CPU): expect parameter to have scalar type of Float
Anyone know how to fix?
I did. The only fix i found was to close comfy and restart it without changing any settings. This seems to get the program to decide on either using cpu or gpu, preventing it from getting confused
https://media.discordapp.net/attachments/525706501390598158/1246567851310055585/image.png?ex=665cdc47&is=665b8ac7&hm=f331fa2dbc5381e0961da55e0d4f63e03672f4020fec521934b0db5ec5dbecbb&=&format=webp&quality=lossless
hey, I have an issue. every output I have is just a complete mess, and I have no idea how to fix it. any help? my workflow is linked.
Does anyone been able to run this with Pony Diffusion? It gives me .gif with still object\person and animated noise around it. I suppose the quality can also vary from model to model.
I tried with 4 different pony models and all of them yielded random noise results. Nothing coherent. Very sad. I hope we have a motion model that's compatible with PDXL at some point.
@Foxbite unfortunately, same :(
for pony animations your best bet is hotshotxl and even then it's not that great
Is it possible to make this run with AMD direct-ml? I keep getting errors no matter what animation module I install.
Comfy says "This motion module is intended for SDXL models, but the provided model is type SD1.5." – But in the description in this page there is SD 1.5 listed as base model.
Hi. Hope someone can help. I have added extension in A111. Downloaded motion modules and put them in WebUI\stable-diffusion-webui\extensions\sd-webui-animatediff\model. I have clicked AnimateDiff drop down, loaded a motion module and enabled AnimateDiff - even on very low frame # and FPS - all I am getting is a PNG image. What am I missing - big thanks to anyone who can help.
Hi!
I'm getting errors no matter what I try! I saw one YouTube tutorial using them, followed the instructions and no....
Was trying to find the answer for the last 2 days but no luck.
So I try here, hopefully someone can help!
So this is the error message I'm getting:
...
loading network C:\Users\XY\stable-diffusion-webui\models\Lora\v2_lora_ZoomIn.ckpt: AssertionError
Traceback (most recent call last):
File "C:\Users\XY\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 321, in load_networks
net = load_network(name, network_on_disk)
File "C:\Users\XY\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 254, in load_network
raise AssertionError(f"Could not find a module type (out of {', '.join([x.class.name for x in module_types])}) that would accept those keys: {', '.join(weights.w)}")
AssertionError: Could not find a module type (out of ModuleTypeLora, ModuleTypeHada, ModuleTypeIa3, ModuleTypeLokr, ModuleTypeFull, ModuleTypeNorm, ModuleTypeGLora, ModuleTypeOFT) that would accept those keys: 0.motion_modules.0.temporal_transformer.transformer_blocks.0.attention_blocks.0.processor.to_q_lora.down.weight, 0.motion_modules.0.temporal_transformer.transformer_blocks.0.attention_blocks.0.processor.to_q_lora.up.weight, 0.motion_modules.0.temporal_transformer.transformer_blocks.0.attention_blocks.0.processor.to_k_lora.down.weight,
...
Thank you!!
Try to keep your positive and negative prompts below 75 tokens. That usually solves it.
This is probably not helpful, but what helped me figure all of it out was a lot of time and determination. My pc was crashing, the workflows were not working, but after days and weeks of experimenting, it finally works. Every workflow is different, every pc is different. Keep going.
How do people animate stuff like this? It's really cool and another way to do artwork if you're annoyed with several characters in a prompt.
Details
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.