What does this workflow do?
This will take your input image, crop/resize if needed to the ideal Cosmos render size, then automatically create an appropriate prompt for Cosmos to work its magic! The result will be a hopefully-amazing video that your family can cherish for generations.
This process is dependent on both Florence (for automatically describing the image) and an LLM (for creating a video prompt from the image description).
Further instructions and links are included in the workflow.
Extremely simple operation after initial setup (model load/LLM configuration):
1. Load an input image.
2. Queue prompt. Really, that's it. Every other setting should be good to go.
Have fun, and I look forward to seeing your creations!
Expectations setting: This model is HEAVY. Using the 7B model, with the included (optional) optimizations, I am running at about 15 minutes and 20GB VRAM usage on a 4090 to generate a 121 frame video at 1280x704.
Description
Initial version. Instructions included in workflow.
FAQ
Comments (17)
the CosmosImageToVideoLatent is a missing node for some reason
Hi, please update your ComfyUI and other nodes. Many parts of this workflow rely on very very recent updates (past few days).
@EnragedAntelope I updated to the latest version and still can't get the node.
@dkain76 I just checked and I confirmed again, it definitely is part of the comfy core nodes. If you are definitely at latest comfy (check your commit) then just search for "Cosmos" and you should see it as an option. If it's not showing up in search then it is possible your comfy is not successfully updating.
@EnragedAntelope Got it. I had only updated comfyui and went back to update the nodes. (made sure to make a backup of ReActor since they changed it)
Thanks for sharing the workflow. Could you add some information about where to download the CLIP and VAE models, please?
@galaxytimemachine Sorry for not including that, I will do in future versions of the workflow. Please let me know of any other suggestions for changes/improvement once you give it a go. And share your creations please, would love to see what people are doing with it!
I've been unable to get torch or sage attention to work. I set to sdpa and disabled sage attention but it still bombs out with a long spurious error when it gets to the ksampler.
SDPA should work pretty universally for Florence so that's good.
Try bypassing Torch Compile module in addition to Sage Attention. It'll be slower but should work on most systems. It will take longer however. If you look in the notes of the workflow, I did include a link to a guide to help you get Triton, Sage Attention, etc. installed. But it really is a PITA for most users, Windows users in particular. So up to you if it is worth your time to pursue the optimizations vs just running the wf unoptimized.
@EnragedAntelope additional notes. I managed to get Triton installed and Sage Attention seems to be working (I think since it hasn't given an error when I select Triton from the dropdown menu). I had to replace Advanced Prompt Enhancer with Ollame Generate Advanced and a text concatonator since Enhancer would not communicate with my local Ollama server for some reason (not text output in the Unai window).
Also I had to manually set the resolution to match the aspect ratio of the inoput image using an aspect ratio calculator, I trued using the Comy Laterals aspect node to control the output resolution going into the Image Resize node but it moaned at the ksampler about some divisional stuff. If you could figure out how to do the math coming from the input image to keep the correct aspect ratio and convert to the correct outout resolution that would save a lot of headaches.
It would be nice to be able to control the amount of motion too but it seems fairly random. CFG/step count?
I'm playing around with the workflow and Comos to see what else I can break at the moment ;)
@Mopantsu Well done, getting that stuff installed! I know it's a pain. And yep any LLM solution should work. I've had great success with Advanced Prompt Enhancer/Plush Nodes and use them in almost every workflow, but you found a good solution for you so that's great.
I was trying to think through the math to stay within the tight allowed boundaries of Cosmos. That default res I selected really does seem to give best output but there are a few acceptable AR variants staying with max of 1280 and min of 704.
I think motion is primarily due to how we prompt and the sampler but if you discover something else, would love to hear it.
Thanks for the update!
@Mopantsu can u share the tutorial u used to get sageattention to work? also did you do it with strictly windows? I'm pretty sure I have the triton wheel for windows installed correctly but could never get sageattention to work, it'd just break my comfyui when I tried lol
@bhopping It's over on Reddit. Search for 'How to run Hunyuan on a single 25GB VRAM GPU' or something like that. I could not get the Sage Attention node in that guide to work but an alternate Sage Attention node seems to work fine (Pathch Sage Attention KJ BETA node).
@Mopantsu Thanks bro, I used that reddit post combined with a new install to install sage. There is also a YT video where a guy goes step by step using the post but most will probably not need that.
Does this workflow work with windows 10? I've been thinking about getting a whole new drive just for ubuntu with how much issues I'm having with comfyui in general
It should but I am on Windows 11 so can't comment first-hand on how Windows 10 will handle things. I'd imagine yes, as long as you have python/torch/etc installed and up to date.
You can also try Windows Subsystem for Linux (WSL) if you don't want to install an entirely separate OS. That will give you full Linux functionality under Windows.