🌀 Wan2.1_14B_FusionX — Merged models for Faster, Richer Motion & Detail in as little as 8 steps!
📢 7/1/2025 Update!
New: FusionX Lightning Workflows
Looking for faster video generations with WAN2.1? Check out the new FusionX_Lightning_Workflows — optimized with LightX LoRA to render videos in as little as 70 seconds (4 steps, 1024x576)!
🧩 Available in:
• Native • Native GGUF • Wrapper
(VACE & Phantom coming soon)
🎞️ Image-to-Video just got a major upgrade!!!!!!
Better prompt adherence, more motion, and smoother dynamics.
⚖️ FusionX vs Lightning?
Original = max realism.
Lightning = speed + low VRAM, with similar quality using smart prompts.
☕ Like what I do? Support me here: Buy Me A Coffee 💜
Every coffee helps fuel more free LoRAs & workflows!
📢 Did you know you can now use FusionX as a LoRA instead of a full base model?
Perfect if you want more control while sticking with your own WAN2.1 + SkyReels setup.
🔗 Grab the FusionX LoRAs HERE
🔗 Or Check out the Lightning Workflows HERE for a huge speed boost.
📌 Important Details- Please read the full description below because small changes to settings will provide totally different results in a bad way! I have been testing and already found better settings so just please read below! Thank you :)
💡Workflows can be found HERE (This is a wip and more will be added soon.)
🛠️Updates section has been moved to the end of the description.
A high-performance text-to-video model built on top of the base WAN 2.1 14B T2V model — carefully merged with multiple research-grade models to enhance motion quality, scene consistency, and visual detail, comparable to some of the many close source models.
## 📢 Join The Community!
A friendly space to chat, share creations, and get support.
👉 Click here to join the Discord!
Come say hi in #welcome, check out the rules, and show off your creations! 🎨🧠
💡 What’s Inside this base model:
🧠 CausVid – Causal motion modeling for better scene flow and dramatic speed boot
🎞️ AccVideo – Improves temporal alignment and realism along with speed boot
🎨 MoviiGen1.1 – Brings cinematic smoothness and lighting
🧬 MPS Reward LoRA – Tuned for motion dynamics and detail
✨ Custom LoRAs (by me) – Focused on texture, clarity, and fine details. (These both were set to very low strengths and have a very small impact)
🔥 Highlights:
📝 Accepts standard prompt + negative prompt setup
🌀 Tuned for high temporal coherence and expressive, cinematic scenes
🔁 Drop-in replacement for WAN 2.1 T2V — just better
🚀 Renders up to 50% faster than the base model (especially with SageAttn enabled)
🧩 Fully compatible with VACE
🧠 Optimized for use in ComfyUI, With both the Kijai Wan Wrapper and native nodes.
📌 Important Details for text to video:
🔧 CGF must be set to 1 — anything higher will not provide acceptable results.
🔧 Shift - Results can vary based on Resolution. 1024x576 should start at 1 and if using 1080x720 start at 2. Note: For more realism lower shift values is what you need. If your looking for a more stylized look then test higher shift values between 3-9
Scheduler: Most of my examples used Uni_pc but you can get different results using others. Is really all about experimenting. I noticed depending on the prompt that the flowmatch_causvid works well too and helps with small details.
📌 Important Details for image to video:
🔧 CGF must be set to 1 — anything higher will not provide acceptable results.
🔧 Shift - For image to video I found that 2 is best but you can experiment.
Scheduler: Most of my examples used dmp++_sde/beta and seems to work best but you can experiment.
After testing, to get more motion and reduce the slow-mo look, set your frame count to 121 and frames per second to 24. This can provide up to a 50% motion speed boost.
📌Other Important Details:
⚡ Video generation works with as few as 6 steps, but 8–10 steps yield the best quality. Lower steps are great for fast drafts with huge speed gains.
🧩 Best results using the Kaji Wan Wrapper custom node:
https://github.com/kijai/ComfyUI-WanVideoWrapper🧪 Also tested with the native WAN workflow, generation time is a bit longer but results match wrapper.
❗ Do not re-add CausVid, AccVideo, or MPS LoRAs — they’re already baked into the model and may cause unwanted results.
🎨 You can use other LoRAs for additional styling — feel free to experiment.
📽️ All demo videos were generated at 1024x576, 81 frames, using only this model — no upscaling, interpolation, or extra LoRAs.
🖥️ Rendered on an RTX 5090 — each video takes around 138 seconds with the listed settings.
🧠 If you run out of VRAM, enable block swapping — start at 5 blocks and adjust as needed.
🚀 SageAttn was enabled, providing up to a 30% speed boost. (Wrapper only)
Workflows for each model can be found here: HERE
🚫 Do not use teacache — it’s unnecessary due to the low step count.
🔍 “Enhance a video” and “SLG” features were not tested — feel free to explore on your own. -- Edit. I did test "Enhance a video" and you can get more vibrant results with this turned on. Settings between 2-4. Experiment! SLG has not been tested much.
💬 Have questions? You’re welcome to leave a message or join the community:
👉 Click here to join the Discord!
📝 Want better prompts? All my example video prompts were created using this custom GPT:
🎬 WAN Cinematic Video Prompt Generator
Try asking it to add extra visual and cinematic details — it makes a noticeable difference.
⚠️ Disclaimer:
Videos generated using this model are intended for personal, educational, or experimental use only, unless you’ve completed your own legal due diligence.
This model is a merge of multiple research-grade sources, and is not guaranteed to be free of copyrighted or proprietary data.
You are solely responsible for any content you generate and how it is used.
If you choose to use outputs commercially, you assume all legal liability for copyright infringement, misuse, or violation of third-party rights.
When in doubt, consult a qualified legal advisor before monetizing or distributing any generated content.
### 🧠 More GGUF Variants
- 🖼️ [FusionX Image-to-Video (GGUF)]
- 🎥 [FusionX Text-to-Video (GGUF)]
- 🎞️ [FusionX T2V VACE GGUF (for native)]
- 👻 [FusionX Phantom GGUF (for native)]
###🧠 fp16 Versions can be found here:
-- 🖼️fp16 FusionX Models
📌gguf comparisons!
I'm slowly adding to this list, but here you can see how the models compare against the main model.
Text to video:
--------
🛠️Update 6/8/2025 - Image to video model is published! Settings that I use in the example videos: Steps = 10 / cfg = 1 / shift = 2 / schedular = dmp++_sde i'll post a WF soon.
🛠️Update 6/7/2025 - Published a i2v phantom model that can take up to 4 reference images and combine them into a video. Posting workflow soon
Phantom WF is getting uploaded soon.
🛠️Update 6/6/2025 - Added a new gguf model! If you want the highest quality and have enough VRAM get the V1.0 model otherwise gguf is the next best thing! When using the gguf's it will take longer to generate even on an RTX 5090.
Description
This is the GGUF Q3 version, optimized to just 6.51 GB — perfect for lower VRAM systems. While performance remains solid, you may notice slightly reduced quality. Check out the comparison video at the end of the description.
FAQ
Comments (45)
Thank you so much, it runs amazing and I can run it on a rtx 2060 of 6GB at low resolution, but the results are gorgerous. For a 636*360 videos of 5 seconds and 8 steps take me 7 mins. WIth the Original Wan model it takes around 15 mins. With this model how can I get better adherence to prompt?. I think rising the cfg scale helps a little, but It will take me more time to generate. Do you have some advices?. Please.
Hey there! Super glad it’s working for you! Just a heads-up — since this model has CausVid built-in, you'll need to keep the CFG set to 1, or things can get pretty wild (you’ll see what I mean if you try it 😄). What kind of prompts are you using? Have you tried my GTP linked in the description yet? Feel free to DM me on Discord if you need help — I’m at vrgamedevgirl. Happy generating!
You may be able to create larger res video's now with the gguf model! Quality does decrease a bit but not alot.
Incredible model! Do you recommend Uni_PC over the causflow specific scheduler that Kijai included in their nodes?
I tested a bunch of samplers, and so far UniPC and DPM++ SDE/Beta have worked best. UniPC gives a slightly softer look, while DPM tends to be more vibrant and contrasty. You’ll definitely want to try both and see what fits your style. As for FlowMatch, the results were pretty similar to UniPC for me — but give it a shot and see how it performs for you too!
@vrgamedevgirl I will give it a shot, thank you!
This model is the beginning of WAN fine-tuning. well done OP, hope for more progress
Aww thank you so much!! It was a fun project :)
How much vram is required for 1.0?
Of u use the gguf version I think 8. Otherwise the full version does work with 12gb at lower res.
works well with my rtx3060 12gb, res 832x480 with "WanVideo Vram Managament" node, take around 25 minutes
@Elliryk2 That is great to hear!! Was this using the gguf or the main one?
@vrgamedevgirl main ;)
Amazing simply amazing. Thanks so much for taking the time to put this together and share it with us!
I know it might seem a bit pointless in some ways, as you've explained in one of your replies, but actually a baked in version of i2v would be great too if you've got the time? To have these great lora's baked in makes things so smooth and i'm actually finding this is running smoother and quicker, with better results visually than trying to put it all together with the individual loras my end.
Great job! :D
PS. i'm managing to pull out stunning clips at 1280 x 720 16:9 on a 4060ti 16gb. I'm so impressed. (I was probably doing things my end wrong that you've got all set within the model itself here.) bravo.
@TheFunk That is so awesome! And about the I2V - I'll see what I can do :)
Great work. The best merge I have tested so far on realism content. Worked fine at 10 steps following your setting recommendations (CRUCIAL). WAN LoRA's worked flawlessly (that I tested) and character LoRA's retained their appearance. Thumbs up! I paired it in my testing with my Detailz-WAN LoRA and they meshed perfectly. Thank you for sharing.
So glad you like it and it worked for you!! :) I'm working on a Phantom version version now!
@vrgamedevgirl I haven't even tried Phantom yet. Too many models and not enough time!
Wow! Some examples are mind blowing. Tested this model with VACE and loras and everything works great. Thank you!
So glad you like it!!! :)
really? I tried to load it with Vace Native Workflow , with Control Video and Ref Image, but the result was completely different from the Ref image. The movement of the Control Video is also not properly transmitted.
@kaytransg196 I should mention that I used WanWrapper and didn't use control video but rather used single image or first frames from video i wanted to extend. And this model extends video using VACE quite well due to loras working properly unlike with causevid lora.
Has anyone tested fusion model with VACE Workflow?
I tried 3 times with VACE Native workflow, using Control video (composite of dense pose and openpose) and Ref image.
Step 8, CFG 1, unipc/simple, 81 frames length.
However the result is totally different, just like a normal T2V workflow, the information from control video and Ref image is not kept consistent.
I got it working no issues ill share wf tomorrow
@vrgamedevgirl please! I need it very much
@kaytransg196 there is a new model for this use case I'll be posting today!
I don't know about VACE specifically but when using Phantom you have to make sure the reference images are the same size or at least have the same aspect ratio as your output video otherwise they wouldn't get picked up at all. Maybe it's the same with VACE?
Kija's workflows do this already by using the ImagePad KJ node but the native workflows I've seen don't seem to do it by default.
@funscripter627 sure! it was always align all of input and output latent to be same res (in my case 480x832).
@kaytransg196 did you solve it? Let me know! Thanks!
@kaytransg196 Aight that wasn't it then. Hope you get it working!
@vrgamedevgirl finale, I found WanFusionX VACE gguf file in the link (from quanstack), and it work like a charm, even faster Skyreel v2 Vace.
81f, 480x832, 2 sampler with cfg3 and cfg1, unipc /simple, shift 2. it take 520-530s on 4060ti 16gb. I will test more.
Will there be a I2V version?
I am working on this, stay tuned. Phantom is the next best thing tho
vace不是可以直接让t2v作为图生视频使用吗
@vrgamedevgirl Thanks looking forward to it!
Honestly if you use VACE with a strength of 1, the start frame for most outputs looks exactly like the input image. If your prompt is wildly different from the ref image though it can create a conflict. Like if you start with a person sitting in a chair and say "The person is dancing" then it may discard the reference pose, but if you say "The person stands up and starts to dance" then it'll probably use the ref image as the first frame.
anyone tried running on HF ?
What do you mean?
@vrgamedevgirl running on Huggin Face via code not workflow
@kamkaaj2015843 sorry i have not tried that. How do I set that up?
is 4 ref images a hard limit, or can you add more?
I think its a limit if not im not sure how. A workaround tho would be to give it 2 images in 1 though.
Thank you for the phantom model. I couldn't get it to work reliably with Causvid myself so I used it without. I went from around 10 minutes per generation to around 90 seconds and using a higher resolution now too. Can't thank you enough!
Your very welcome!! Happy it works good for you :)