This is the conversion to GGUF of the SmoothMix Wan 2.2 v2 - https://civarchive.com/models/1995784/smooth-mix-wan-22-i2vt2v-14b
Files available also in https://huggingface.co/BigDannyPt/WAN-2.2-SmoothMix-GGUF
All were using umt5-xxl-encoder-Q8_0 GGUF and Wan 2.1 FP32 VAE and lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16
Description
Asked by a user, it really has an impact on the quality
FAQ
Comments (58)
Can you add some comparison with the model that is not GGUF?
I can try to setup a test environment for both and provide some example
thank you. gonna try this later
is there a different between this version and the one from huggingface? they have different file size
No, they are the same, if you start downloading, they show the same size
So is the gguf q8 higher quality than the nomral v2 smoothmix?
Chatgpt says gguf q8 variants for wan 2.2 is generally better quality than fp8/fp8scaled. Is this true?
fp16>Q8>fp8>Q6
it is an hard question to answer since the gguf was created fromt he fp8 version. it had to multiply the weights to fp16 and then convert to gguf
@BigDannyPt Is it possible to extracting a lora from the v2 fp8 models and then merging that lora into the fp16 model and gguf that for q8 full precision? we couldn't do this on version 1 since digitalpastel had merged the speed loras
@AnonBlah I never tried, but you are free to do it
this model give me very blurred outputs
ok cause no SpeedUp lora
FYI Remember guys, this gguf doesn't have ligthx2v loras intergrated within model, you gotta add your own 1030 high and 1022 low lightx2v loras into your workflow. Great quantization!
Yeah, I always expect for people to go to the original model to check the real description of the model.
H 1030 3.0, L 1022 1.5?
What setting for the these LoRA? There is also a 1217 low LoRA just last month.
@stormstrike52382 Yes, these are the settings in @Santodan workflow.
@R3G4L 1217 is only for TxttoVideo. As for settings I play around with them, I find that keeping low at 1.50 and high at 2.00 is consistent. The more you increase high, the greater the motion. Low improves some form of quality but I'm not entirely sure to be honest.
Does anyone have a tutorial video on how to use this ? i have rough understanding and comfy ui, but i couldnt load the gguf for example
Go to the templates from COmfyUI and get the Image-to-Video from Wan 2.2.
Change the model loader nodes for the GGUF Unet nodes.
GGUFs need to be in the UNET folder inside ComfyUI\models
just to recall that you will also need the text encoder ( I used UMT5-XXL GGUF, you will need to change the clip loader to the gguf clip loader and set it to wan ) and Wan 2.1 VAE
@Santodan can you share your workflow please?
@Beyond_Imagination I've specified in the reply what is that is needed.
1 - Get the official I2V template from ComfyUI
2 - Replace the model loader with Unet Loeader ( GGUF)
3 - Since I was using the UMT5-XXL GGUF, I also had to replace the Clip Loader with the Clip Loader (GGUF)
that's all, simple as that
@Santodan but dosent the img2vid template from comfy require tokens to use?
@puddiamz no, it is the template for Wan 2.2 image to video template.
Wan has always been local, from what I recall.
This is the official template https://docs.comfy.org/tutorials/video/wan/wan2_2
Even without using prompts like smoke or moisture, the model still tends to generate these effects very easily……
I keep getting terrible outputs, this is my third time trying smooth mix. I use Wan2.1 FusionX and it works amazing compared to this. I'm confused what I'm doing wrong. Is Q8 way better than Q6K? I've only tried Q6K. I have tried using nsfw encoder/clip vision and without, I have tried adding back the wan2.1 loras I use and adding new wan2.2 loras even though smooth mix is supposed to already be nsfw capable. Outputs are still terrible and act as if the model has no idea what NSFW is (cum looks like white sludge for some reason???) no matter what I try.
It's also far slower than Wan2.1 FusionX, which could finish in 120s for me, this finishes in 250s at best. How are people getting such amazing results? I can't seem to even come close to what other people can prompt and make. My inputs are the same as FusionX so its not bad images
@sircringealot notice that this is Wan 2.2, normal to be slower.
Not sure if you are already using a Wan 2.2 workflow, but you need both high and low models.
I try this but I want to revert to v1. Unfortunately I delete it but it seems introvable now with v2. Have anyone a link for gguf version v1?
https://huggingface.co/Bedovyy/smoothMixWan22-I2V-GGUF
What vae and clip files should I use?
If I use the vae and clip files as instructed, the image will turn into a mosaic, resulting in terrible results.
I used the ones in the description. What is the workflow that you are using?
@Santodan i have simple problem i used in the description
@2372004790954 Here is the workflow, just change the models to the one you want
https://rentry.co/hsvdwuzt
@Santodan I also have mosaic issues, no matter which clip, VAE, or model I use i ues Smooth Workflow Wan 2.2(I2V/T2V/first2last
@2372004790954 and what is taht mosaic? have you tried this workflow that I posted above?
@Santodan There's a lot of noise in my videos, and I followed your workflow, but it didn't work
@2372004790954 I think I didn't raised the steps in that one and it was the default one from the template. so, raise the steps to 6, in the high initial 0 and last 3, in low initial is 4 and last is 999
I'm using WAN 2.2 SMOOTH WORKFLOW v3.0. The version of Q4KM I downloaded is the one you provided. For CLIP, I'm using utm5-xxl-encoder-Q5KM (GGUF), and for VAE, I'm using Wan2.1-VAE-BF16. I'm not sure what the problem is. In Ksampler, the workflow stops and reports an error, telling me that the feature dimensions don't match:
KSamplerAdvanced
Given groups=1, weight of size [5120, 36, 1, 2, 2], expected input[1, 64, 21, 84, 60] to have 36 channels, but got 64 channels instead
Have you ever had such a problem? If so, how was it solved? I sought help from Gemini, but he failed to offer good advice.
Well, the problem is solved. After I replaced VAE with FP32, there were no more issues. But I don't quite understand. Gemini told me that the encoding method should not affect the number of channels. Just like Q8 and Q4, the number of channels should be the same. Logically speaking, the VAE using BF16 should run the same as FP32...
@Yaoyu92fox one thing that I learned, don't ask AI ComfyUI related questions... they always mess up something
@Santodan Yes, I completely agree. Gemini's delusion about ComfyUI is too severe. Also, I would like to ask you another question. Can 12GB of video memory be used for the Q8 model?
@Yaoyu92fox I don't think so, I'm with 16GB of VRAM and I normally go to Q6, you have to think that, normally, there is already something consuming VRAM, that being the OS or even the browser to see ComfyUI, so you will not have the full 12GB
@Santodan Got it. So it seems that Q5 might be the limit of my hardware. I might even have to continue using Q4. Thank you for the information you provided!
is umt5-xxl-encoder-Q8_0 GGUF necessarry? can I use umt5_xxl_fp8_e4m3fn_scaled.safetensors?
You can use any umt5-XXL, I only used that one to be the closest to the original one but in gguf quantizations
Hi, I'm using the SmoothMix WAN 2.2 I2V GGUF workflow but my outputs are consistently fuzzy/ghost-like (see attached image).
My Setup & Steps:
Models: SmoothMix I2V GGUF, umt5_xxl_fp8, and clip_vision_h.
LoRAs: Both Lightx2v High/Low LoRAs are enabled (Strength 3.0/1.5) and chained correctly (MODEL & CLIP).
Settings: CFG 1.0, Steps tried from 6 up to 30. Sampler: euler_ancestral / simple.
VAE: Using the dedicated wan_2.1_vae.safetensors node.
The Issue: Even with 30 steps, the video remains a blurry "ghost image" of the source. I've confirmed all nodes are enabled and the signal path is connected.
Is this a known GGUF compatibility issue, or am I missing a specific node connection? Any advice would be appreciated!
The description of the workflow is incomplete. This is Wan 2.2 and you have the high and the low, most likely you have set up the steps incorrect in the samplers.
In the high it is to start at 0 and end at half the total steps, the low is to start at half of total step plus 1 and end in total steps, but in both the steps have to be the same.
So, high ksampler:
- steps 6
- start at step 0
- end at step 3
Low ksampler
- steps 6
- Start at step 4
- End at step 999
I ran the workflow, but I only got a PNG file. There is no video in the file
I don't know what workflow you are using, but for sure it isn't this one since this is not a workflow but a gguf conversion of a model.
great
Nice work
Whenever I use smooth mix models the characters start to get stretched and sloppy. Is there something I can do to fix that?
Not sure, never saw nothing like that, but what is that you are using ( Model, vae, clip, loras, etc... )
@Santodan
Model: SmoothmixWanV20Q8.gguf
vae: Wan 2.1
clip: umt5_xxl_fp8_e4m3fn_scaled
loras: any, but DR34ML4Y is my most common one.
Lightning loras: lightx2v_12v_14b_480p
SVI: SVI_v2_PRO_wan2.2
I've tried using the fp32 VAE and clip that are recommended, but I get errors when I do. It's very odd since using other models prompts come out fine, but if I try any of the smooth mixes the characters start to stretch, the breasts grow dramatically, the faces get saturated, and the mouths open very wide. I'm normally using the triplek sampler if that's relevant?
@penah69331561 have you tried without SVI? I don't know if it could be that, I've never used SVI.
also, try with a simple workflow and then add things one by one to see if anything changes it drastically.
@Santodan I've tried it without svi, but I'll mess around with simpler workflows to see.
@penah69331561 I've never seen that behaviour, the only thing I could think about it is the lora or the resolution. I recall that landscape resolution has issues with wan
Can't for the life of me to get the NAG node to work Ksampler advanced