This is an SVDQuant int4 conversion of my CreArt-Ultimate Hyper Flux.1_Dev model for Nunchaku.
It was converted with Deepcompressor at Runpod using an A40.
It increases rendering speed by 3x.
You can use it with 10 steps without having to use Lora Turbo.
But 12 steps and turbo lora with strenght 0.2 give best result.
Work only on comfyui with the Nunchaku nodes
This int4 version does not work with rtx 5000
Description
SVDQuant int4 version of CreArt_Ultimate
FAQ
Comments (35)
Amazing model. Thank you thank you thank you
You are welcome
I'm so happy this format is starting to gain traction!
Me too, it's worth it
DITTOOOOOO way better than guff and nf4
@pychobj2001741 Yes, more speed !
Can I ask a question?How did you solve the metadata saving problem?I'm using the workflow in your sample image and I'm stuck at the final save, too many things are missing, ckpt_name, ckpt_hash, sample scheduling, etc.thanks!
For runpod, do you have your workflow on how you used DeepCompressor? I would like to try to use Google Colab but have a feeling it might not work.
i also think that it won't work, because the conversion, with the configuration that offers the least quality, takes at least 20 hours. Otherwise it's simply an Ubuntu template. Just clone the deepcompressor github, install poetry and use poetry for install requirement. But, the first thing to do is to transform the checkpoint into a sharded diffuser version(for hugging face). Look at the github of diffuser the script named flux_to_diffuser.py
@jice for the beginning checkpoint i'm guessing always try to shard a fp16 or bf16 to begin with and not a fp8 or fp32 into the diffuser version?
@pychobj2001741 it works with a fp8, it will be automatically converted to fp16, fp32 not tested
example:
python convert_flux_to_diffusers.py \
--checkpoint_path name-of-your-checkpoint.safetensors \
--output_path my-checkpoint_diffusers \
--transformer \
--dtype bf16
Which sampler + scheduler is recommended for realism
I use euler/beta
Thank you so much for making this conversion, using svdquant makes flux practically as fast as SDXL, you are awesome!
You are welcome
Thanks for the Nunchaku version, I know it's a pretty hard work.
Yes, it's not easy, impossible to do locally, it takes a lot of time.
Can you point us towards the "Turbo" LoRA mentioned in the description, I'd like to give it a shot but there are multiple results when I searched, thanks!
DeprecationWarning: verify_ssl is deprecated, use ssl=False instead
When I invoke it on the 5070ti, this error will occur.
I don't have a 5000 series, I can't reproduce your mistake, what I think is that for the 5000 you need fp4 and not int4
@jice OK. Thank you for your reply.
@516142474832 You can try the Fp4 version from here:
https://huggingface.co/mit-han-lab/svdq-fp4-flux.1-dev/tree/main
I'm getting the following error when attempting to generate an image:
E:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\nunchaku\utils.py:91: UserWarning: The model may be quantized to int4, but you are loading it with fp4 precision.
warnings.warn("The model may be quantized to int4, but you are loading it with fp4 precision.")
[2025-05-22 08:41:46.352] [info] Initializing QuantizedFluxModel on device 0
[2025-05-22 08:41:46.392] [info] Loading weights from E:\AI\ComfyUI_windows_portable\ComfyUI\models\diffusion_models\svdq-int4-CreArt_Ultimate\transformer_blocks.safetensors
Assertion failed: this->shape.dataExtent == other.shape.dataExtent, file C:\Users\muyang\Desktop\nunchaku-dev\src\Tensor.h, line 372
I'm not sure what to do as I haven't found this on the nunchaku github page and ChatGPT is recommending that I start editing the python files which obviously shouldn't be necessary. I've tried using both jice's workflows from images they posted as well as another person and the result is the same. The main thing I had to do in order to get nunchaku to load was manually pip install the precompiled wheel matching my python and pytorch versions since otherwise the nodes wouldn't load.
Any ideas?
4000 or 5000 serie? this model dont work with rtx 5000, on windows, triton must be properly installed, install cuda-toolkit 12.8, MS visual build tool 2022, and set the correct path system.
@jice Oh wow, I did not know that. That makes sense then. I have a 5090.
@_degenerativeai_ For 5000, fp4 version only work, int4 not working, You can try the Fp4 version from here:
https://huggingface.co/mit-han-lab/svdq-fp4-flux.1-dev/tree/main but is the original flux dev version, not CreArt_Ultimate
@jice That worked, thanks. I wish it was your version, but I understand that the work wasn't cheap and you want to reach the most people with your efforts.
@_degenerativeai_ I will soon try to make an fp4 version of CreArt_Ultimate, but I don't know if I will succeed, maybe to make an fp4 version you need a 5000
Do I understand correctly that it can't be used with other regular lora even if the workflow converts these for svdquant normally? Also if the ksampler config has refiner steps is there any opinion on the settings or should a refiner not be used? Thanks.
It can be used with any lora, i dont use refiner/upscaler, i don't know how to tell you about the parameters to use















