Flux.1-Dev Hyper NF4:
Source: https://huggingface.co/ZhenyaYang/flux_1_dev_hyper_8steps_nf4/tree/main from ZhenyaYang converted Hyper-SD to NF4 (8 steps)
Flux.1-Dev BNB NF4 (v1 & v2):
Source: https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/tree/main from lllyasviel
Flux.1-Schnell BNB NF4:
Source: https://huggingface.co/silveroxides/flux1-nf4-weights/tree/main from silveroxides
ComfyUI: https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/981
💪Train your own model: https://runpod.io?ref=gased9mt
🍺 Join my discord: https://discord.com/invite/pAz4Bt3rqb
Description
V2 is 0.5 GB larger than the previous version, since the chunk 64 norm is now stored in full precision float32, making it much more precise than the previous version. Also, since V2 does not have second compression stage, it now has less computation overhead for on-the-fly decompression, making the inference a bit faster!
FAQ
Comments (132)
Update:
Always use V2 by default.
V2 is quantized in a better way to turn off the second stage of double quant.
V2 is 0.5 GB larger than the previous version, since the chunk 64 norm is now stored in full precision float32, making it much more precise than the previous version. Also, since V2 does not have second compression stage, it now has less computation overhead for on-the-fly decompression, making the inference a bit faster.
The only drawback of V2 is being 0.5 GB larger.
Source: huggingface.co
Can we have shnell version 2?
@raidmachine132017712 I saw someone request it and it may come soon and sure it will be here too
How do I run this on SwarmUI ? I keep getting backend errors.
Swarm should have asked you if you wanted to install the BNB/F4 code when you first tried to run the model. If it did not, try restarting and generating again to see if it pops up. If you did do the install and it's still generating errors, make sure you aren't using a LoRA and try again. Some LoRAs are erroring out in NF4 models. Most notably, anything specifically converted to run with ComfyUI will drop an error (whereas the original, unconverted versions seem to be running normally.)
I keep getting errors too. I updated SwarmUI and installed the additional items and I'm still seeing "All available backends failed to load the model." when I try to generate
v2 is running a lot slower than v1 on my 3070ti in Comfy - getting 160s on v2 compared to 70s on v1. Am I missing something?
nope...you are not. it consumes more vram/ram, equals slower performance on most of cards
on my 4070 the generation time is nearly identical, v2 may even be a tad faster
You're missing VRAM, I guess
I think we can inherit something like GGUF terminology.
You probably don't have enough VRAM. On my RTX 3060 (12 GB) the speed is about the same as v1. When you run out of VRAM it switches to RAM instead, which makes it alot slower. You should probably stick to v1 instead of v2, or find a way to free up VRAM (I think that between v1 and v2 it's only about 500 mb difference in VRAM).
Also I think that Forge might use a bit less VRAM compared to ComfyUI so you could try that.
I thought so too, but that was my mistake, on my 4070 12GB was even a few seconds faster with the same prompt and seed.
Regular flux.1 dev works on my comfyui. Downloading and swapping in node "Load diffusion model" yields errors:
Error(s) in loading state_dict for Flux: size mismatch for img_in.weight: copying a param with shape torch.Size([98304, 1]) from checkpoint, the shape in current model is torch.Size([3072, 64]).( lots of similar lines...)
File "D:\work\ai\ComfyUI\execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "D:\work\ai\ComfyUI\execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "D:\work\ai\ComfyUI\execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "D:\work\ai\ComfyUI\nodes.py", line 863, in load_unet model = comfy.sd.load_diffusion_model(unet_path, model_options=model_options) File "D:\work\ai\ComfyUI\comfy\sd.py", line 648, in load_diffusion_model model = load_diffusion_model_state_dict(sd, model_options=model_options) File "D:\work\ai\ComfyUI\comfy\sd.py", line 639, in load_diffusion_model_state_dict model.load_model_weights(new_sd, "") File "D:\work\ai\ComfyUI\comfy\model_base.py", line 225, in load_model_weights m, u = self.diffusion_model.load_state_dict(to_load, strict=False) File "C:\Users\Chris\miniconda3\envs\sd\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(NF4 require "load nf4 checkpoint" node. It not work with regular load checkpoint.
Missed opportunity to put it behind a 10k buzz Early Access. /s
Do we have a Flux.1-Schnell BNB NF4 v2 model?
I compared them with a lot of prompts with a fixed seed, sometimes v1 is even better. Edit: v2 is not slower, it was a different prompt. I'll just discard v1 and stay with v2.
Hi are there v2 bnb for turbo model and are there any 4bit version for merged model?
need vae???
Nope, you don't. The vae is baked into the model.
could you please explain or include in the description the meaning of BNB NF4?
BNB stands for bitsandbytes, which is a type of quantization (like compression with some quality loss). NF4 stands for the size of the model. A "normal" model is released as a 16bit model, NF4 is mostly 4bit, with some of the most important data remaining in 16/32bit.
How people share images with the metadata from comfy, my images don't show the checkpoint name in the metadata
Dood, name it Flux NF4V2, leave the tech specs out of the name ! what you gonna soon add its training spec sheet to the filename ?
You can just rename it yourself
Sooooooooooooooo if i download it and use it inside of A1111 and generate iamges did i able to generaste any image with text effect now ??
You need Forge for it, right now. (https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/981)
Nope, no auto1111 support yet
Quick question, i anyone like to share they oppinion.
Will lower tier card with 16gbvram perform faster than upper tier with 12gb?
For example, 4060ti with 16gb vram or 4070 with 12gb?
I other applications i know speed is on 4070 side, it is more powerfull card, but in flux use, will there be greater benefit from core faster card, or more vram?
If you use nf4, 4070 will be faster since 12gb is enough to hold the model.
However, if you use fp8 it cannot fit in 12gb entirely, so 4070 will be much slower. (I use 4070 myself)
I cannot speak for whether 16gb is enough for fp8 though.
a higher side of vram helps in flux more,
@snap2887 Thank you!
We need Foocus Flux! illyasviel!
Might be already started working on it........... :)
badly want flux for fooocus 😫
update already released guys . Forge Working great on every Flux Models + Loras and Upscaler.
@EKKIVOK That's true, it works in Forge without any problems. But Fooocus is so much sleeker when it comes to the overall application feel. I use Fooocus only because it's interface is so much less cluttered compared to the alternatives.
Upvote! Upvote!!
Do I need a VAE for this? If so, which one? I tried to download ae.safesensor and gave me 404 error
No, you don´t need anything, just the model and go!
@RalFinger What app do you reccomend? I have A1111, Web UI Forge, Comfy UI and Fooocus
@jaydoubleu_ I am running it with Forge (always update!)
@RalFinger Thank you for the info. I'll try it in a few.
@RalFinger It works... but wow. So slow! Time taken: 1 min. 42.5 sec. for one photo. :(
@jaydoubleu_ with Forge now it only loads the model at the initial generation. However subsequent images will come out faster once it is loaded.
You need the Model and a Stable Diffusor Plattform like Forge that support Flux Diffusion, its a whole other category.
It's a good model, but can it generate a female face without a square jaw and dimple on chin? I want to see a round face with smooth chin. I tried prompts like "round face", "witout dimple on chin" and a bunch of similar ones, but without success.
Also, looks like this model is incapable of generating a female with small breast. Every picture of a female was with a breast from medium to huge size. My wife has small breast, I wanted to generate a female picture with a body as close as possible to hers, but I couldn't. Looks like Pony model is still the best option for me, at least for now, despite Pony model is not as accurate at prompt coherence as Flux model.
Yeah the Cleft Chin strong jaw line does seem to be a bit of an issue with the main FLUX models. Hopefully as the community finetunes it and adds in more image sets tit will be ironed out, however I'm kind of surprised that Black Forest Labs weren't aware of the bias it before releasing the model.
@Mirabilis Yeah, it's a little bit strange. I thought model should be truly unbiased in any aspect, no matter what is this aspect - face shape, breast size or anything else.
Well, let's see how it goes. Anyway, the progress is inevitable.
Also the faces always seem to have some weird coloring in the eye corners, or is that just me?
@JayNL You're not alone, I haven't noticed that until you mentioned, inner eye corners really have the same weird pale color in every female face. Looks like the image set the flux was trained on is flawed.
Came here to comment this. I've tried prompting different facial structures (e.g.: thin lips, round smooth chin, etc), given them names (e.g.: Anna, Jessica, etc. also actresses names), age (18yo, mid 30s, etc), ethnicities (Scandinavian, Latina), long/detailed vs short/keyworded prompts, but they all result in women having the same bone structure. It seems men also suffer from the "samefaceitis syndrome".
Perhaps it's a side effect of quantization (tested with Dev FN4) rather than a strong bias? It's quite inconvenient to load another model just to workaround it with inpainting/detailer.
@Nexdoor looks like some 'square-jaw-lover' forgot to add other female face types to a dataset. I doubt it's a side effect of quantization.
@muggzzzzz try to use some loras of celebrities with low weight eg 0.5 ;)
@sevenof9247 'The FLUX is the best model ever, but you should use Loras to get what you really want'. Does it sound funny? I'm pretty sure it does LOL
I think Flux is over-hyped, at least for now.
@muggzzzzz Which lora is better to use on women's faces? Nothing is gained. Women in flux are equally ugly, the names of the actors, styles do not apply to them.
@kiryanton930 I stopped using Flux because of all of the issues you mentioned, and went back to Pony based models.
Ok, today i noticed something strange when using schnell model on Forge.
Before generation starts, there is a message in cmd window that says "Distilled CFG Scale will be ignored for Schnell".
Funny because, changing DCFG have impact on picture, so, what is this about? If flux don't use negative prompting or CFG, and it ignored they own DCFG....based on what is he creating images?
i get all of reactions, thank you!
Only this fellow who pressed thubs down. With exactly what you dont agree? with random observation expressed as is?
this model doesn't support lora?
It doesn't seem to be working. I tried using different LoRA files for Flux, but they don't seem to be compatible. There might be something that needs tweaking.
@Kiratinix current loras are meant for the original flux1.
@bananaisgod Flux D or Flux S? or original Flux base model
@Kiratinix I've tried different checkpoints seems it worked really well try mklanfluxdev Mklan-Flux-Dev-V1-FP8 CLip Vae Included - Mklan-FluxV1 | Stable Diffusion Checkpoint | Civitai
@pazuzu666 I tried it with Flux D, and it works with the LoRAs, but the strange thing is that some LoRA files work and others don't. Maybe it's something to do with the sampler method.
@pazuzu666 Well, this is not an NF4 model anymore. this means that it will work much slower on weak video cards
@Kiratinix what lora you were trying to use?
Noticed that when installing ComfyUI_bitsandbytes_NF4 [EXPERIMENTAL] via manager, there are alert that NF4 doesnt support LoRa
the LORA doesn't support this checkpoint is more likely.
Lora works in this model nicely. perfect result.
@AIStudio80 the don't work for me I'm using forge
I tried v1,v2,qq, on comfyui, DEV, 8gb Vram 4060Ti/32 gb ram 1024x1024 got 40 sec/it /// UPDATE Forge 3.37sec/it DEV
i get a steady 1 second maybe 2 do to internet lag near flawless. with flux.1-schnell only 4 steps
@jconn2602399 in Comfyui or Forge?
I don't know how on other shells, but in FORGE 2.0, LoRa made in the regular DEV version can now be applied to this model, at least now it has started to work, but before it gave an error. But there is a small nuance... apparently, because LoRa is made in the DEV version, generation is slow, not at the speed of the NF4 model, but exactly like on a regular DEV.
Seems that flux1-dev-bnb-nf4-v2 which i have in Forge through Pinokio, it doesnt work with Flux LoRas yet, i tried with chloe grace moretz lora and it just didnt generated anything and got stuck at nothing.
@magmirant0011 it means that something is not being updated. the person who is currently engaged in FORGE updates various things there several times a day
--disable-cuda-malloc (not sure syntax for turbo, but guessing it is the same problem) is the only way I'm working with Lora's in ComfyUI.
Every single female looks like Ana De Armas. She's good looking right... but not every woman looks like Ana de Armas. (seems to be the case with most models..)
Who is she? Link?
I noticed that when you use lora for first time it has to convert, it can take astronomical amount of time, for me it took few resets of ui itself for it to finally start cooking up.
in webforge-ui
set GPU Weights (MB) to "7000"
@sevenof9247 i have it at default at 23000, is it wrong ?
I read that NF4 will probably be deprecated because of GGUF that gives better results, so if you're new to this, don't bother. I also have never looked back to NF4 since GGUF.
So GUFF is better than NF4? can you tell me a GUFF model to try :)
@Frankcis RalFinger posted almost all of them, from Q2 to Q8
NF4 is still the fastest quant and gives comparable results to Q4_0. But Q4_K_S is already noticibly better
@Nelathan yeah they're not my words, I wanted to use NF4 again so I had to download bits and bytes and on the GitHub it said so, so I thought well then let's just go for GGUF, link here
https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4
question...
it give a real trigger for NON bokeh ???
all i tryed dont work (16mm, iphone, nonprofessional etc)
Try describing the background before the foreground/subject. If that fails, add more useless details like "the sky is blue". No idea why that works, but it does. The same flaw exists in every version, afaik.
@randomlettershdhtyxod
yes i wrote an article ;)
DEV 2.90 it/sec ForgeUI 8gb RTX 4060Ti/32 gb ram image --> https://ibb.co/7SLsnvm 20 steps one step take 2.9 sec
That's 2.9s/it not 2.9it/s. It's completely different,and one is way faster than the other, 28s/it does not equal 28it/s on my gtx 1060, because that would mean i would be able to generate 28 steps in 1 second, and not take 28 seconds to generate 1 step.
@Gemini__ one step 2.9 sec
i feel like we have a similar setup but i havent tried 1 step. 30 steps take a little over 2 mins per image for me
Version 2 is really very good on 8gb gpu
Please always share the recommended setting for the model! Sampler, Scheduler, Steps!?
We can't even start using a model without that basic info.
I am also having many out-of-focus and blurry images!
you share no images no prompts so no one can help you ...
Use Euler Simple 20 steps
You did not follow Kanto yet?
DEV 2.90 it/sec ForgeUI 8gb RTX 4060Ti/32 gb ram image --> https://ibb.co/7SLsnvm
@sevenof9247 It's customary to provide information about your checkpoint and how to generate on it in the description for the release, rather than the other way around. Without doing this, one could be considered lazy or showing a lack of care in the things they release.
Again, the question is "What are the generation settings?", and replying with "you share no images" is not a reply to that question.
This is the best out here for someone like me with 8gb Vram. realism is top. Running in Forge setting without extra clip/T5. Great work. THX
Why doesn't Lora work on comfyui?
I tried the nf4 model on Lora
for xlabs
But I get an error after queue
Is it possible to use T5 with this? Or does it already do so? I'm using this in forge and whilst it runs without turning on clip, T5, etc I can't seem to get any T5XXL etc to work. The fact it works without one makes me assume it already has a text encoder in it or something?
Fairly sure that NF4 from Illyasavel has the T5, Clip_L and AE already included, hence the larger size compared to, say, Q4.
See here: https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4 :
Main model in bnb-nf4 (v1 with chunk 64 norm in nf4, v2 with chunk 64 norm in float32)
T5xxl in fp8e4m3fn
CLIP-L in fp16
VAE in bf16
I test both models, dev and schnell and the last took 20s more than first 🤔 is possible? Tested in Forge with rtx3060m 6gb.
I'm at a loss here, I have tried the model downloaded from here and from the forge repo and I can't get the resources/models baked into the png so my images are essentially invisible because people sort for Flux obviously. HOW can I make it work?
You can try comfyUI,Some picture uploaded here has nodes and prompt info,just downdoad one and drag it to the comfyui web page,you can get all you need,install the missing nodes by click the buttons on the comfyui mangager and check the model files that configed on the workflow, then you can make it work.
upload them to the model you used and they will be tagged with it
@dc802 Ah that's actually brilliant, I'll give that a try thanks!
@Unicore I didn't mention, but I already tried comfy and it wouldn't show either x) thanks though!
@Jhaku looks like it worked
@dc802 It has, thanks
Great work on this, it was a game changer for a ton of things.
It's hard, to get something out of this model. Mostly black or dark and unsharp.
The speed is horribly slow (15...30 s/it) compared to XL (1.5 to 3 it/s)
Used some of the workflows out of the images below, but not with good results, so I wonder, how they did it ...
Today, a few months later, I'm using it often in ComfyUI. Compared to Forge it's much faster.
Important is: I can use almost any sampler.
The speed is about 4...5 it/s for 1Mpixel. 3 times faster than Forge on my system.
Flux was trained on 12 Billion data points. SDXL was 3 Billion if memory serves. It's a larger data model so it requires more resources and is naturally slower on the same hardware, but as far as results, SDXL can not come close to the stuff I generate. I've all but left SDXL behind at this point. I run Forge SD on a 3090 Ti and it generates 3-4 images in about 1m 20s.
@cainezen Thanks to compare. On my system with a 3090 Ti 8GB it takes 5 min per image with this model. (24 cores P7, 128GB RAM)
If I use the regular full model, which is much larger, it takes 1.5 min per picture, or 8 s/it.
On this model the pictures are really dark, unclear and unsharp, means nothig is really recognizable (5 min to render a black picture ...). On the regular full model, everything is really clear and crispy.
I prefer ComfyUI to render, because all the FLUX outputs are black on my Forge installation. The speed is the same in ComfyUI but with good pictures on the regular models (FP8, Schnell and Full).
I found my problems in Forge: The designation of the sampler in the generated images is often "DPM++ 2M SDE", which is wrong. FLUX only works with "Euler", "Simple". Also the VAE "ae.safetensors" has to be specified.
The image data contains also a workflow, generated by Forge, which is mostly not working!
@dschonich Euler BETA is better.
@dschonich That is a good point. It's very important to follow the configuration guidance for Flux on the Forge SD Github page. I set mine up that way from the beginning. Also, I have 64GB of system RAM so I use "Shared" as my main method instead of CPU and it cuts about 15 seconds off generation time.
@dschonich sorry for responding so late but thanks for this, solved my issue I was having with it
is it me...?
today this model seems to be running faster - 3.19s/it. now an image only takes 1:03 mins for 896x1152 on rtx 3060 12GB vram. did something get updated in forge last night?
Details
Files
flux1DevHyperNF4Flux1DevBNB_flux1DevBNBNF4V2.safetensors
Mirrors
pilgrimFlux_pilgrimFluxV10.safetensors
flux1DevHyperNF4Flux1DevBNB_flux1DevBNBNF4V2.safetensors
jedpoint_v10.safetensors
flux1-Dev-NF4V2.safetensors
pilgrimFlux_pilgrimFluxV20.safetensors
flux1-Dev-NF4V2.safetensors
flux1-dev-bnb-nf4-v2.safetensors
flux1-dev-bnb-nf4-v2.safetensors
flux1-Dev-NF4V2.safetensors
pixelforge-v2.safetensors
flux1-dev-bnb-nf4-v2.safetensors
flux1-dev-bnb-nf4-v2.safetensors
flux1-dev-bnb-nf4-v2.safetensors
flux1-dev-bnb-nf4-v2.safetensors
flux1-dev-bnb-nf4-v2.safetensors
flux1-dev-bnb-nf4-v2.safetensors
flux1DevHyperNF4Flux1DevBNB_flux1DevBNBNF4V2.safetensors
flux1-dev-bnb-nf4-v2.safetensors
flux1-dev-bnb-nf4-v2.safetensors
Available On (2 platforms)
Same model published on other platforms. May have additional downloads or version variants.
