Flux.1-Dev Hyper NF4:
Source: https://huggingface.co/ZhenyaYang/flux_1_dev_hyper_8steps_nf4/tree/main from ZhenyaYang converted Hyper-SD to NF4 (8 steps)
Flux.1-Dev BNB NF4 (v1 & v2):
Source: https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/tree/main from lllyasviel
Flux.1-Schnell BNB NF4:
Source: https://huggingface.co/silveroxides/flux1-nf4-weights/tree/main from silveroxides
ComfyUI: https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/981
💪Train your own model: https://runpod.io?ref=gased9mt
🍺 Join my discord: https://discord.com/invite/pAz4Bt3rqb
Description
FAQ
Comments (93)
for me works 12 steps better on
sampler -> [Forge] Flux Realistic
whats the difference between Hyper and BNB?
lower steps needed, faster, less quality
@luchetes oh. Got it now. Thank you.
which one is which?
I think the greatest difference is in rendering time Otherwise difference in definition/quality is marginal
See This: https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd.it%2F4gb-vram-using-hyper-flux1-dev-nf4-checkpoint-for-8-steps-v0-kt6wqtke2oqd1.png%3Fwidth%3D1591%26format%3Dpng%26auto%3Dwebp%26s%3D6f1b716940c916c33e853c0fdf83f7560d2b22ab
I'm currently using: flux-hyp8-Q8_0.gguf.
These could give better quality or?
No, this is for weak graphics cards
Guys, this model (`Flux.1-Dev Hyper NF4`) already has a VAE. So, you don't need to use ae.safetensors, for example. Good creations to everyone! s2
how to use in comfyui ? can't load the model
@blo01
The flux models need to be placed inside the UNET folder: /models/Unet, unlike the others, which are stored in the checkpoint and/or stable diffusion folders.
@SpawnBTC I moved it but how do you use it in comfyui?
"prompt_id": "f69ce5c2-6885-4824-900d-2006ce9a4e7b",
"node_id": "CheckpointLoader_Base",
"node_type": "CheckpointLoaderSimple",
"executed": [],
"exception_message": "Error(s) in loading state_dict for Flux:\n\tsize mismatch for img_in.weight: copying a param with shape torch.Size([98304, 1]) from checkpoint, the shape in current model is torch.Size([3072, 64]).\n\tsize mismatch for time_in.in_layer.weight: copying a param with shape torch.Size([393216, 1])
Did you find solution to this?? I'm facing the same issue
@dhillon_karan03405 Unfortunately not. The problem remains relevant
@dhillon_karan03405 @DashaLuniowa hey guys. Can you explain in more details what UI you are using, what you are trying to create? Is it just the base model giving you this error (never seen before). More context would help to see if we can figure out the error :)
@RalFinger Solved. The problem was that in my Stability Matrix shell, this error appeared due to the fact that the standard loader was used. The NF4 loader was needed, which solved it.
For others who do not have a choice of loader, you need to make sure that the model is in the correct folder (it will help some, I hope)
UPD: Unlike FluxDev, NF4 model it is only partially loaded. And then there is no start generating It seems 6 GB of video memory is not enough for him, which is strange
@DashaLuniowa thank you for the clarification and the update!
I really wanted to like it but I'm getting an extreme level of sameface even when prompting heavily for differences.
Welcome to Flux
Use a lora.
@5c0f4n0 What lora have you found that helps with the issue?
Is already have lora stack for Hyper-NF4 in ComfyUI?
this goes in checkpoint folder or Unet folder ?
Flux Models (comfyui) go into models/unet.
it would be pretty cool if you could add a line or two in the description of the models (nf4, l8a, gguf) where they should be saved and whether everything has already been baked in. i've just added back NF4 support to my workflow and it was a bit confusing tbh :P
anyway, thanks for your work <3
Good idea, I can rework that!
@RalFinger that would be awesome, thanks a lot! 👌
Someone help me, I want to use flux, I have a 12 gigabyte 3060 and an AMD Ryzen 5 pro 4650 g processor, it is possible to move flux with that and if possible, I need you to go, thanks for the help, I'll start in this world.
I use Ryzen 5 3600, gtx3060. it work well. hyper version is working with 4GB vRAM
@so_ha_ What else do I need to download friend, what model are there configurations, can you send me that data friend :D
@knfel You must download clip_l.safetensors and t5xxl_fp16.safetensors and put them in models/clip folder, and ae.safetensores and put this in models/vae folder. You can't run flux in Automatic1111, so you should use ComfyUI or WebUI Forge. In ComfyUI in workflow, you should se noodes Dual Clip Loader, for clip_l.safetensors and t5xxl_fp16.safetensors and VAE node for ae.safetensors. In WebUI Forge, on the top you should see VAE/Text Encoder option, you need to choose all 3 of them in that option, and on right side you should see option Diffusion in low bits, where you choose option Automatic (Fp16 LoRA), otherwise loras wont work. You should put Euler as a sampler and Simple as a sheduler (you can use differenet samplers too, also shedulers, but pay attention that most of them will result as a noise only. Sampling steps is 20 (or 8 with this model), Distilled CFG scale should be 3.5, CFG scale 1. Also you need to reduce GPU Weights option if you have problem with memory.
@tenej thank you very much friend for the detailed explanation thank you very much
@tenej amazing explanation, thank you so much!
HI . i found this https://civitai.com/models/768836/pilgrimflux , has the same hash as your model X-flux1-dev-bnb-nf4-v2
I assume this does not support Flux-dev LoRAs?
I get a ".to() does not accept copy argument" error when trying to load Flux LoRA with the KSampler.
SD LoRA go in the Sampler without an error, but -not very surprising- seem not to have an effect on the result.
Am I doing something wrong? None of the example images seem to feature a detailer or character LoRA for me to try.
I use ComfyUI with silveroxides UNET loader.
Thanks in advance! Great model btw!
Trying to figure out the same thing right now lols :l
has anyone figured this error out, i'd love to use it in SwarmUI without having to worry about wrangling Comfy Node workflows, but it not being able to use loras is a dealbreaker
what's the difference between hyper and the previous version bnbf4 (I'm new to all this)
+1
I love this checkpoint. It allows me to run flux on my 1080ti without much problem. At 10 steps it generates amazing images and the speed is manageable.
Hi - I love the images you create here and wondered if you wouldn't mind answering some questions to help a noob? I've been reading lots of articles and watching videos to figure this out. I use ComfyUI and have things almost working - but somethings are different, for example this page: https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4 says the Loader should be named 'CheckpointLoaderNF4' however, I only see 'Load NF4 Flux Checkpoint' in ComfyUI. I know, probably a noob question, but there are other small differences as well - for example, I'm trying to base my workflow off of the Flux examples here: https://github.com/comfyanonymous/ComfyUI_examples/tree/master/flux - but I don't see an option for 'Guidance' which is something that you specify in the images you've loaded. It also isn't totally clear if the whole thing is now completely deprecated based on the github link above saying that this should move GGUF - If you have any suggestions - thank you in advance~!
@apprewired No idea. I use Forge (based on automatic1111
So what is the difference with these models and the other flux checkpoints? I see so many and dont know which to use. Do I have to use this one since I have 12gb vram, or can I get away with using the normal basic fp8 model? Or will the outputs be the same anyway. Im overwhelmed with so many flux checkpoints.
Me too, I don't understand anything
These Flux.1 model variants utilize different quantization techniques, such as FP16 and NF4, to optimize performance, catering to various deployment scenarios and hardware limitations.
FP16 refers to 16-bit floating-point precision, which reduces the memory footprint and computational requirements compared to the standard 32-bit floating-point (FP32) precision. This reduction is beneficial for deploying models on hardware with limited resources. In most scenarios, FP16 precision leads to faster rendering times compared to FP32 (32-bit precision) because it reduces memory usage and computational load. This is particularly true on GPUs optimized for FP16 arithmetic, such as modern NVIDIA GPUs with Tensor Cores. The trade-off is that reducing precision can lead to minor quality loss in the model's calculations. However, this precision loss is negligible for many use cases in image generation, and the speed gain is worth it.
NF4 stands for 4-bit NormalFloat quantization, a technique that further compresses model parameters to 4 bits. This compression significantly decreases model size and enhances inference speed, making it advantageous for deployment on devices with constrained memory and processing capabilities. NF4 quantization is more aggressive than FP16, reducing weights to 4-bit representations. This further reduces memory requirements and increases inference speed by minimizing the data that the model needs to handle. The downside is that compressing to 4 bits can potentially lead to loss of detail or lower accuracy in some model outputs. You might notice that results aren't as sharp or precise for highly detailed or complex image generation compared to higher-precision weights. It is generally used when speed and efficiency are more critical than having the highest possible quality. This can be particularly useful for rapid prototyping or running models on hardware with minimal resources (e.g., older GPUs).
Checkpoint-trained models are pre-trained models saved at specific points during the training process. They serve as starting points for further fine-tuning or can be used directly for inference tasks. These models are not base models; instead, they are derivatives fine-tuned for particular tasks or optimized for specific hardware configurations.
I hope this helps!
@Hyokkuda What is the difference between Dev and Schnell?
What is Hyper?
What is BNB and GGUF?
Your model is the only one that fits into my RTX3060 video card with 12 GB of VRAM (and there is still plenty of space left for lore). Generation of 8 steps takes 40-60 sec.
The original models do not fit into the memory and this is why generation takes up to 7 minutes.
In all models, the number of steps is not less than 20, otherwise the quality is greatly reduced (although you wrote 8). And only with the FLUX.1-Turbo-Alpha lore I was able to reduce to 8 steps.
Your models are not cut and all work. I do not understand why other people cut them, while they begin to weigh attractively up to 10 GB, but in fact the models do not work without additional CLIP, T5, VAE (by the way, what is this for?). That is, after adding all these cut volumetric parts, the model again weighs a lot.
I generated the last images I posted with your Flux.1-Dev Hyper NF4 checkpoint, but the site loses it when adding an image and there is no way to change it.
@kiryanton930, I want to apologize in advance for that wall of text.
Also, I am unfortunately not the creator of this base model.
As for the Dev versions of models, they are typically aimed at ongoing development, which means they may include new experimental features, hyperparameter settings, optimizations, or weights that haven't been fully tested or tuned. They might also allow the generation to be more flexible but could be less stable.
Schnell is German (no idea why...) which stands for "fast." The Schnell versions of models are likely optimized for speed and efficiency, possibly with reduced computational requirements or cut-down architectures that aim for quicker inferences. These models might achieve faster generation times, but this sometimes comes at the cost of slightly lower quality or a reduction in certain details.
Hyper refers to a special configuration of the model that's using what might be called a Hypernetwork. Hypernetworks are a kind of add-on to the main model that enables it to generate more nuanced or specific details while keeping the core model architecture intact. They’re useful for capturing certain styles or types of data without retraining the main model entirely.
BNB is an abbreviation of BitsAndBytes. It’s a library used for optimizing GPU memory usage when loading models. It can help to quantize the models to smaller bit representations, such as 8-bit or even 4-bit, reducing VRAM requirements while retaining as much quality as possible.
GGUF likely referring to a quantization format or specific optimization method, but it's possible that GGUF is a variant or extension of a particular file format used for optimizing model size and VRAM utilization. It could be specific to a library or a technique that further reduces the memory footprint by compressing weights.
When a model is "cut," it’s essentially stripped down to reduce the size, often by removing auxiliary components or dependencies like CLIP, T5, or VAE.
CLIP is used to better understand or guide what should be generated based on input text. If missing, the model might struggle to correctly interpret prompts. I could better explain what CLIP is for you and how it works if you want, but that's gonna be a long topic.
T5 is a text-to-text transfer transformer used in some models for handling more advanced text understanding or prompt generation. Without it, models can lose context understanding.
VAE is used to encode and decode the latent space representation into images, which helps in final output quality. If missing, the quality can suffer significantly.
When models are trimmed to reduce size, the aforementioned components are often removed, resulting in a smaller file, but users then need to provide these components separately to use the model properly, which can be frustrating because it negates the space-saving benefits.
Your GPU benefits from models that have been carefully tuned to fit within the VRAM without compromising core functionality. FLUX.1-Turbo-Alpha model's efficiency allows it to generate with only 8 steps, resulting in relatively fast performance. This kind of setup benefits from optimization techniques and a more straightforward architecture, enabling lower step counts. Other models, which require 20+ steps to produce good quality results, are likely not optimized the same way. However, reducing step counts can drastically affect quality if the model isn't designed for fast convergence.
Also, I am not sure I understand your last sentence about the website losing it when adding an image? If you are having trouble generating images via Civitai website, you can always try Stable Diffusion Forge. Stable Diffusion Forge is specifically designed to enhance resource management and speed up inference, making it well-suited for GPUs like the RTX 3060. Therefore, integrating that into your workflow can provide a more efficient and streamlined experience, especially when working with hardware that has limited VRAM. If you're using an image as a reference and encounter issues, try opening it in an image editing program and save it in a different format. This can help resolve potential color profile or metadata issues that might interfere with the rendering process.
Again, I hope this answers most of your question. And I am sorry again for this large wall of text. u_u;
@Hyokkuda "Also, I am not sure I understand your last sentence about the website losing it when adding an image?"
I generate an image on my PC and add it to the site. Usually all used lore and checkpoints are displayed on the right in the description, but with flux the description of some lore and checkpoint disappears. I used to think that it was because I downloaded the checkpoint from another site, but now I downloaded the checkpoint from this site, but nothing has changed, the checkpoint is missing in the description, it's some kind of bug.
Forge is what I use
Ah, I understand now, @kiryanton930. I believe this issue may be related to Stable Diffusion Forge. When revisiting some older prompts, I often find that certain key settings are missing, which I need to determine manually—something that can be quite frustrating at times. :(
I recommend trying a different software, using the exact same prompts and settings, and then comparing the outputs using PNG Info or the FileOptimizer software to read (or edit) the metadata.
@Hyokkuda
Well, all the information is present in the metadata.
Maybe the problem is in the formatting, but I don't create the metadata, it is created automatically.
Maybe we can somehow call the authors of the site so that they can determine what the problem is.
An example of metadata in which information about the model disappeared after uploading to the site:
Steps: 8, Sampler: [Forge] Flux Realistic, Schedule type: Beta, CFG scale: 1, Distilled CFG Scale: 3.5, Seed: 502053333, Size: 896x1152, Model hash: 6e3e5990e9, Model: flux1DevHyperNF4Flux1DevBNB_flux1DevHyperNF4, Lora hashes: "flux.1_lora_flyway_Epic-detail_v2: 2DBB61AC85E1, FLUX.1-Turbo-Alpha: e5e0c5d5201b, Comic book V2: 9a710c809fdb", Discard penultimate sigma: True, Beta schedule alpha: 0.6, Beta schedule beta: 0.6, NGMS: 4, Version: f2.0.1v1.10.1-previous-545-gf5190349, Diffusion in Low Bits: bnb-nf4 (fp16 LoRA)
Sorry for the delay, @kiryanton930. Is it happening for every generated image or just one image? Or is it for every image from this model in particular? You could record a short video to upload and share your experience. Or share your picture with someone else who can try to upload it and see if the issue occurs for them, too.
@Hyokkuda For each, I created a ticket with the developers, their answer is "yes, we know it doesn't work, we can't do anything"
Oh... well, that is pretty underwhelming, @kiryanton930. :/ Sorry you have to go through that.
@Hyokkuda Can loras that was based on flux.1 D use flux.1 d hyper as base model?
To my knowledge, @funtimequest363, yes, LoRAs trained on Flux.1D can be used with Flux.1D Hyper as the base model. Since the two models are related, using Flux.1D Hyper might result in enhanced detail or better fine-tuning control compared to the original Flux.1D, depending on the modifications in the Hyper version.
seems like it actually lets me run flux. nice thanks
Very natural imitation of photos from SLR cameras. Simply amazing!
I love this model as a base for generating from flux dev base with less censorship and fast performance and fp16 lora support.
Now I want to create a character lora from this checkpoint, could you guys share your best FluxGym settings?
How can I run Flux.1-Dev Hyper NF4 in Comfyui?
You need the custom loader instead of the core one. See Cluster's comment.
If you have Manager installed or something, you can just search for "NF4", see if any are the new loaders.
Try CheckpointLoaderNF4 (bitsandbytes_NF4)
thanks guys, NF4 loader helped!
@stavros247 it is also in the description of the model, you just need to read
works perfectly for me RTX 3060ti 8gb vram, maybe I'll abandon all the SD1.5 and SDXL models I have and just stick with flux, I'm happy with the results, I have more than 190GB of models on my pc, I'm using Forge
which version is it? please?
Any tutorial how to instal and use please ?
https://www.youtube.com/watch?v=HTBCDdn_GSY - good tutorial
with the Flux.1-Dev Hyper NF4: im looking at 13.3/it @ 936 x 1856 thats with an upscalyer...does this make sense? it is quite slow.... rtx 3060 12g VC, thanks
Yes, it makes sense. Use one of the default resolutions (832x1248, 896x1152, 1024x1024). It doesn't really make sense to use a higher resolution than that. Upscale the image if you want to have it bigger, but don't generate it at such a high resolution.
Maybe the generation settings are not correct? I have 2060super 3.8s\it on my card. The picture averages 1 minute 40 seconds.
My favorite model!2060super very good results,average image generation time 1min 35sek
great work! nice checkpoint. I love it....
zluda not supported
can someone please share his/her workflow for comfyui because I cannot make this work no matter what I've tried? Thank you in advance
Hello, has anyone using Flux NF4 All in One models in Comfyui been getting errors on their workflows since last update? Error: mat1 and mat2 shapes cannot be multiplied (1x1 and 768x3072)? Is there a fix or just wait for an update?
Yes broken
Hey guys, this sounds like a Comfy or Node problem, the model is fine
@RalFinger hello, yes I ment comfyui, not the model. Had to move around some things in Forge to be able to use them with a rtx5080
Comfy suck. All they gonna do is advertise themselves to support latest and greatest models, but at its core, they're shallow and don't give a damn about foundational improvements. I was having the same issue while using bits_n_bytes loader, which is created by the devs. Not even a single person on their discord responding to my issue. If you wanna use Flux on Comfy, stick to GGUF.
I'm not able to run comfyui through colab pro with gpu l4 and A100, can someone please help me which workflow should I use?
try installing it locally by desktop version or portable one, use lower end models if your graphic card is feeble. Use SD1.5 models with baked in vae, or go for SDXL models if your graphic card can take it. SDXL is somewhat comparable to Flux schnell, use Juggernaut or Dreamshaper XL models
loras don't work for some reason
working fine for me
Hello, I'm getting error mat1 and mat2 shapes cannot be multiplied (1x1 and 768x3072) with any workflow I'm trying.
Do you know if there's a fix ?
Or is comfy nf4 broken ?
I've tried aio models and unet , nothing works
Check if you used the correct model selection "FLUX" in the clip loader
Can you tell me which is better to use in Forge?
Sampling method -- ?
Schedule type --?
thank you =)
My favorite grey square generator. Only 10+GB to draw all the variation of your favorate grey square.
Skill issue
This sort of thing happens when you have a misconfiguration somewhere. A good place to start is to load an image from here that looks good and modify it from there.
Details
Files
Available On (2 platforms)
Same model published on other platforms. May have additional downloads or version variants.
