This model is converted to bnb nf4 by mixing Flux dev fp8 with VAE, text encoder T5xxl-fp8 and Clip-l. All included!
For those who don't care about size, also blended full dev fp16 with VAE, text encoder T5xxl-fp16 and Clip-l in NF4. All included!
From many observations, the FP8 is a bit more accurate in understanding the prompt. And having assembled this build and tested it, I came to the same conclusion.
This model is a bit more accurate than the familiar NF4 v2. I've done a lot of generation with complex prompts to assert this. And I am completely satisfied with the way this model understands me. With simple prompts I didn't notice any difference.
The model works well with lors in Forge, in which I create in. Don't forget to select in Diffusion in Low Bits: Automatic (fp16 LoRA).
In Forge I choose Euler or Flux Realistic samplers, Schedule type: Simple. CFG: 1, number of steps 20-30.
Generated outputs can be used for personal, scientific, and commercial purposes as described in the flux-1-dev-non-commercial-license
Description
This model is converted to bnb nf4 by mixing Flux dev fp16 with VAE, text encoder T5xxl-fp16 and Clip-l. All included!
FAQ
Comments (36)
hi, fantastic models! one question, based on your personal use wich model do you use ? fp8 or fp16 what is the best ?
Hi. Thanks for the review! I mostly use FP8 as it is much lighter and faster. And it understands prompt perfectly. FP16 I use occasionally. Mostly to compare results. FP8 is completely satisfactory for me)
@Abzaloff You say it understands prompts better, I'm curious, how much Vram you got? If you've got 16 like I do, then chances are, using FP16 probably barely scrapes by on the Vram, and doesn't have as much room left for computations to decipher the prompt. Even on 24 Vram this may be the case, depending on the settings you use. When I was still on my 8gb laptop, I ultimately found that I preferred using SD1.5 for generations over SDXL, cuz it had so much more Vram to use that I could bust out 8kx8k images without even upscaling.
@Abzaloff thanks a lote mate :)
@Lazman I have 16 GB of video memory and 64 GB of RAM. The model works great on both Forge and Comfy
@Abzaloff Ah, but that's why you're getting better results with FP8 over FP16, cuz your entire Vram isn't being taken up just loading the model. Someone with 24gb vram would prob get better results on FP16. Sadly, Nvidia/AMD conveniently removed SLI/Crossfire support just as AI went mainstream.. So it's more difficult for us brokies to build high performance AI rigs.
PS: 64gb system ram is totally overkill. I'm tempted to get 32 more myself when I get the chance, but frankly, it's more to complete the ARGB effect than for needing the ram itself. I don't even think I've maxed out my 32 gb yet..
Very good model. I run it with a RTX3060 GPU, 6GB VRAM, with good results
Does it work with comfyui and loras?
Hi! I don't know. I only work for Forge. I would be grateful if you would test and give me an answer.
ComfyUI yes, LORA no.
@Mescalamba I gotta know, how did you get it working in ComfyUI. I've been beatin my head against a wall with this thing. I even loaded one of the images from the OP for the workflow, but every time when trying to run it, I get..
Error(s) in loading state_dict for Flux: size mismatch for img_in.weight: copying a param with shape torch.Size([98304, 1]) from checkpoint, the shape in current model is torch.Size([3072, 64]). size mismatch for time_in.in_layer.weight: copying a param with shape torch.Size([393216, 1]) from checkpoint, the shape in current model is torch.Size([3072, 256]). size mismatch for time_in.out_layer.weight: copying a param with shape torch.Size([4718592, 1]) from checkpoint, the shape in current model is torch.Size([3072, 3072]). size mismatch for vector_in.in_layer.weight: copying a param with shape torch.Size([1179648, 1]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for vector_in.out_layer.weight: copying a param with shape torch.Size([4718592, 1]) from checkpoint, the shape in current model is torch.Size([3072, 3072]). size mismatch for guidance_in.in_layer.weight: copying a param with shape torch.Size([393216, 1]) from checkpoint, the shape in current model is torch.Size([3072, 256]). size mismatch for guidance_in.out_layer.weight: copying a param with shape torch.Size([4718592, 1]) from checkpoint, the shape in current model is torch.Size([3072, 3072]). size mismatch for txt_in.weight: copying a param with shape torch.Size([6291456, 1]) from checkpoint, the shape in current model is torch.Size([3072, 4096]). size mismatch for double_blocks.0.img_mod.lin.weight: copying a param with shape torch.Size([28311552, 1]) from checkpoint, the shape in current model is torch.Size([18432, 3072]). size mismatch for double_blocks.0.img_attn.qkv.weight: copying a param with shape torch.Size([14155776, 1]) from checkpoint, the shape in current model
@Lazman , I launched it on Comfy. Here are the workflows:
https://drive.google.com/drive/folders/1WDrdE3rJfUzjH1mJ52VhWAUfV98oKCtj?usp=drive_link
@Abzaloff Lol.. I actually got it working just before I came on here just now. I had an epiphany when I woke from sleep, to try the NF4 loader node (that I only found out about last night after writing the post you're responding to) in the normal sdxl workflow, and viola, it worked. Just gotta set the CFG to 1. at 8CFG, the image is blurry. IDK why the info on how to use these things is so obscured. I only stumbled onto the NF4 loader node by chance when running searches in the comfyui manager for other things. I don't think I've seen one person mention it otherwise..
But using the NF4 loader node in the flux workflow does result in an error. Not as long as the one above, but not a success either.
I'm confused.... how do you use CFG 1 and negative prompting?
If using destiled CFG, negative prompt is disabled, if using normal CFG at 1, negative prompting is disabled, so, what's the catch?
This is how the Flux model works. You can increase the CFG, but you will also have to increase the number of steps significantly. In most cases, negative prompt is not needed in this model
@Abzaloff I know how it works, i simply ask how u use negative prompting with that settings, on your images? Is it simply leftover from trying old prompts with new model, or?
Tnx!
@DD_Ai_art, I see your point) I have negative cues enabled by default, or get there when using styles. But they are not taken into account during generation, they are just included in the output
I usually use 1.1
Hello I use comfyui the nf4 does not work with lora could you upload use normal fp8 version?
A very nice Flux model for making gorgeous artwork! Great job! Thanks for the model!
Thanks, bro!))
A good model that can both realistic pictures and fantasy pictures to create, thank you very much
Thank you for your good feedback!!!
Small but noticeable improvement on standard NF4
Hi, what model work with it as a refiner?
Precise Dev 8 and 16 are so good! Thanks!
Flux "Realistic" Sampler? Where would I find that..?
This sampler was in Forge
The model is great! The detailing is very accurate, it is a pleasure to work with such a model!
any ideas about how to use this model in swarmui? it keeps failing there. no problems in forge.
Wow, the performance particularly of the "fp16" is amazing. This is why every fine tune should have at least one nf4 port, it simply is much better than any gguf or other lower quants.
And this is not about raw GPU power. I have a 4070 TI Super, it has excellent rendering speed, but is limited to 16gb vram. When I use a full fp16 or even a fp8 gguf the speed gets absolutely destroyed by the memory management. It takes ages just to reload the model on small adjustments. But this one just works without the absurd offloading issues, and the quality is totally fine, if not excellent if the memory savings are considered.
I am not sure what fp8 or f16 in this particular case means as I can't see much difference in memory usage. I would guess this is the source, not the output?
Yes, these are the incoming models that were used for conversion to NF4.
Details
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.







