FLUX Monte Carlo (Full FP32) - CivArchive (CivitAI Archive)

FLUX Monte Carlo (Full FP32) - UNET-GGUF-Q8

NSFW

FLUX Monte Carlo



Note: This model has undergone deterministic bias correction + stochastic regularization. 



Even this took around 2GW of power.



To train a FLUX at FP32 it would take a supercomputer and power-plant.

This model has 1 Trillion Iterations to restore FP32 precision.
Do to the complexity of the tensor shape this was done at 10,000 steps per element, with noise guidance based on the tensor.
FP32 T5 & FP32 CLIP

Description

FAQ

Comments (46)

A_Friendly_SpiderApr 20, 2025· 4 reactions

CivitAI

The hilarity to ensue if they let you run this on site.

Felldude

Author

Apr 20, 2025· 1 reaction

The commercial license does not allow them to run unofficial FLUX is my understanding

5310116Apr 20, 2025· 4 reactions

CivitAI

You're out of your mind...and I love you for it. I can't wait to try this.

Felldude

Author

Apr 20, 2025

I do not recommend forcing FP32 UNET for most cases, although it does render a slightly different image even seed to seed

5310116Apr 20, 2025· 1 reaction

@Felldude I was just messing with you. I get what you're doing here. Even if downcast or quantized, having that extra precision will make a difference.

Felldude

Author

Apr 20, 2025

@AUsername111 Athletically I think the FP8 fast was giving the best looking images, the noise makes them look less AI and processed

SencneSApr 20, 2025· 2 reactions

@Felldude This is what I've noticed as well, literally adding "Film Grain" to images makes them look so much more 'real'. The super smooth textureless skin makes it look so fake!
Like if you're trying to go with that effect it's fine, but in most cases it looks VERY fake and uncanny much more than even the slightest noise added.

SencneSApr 20, 2025· 4 reactions

CivitAI

Downloading just to say I did, and to have this baby ready for the day I can use it LOL

But I'm thinking Q8 GGUFing just to see. Since it's FP32.

Felldude

Author

Apr 20, 2025

I did a NF4 test but not Q8 or Q4 gguf

5310116Apr 20, 2025

Via comfyui I tried casting it to fp8 but it crashed (RTX 4090 and 64gb RAM 😥)

SencneSApr 20, 2025· 4 reactions

Q8 Conversion completed in 0 hour(s) 9 minute(s) 48.49 second(s).
I'll give it some tests tomorrow, headed to bed at the moment - Have a good one Felldude!

AgimaxApr 20, 2025· 1 reaction

CivitAI

Awesome model, probably my favorite now! Only issue i found is the nipples, they seem to be somewhat deformed on most creations -though easily fixed with tweaking.

Felldude

Author

Apr 20, 2025

Thank you

5310116Apr 20, 2025

CivitAI

Just wondering how some of you have gotten this to run? I have an RTX 4090 and 64gb of RAM and it always crashes on loading, even when data type is set to FP8. Any tips/tricks?

Felldude

Author

Apr 20, 2025· 2 reactions

My recommendation of 120GB virtual memory allotment was not a joke - beyond that make sure you do not have the -highvram or gpu only flags set

5310116Apr 20, 2025

@Felldude Thank you.

AgimaxApr 20, 2025· 2 reactions

@AUsername111 I have a RTX4090 laptop, 64GB ram, linux. I can run with or without FP8 though i use FP8 FAST to render quickly. It uses 32.5 gb chip ram, 31.1 gb cache ram and 11.3 gb of VRam. I use:
--listen --use-sage-attention --bf16-vae --cuda-malloc --bf16-text-enc --fast --normalvram

Felldude

Author

Apr 20, 2025· 2 reactions

@Agimax Thanks for sharing, --fp32-text-enc You, may want to use this command as my understanding is that comfy would be taking the FP32 TE and CLIP, downcasting to BF16 per your argument, then back to FP16 for use on your CPU, and depending on how your pytorch is setup it may be upcast from FP16 back to FP32

5310116Apr 20, 2025· 3 reactions

Thanks guys. Already did the virtual memory trick. Also created F16 Quant and a FP8 safetensor :) Even loaded at full strength and generated with it. The quality difference is definitely there. Thanks for making this.

Felldude

Author

Apr 20, 2025· 2 reactions

@AUsername111 Thanks, glad its working for you

0l1v1aR0551Apr 20, 2025· 1 reaction

CivitAI

I know - this is stupid, but maybe, just maybe ... GGUF?

Felldude

Author

Apr 20, 2025· 1 reaction

If the quantization shows the same improvement as the full model across hundreds of images

0l1v1aR0551Apr 20, 2025

@Felldude yes, but still - for all of us, simply to test and play around with it ;)

dpbentonApr 20, 2025

CivitAI

Is there a special FP32 clip for this? The highest I have is FP16.

Felldude

Author

Apr 21, 2025

Yeah I will link them

dpbentonApr 21, 2025· 1 reaction

@Felldude Thank you. Running on a RTX 4090 with 24GB VRAM and 124GB system RAM and have about 30 second run times for a 1024x1024 image. Not bad at all. Still looking at my results. I will post and give more feedback tomorrow.

DarkShivaApr 24, 2025· 3 reactions

CivitAI

Ty for the gguf, works perfect! A++++

Felldude

Author

Apr 24, 2025· 1 reaction

Thanks

mmdd2543Apr 24, 2025

CivitAI

Which is higher quality and closer to Flux Pro, this or the De-distilled Flux by Nyanko7? Also, does it make sense to run this as a daily model or is it more for experimentation?

Felldude

Author

Apr 24, 2025

The de-distilled would be a completely different approach and if your goal is NSFW it might work with some TE trained loras better.

As to daily use if you have the system ram to load all the models 70GB (Virtual memory can be used) then the time to render is the same. (Unless you use the full FP32 command)

mmdd2543Apr 24, 2025

I appreciate the quick reply. :) I don't need it for NSFW. I'm doing landscapes and architectural imagery with LoRAs and a good amount of img2img transformations with controlnets. I'm mostly concerned about which can bring out more detail, follows prompts better, and if it's feasible to use your Monte Carlo version for the purposes I outlined above. I do have 96GB DDR5 RAM and RTX 5090 graphics card, so I guess this should be possible? Again, I'm more concerned with quality than speed. I can wait a bit longer if there's an appreciable difference in image quality and prompt comprehension. :)

Felldude

Author

Apr 24, 2025

@mmdd2543 With that setup you could try --force-fp32, in my testing the images sometimes very drastically with the FP32 UNET however they did take 3x-4x longer - If using the FP32 FlanT5xxl that is linked the prompt adherence should by quite high. Personally I only use --fp32-text-enc and then either BF16 UNET or FP8 fast

mmdd2543Apr 24, 2025

@Felldude Got it! I will try it out when I find some time for experimentation. Thanks a bunch!! 👍

drak0nMay 31, 2025

CivitAI

I am impressed with your work on this model. At first it didn't work, it always crashed. Strangely, without making any changes to the comfyui, this model started working. Unfortunately, for some reason, this model no longer works. I started getting the error "The paging file is too small for this operation to complete. (os error 1455)". I have RTX 4090 and 64 DDR 5 on W11. It worked for a while without making any changes to comfyui or windows and suddenly it stopped working.

Felldude

Author

May 31, 2025· 1 reaction

Paging File suggest that windows is managing virtual memory and running out - I would suggest setting it to 64GB or larger

drak0nMay 31, 2025

@Felldude Thanks for your answer. It is strange that it went for a while without making changes to either comfyui or virtual memory. I currently have 28 439 MB in Virtual Memory configured.

r600Jun 30, 2025

CivitAI

@Felldude Cannot find flux_vae_fp32.safetensors anywhere on the internet or civitai. Tried original flux VAE ae.safetensors but images do not match the prompt.

Please upload

Felldude

Author

Jun 30, 2025

I renamed the ae file to what you see in the workflow - did you launch in FP32 Text encoder

r600Jun 30, 2025

@Felldude I used your workflow from the 1st image here:
https://civitai.com/images/71295200
I selected the 2 FP32 clip files you mention to download, and ae.safetensors for vae.
GGUF load checkpoint fluxMonteCarloFull_unetGGUFQ8.gguf.
Does not follow the prompt at all

Felldude

Author

Jun 30, 2025

@r600 Using --fp32-text-enc

r600Jul 1, 2025

@Felldude added to run_nvidia_gpu.bat:
--fp32-vae --fp32-text-enc --force-fp32
It works! Although the difference is very minor compared to fp16
fluxMonteCarloFull_unetGGUFQ8.gguf is FP8 8 bit?
What good is it to have 32bit text/clip/vae Encoders?

Felldude

Author

Jul 1, 2025

@r600 force fp32 will force fp32 unet which would slow down generation and I would not recommend it for most users, forcing the CLIP in FP32 is not an issue for most users unless you have a 24GB video card and run the NF4 or Q4 and use -highvram flag

r600Jul 1, 2025· 1 reaction

@Felldude after much testing, this blows the original flux_dev_q8.gguf out of the water!
recommend this lora to fix flux chin:
https://civitai.com/models/775002/chin-fixer-2000

Project_SERAAug 12, 2025

CivitAI

Brilliant model if you can get it to work.

EschelonNov 21, 2025

CivitAI

Просто потрясающая модель! Спасибо большое автору! Вы лучшие ребята!

Felldude

Author

Nov 21, 2025

👍

Checkpoint

Flux.1 D