CivArchive
    FLUX Monte Carlo (Full FP32) - UNET-GGUF-Q8
    NSFW
    Preview 71295200
    Preview 71300700

    FLUX Monte Carlo

    
    
    Note: This model has undergone deterministic bias correction + stochastic regularization. 
    
    
    
    Even this took around 2GW of power.
    
    
    
    To train a FLUX at FP32 it would take a supercomputer and power-plant.

    • This model has 1 Trillion Iterations to restore FP32 precision.

    • Do to the complexity of the tensor shape this was done at 10,000 steps per element, with noise guidance based on the tensor.

    • FP32 T5 & FP32 CLIP


    Description

    FAQ

    Comments (46)

    A_Friendly_SpiderApr 20, 2025· 4 reactions
    CivitAI

    The hilarity to ensue if they let you run this on site.

    Felldude
    Author
    Apr 20, 2025· 1 reaction

    The commercial license does not allow them to run unofficial FLUX is my understanding

    5310116Apr 20, 2025· 4 reactions
    CivitAI

    You're out of your mind...and I love you for it. I can't wait to try this.

    Felldude
    Author
    Apr 20, 2025

    I do not recommend forcing FP32 UNET for most cases, although it does render a slightly different image even seed to seed

    5310116Apr 20, 2025· 1 reaction

    @Felldude I was just messing with you. I get what you're doing here. Even if downcast or quantized, having that extra precision will make a difference.

    Felldude
    Author
    Apr 20, 2025

    @AUsername111 Athletically I think the FP8 fast was giving the best looking images, the noise makes them look less AI and processed

    SencneSApr 20, 2025· 2 reactions

    @Felldude This is what I've noticed as well, literally adding "Film Grain" to images makes them look so much more 'real'. The super smooth textureless skin makes it look so fake!
    Like if you're trying to go with that effect it's fine, but in most cases it looks VERY fake and uncanny much more than even the slightest noise added.

    SencneSApr 20, 2025· 4 reactions
    CivitAI

    Downloading just to say I did, and to have this baby ready for the day I can use it LOL

    But I'm thinking Q8 GGUFing just to see. Since it's FP32.

    Felldude
    Author
    Apr 20, 2025

    I did a NF4 test but not Q8 or Q4 gguf

    5310116Apr 20, 2025

    Via comfyui I tried casting it to fp8 but it crashed (RTX 4090 and 64gb RAM 😥)

    SencneSApr 20, 2025· 4 reactions

    Q8 Conversion completed in 0 hour(s) 9 minute(s) 48.49 second(s).
    I'll give it some tests tomorrow, headed to bed at the moment - Have a good one Felldude!

    AgimaxApr 20, 2025· 1 reaction
    CivitAI

    Awesome model, probably my favorite now! Only issue i found is the nipples, they seem to be somewhat deformed on most creations -though easily fixed with tweaking.

    Felldude
    Author
    Apr 20, 2025

    Thank you

    5310116Apr 20, 2025
    CivitAI

    Just wondering how some of you have gotten this to run? I have an RTX 4090 and 64gb of RAM and it always crashes on loading, even when data type is set to FP8. Any tips/tricks?

    Felldude
    Author
    Apr 20, 2025· 2 reactions

    My recommendation of 120GB virtual memory allotment was not a joke - beyond that make sure you do not have the -highvram or gpu only flags set

    5310116Apr 20, 2025

    @Felldude Thank you.

    AgimaxApr 20, 2025· 2 reactions

    @AUsername111 I have a RTX4090 laptop, 64GB ram, linux. I can run with or without FP8 though i use FP8 FAST to render quickly. It uses 32.5 gb chip ram, 31.1 gb cache ram and 11.3 gb of VRam. I use:
    --listen --use-sage-attention --bf16-vae --cuda-malloc --bf16-text-enc --fast --normalvram

    Felldude
    Author
    Apr 20, 2025· 2 reactions

    @Agimax Thanks for sharing, --fp32-text-enc You, may want to use this command as my understanding is that comfy would be taking the FP32 TE and CLIP, downcasting to BF16 per your argument, then back to FP16 for use on your CPU, and depending on how your pytorch is setup it may be upcast from FP16 back to FP32

    5310116Apr 20, 2025· 3 reactions

    Thanks guys. Already did the virtual memory trick. Also created F16 Quant and a FP8 safetensor :) Even loaded at full strength and generated with it. The quality difference is definitely there. Thanks for making this.

    Felldude
    Author
    Apr 20, 2025· 2 reactions

    @AUsername111 Thanks, glad its working for you

    0l1v1aR0551Apr 20, 2025· 1 reaction
    CivitAI

    I know - this is stupid, but maybe, just maybe ... GGUF?

    Felldude
    Author
    Apr 20, 2025· 1 reaction

    If the quantization shows the same improvement as the full model across hundreds of images

    0l1v1aR0551Apr 20, 2025

    @Felldude yes, but still - for all of us, simply to test and play around with it ;)

    dpbentonApr 20, 2025
    CivitAI

    Is there a special FP32 clip for this? The highest I have is FP16.

    Felldude
    Author
    Apr 21, 2025

    Yeah I will link them

    dpbentonApr 21, 2025· 1 reaction

    @Felldude Thank you. Running on a RTX 4090 with 24GB VRAM and 124GB system RAM and have about 30 second run times for a 1024x1024 image. Not bad at all. Still looking at my results. I will post and give more feedback tomorrow.

    DarkShivaApr 24, 2025· 3 reactions
    CivitAI

    Ty for the gguf, works perfect! A++++

    Felldude
    Author
    Apr 24, 2025· 1 reaction

    Thanks

    mmdd2543Apr 24, 2025
    CivitAI

    Which is higher quality and closer to Flux Pro, this or the De-distilled Flux by Nyanko7? Also, does it make sense to run this as a daily model or is it more for experimentation?

    Felldude
    Author
    Apr 24, 2025

    The de-distilled would be a completely different approach and if your goal is NSFW it might work with some TE trained loras better.

    As to daily use if you have the system ram to load all the models 70GB (Virtual memory can be used) then the time to render is the same. (Unless you use the full FP32 command)

    mmdd2543Apr 24, 2025

    I appreciate the quick reply. :) I don't need it for NSFW. I'm doing landscapes and architectural imagery with LoRAs and a good amount of img2img transformations with controlnets. I'm mostly concerned about which can bring out more detail, follows prompts better, and if it's feasible to use your Monte Carlo version for the purposes I outlined above. I do have 96GB DDR5 RAM and RTX 5090 graphics card, so I guess this should be possible? Again, I'm more concerned with quality than speed. I can wait a bit longer if there's an appreciable difference in image quality and prompt comprehension. :)

    Felldude
    Author
    Apr 24, 2025

    @mmdd2543 With that setup you could try --force-fp32, in my testing the images sometimes very drastically with the FP32 UNET however they did take 3x-4x longer - If using the FP32 FlanT5xxl that is linked the prompt adherence should by quite high. Personally I only use --fp32-text-enc and then either BF16 UNET or FP8 fast

    mmdd2543Apr 24, 2025

    @Felldude Got it! I will try it out when I find some time for experimentation. Thanks a bunch!! 👍

    drak0nMay 31, 2025
    CivitAI

    I am impressed with your work on this model. At first it didn't work, it always crashed. Strangely, without making any changes to the comfyui, this model started working. Unfortunately, for some reason, this model no longer works. I started getting the error "The paging file is too small for this operation to complete. (os error 1455)". I have RTX 4090 and 64 DDR 5 on W11. It worked for a while without making any changes to comfyui or windows and suddenly it stopped working.

    Felldude
    Author
    May 31, 2025· 1 reaction

    Paging File suggest that windows is managing virtual memory and running out - I would suggest setting it to 64GB or larger

    drak0nMay 31, 2025

    @Felldude Thanks for your answer. It is strange that it went for a while without making changes to either comfyui or virtual memory. I currently have 28 439 MB in Virtual Memory configured.

    r600Jun 30, 2025
    CivitAI

    @Felldude Cannot find flux_vae_fp32.safetensors anywhere on the internet or civitai. Tried original flux VAE ae.safetensors but images do not match the prompt.

    Please upload

    Felldude
    Author
    Jun 30, 2025

    I renamed the ae file to what you see in the workflow - did you launch in FP32 Text encoder

    r600Jun 30, 2025

    @Felldude I used your workflow from the 1st image here:
    https://civitai.com/images/71295200
    I selected the 2 FP32 clip files you mention to download, and ae.safetensors for vae.
    GGUF load checkpoint fluxMonteCarloFull_unetGGUFQ8.gguf.
    Does not follow the prompt at all

    Felldude
    Author
    Jun 30, 2025

    @r600 Using --fp32-text-enc

    r600Jul 1, 2025

    @Felldude added to run_nvidia_gpu.bat:
    --fp32-vae --fp32-text-enc --force-fp32
    It works! Although the difference is very minor compared to fp16
    fluxMonteCarloFull_unetGGUFQ8.gguf is FP8 8 bit?
    What good is it to have 32bit text/clip/vae Encoders?

    Felldude
    Author
    Jul 1, 2025

    @r600 force fp32 will force fp32 unet which would slow down generation and I would not recommend it for most users, forcing the CLIP in FP32 is not an issue for most users unless you have a 24GB video card and run the NF4 or Q4 and use -highvram flag

    r600Jul 1, 2025· 1 reaction

    @Felldude after much testing, this blows the original flux_dev_q8.gguf out of the water!
    recommend this lora to fix flux chin:
    https://civitai.com/models/775002/chin-fixer-2000

    Project_SERAAug 12, 2025
    CivitAI

    Brilliant model if you can get it to work.

    EschelonNov 21, 2025
    CivitAI

    Просто потрясающая модель! Спасибо большое автору! Вы лучшие ребята!

    Felldude
    Author
    Nov 21, 2025

    👍

    Checkpoint
    Flux.1 D

    Details

    Downloads
    161
    Platform
    CivitAI
    Platform Status
    Available
    Created
    4/20/2025
    Updated
    6/23/2026
    Deleted
    -

    Available On (1 platform)

    Same model published on other platforms. May have additional downloads or version variants.