SDXL 4GB/2GB (Improved FP8 & GGUF) - FP8 Full Checkpoint

NSFW

SDXL 4 Step (FP32 with Improved UNET)

Note I used the 32bit CLIP-G
Refiner used in workflow also 32bit GGUF
Updated CLIP-L
For 4 Step use at CFG 1.0 - Load a image for workflow
For NSFW images the refiner should not be used
BRSGAN 2x can be found on Google Drive

2GB GGUF from FP32

Note 2GB requires separate clip and GGUF support, 4GB FP8 is ready to use in any SD GUI

Refined with baked FP32 Lora's
Quantized from the FP32 SDXL model for less loss
For GGUF Download separate SDXL CLIP-G and CLIP-L and VAE

4GB SDXL (Full Checkpoint)

Custom CLIP is not Quantized
Custom UNET quantized to FP8 allowing for a balance of size and quality
Works in FORGE, Comfy-UI and Auto-1111
Works with LORA's
Beta/Deis is a good choice for img 2 img up-scaling

Both models have improved anatomy (Uncensored) for females however GGUF version does not do well with males.

Description

FAQ

Comments (13)

3567304Oct 19, 2024· 1 reaction

CivitAI

interesting work first pony now SDXL with 4 gb, i cant wait for try your 4 step versions of these models to draw various results, Good work man appreciate the hard owrk and effort

Felldude

Author

Oct 19, 2024

The Step Models rely on timestamp/schedulers, given the lora is less the 400MB and in intended to work with LCM I am not sure it would merge in well.

moocoopOct 20, 2024

Can someone help me understand the goal/value/benefit here? I truly don't understand the significance of the 4GB or the added written detail. Thanks!

Felldude

Author

Oct 20, 2024

@moocoop The benefit would apply to 6GB 2060 and 3050 users. Users who keep multiple models loaded into VRAM, such as PONY + XL or PONY + FLUX and have to watch VRAM usage

punkbuzter340Oct 20, 2024· 2 reactions

CivitAI

What is this sorcery ?

Felldude

Author

Oct 20, 2024· 2 reactions

The type that uses 2 bits of precision compared to 23 bits on the original FP32 model. The sorcery is the fact they can predict what the number would have been with any measure of accuracy thanks to graphing

punkbuzter340Oct 21, 2024· 1 reaction

@Felldude What else did you do... Did you use the original SDXL model and tweaked it or you used a finetuned model or merged model, or you trained your own images on top of something?... Cuz the results are impressive.

Felldude

Author

Oct 21, 2024· 3 reactions

@punkbuzter340 The clip and unet have both been modified, it has multiple FP8 trainings baked in

ViennarOct 23, 2024· 1 reaction

CivitAI

I don’t understand why use the fp8 model if with the --medvram argument it is not loaded into memory, and without this argument, it works for me about 6 times slower than the classic fp16 with the --medvram argument. The original idea was to work with insufficient video memory?

Felldude

Author

Oct 23, 2024· 1 reaction

It was to enable those with 6GB RTX cards to fit into VRAM, possibly IPEX on Integrated Intel GPU's - Or those using a feed into FLUX, you might be able to fit a 4GB alongside a NF4 version of FLUX into a 16GB card

Felldude

Author

Oct 23, 2024· 1 reaction

Only a 4090 has accelerated FP8 attention and pytorch doesn't support it yet so we still upcast to FP16 or BF16

ViennarOct 23, 2024· 1 reaction

@Felldude I understand, thanks for the explanation

amazingbeautyNov 12, 2024

as far i know the fp8 isn't that really good with some H.W , might be sort old H.W

Checkpoint

SDXL 1.0

by Felldude

Download (Beta) View on CivitAI