Goddess Raw (Schnell) - Goddess_RAW_Diffusion-FP8

NSFW

Goddess Raw (Schnell)

Goddess Raw is a unique FLUX model that has realistic skin at very low steps. I even put my results up against the ULTRA model

NF4 FULL CHECKPOINT - DO NOT LOAD ADDITIONAL TE,CLIP,VAE
FP8 and GGUF Require additional TE and CLIP-L & VAE

FP8 Model load as Automatic or Default not FP8 as it is a mixed precision UNET BF16/FP8

No more identical faces
Focused on creating realistic looking images at 4-6 Steps
Some NSFW training no harder then full female nudity. (No sexually explicit training)

set COMMANDLINE_ARGS= --unet-in-bf16 --vae-in-fp32 --cuda-malloc --clip-in-fp32

*Cuda-malloc only if using a Nvidia RTX

I have used prompts for testing based off of onsite remix, in some cases I feel this model out preformed the DEV Pro in prompt adherence and realism

Description

FAQ

Comments (17)

SencneSDec 27, 2024· 1 reaction

CivitAI

So I was testing the FP8 on a few prompts I would normally do just to 'test'. I noticed something weird, it wasn't holding the sword right. Like it literally was holding it vertically in all the prompts. So I cracked out my "I'm a lazy ass" work-flow and had it pull together 60 images. I explain what prompt generation in the 3 groups of images.

I've never had a model generate it holding a sword like this. So I through I'd see if it was a prompt with the model or was it just pure luck.

A couple of examples of what I was seeing pretty consistantly.

https://civitai.com/images/47940135

https://civitai.com/images/47937749

https://civitai.com/images/47940112

But it just seems to be seed drift or something, either way, It's a good model :)

Felldude

Author

Dec 27, 2024

That is quite interesting

SencneSDec 27, 2024

@Felldude Just a small update and I'll refrain from pumping another 40 images to the model (unless you really want them)
But I did another 20 in which I skipped the "LLM prompting" So the "They hold a sword in one hand and a shield in the other hand." was always at the top of the prompt, then all the random prompt generation and stylizing (which I fully randomized this time).
A total of 6 of 20 were holding the sword incorrectly or not in a natural way. 1 was missing the sword altogether.
With the LLM taking a pass at the prompt, it was just 2 out of 20, but 2 were missing the sword, one of which was totally nothing like anything it was more a pattern etc.

I thought, why would the LLM re-wording the data have an impact. So I tested something.
I wrote a one sentence as it would appear in a book. "As they held their sword in one hand, they swung down, they held up their shield with the other hand to protect against the coming attack." Removed all the prompt generation and styling and just pushed it through 20 seeds. It was similar to the LLM taking a pass. Three "weird" looking sword holding.

I wonder if the schnell base needs more 'context' to remove randomness. It very clearly has a higher 'weirdness' factor when you just throw a few elements you want at it and it has to work out the rest. But when there is additional "fluff" that the LLM adds it, for lack of a better word, understands the intent, rather than just making sure the 'objects' are in the image.

Sorry for the long post, I'm going to test raw schnell probably tomorrow to see if that's the case.

Felldude

Author

Dec 27, 2024· 1 reaction

@SencneS With T5 it is hard to say, google deemed weapons and violence verboten so the T5 interaction with a far less censored CLIP is hard to predict

LemonSparkleJan 1, 2025· 1 reaction

@Felldude My little Schnelly tried to give me an unprompted lewd the other day, I was so proud 🥹

eFeRBeDec 29, 2024

CivitAI

I want to make a Q6_K GGUF. Did you convert your Q4 with the FP8 here or from a larger model ?

Felldude

Author

Dec 29, 2024

From BF16

SencneSDec 29, 2024

There is little difference in VRAM usage between Q6 and Q8. If you can afford to run a Q6, you can run the Q8 just fine. From all the data tables I've seen the difference in data loss between Q6 and Q8 is "little" but it's "twice" as much loss as the Q8 from FP16. You should be good with Q8. It's under 12G, I'm guessing a Q6 would be like under 1G smaller. Still too large for our 8G brothers and sisters, but Q8 should be just fine for 12G VRAM and more than perfect for the 16G VRAM users :)

Thank you Felldude for the Q8 :)

Felldude

Author

Dec 29, 2024

I prefer the NF4 or FP8 over GGUF, but this is do to the fact we don't have the FP32 that is really needed for GGUF to shine

SencneSDec 29, 2024

Everyone is different for this. For me NF4 does good images, but they are so much more variance over F16 that bothered me. Q8 does way less variance over the F16 model. I guess I kind of like that more. It's not that NF4 is bad, it's probably my OCD LOL

eFeRBeDec 29, 2024

@SencneS I know I can use Q8 or FP8 with 16GB VRAM, but no, it's no perfect, unless I want to wait a lot, away from your computer.
Q6 is the sweet spot if with 16GB you want to use loras, change T5... or just use your computer while it's working on its picture. ComfyUI is more efficient and predictable with VRAM, but I prefer to use Forge.

Felldude

Author

Dec 29, 2024

My best IT's per second on a 3050 is 4.5 Second per IT with NF4 and around double that or more if I use BF16 - GGUF tends to be a second or two more per IT then FP8, with the BF16/FP8 model I can get 7-9

SencneSDec 29, 2024

@eFeRBe I guess I've never noticed any real difference in generation time between the versions and I've tested a lot. You're right I am using Comfyui and that may be the difference. Honestly I just tested the difference between a Q6 and Q8 for me was 3.82s (faster for Q6) This was 1 sample. It might be Comfy vs Forge thing. Generation for me from start to finish are between 40 and 90 seconds depending on what I'm doing. The 90 second mark is me passing the prompt through a separate text generation LLM. So I can get up there in generation times if I really have the ultra complex workflow, but not enough to consider "excessive generation". But if you're use to like 10 second generations I guess 40seconds seems like a lot.

Edit :- Clarifying note I'm not generating with Schnell, pretty much always Dev with 25+ steps.
Because schnell is under 10 seconds for me pretty consistently.

eFeRBeDec 30, 2024

@Felldude I just tested your FP8 with my 4060Ti. I get 2.3s/it, it seems normal (and reassuring). But sometimes I get a warning saying I have no VRAM left. I thought that the option --clip-in-fp32 will load the TE in RAM to be used by the CPU, but it seems that at one moment, I must have the size of model+clip free in VRAM. Am I wrong or did I misunderstood that it's possible to load the text encoder only in RAM ? (and thank you for your previous answers)

Felldude

Author

Dec 30, 2024

@eFeRBe So forge is the issue here, after doing some digging it is acting as the force gpu command is active at all times - Comfy UI is 25% faster do to forge constantly unloading the TE rather then keeping them in RAM and using the CPU

eFeRBeDec 30, 2024

@Felldude I think it's me and not Forge :) FP8 is 14.5GB and it's easy to use the 1.5GB left, especially if you have several web browsers opened and a 4K display, and changing the T5 size doesn't change anything.
And I ran out of RAM (without V) too with 32GB with another Flux model. When it appends it's hard to diagnose precisely because the mouse pointer moves 1cm / minute...
I ordered 32 more (I knew it will append one day).
And thx for the pix btw.

Felldude

Author

Dec 30, 2024

@eFeRBe Forge is definitely forcing GPU for CLIP and that is an issue, hopefully someone will submit...I use Comfy 99% of the time so I'm just making a note for people

Checkpoint

Flux.1 S

by Felldude

Download (Beta) View on CivitAI