Version Notes
Trained on a true de-distilled model according to the paper On Distillation of Guided Diffusion Models for twice as many steps as before.
Flux is a distilled model, which means there are diminishing returns on training, which is why so many LoRas are poor quality. For the first time, we can now train it properly.
V3 is my first de-distilled version. I've also trained it for twice as many steps as V2.
It's much easier to get good results without cherrypicking now and there's much less overfitting.
There's still room for improvement for V4. Lots of new things to learn from experiments and I will continue training for even twice as long, but it's already better.
Works as a general nude model pretty well now. Using it on top of Flux Unchained makes it much more flexible for poses and improves photorealism greatly, although they both work symbiotically.
The model is starting to learn how underwear works, and there's much less probability of pants turning into socks when mooning. You can try underwear by just saying wearing some type of pants or whatever other clothes "and underwear", and it will add pulled down underwear.
Usage Notes
Works great in Comfyui. It seems Forge haven't added proper support for using trained text encoders in the LoRa yet, but setting it to fp16 lora seems to use them somewhat.
The text encoder training really fixes a lot and makes it much higher quality.
Qualitative Notes
There are no pubes in any of my datasets. If that's your thing, you will need to add them back in because this will teach the network to shave them off.
Good variety of butt shapes, they're not all small and round. Should generalize to any butt shape but trend towards heart and round.
Description
Early beta model, much room for improvement
FAQ
Comments (13)
What do you mean month long run? 😱
Flux is much harder to train than any previous models
@Tophness I use the Civitai lora generator and it takes like 12 hours to train one lora.
@GracefulFox yeah im training on my home PC, civitai has large server GPUs
@Tophness Keep it up, buddy. It is coming out awesome!
This Lora makes generating an image take an hour instead of 30 seconds on Forge.
I've had this happen to me with various models / lora combos because it's right on the edge of my vram limits. Only solution is to restart forge. Doesn't happen with hyper / gguf quants though.
What's your vram?
@Tophness I have 20GB VRAM
@klotz I'm 16GB, would've thought 20 would be enough but 24 GB is recommended ig. There are people using them on 12 and even 8, but it takes a lot of hacks to get there
Might be worth trying NeverOOM. I was gonna try that myself but I'm currently training. If it's actually the lora then it's possible the next version won't have this problem. This one used Adafactor optimizer on a large dataset which I don't think many people have done as they couldn't get it to converge, so AdamW8bit might make it run like every other lora. It looks like it's already learned more at 1 epoch/12 hours than it took months to get to with Adafactor, so maybe it was taking an inefficient path during inference too. Every other lora I've trained on SD/SDXL has been Adafactor with no issues though
@Tophness I played around with several things and got it to work with limiting GPU Weights to 10 GB. I have no Idea why cutting VRAM to half would help but it does.
Both Forge and Comfy are VERY poorly coded when it comes to memory management (amazing how few programmers are competent in memory management algorithms). Essentially there are datasets that must NOT be constantly swapped from VRAM, else performance will suffer exponentially. No matter how much VRAM you have, when this faulty behaviour kicks in, you will suffer.
Essentially inputs (like LORAs, Flux/SD models, and text models should NOT be kept in VRAM when VRAM in total runs out- instead such data can be moved block-by-block per iteration from RAM to VRAM with no performance impact with Flux for anything below a 4090 probably.
Every problem could be solved if the USER was fully allowed to state which data stays in Vram, and which data switches into Vram in blocks as needed. Sadly most programmers 'think' the more automated the code choices the better!



















