The First Flux Model Allowing Nude Men & Women to Co-Exist!
Trained Locally on a 3090 using the SD3 Branch of Kohya's SD3-Scripts!
Whilst definitely still a proof of concept compared to something like Pony, it (often) does what it was designed to do quite well!
V2.5 Update
This version is a merge of training runs done on Flux De-Distill and Flux Dev2Pro, both of which seek to remove distillation from Flux Dev. Models were merged w/ a ratio of 0.7:0.3 Dev2Pro:De-Distill. The dataset has been unaltered from version 2, hence why it's v2.5 as opposed to v3.
The result is FAR greater image quality and generally better prompt adherence at the cost of increased generation times. Example images were generated using a modified version of the DynamicThresholdingFull Node to allow more precise Threshold Percentile values. This can be done by adding an extra 0 to step on line 11 of dynthres_comfyui.py.
Initial Q8 GGUF release has near indistinguishable quality from FP16 and should run on most hardware. Requires the ComfyUI GGUF custom node. The FP16 version will be uploaded towards the end of the week; FP8 version may or may not be uploaded depending on demand.
Notes:
Whilst the model can somewhat work with the default Flux guidance of 3.5 and a CFG of 1, it is highly advised to remove the Flux Guidance node entirely and set CFG to something above 1
Takes anywhere from 2-3 times as long to generate an image compared to previous versions, but the drastic increase in quality makes this worth it IMO
It is recommended to use a step count between 40 and 60
Due to using the same dataset as v2 it still does have some issues carried over:
Female genitalia seems undertrained
No NSFW poses
Pubic and body hair seems prevalent even if you specify to avoid including it
Still has trouble distinguishing between circumcised and uncircumcised
Erect/Flaccid is sometimes not interpreted properly with more complex prompts or when generating an image with multiple male characters. It is my hope these issues will be remedied with version 3
V2 Update
Introducing Better Prompt Adherence and Anatomy!
This version was trained on the original SapianF model with an expanded dataset (3x as large) with more aggressive masking and prompting, along with a lower learning rate (22e-6 vs 25e-6). The result is a greater understanding of concepts like erect vs flaccid, dense pubes vs shaved pubes, and an overall improvement to genital anatomy, especially when it comes to male characters!
The dataset for males now contains 175 images, and the female dataset now consists of 75 images, both with a larger variety of poses, angles, and concepts.
Images were masked more aggressively with lower non-masked values, forcing the model to focus on the genitalia specifically, with captioning for these new images is far more focused on the subject.
Learning rate was also decreased to allow the model to be trained for a longer period of time to allow it to better learn the concepts described.
Model was trained for 6 epochs, with epoch 4 producing the most consistent results. The blocks of this model were merged with the original model by hand with the goal of keeping elements consisted whilst transferring over the concepts which were trained.
Notes:
Whilst a big improvement overall, it's still not perfect. Depending on the prompt and seed elements can still produce suboptimal results, though this is much more rare
It is likely that the only way to fully fix this issue is with even lower learning rates, longer training times, a further expanded dataset, and a higher batch size during training, most of which aren't really going to be possible on my hardware for the time
It is recommended to run the model with the improved CLIP-L model and LongCLIP model released by zer0int1 a couple days ago. The ComfyUI node for LongCLIP can be found here
About
There are plenty of Flux checkpoints out there now that allow for both nude men and women to be generated, with one caveat...
These models are trained only on members of a single sex, meaning that if it's trained on nude men, any attempt to generate nude women will result in male genitalia being added unprompted. Similarly, attempting to generate nude men on model's trained on nude women will result in female genitalia being added to nude men unprompted.
So I set out with what I thought would be a simple task: train either a LoRA or Checkpoint to generate both nude men and nude women.
LoRA training was quickly ruled out due to consistently suboptimal outputs, but after much testing full checkpoint training has clearly yielded better results!
Training/Dataset
The dataset consisted of 45 images of nude men, 30 images of nude women, 15 images of nude men and women together in an image (tasteful), and 50 regularization images generated with my regularization workflow. Images were primarily front and side facing, and consisted mostly of standing and sitting poses from a variety of angles. Dataset was resized to 1024, 768 and 512 for multi-resolution training. Masked training was completed by manually drawing a white mask over the genital areas, and setting the rest of the masked area to 30%.
Of the non-regularization images, 60% were captioned using Joy Caption with modifications as needed, and 40% were manually captioned using natural language descriptions by hand.
In my testing female anatomy seems to train substantially faster than male anatomy, so image repeats for the men & women and men datasets were double that of the women and regularization images.
Learning rate was 25e-6, and was run for approximately 7,500 steps. Took 10 hours to train on my 3090! Training was completed on both the regular Dev model, along with a Dev model that had a female-focused NSFW LoRA applied prior to training. Both models were merged together in ComfyUI afterwards.
Considerations
As said earlier, this is still a proof of concept. NSFW is very difficult to train with Flux, and the limited dataset I have can often require some seed searching
The model as of now has only a rudimentary understanding of sub-concepts like pubic hair, erect/flaccid, and circumcised/uncircumcised, and thus results are often of questionable quality. You can often still force the model to work with these sub-concepts by playing with guidance settings and searching through different seeds, but more focused training will likely be required
If you are only looking to generate images of a particular sex, models trained on that specific sex will usually produce better quality images
Description
Version 2 is Here!
Introducing Better Prompt Adherence and Anatomy!
This version was trained on the original SapianF model with an expanded dataset (3x as large) with more aggressive masking and prompting, along with a lower learning rate (22e-6 vs 25e-6). The result is a greater understanding of concepts like erect vs flaccid, dense pubes vs shaveed pubes, and an overall improvement to genital anatomy, especially when it comes to male characters!
More details can be found in the main description.