This model was trained on 3-4k images where hoses are inserted in the mouth or butt. Naturally it has lots of inflation related content in the dataset, but the main purpose is to make hose insertion more reliable in SDXL models.
This dataset was extracted from the Hyperfusion dataset, so the tagging will be similar. Primarily use "hose in mouth", "hose in butt", or "holding hose"
Other Inflation related tags may also work, but I haven't really tried.
Description
Similar dataset to v1, just removed a handful (~15%) of incorrectly tagged images, and low quality images and trained on Noob_vpred.
Training notes:
Kohya's trainer
optimizer ADOPT
optimizer_args
"betas=(0.9, 0.9999)" "eps=1e-7"
This was the best config for ADOPT I found over the past few weeks training for this size dataset
DoRa LoCon
frozen text encoder (increased training time, but I prefer to not touch the TE if possible)
lr 5e-4
dim 16
alpha 8
conv_dim 8
conv_alpha 4
batch 8
GA 16
2.6k images
flip
bucket
resolution 1024
tag dropout 0.1
dropout 0.2
caption_dropout 0.1
scale_weight_norms 6
ip_noise_gamma 0.02
min_snr_gamma 2
zsnr
v_pred
Extras:
soft_min_snr instead of the default formula
learned timestep loss weights, a small network to learn the loss scale for each timestep. similar goal to min_snr
sort important tags to the front and sort separately from others
tag implication dropout for all common implied tags ~40% drop