hyperfusion SDXL DoRA 600k images - CivArchive (CivitAI Archive)

hyperfusion SDXL DoRA 600k images - v10 noob_vpred

NSFW

This DoRA was trained on 600k images of hyper sized anime characters. It focus mainly on breasts/ass/belly/thighs/fat. This dataset is a subset of the larger hyperfusion dataset, but filtered down to body shape/size related images only. The full dataset would have taken more than a year to train on SDXL, lol.

Recomendations:

Dora/LoRA strength: 1.0 (DoRA's work in most WebUI's by now)
Resolution: ~1024px
samplers: any sampler PonyXL supports
in v10 you can push the lora weight more than in v9, so do that if your concepts are not working as well as you like.

Uploaded 1.4 million custom tags used in hyperfusion here for integrating into your own datasets

v10 Noob_vpred Release 2025/07/29:

Did you guys think I disappeared? Nope, just hopelessly training a model with a frozen text encoder for 7 months.
This new DoRA has the same concepts you are used to by now, but with a few new concepts as usual. Also 200k more images than v9.
This version is trained on NoobAI_Vpred, so there is no guarantee it will work with anything else. Especially not on non-v_pred models.
Wanted to try training with the Text Encoder frozen one last time. Also decided to stick to it no matter how long it took. And now I can definitively say I will be including TE in future models just for the sake of time. It works, but its way too slow for my setup.
Use the tag list in v9 for now, until I get around to building the new one with the small number of new concepts.
This one should handle concepts a little better than v9_sdxl, and is less prone to exploding gradients as well.

v9 Pony Release:

This model has been training for over 2 months now, but since Flux dropped, I decided to release what I have so far to free up a GPU. Technically it should have trained for longer, but I'm impatient, and some of you are probably tired of waiting anyway.
The tags are mostly the same as the last v8 release for SD1, with a few new additions like blob content for example. See the tag.csv for more in "Training Data".
Pony is a little tricky to train on, so I was experimenting a lot with this model. Because of this you should try to keep the DoRA strength near 1.0. Anything above 1.1 tends to explode. (weight regularization like scale_weight_norms is critical for training on pony, fyi)
To keep training time reasonable I trained at 768x768 resolution initially, and had planned on finishing up training with 1024px resolution, but then Flux happened. The results still seem reasonable.

I put plans and progress here every now and then.

Changelog Article Link

Description

Trained on 600k (hyper focused) images extracted from the larger hyperfusion dataset.

The goal of this model was to power through Unet only training, and see how much longer it would take to train without the Text Encoder. The result, was that it's at least 3x longer to train because of it, so ill be including Text Encoder training in the future. Its just not worth the added time. Based on previous models I expected this to take 3 months to train to the same level as the previous DoRA, but ended up taking 7 months because of the frozen Text Encoder. It's still far from fully trained (although better at most concepts than v9_sdxl), It felt good enough to go ahead and release.

Training Notes:

~600k images
LR 3e-4
Unet only training (just as an experiment)
batch 4
gradient accumulation 32
dim 16
alpha 8
c_dim 8 (technically a DoRA LoCon)
c_alpha 4
optimizer: Adamw8Bit
scheduler: linear
base model NoobAI_vpred
flip aug
525 token length (because appending captions and tags is a lot of tokens)
bucketing 1024, min 512, max 1280
tag drop chance 0.1
tag shuffling
--min_snr_gamma 2
--ip_noise_gamma 0.02
--scale_weight_norms 7
7 months of training on 2x3090

Custom training configs:
I have implemented a number of things into Kohys's training code that have been suggested to improve training, and kept the things that seemed to make improvements.
- drop out 75% of tags 5% of the time to hopefully improve short tag length results
- soft_min_snr instead of min_snr
- --no_flip_when_cap_matches: Prevent flipping images when certain tags exists like "sequence, asymmetrical, before and after, text on*, written, speech bubble" etc... This should help with text, and characters with asymmetrical features.
- --important_tags: move important tags to the beginning of the list, and sort them separately from the unimportant ones (suggested from NovelAI if I remember correctly).
- --tag_implication_dropout: Dropout similar tags to prevent the model from requiring them both to be present when generating. Like "breasts, big breasts" breasts will be dropped out 30-50% of the time. I used the tag implications csv from e621 as a base and added tags as needed. Even with 10%-15% tag dropout, some tag pairs were still being associated too often, this definitely made a difference. I think there were about 5k tags in total on the dropout list.
- 30% of the dataset is captioned with VLM's (COGVLM, Quen2, etc), as well as cleaning up many of the captions with custom scripts that correct common problems.
- Tags vs Captions: 70% of the time use tags, ~20% of the time use captions (if they exist), 10% of the time combine tags with captions in different orders.
For this training run specifically, I also implemented Learned Loss Weighting, and removed min_snr half way through. Learned loss weighting basically attempts to learn the noise schedule for different timesteps instead of having a static one. Hard to say if it made the model any better, but I did notice the model start to converge a bit faster after adding it. Still though, it's hard to say, but it didn't hurt it at least.