Please read through the entire description (might need to be expanded) and the version change notes as they cover a lot of information about basic use cases and limitations. The description is always focused on the latest version. Thank you!
This is an SD 1.5 LoRA for the character Dawn / Hikari from Pokemon. It works for models based on SD 1.x only, it will not work for models based on SD 2.x or SDXL!
Example images were picked from base resolution txt2img results + the FreeU extension (https://github.com/ljleb/sd-webui-freeu) enabled with default settings to further improve results (see last example images for a comparison of FreeU disabled vs enabled). They were then re-created at 2x resolutions using txt2img with moderate Hi-res Fix settings (upscaler Lanczos, denoising strength 0.4). This improves faces and other details that are almost impossible to get correctly and consistently at base resolutions (limitation of the technology) while still giving a realistic impression of what it looks like. Base resolution results will have more distorted faces and less detail.
Please see the version change notes for the training and example image generation models as well as the used weights as they might change between versions. Remember that you might need to adjust weights to best suit your use case!
Remember to add the tag dawn \(pokemon\) (with backslashes intact) to your positive prompt.
The training set contained multiple variations of Dawn's signature look (not including alternate outfits for now, sorry) so you might need to put combinations of the following tags in your positive or negative prompts to get the desired results:
beanie
bracelet
duffel bag
hairclip
knee boots
kneehighs
poketch
scarf
skirt
sleeveless shirt
As the training set also contained a few images with floating bits, you might need to add the negative prompts floating hair and/or floating scarf if not desired.
Known Limitations / Problems:
The bag and boots were not present in a lot of images and not a focus for the training so they sadly will not be consistent at all.
Same goes for the bracelet and poketch but they might be a bit more recognizable from time to time.
The beanie pattern needs a bit of luck to get correctly (sadly seems very difficult to train correctly).
Eyes and hair had different colors (blue and black) in the training data, better to specify explicitly.
You might also need to prompt for specific clothing colors from time to time, don't really know why yet.
Depending on the clothing, framing etc. it may be more or less difficult to remove the beanie. I recommend a negative prompt with additional weight as well as specifying a hair color, e.g. blue hair in the positive prompts which seems to help. Still, that beanie (especially its pattern) really wants to be there.
The kneehighs might sometimes result in two parts, the kneehighs themselves and a separate part at the thighs, leaving the knee exposed. No idea what causes that, it might help to put thighhighs or clothing cutout in the negative prompts.
Description
There are detailed changes below the next paragraph, you might need to expand this version changes box!
Experimented with quite a few changes to the training settings and training set. A few of them seem to have stuck, leading to a new version that seems mostly better to me than the last. So here you go. Seems I just can't help myself.
Most of the changes (look at LoRA metadata for more details):
Updated to much newer Kohya scripts version (~3 months newer)
Updated training images
Removed some lower quality images, added some new others
Switched to keep original-ish aspect ratios (slightly cropped and resized to compatible SD 1.5 resolutions) where it makes sense instead of forcing squares
Added regularization images to actually make trigger tag work correctly
1 regularization image per training image
Generated by the base model at the same resolution, with the same tags (minus the activation tag)
Currently at half loss during training as they had a bit too much influence otherwise
With this, the LoRA now reverts back much closer to base model knowledge without the activation tag (which is correct!)
Might care a bit more about the positioning of the activation tag now as it was trained with "keep tokens" to keep the activation token at the front when shuffling
Normalized repeats to 1 (only using different values if ever in need of balancing datasets) and learning rates back to defaults
Compensated with different epoch settings
Reverted back to training at 128 dims and a resize down to 32
Results were better across the board and the resizing also removes a bit of noise as a bonus
Used NO dynamic resize method as results for [email protected] and sv_ratio@20 did not seem too different from or better than a simple resizing
Added training warmup of 10%
Not sure if this had much impact, might remove again in the future
Added "scale weight norms" with value 1 during training
Supposedly helps against overfitting and might make LoRAs more compatible with others
After initial release: Used FreeU extension for example image generation to further improve results
Recommended weight: 1
Training model: Anything V3
Example image generation model: AbyssOrangeMix2 - Hardcore
Well, did not expect to do another version for this. At all. But as it turns out, I do like to experiment with completely useless things sometimes, taking up a lot of my free time for some reason. Maybe some of my other existing LoRAs may follow yet again, now that I have new settings. Or maybe not since I need to update the training sets and don't have quite as much interest in most of the others. We'll see, we'll see!