playground-v2-512px-base-anime-finetune
■This is an experimental fine tuning.
I trained using onetrainer.
Fine-tuning is performed on a 100,000-image dataset that mainly contains anime images, but also some realistic and AI images. The training resolution is 512px.
I would like to share the possibilities of playground v2 512px base with everyone.
It is the same as SDXL, so you can download it and use it immediately.
The advantage of this model is 512px, so I thought it would be ideal if you want to train SDXL architecture but have problems such as lack of VRAM.
I think this model may be a good choice for those who want to use the SDXL architecture but feel that the generated size of 1024px is too large or want to generate at 512px.
Fine tuning is done at 512px.An advantage is that there is no need to prepare a 1024px dataset. You can use the dataset that has been used in SD1.5 so far, so it is less burdensome.Training time can also be reduced.
1024px eats up training time, cache time, cache space, VRAM, hard disk, etc...
It's 4 times faster than 1024px. I'm sorry if my calculations are wrong... Learning is fast and fun since you can get the benefits of SDXL architecture even at low resolutions.
This model may have potential.
My wish is for many people to discover basemodels with potential and to see their possibilities unfold even further. I would be happy if I can help make that happen.
■Please be careful as sexual images are also generated.
There are cases where the look of something realistic or AI comes out strongly.
It might be a good idea to add "realistic" to the negative prompt.
"blush" This tag may be effective as it forces an anime style.
This is a very strong tag, so putting it near the beginning may be too strong.
On the other hand, it might be fun to try something other than anime.
New discoveries are made in areas that were not originally intended.
It's okay not to expect perfection too much.This model is still immature.The broken results are more interesting!
It would be interesting to generate various tags using something that can automatically generate tags.
■The standard size for this model is 512px
A ratio like 512x768 like SD1.5 is suitable.
768px 1024px is not trained, so the result will be disastrous.
If you set it to a large size when doing i2i, it will fail.
The limit would be 1.5x magnification and denoise 0.5.
I like dpmpp_sde step:12 cfg:3-5. Euler a is also stable and good. The generation speed will also be faster.
i2i can raise the cfg as much as you want. At around cfg15, contrast and detail become more prominent.
■Added lora to force anime style.
For more details, see the lora tab.
My recent testing results are also written there.
I'm getting pretty used to inference!
The comfyui workflow has also been updated.
■Added a model that merged 0.4 text encoder of Animagine-xl-3.1 to v0.0_aesthetic.
A detailed explanation is written in the v0.0_aesthetic_TE tab.
It's very experimental, so I can't recommend it with confidence, but if you're interested, please give it a try!
If you try hard, you will be able to generate another person who slightly resembles the anime character.Who does the character in the sample image look like? I worked really hard on it. LOL!
I haven't fine-tuned the character so don't expect too much!
It was unexpected that it was possible to generate images of people holding guitars and swords...
You may be able to generate something else as well.
Maybe the Animagine tag rules will also be effective...?
■Added a merge model with stable quality.
I extracted the difference between playground-v2-1024px-aesthetic and the pre-training model and merged it with + 0.5.
Other than 512px has also been improved to improve stability when scaling up with i2i.
Although the style and tag recognition rate will change, the aesthetic aspects are also enhanced, so we recommend this if you find the original model difficult to use.
There is no problem even if the cfg is around 3. If the color is dark, please set the value lower.
It's a lot more fun than I expected.
When it comes to sexual things, the original model is more likely to respond.
It may be fun to search for the ideal combination on your own.
The image may be a little blurry and you may need to sharpen it by upscaling or other means.
There is also an image with a sword in the sample image.I was surprised because I didn't think it would be possible to generate it...
In some cases, images such as 786x1152px could be generated without failure.
↓ It may be effective to divide the reasoning into stages like this.
1. Try the prompts at 512x768px to solidify your concept.
2. Generate better composition and human body at 768x1152px.
3.Improve details with i2i.
■There is no consistency in style.The quality is poor and there are no fixed settings or prompts.
It has no advantage over existing models and has a narrower dataset.
The advantage is that it is lightweight.
If you notice any other benefits, please let me know.
■I am training with the danbooru tag.
A small number of tags will produce a disastrous result.The tags that are often used in danbooru and SD are the quality tags for this model.
We are only learning general tags such as 1gril, and we are not training artist or anime work tags.
I would be happy if you could give me your opinion on what datasets I would like to have if I continue training in the future.
The order of tags is important. Every tag has a unique image.
The more popular the tag, the better the quality may be, but the image will be reflected more strongly, so it is also effective to offset it with other tags or change the order to dilute it.
If the effect is too strong, it might be a good idea to lower the weight.
"Looking at viewer","upper body","shiny skin"etc... can easily be of high quality.
I'm training without adding the "nsfw" tag, but I feel like it's effective for some reason...
■It's an incomplete and very difficult model, but if you're interested, please give it a try. I'm not very good with prompts, so if you can generate interesting results, please share them so I can make this model even stronger.
Your feedback will motivate us to train on a wider range of datasets.
There are still tags that have not yet been learned, so more diverse expressions will be possible.
■I've added the comfyui workflow that I'm using for generation tests.
It doesn't matter what software you use, please try out various software and generate it!
■Merging with SDXL u-net fails.If there is a way to merge, it would be helpful if you could let me know.
Once you are able to merge, you can benefit from other great SDXL models!
It has a different weight than SDXL, so there is basically no compatibility, but it would be fun to find a way to combine them.
I think it can be merged with other playgrounds. It could also be interesting.
If you have any chemical reactions caused by merging different models, please share!
It doesn't matter if it's real or anime.
■Added the training source playground-v2-512px-base model for differential merging with other playground_v2.
I have uploaded it to the "v0.0_base" model tab, so please check it there.
Now you can extract the aesthetic training + fine tuning weights of other playground_v2 1024px by performing differential extraction. If you add and merge them at a rate of +1.0, the 512px base will match 1024px. +0.5 gives an intermediate result and can be matched over a wide range. On the other hand, if you do a difference with my model, you will be able to extract only my fine tuning results and add and merge them to another playground_v2 1024px. There are various combinations and it is fun.
I think lora can be trained like SDXL.
There are still many things that are unclear, so I won't provide a detailed explanation, but if there is a positive opinion, I would like to share as much information as possible.
■Added float32 checkpoint and diffuser model for fine tuning. The training configuration is onetrainer_config that comes with the diffuser model.
I have uploaded it to the "v0.0_base" model tab, so please check it there.
Both u-net and text encoder are fine tuned.
If the training tool you are using supports SDXL, you can train without any problems. If you are still worried, you can be more at ease with onetrainer, which I used for training.
Training this model is fun as it learns very well even at 512px.
playground-v2-512px-base is an SDXL model that is in the middle of training before aesthetic fine tuning.
This is a very rare item that we would normally not be able to obtain.There are endless possibilities.
By using this as a starting point, you may be able to create a specialized model as you wish.
I only drew a slightly unsatisfying picture of the wonderful campus.
It will be a great picture if you add to it.
My dream is to see more SDXL models that can be generated at lower resolutions such as 512px!
It would be fun to add 512px training to further increase concepts at a low training cost.or, By adding 384px+768px and doing multi-resolution training, It will be fun to be able to flexibly support lower and higher resolutions while maintaining 512px, reducing upscaling failures and making it easier to remember finer details and concepts.
There is no problem even if the sample images during training are not of good quality.When I actually used inference and automatically generated tags, it worked surprisingly well. It's okay as long as the training doesn't fail and it doesn't become noise.
Even if the training results are bad, if you merge other models, the non-existent aesthetic elements trained with 512px_base or later will be added and the high resolution will be stronger, so you can generate images that exceed your imagination!
It might be interesting to replace it with an SDXL model text encoder such as "animagine" or "pony" before training and then train.
The text encoder starts with the characters and danbooru tags already known. All you have to do is train unet!
It might also be a good idea to merge the text encoders with 0.5 to keep both properties and train them further.
I'm new to civitai, so if you have any opinions, I'd appreciate it if you could let me know.
Your reaction is my driving force. m(_ _)m
The total number of downloads has exceeded 300. Thank you for your interest in my immature model! Thank you very much for your many likes. m(_ _)m
■Great pre-trained model used for fine tuning.
https://huggingface.co/playgroundai/playground-v2-512px-base
If you have any questions, please feel free to ask!
日本語での質問も大丈夫ですのでご気軽にお声がけください~
Description
■This is a model that merged 0.4 text encoder of anime-xl-3.1 into v0.0_aesthetic.
I have the impression that the tag recognition rate increases slightly and the number of failures decreases. I had the impression that more and more realistic generation was generated, so it would be a good idea to put "realistic" in the negative prompt.
Just because you put it in the prompt doesn't mean it's okay; a real person may suddenly be generated and your heart may stop.
"blush" This tag may be effective as it forces an anime style.
■cfg 3-5 is recommended.
■It is thought that it would be difficult to add tags that have not been fine-tuned to basic U-net, but it may be possible to use animagine's tag rules. There were also anime characters whose reproducibility was slightly better.
■Since u-net has not been fine-tuned, the fundamental generation quality has not changed.
■There were some anime characters whose recognition rate increased when they were completely replaced with the text encoder of anime-xl-3.1, but the number of characters that looked more realistic increased, so this combination ratio was adopted.
■If you try hard, you will be able to generate another person who slightly resembles the anime character.Who does the character in the sample image look like? I worked really hard on it. LOL!
There are some images with poor reproducibility, but please forgive me!
Don't expect too much!
For some reason, there seems to be a lot of pink hair, right? Don't worry about it...
If I come up with something good other than anime characters, I'll post it.
Surprisingly, there were some characters that could be generated using my model without merging. why? playground_v2 may have known this from the beginning.
I don't even know which characters can be created...
I'm currently working hard to create sample images, but the anime characters that come to mind are probably from famous works.
I think it's included in the playground dataset, so if you're lucky you might be able to generate other characters as well.
Perhaps it can be generated using a model other than this merge model.