playground-v2-512px-base-anime-finetune
■This is an experimental fine tuning.
I trained using onetrainer.
Fine-tuning is performed on a 100,000-image dataset that mainly contains anime images, but also some realistic and AI images. The training resolution is 512px.
I would like to share the possibilities of playground v2 512px base with everyone.
It is the same as SDXL, so you can download it and use it immediately.
The advantage of this model is 512px, so I thought it would be ideal if you want to train SDXL architecture but have problems such as lack of VRAM.
I think this model may be a good choice for those who want to use the SDXL architecture but feel that the generated size of 1024px is too large or want to generate at 512px.
Fine tuning is done at 512px.An advantage is that there is no need to prepare a 1024px dataset. You can use the dataset that has been used in SD1.5 so far, so it is less burdensome.Training time can also be reduced.
1024px eats up training time, cache time, cache space, VRAM, hard disk, etc...
It's 4 times faster than 1024px. I'm sorry if my calculations are wrong... Learning is fast and fun since you can get the benefits of SDXL architecture even at low resolutions.
This model may have potential.
My wish is for many people to discover basemodels with potential and to see their possibilities unfold even further. I would be happy if I can help make that happen.
■Please be careful as sexual images are also generated.
There are cases where the look of something realistic or AI comes out strongly.
It might be a good idea to add "realistic" to the negative prompt.
"blush" This tag may be effective as it forces an anime style.
This is a very strong tag, so putting it near the beginning may be too strong.
On the other hand, it might be fun to try something other than anime.
New discoveries are made in areas that were not originally intended.
It's okay not to expect perfection too much.This model is still immature.The broken results are more interesting!
It would be interesting to generate various tags using something that can automatically generate tags.
■The standard size for this model is 512px
A ratio like 512x768 like SD1.5 is suitable.
768px 1024px is not trained, so the result will be disastrous.
If you set it to a large size when doing i2i, it will fail.
The limit would be 1.5x magnification and denoise 0.5.
I like dpmpp_sde step:12 cfg:3-5. Euler a is also stable and good. The generation speed will also be faster.
i2i can raise the cfg as much as you want. At around cfg15, contrast and detail become more prominent.
■Added lora to force anime style.
For more details, see the lora tab.
My recent testing results are also written there.
I'm getting pretty used to inference!
The comfyui workflow has also been updated.
■Added a model that merged 0.4 text encoder of Animagine-xl-3.1 to v0.0_aesthetic.
A detailed explanation is written in the v0.0_aesthetic_TE tab.
It's very experimental, so I can't recommend it with confidence, but if you're interested, please give it a try!
If you try hard, you will be able to generate another person who slightly resembles the anime character.Who does the character in the sample image look like? I worked really hard on it. LOL!
I haven't fine-tuned the character so don't expect too much!
It was unexpected that it was possible to generate images of people holding guitars and swords...
You may be able to generate something else as well.
Maybe the Animagine tag rules will also be effective...?
■Added a merge model with stable quality.
I extracted the difference between playground-v2-1024px-aesthetic and the pre-training model and merged it with + 0.5.
Other than 512px has also been improved to improve stability when scaling up with i2i.
Although the style and tag recognition rate will change, the aesthetic aspects are also enhanced, so we recommend this if you find the original model difficult to use.
There is no problem even if the cfg is around 3. If the color is dark, please set the value lower.
It's a lot more fun than I expected.
When it comes to sexual things, the original model is more likely to respond.
It may be fun to search for the ideal combination on your own.
The image may be a little blurry and you may need to sharpen it by upscaling or other means.
There is also an image with a sword in the sample image.I was surprised because I didn't think it would be possible to generate it...
In some cases, images such as 786x1152px could be generated without failure.
↓ It may be effective to divide the reasoning into stages like this.
1. Try the prompts at 512x768px to solidify your concept.
2. Generate better composition and human body at 768x1152px.
3.Improve details with i2i.
■There is no consistency in style.The quality is poor and there are no fixed settings or prompts.
It has no advantage over existing models and has a narrower dataset.
The advantage is that it is lightweight.
If you notice any other benefits, please let me know.
■I am training with the danbooru tag.
A small number of tags will produce a disastrous result.The tags that are often used in danbooru and SD are the quality tags for this model.
We are only learning general tags such as 1gril, and we are not training artist or anime work tags.
I would be happy if you could give me your opinion on what datasets I would like to have if I continue training in the future.
The order of tags is important. Every tag has a unique image.
The more popular the tag, the better the quality may be, but the image will be reflected more strongly, so it is also effective to offset it with other tags or change the order to dilute it.
If the effect is too strong, it might be a good idea to lower the weight.
"Looking at viewer","upper body","shiny skin"etc... can easily be of high quality.
I'm training without adding the "nsfw" tag, but I feel like it's effective for some reason...
■It's an incomplete and very difficult model, but if you're interested, please give it a try. I'm not very good with prompts, so if you can generate interesting results, please share them so I can make this model even stronger.
Your feedback will motivate us to train on a wider range of datasets.
There are still tags that have not yet been learned, so more diverse expressions will be possible.
■I've added the comfyui workflow that I'm using for generation tests.
It doesn't matter what software you use, please try out various software and generate it!
■Merging with SDXL u-net fails.If there is a way to merge, it would be helpful if you could let me know.
Once you are able to merge, you can benefit from other great SDXL models!
It has a different weight than SDXL, so there is basically no compatibility, but it would be fun to find a way to combine them.
I think it can be merged with other playgrounds. It could also be interesting.
If you have any chemical reactions caused by merging different models, please share!
It doesn't matter if it's real or anime.
■Added the training source playground-v2-512px-base model for differential merging with other playground_v2.
I have uploaded it to the "v0.0_base" model tab, so please check it there.
Now you can extract the aesthetic training + fine tuning weights of other playground_v2 1024px by performing differential extraction. If you add and merge them at a rate of +1.0, the 512px base will match 1024px. +0.5 gives an intermediate result and can be matched over a wide range. On the other hand, if you do a difference with my model, you will be able to extract only my fine tuning results and add and merge them to another playground_v2 1024px. There are various combinations and it is fun.
I think lora can be trained like SDXL.
There are still many things that are unclear, so I won't provide a detailed explanation, but if there is a positive opinion, I would like to share as much information as possible.
■Added float32 checkpoint and diffuser model for fine tuning. The training configuration is onetrainer_config that comes with the diffuser model.
I have uploaded it to the "v0.0_base" model tab, so please check it there.
Both u-net and text encoder are fine tuned.
If the training tool you are using supports SDXL, you can train without any problems. If you are still worried, you can be more at ease with onetrainer, which I used for training.
Training this model is fun as it learns very well even at 512px.
playground-v2-512px-base is an SDXL model that is in the middle of training before aesthetic fine tuning.
This is a very rare item that we would normally not be able to obtain.There are endless possibilities.
By using this as a starting point, you may be able to create a specialized model as you wish.
I only drew a slightly unsatisfying picture of the wonderful campus.
It will be a great picture if you add to it.
My dream is to see more SDXL models that can be generated at lower resolutions such as 512px!
It would be fun to add 512px training to further increase concepts at a low training cost.or, By adding 384px+768px and doing multi-resolution training, It will be fun to be able to flexibly support lower and higher resolutions while maintaining 512px, reducing upscaling failures and making it easier to remember finer details and concepts.
There is no problem even if the sample images during training are not of good quality.When I actually used inference and automatically generated tags, it worked surprisingly well. It's okay as long as the training doesn't fail and it doesn't become noise.
Even if the training results are bad, if you merge other models, the non-existent aesthetic elements trained with 512px_base or later will be added and the high resolution will be stronger, so you can generate images that exceed your imagination!
It might be interesting to replace it with an SDXL model text encoder such as "animagine" or "pony" before training and then train.
The text encoder starts with the characters and danbooru tags already known. All you have to do is train unet!
It might also be a good idea to merge the text encoders with 0.5 to keep both properties and train them further.
I'm new to civitai, so if you have any opinions, I'd appreciate it if you could let me know.
Your reaction is my driving force. m(_ _)m
The total number of downloads has exceeded 300. Thank you for your interest in my immature model! Thank you very much for your many likes. m(_ _)m
■Great pre-trained model used for fine tuning.
https://huggingface.co/playgroundai/playground-v2-512px-base
If you have any questions, please feel free to ask!
日本語での質問も大丈夫ですのでご気軽にお声がけください~
Description
■This lora is for the "v0.0_aesthetic+TE_merge" model only.
This lora was created with the purpose of forcing an anime style.
This is not meant to enhance aesthetics; it's intended to suppress realistic results. The sample images are nearly identical to those from the original model, so applying this to other models won't necessarily produce results like those in the sample images.
■Created with 5000 aesthetic anime images.Trained only with u-net, so the tag recognition rate does not change.
I trained it on multiple resolutions (512px+768px) to make it compatible with a wide range of resolutions.
A strength of 0.3 is just right.A larger value increases the enforcement power, but there are more failures.
Lora is effective for background tags such as beach, forest, city, and outdoor, which tend to display realistic images due to lack of learning.
■The quality is higher and there are fewer failures if you do not use lora.
If you are okay with the occasional realistic image, you do not need to use it!
Ultimately, not putting anything in the negative prompt opens up more possibilities!
The sample images also contain only the minimum number of tags to preserve the diversity of the base model.
I'm getting pretty used to inference!
■This lora was created for my own experimental purposes.
I have never created a lora for SDXL, and playground lora is not shared much, so this is a test to see if I can create one.
I will also share my training config for reference.
■I have also updated my comfyui workflow.I like this one these days.
Lora is also configured.
kohya Deep Shrink is great because it can generate 1024px!
I like "dpmpp_2m+Deep Shrink".
There are a lot of distortions, but it's very detailed and fun.
■I will also share my subjective sampler criticism.
dpmpp_sde: Perfect, closest to the dataset, and has few distortions, but it's a little slow.
dpmpp_2m: Rough, dynamic style, and fast, I love it, but there are many distortions, so I use it if the overall atmosphere is good.
euler_a: Most stable, close to the dataset, and fast. However, I have the impression that the images generated are often boring.
■I will also share my favorite settings.
Stable quality
"dpmpp_sde+Deep Shrink" dowmscale-factor:1.5 768x1152px
The resolution of x1.5 is less likely to break down.
Atmosphere-oriented
"dpmpp_2m+Deep Shrink" dowmscale-factor:2 1024x1536px
Create high-resolution, attractive images, even if it means they will fail.