CFG = 2 for precise movement
CFG = 4 for creative but not precise results
LCM for high quality and polished + Hiresfix
#example
Pommel_horse , young , uniform_gymnastics
bad quality,worst quality,worst detail,sketch,censor,realistic , 3d
#Don't do it
Don't use a long prompt, just put what you want or what you are replacing.
Description
FAQ
Comments (5)
what is the difference between the different versions?
Each number represents a radical improvement, either in network size or better training data. The number after the decimal point, for example, 3.1 or 2.2, represents the same network, but with fine-tuning, mostly to convert the realistic style to an anime style or improve the quality of the output while preserving previous learnings. The same trained network and settings are used. SNR is a training method to speed up approximation, but it's less accurate, but the results are beautiful. Large is a giant network of 512 nodes, keeping only two important layers and removing the rest to speed up training and reduce size. 3.1 is best, but if you prefer something smaller and the same, 3.0 is fine. Currently, I'm still experimenting with new and better methods.
@Achref_Arts Interesting. I mainly use Onetrainer for training. Do you use Kohya then, or some cloud thing? I know there's quite a few options in Kohya, which is a big part of why I avoid it. I find the number of options a bit intimidating. Plus when I initially tried to use it to make a lora, it kept giving me OOM. Have you tried masking in your training at all? I think it helps a bit to mask out the backgrounds. Especially if your backgrounds are not simple. Cuz otherwise, you end up with a lot in your training that's purely arbitrary (assuming you're training character-type loras).
@Lazman @Lazman Sorry, I didn't see your comment. The site isn't sending me notifications, and I don't know why.
Yes, I use Kohya on Kaggle with a free account.
They give you 30 free hours per week, 12 hours for each session. If the training is long, I use --network_weights to complete the Lora training.
Yes, masks are good if the data is small or you want something specific, but I use a better method. I train on images, even if they're of poor quality. Then I create new images using Lora, and through text prompts, I can output the images I want. Then I complete the training on top of the first Lora I trained.
The results are amazing.
I also use one or two words together for training. Not only does it speed up training from 100 steps to 40 steps, but the prompting also makes it very easy and consistent.
I also don't train on 1024, but on 760. Not only for speed, but I've noticed that the image quality is much better because the original model compensates for the missing elements.
And most importantly, using the appropriate checkpoint is very important. I use rouwei_07_base_440k It's the best checkpoint for training
@Achref_Arts "Then I complete the training on top of the first Lora I trained."
Are you saying that you use the new images to fine-tune the first lora, or that you use the new images to make a second lora? First option may be better for generalization, however, I'd argue that second option would be better in terms of quality.
Of course there are some nuances to that as well. For one, if the subject is realistic, I'd try not to use any AI generated images if possible; just cuz of the flaws that are inherent to AI, so I'd rather not risk compounding them in the final results.
If anime, things can be stretched. I mean, AI can do really damn good anime.
I mean, I've been going for trying to turn anime into realistic for some time, but there is something to be said for just getting the perfect/near perfect images the first time cuz the AI isn't struggling with as many different concepts.
"I also use one or two words together for training."
Better for training speed, idk about generation diversity.. One or two is fine for a style lora, but with so few even on a single character/outfit lora, idk.. I couldn't see it converging as well as it would with detailed and accurate prompts.
Cuz then the AI sees every image as being literally one or two things which include every detail within the given image.
"I also don't train on 1024, but on 760"
Eh.. I train everything at 1536. Maybe the smaller res might work well for anime, or low detail stuff, but what if you want eyelashes on a realistic image of a person(just for example)? Cuz those smaller res won't do well. Even 1536 is pushing it for such details.
Actually, this even goes for anime. That's why SDXL can produce perfect characters up close, but then blobby faced weirdies if they're back from the viewer more than 10(ish) feet.
And here's the thing, due to the immense cost of the hardware involved, most people can't afford to train full models. That's why most of what we see are merges and fine tunes. But if everyone only makes Loras with 768 res images (or worse in some cases, lower res (quality) images downsized to 768), then when models get merged with peoples Loras, the quality never improves, and that's why, even after how long SDXL has been out, you're still seeing better results with 768 than 1024.
That, and you're probably also following other mainstream advice, such as using the base sdxl model.
That model has been improved upon in so many ways with fine tunes and merges. So it's a waste(imho) to not take advantage of the best SDXL based model/s you can get your hands on. WaiNSFW, plantmilk, pornmaster, realitymaster, and a handful of others are the best I've found, and illustrious has some amazing potential, it ditches the score tags in favour of improved prompt adherence. It's mainly known for anime style, but with some good Loras, and the right models, it can do realism that could contend with flux.
"I use rouwei_07_base_440k It's the best checkpoint for training"
Ok, I guess I got ya wrong regarding model use. Tbh,I haven't even heard of that model, but I'll have to give it a try. Is it best for anime, realism, semi-real, all of the above? Is it sdxl, pony, illustrious, or noobAI?
But yes, I also agree, best checkpoint is a must.













