I can guide you with the best practices, tips, and tricks to achieve your goal in your Stable Diffusion training. Take a look at my Patreon, subscriptions for just $5. Exclusive content. Patreon. :)
About:
This is my first LoRA here at CivitAI. Its purpose is to (try to) replicate the anal missionary position in a POV angle. Be careful, as it sometimes generates body horror and disembodied penises.
It was trained in 768x768, using captions and with 160 high-resolution images. I plan to try to improve it in possible new versions.
The demo images of the model aren't edited, but they are cherry-picked and generated using hires-fix (0.40 denoising).
Suggested Settings for Inference:
Model: lazymixRealAmateur_v40 (but works with almost any)
Positive: beautiful nude woman, natural skin, busty, huge breasts, brown hair, bob cut, anal, lying on back, short hair, man, eye contact, penis, pussy, angry expression, pov <lora:MissyFold_sd15:1:0.6>
Negative: bad-hands-5 verybadimagenegative_v1.3, polydactyl
Sampler: DPM++ 2M Karras
Steps: 28
CFG: 5~6
Resolution: 512x768
Note¹: As you can see, the LoRA tag has two weights: <lora:MissyFold_sd15_by_IsnAI:1:0.6).
This is a feature of the A1111 (which many people aren't aware of), but you can set separate weights for the UNet and the Text Encoder. The TE is always a pain in the ass to get the training right, so it's usually better to reduce its weight in the inferences rather than having it undertrained.
Note²: You can get more image variations if you also tweak the resolution a bit. In my tests, the best photos were in portrait format. In landscape format, they don't turn out very well because it shows too much of the feet, and in the dataset, there are hardly any examples of feet, so the model didn't learn very well in that aspect.
Triggers:
Main: woman, anal, lying on back, man, penis, pussy
Tags Used on Training: hands, fingers, teeth, open mouth, half-closed eyes, testicles, indoors, pubic hair, outdoors, eye contact, pov, legs up, feet, smile, arms up, closed eyes, panties aside, tongue, fishnets, fingering, from top, shirt, arm up, deep penetration, thighhighs, oiled, tattoo, penis grab, looking down, spread pussy, glasses, sex toy, leg up, tongue out, clenched teeth, licking, pants, petite, gloves, cap, lips
Description
FAQ
Comments (20)
I found that if you weight anal and pussy it consistently makes anal. If I didn't it was probably one of the easiest Loras to generate good vaginal missionary with. Great job!
Thank you so much! Congratulations on the discovery! I had no idea this was possible. This is one of the best parts of sharing models, people always find other interesting ways to use them. 😊
Best NSFW Lora I've seen in a long time, I hope you'll make more! 10/10
very good, worked quite good in first try with my other loras. Most come out weird. Keep it up, hope you do more Loras, wheater SFW or NSFW stuff.
Thanks a lot! The next version will be more stable to work along with other LoRAs! 😊
Thanks!
Sure, I will try to make it! 😊
Post your LORA training kohya/other settings?
How consistent are the training images? How many?
hello! sorry, but it's an industrial secret 🤭
Big fan of your LoRAs! Quick heads up, though, that you've got the ordering backwards on LoRA weights in Automatic. When you provide two weights, the first weight is the Text Encoder, and the UNet is the second. In your prompts, you're reducing the UNet contribution of the LoRA, not the TE.
You can confirm this with the alternate input syntax. <lora:MissyFold_sd15_by_IsnAI:1:0.6> and <lora:MissyFold_sd15_by_IsnAI:unet=0.6:te=1> will generate the same output.
I hope that helps if you've been trying to, reduce weights the other direction during training or rebalancing otherwise and getting the opposite of what you expect. You might be able to change the training meta and get a balanced training upfront, although I've had very inconsistent results trying to do that either way.
Hey! Thanks!
Omfg, I've been using the order of the multipliers wrong for months (and also passing on the wrong information about it 🥲).
This completely changes the approach I need to take during training, because as I decreased the second weight and the LoRA worked better, I was always adjusting the TE parameters. 🤯
This probably even explains why sometimes I had some inconsistencies that I didn't understand when I modified the TE parameters. 🤔
Thank you so much for this information! It's going to help me a lot!
Btw, you did me another huge favor, In the A1111 LoRA code you showed me, I realized that it's also possible to tweak the neural network's dimension. I did some small tests here and noticed that my LoRAs are oversized. I'll be able to cut down the network dimension to 1/3.
Thanks a lot, once again!
Glad to hear! It's hard to find good information about detailed training approaches. I'm surprised Automatic1111's documentation doesn't include this, I wonder what other hidden gems are out there.
In case it's helpful, unfortunately there's not always a super clear alignment between adjusting the weights as applied, and the weights in training (as the ongoing text encoder training carries forward and affects the impact to further UNet/TE adjustments), so apologies in advance if I've just given you one more meta parameter to begin re-tweaking obsessively.
Interesting idea on adjusting down the dimensions after creation. I hadn't considered that, but am definitely going to try it. Picking the right size is one of the harder things to get right and it makes a huge difference.
@thegipper yeah! In Kohya itself, there are some parameters that don't show up anywhere in the docs. Also, the documentation in the repository is a pretty bad translation of the original japanese content.
Hahaha, don't worry, I'm always on the lookout for the SOTA, so the more I know, the better. Just yesterday, after reading your message, I ran some tests, and the TE almost has no influence on the inference, with minimal difference between 0.1 and 1.
Without a doubt, one of the most subjective parameters is the network dim, and I still haven't quite figured out how network alpha works to this day.
I left the network dim at 32 because it gave me the best results with the set of parameters, but now I need to review and test again, taking into account the unet issue, which is causing overfitting, and I thought it was the TE 😅.
My understanding is that despite people's weird semantic descriptions of what Alpha "does," for normal LoRA's it's literally just a learning rate multiplier (well, Alpha / Rank is) that dampens the applied learning rate at each step. I think Alpha does affect one or two other properties of the LoRA that the LR doesn't (like the initialization vector weights), but I think those are quite minor compared to the impact of the ((alpha/rank) * LR) scalar, and that if you trained a Lora at 64_32 (dim/a) at 1e-4 LR versus 64_16 at 2e-4 LR you should expect a very similar result (I might actually try that, it seems interesting to know).
The one big thing I'm not clear on is whether the alpha/rank scalar is applied to both training learning rates (UNet and TE) or just the UNet. I think in theory it should apply to both, but this level of detail is super hard to figure out without digging into the code directly given the overwhelming noise-to-signal ratio of useless information about Stable Diffusion out there.
@thegipper hmm, interesting, so the last time I read about alpha the info was wrong, especially the proportion, because they talked about Rank / Alpha, but I never got to do any tests.
Now with this info you gave me, I got a bit intrigued, and I'm also gonna test Rank and Alpha and adjust the LR according to them.
After I tested the TE multipliers and the Unet in the right places, I kinda got convinced that TE doesn't have a big influence when generating the image, considering the weight (0.1 ~ 1.0 as I said), so if it doesn't fry, there shouldn't be major changes whether Alpha is applied or not.
I tried to do some digging and follow the path of alpha when it's defined, stopped for now at network = LoRANetwork from lora.py (Kohya).
@IsnAI You might want to try increasing the TE learning rate until you're seeing an effect, just so you'd know, but TE impact is really hard to get definitive clarity on. I agree that I see minimal difference when going from 0->1.0 with this model, but it's interesting when you bump it up to 8.0 or 16.0. I've been getting significantly cleaner draws with <lora:MissyFold_sd15_by_IsnAI:8.0:0.6>, but also less creative ones.
Seeing how successful your model is at not overfitting the subject, though, while having less contribution from the TE at 1.0 is making me rethink if I have been getting this backwards 'til now and overtraining the text encoder.
I'm trying to tune XL training right now, and on some models moving the TE from 0.1-1.0 barely touches the output and other it's like half of the effect. My existing gut intuition, which is partially justified from experimentation and partially inferred, is that the Text Encoder when tuned right does 2 things.
The first (and most experimentally verified) role is just making sure that generated images have enough of the starting framing that the UNet portions actually activate. When I used to train without the TE, it was common to add a LoRA and activation word and get a draw that didn't include the concept whatsoever without other words that would make an image where the concept was relevant. That certainly doesn't seem to be a problem with this model, although I'd be curious how it worked with no text encoder versus a low text encoder.
The second (and possibly illusory / confirmation bias made-up) aspect of the TE that I've experienced is that if it's tuned right it seems to be the key to generalizing pre-existing knowledge rather than retraining it. There's some combination of factors that will sometimes result in the model being able to reapply what you trained it, and I think that magic is in the TE encoding somewhere but is really hard to aim at.
One of the most bizarrely successful attempts I got there was this model, which was trained only on face references but generalizes successfully to other contexts super reliably. The generalization only really works when the TE is included, and the more the TE is driving the effects in the "normal" case, the seemingly stronger that sems to be. It's possible I'm over-reading into that as a factor, though.
@thegipper wow, srsly, I just didn't realize the most obvious thing at all, indeed, as you said, if it's possible to increase the TE's weight to this magnitude, then maybe it's undertrained.
However, as you also pointed out, maybe that's precisely why the model works well.
I'm having trouble finishing the second version of BJ Pov, so I'm tweaking the parameters a lot after the new info. It's being pretty annoying trying to make LoRA understand the concept of 'holding' the tool.
Atm is generating between 30~40% satisfactory images, and the rest the hands always come out with too many fingers (or alien bizarre hands with 3 fingers), or both hands hold the tool in an intertwined manner, but the fingers merge.
This situation is very weird, given that with much less effort I managed to achieve this in the "Dick Pic" LoRA.
Hmm, about XL, I know nothing at all, and I don't have much intention of adopting it, since 1.5 perfectly meets my goals (with some stubbornness on the part of the TE haha).
XL has many variants, which makes it difficult to create a broader content, as the audience is very fragmented. Not to mention that it's kind of restrictive in terms of hardware, while there are people almost running 1.5 on a toaster.
Wow, your LoRA of the rubberbands is incredible, you simply achieved the biggest goal of all in fine-tune and trainings, making the neural network generalize very well to absolutely new and never seen data. And from what I saw in the metadata, the dataset is quite simple. You made a jackpot with this one, I think I'm going to study it to see if I can understand the formula of this generalization.
Also, I'm also going to look closely into this thing of the TE maybe being the key for a good generalization.














