I was planned to trained this lora a long time ago and while I was working on dataset the other similar lora has been released. Anyway I am releasing it, so now you have a choice.
1. Main action variations
There are few concept variations it was trained on:
1.1 Regular Facefuck
Tag "facefuck" in the beggining and the main action is described with the next phases:
A man is facefucking a womanA man facefucks a womanA man is fucking a woman's mouthA man fucks a woman's mouthA woman is being facefucked by a manA woman's face being fucked by a man1.2 Impaling
Tag "impaling" and the next phrases:
With his hips remaining stationary, a man fucks a woman's face, impaling her throat with his penis. A man stands still impaling a woman's face on his penisA man forcefully drives woman's head up and down on his erect penis."A man thrusts woman's face into his crotch, making her take his penis down her throat.1.3 Excessive pressure
Train files were martked with tag "excessive pressure" and with the next phrases:
The man pushes his penis into the woman's mouth as he holds her head firmly in place.The man forces his penis deep into the woman's throat while keeping a strong hold on her head.2 Insertion control
I have included "half-insertion" and "full-insertion" tags captioning corresponding samples. Also I have added natural text for that as well. For example
The woman takes the entire length of his penis down her throat.The woman takes only half of the man's down her throat and can't take it deeper as it hits the back of her throats.But here I was less systematic, so it not reliably works.
3 Framing control
To contol distance to the camera I have used tags "close-up", "medium shot", "full shot". And looks like full-shot does not works fine, because there was a lot fewer samples for that kind on shot.
4 Movements intensity
I have used the next tags for movements: "slow movements", "fast movements". Samples with average speed do not have any speed tag.
"slow movements" tag I sometimes extended in natural languare with phrase "their movements are slow and deliberate".
"fast movements" tag sometimes was extended with "his movements are fast and forceful"
5 Positioning
There are three main positions which I added to the train data. For each position we have a tag at the beggining of the caption with extended with natural language description
On all fours
tag: "on all fours"
example description: "The woman is positions on all fours", "she is on all fours"On stomach
tag: "on stomach"
example description: "The woman is lying on her stomach an a couch/bed/table"On squats
tag: "on squats"
example description: "The woman is sitting on squats with her legs wide spread"On knees
I didn't added keyword tag for kneeling, but it works with natural language description
6 Bonus
Here is a wildcard template I used for testing. It is not exectly follows the principles how I conbstructed the captions for training, but I am too lazy to write a lot of templates. Maybe I will write 3 templates for each action. But now I dont want to.
{medium shot|full shot}, {impaling|facefuck}, full-insertion
A man forces fully drives a woman's head pushing his penis into the woman's mouth. He holds her head firmly in place, while she is gagging taking the full length of his penis. The woman's {blonde|red|bunette} hair {tied back in a ponytail|is messy|cascading down her shoulders} and she is wearing a {black top and gold jewelry|nothing|white crop top|dress}, while the man, partially visible, is naked. {Their movements are slow and deliberate|their movements are fast and forceful}. Her {small|massive full|massive sagy|medium sized} breasts are {hidden under a lacy bra|and swinging from side to side|prominently displayed}. {Her mascara is smeared|she has huge cute eyes|she wears a leather choker}. She has {pale|fair|tanned} skin, while the man has {black|pale|fair} skin. The scene is set {in a dimly lit room against a concrete wall|in deep green forest|on a sandy beach, they bodies are shiny from sweat.}Description
rank16 pruned version created from trained rank32 version
FAQ
Comments (17)
Looks great! Hope to try it some day if I have enough buzz. Can't wait to see what others make with it!
<3
Bro. As a connoisseur of Hunyuan LoRas, having trained a few hundred myself, I have to hand it to you.
This LoRA is exceptional.
I have two folders with failed checkpoints for a FF LoRA that I never perfected.
Now I have no need of fucking with any of that, because this is the one I wanted all along.
Seriously, great job! Thanks!
Thanks. I have spend a lot of time on it. The dataset is 200+ videos and 200+ images.
During training it stopped improving, so I twice interrupted training, resuming it with diffrent parameters.
If you want to bonify the full body view more, put it in it's own folder and increase repeats. More exposure will give you better representation.
I understand that balancing is a good thing generally, but it my case full body shots are mostly still images, so adding more repeats mean worse movements.
@AstroWeasel oh! yeah ok, don't increase repeats on those!
I did briefly theorize that still images might be a very good thing early in training, say for the first 10% of epochs, following by learning exclusively on video, as a way to lock in the pose cheaply and then expand on movement later on. I never got too deep into trying it though.
@az420 From my theoretizing, on the contrary, still-frames can help at later stages of tranign to add detailization. Because for images we can afford much higher resolution. In my case I used 1280x720 images. Actually from my last experience I would rather trained first half of time exclusivelly on videos and only then included images after making sure the movements works well.
@AstroWeasel Interesting! Did you ever try the opposite? I suppose there's logical ways it could go either way. Like when refining, will loss not go in favor of motion because the diffs are against stills? nudging it towards a lack of motion? or would low resolution videos hurt details and quality while refining, even if training on motion... I suppose you found something that works for you though, so that's cool.
@az420 I have tried two things:
1. low res videos only
2. low res videos + high res images.
I don't like low res only traininig because sometimes I have seen checkers patern on inference, and often the details were bad. I have done a lot of experiments before training my previous "anal" lora and after lowres training it often used "a wrong hole" and stick penis somewere in the middle. Adding images helped with it.
Low-res videos + hi-res images works better on my opinion, but I think there is a lot of room for improvemets of motions. For example adding additinal tag for images helps to preserve motions. Not 100 percent, but helps. For current lora I added tag "still frame" for all images. Now I wanted to test different training schedules because when I sample each epoch of lora I noticed that good motions appear much later than good overall scene, so I think about the next schedile:
1. very lowres video training to get general understanding of motions
like 320 X 192, 224 X 224 or so
2. increasing video resolutions a bit (as many as we can without getting OOM which in case of 24 gigs it is about 448 x 256) + highres images.
3. when quality is not improving decrease learning rate and train a few more epochs to finalize. I have checked the this third point with this lora.
Maybe I overcomplicate things...
@AstroWeasel No that sounds reasonably. I like 256x256 and tend to find the details are fine, but I also augment my set with close-ups and caption them as such. That way you don't need high resolution videos to capture details. The close-ups can be tighter crops of existing videos btw, just make sure they have fewer repeats than whatever the default framing should be.
@az420 makes sense. I will try
@AstroWeasel I applied this for my BDSM loras. The bindings were just not detailed enough in the full body clips. I set aside a folder called "close" and put just the hand or foot and captions "close-up of a restrained hand <binding type>, part of a restrained pose"
to drive home that these were not the full concept. Since no one will ever prompt for "part of a restrained pose" the close-ups serve only to improve details. I am quite convinced it worked well.
Im having trouble invoking the angles where both the woman's eyes are visible, and getting only the pure profile view angles. I welcome advice.
the training data mostly consists of sideview shots. So it is almost imposible to get a good front view or POV. The only thing I can recommend is to avoid "close-up" tags, as close-ups are 100 percent side views. "medium-shot" tag may help a bit. Also you may describe the look better. For example "the woman looks direcrtly at the viewer with her eyes wide open". Also play with defferent seeds, I have played with different prompt constructions with fixed seed and noticed that some seeds are define overall composition very striktly and with almost all changes of prompt the overall compoosition on poses is almost the same until the seed is changed.
@AstroWeasel I mean only to get angles as shown in the samples, not full frontal angles on which you did not train, I understand that.
For example, the woman on the beach in the samples, I've had trouble getting that angle with I2V.
I cannot get motion out of this lora in framepack, but it is usable to keep penis-mouth position correctly if combined with other motion loras.