Example prompt:
missionary sex, POV overhead view of a woman lying on her back with her legs spread having sex with a man. She's on the couch. She is moaning with pleasure. A man is thrusting his penis back and forth inside her pussy rapidly at the bottom of the screen. The camera is zoomed out and holding steady. Her auburn hair is long and curly.
Version 1.2 Updates:
I used the same training videos as before, but this time I blurred the faces. I hope this created a non face altering version, and it definitely does seem to work better with character loras, and the facial variety also seems a bit better.
I used https://github.com/ORB-HD/deface to blur the faces in the training videos, and then added "woman with a censored and blurred face" to the captions.
Version 1.1 Updates:
I added a few new videos to train on, zoomed in more on the movement itself to hopefully train that a bit better. I also lowered the learning rate to 5e-5 and bumped up the number of repeats to 30.
The result seems to work much better at lower strengths and hopefully better with character loras now.
Note that this training took 8 hours compared to v1.0's 1.5 hours. There's probably some sweet spot for learning rate and training time to get good results, but it would take more experimentation to figure it out.
I don't feel comfortable sharing the exact mp4s I trained on, as they were just ripped from online sites and I don't really have the rights to distribute them. However, I will include my training data for the config files and captions so that other people can more easily get into training. I was surprised at how quick and easy it was.
I included an example workflow in the training data download (I can't find a better way to upload the workflow), which shows how to make it work nicely with multiple LoRAs, and has dynamic prompt support.
v 1.1:
I trained on a 3090 using 11 3 second videos (24 FPS, at least 50 frames each) and it took around 8 hours to do 20 epochs with 30 repeats.
v 1.0:
I trained on a 3090 using 8 3 second videos (24 FPS, at least 50 frames each) and it took around an hour and a half to do 20 epochs with 10 repeats.
Description
Trained with a few more images, some zoomed in on the movement itself, and at a lower learning rate.
It now seems to work better at lower strengths and is probably better with character loras.
FAQ
Comments (16)
Any chance you can do one for anal? There aren't any loras for it yet :(
V2 is a clear step forward. The movement feels much more natural now ;)
What a time to be alive!
How much VRAM it needs btw?
Has anyone tried making these in stereo (cross eyed) yet?
are you planning to update this? What I would like to see is to include different angles. You nailed the center front view. Now seeing it from different angles would be a nice upgrade!
Also the Hunyuan model seems to have such broad knowlegde of poses and movements that it would produce amazing results. I'm trying it with the laying cumshot lora right now and depending on the location it produces very interesting scenes tht work very well. Man, I'm so used of problematic finetuning of Flux and SD3 that I'm impressed every time how well this is just working =D
Hey, Great Lora! May I ask what soft you used for training the lora: Diffusion-Pipe or Kohya-ss Musubi-Tuner? Do you think 3090 and 32 ram is enough for training from videos?
I have an interesting observation. I have been generating at lower resolutions to save time. The 1.0 worked fine, but 1.1 was all broken. However, I mixed the two with 1.0 at full strength and 1.1 at ~0.4. The results are great!
Hey! It's back. Dunno why this model was gone for a day or two.
Where has good guide how to train models?
Hey, i think it was removed because of the hunyuan license. Maybe you need to censor the showcase videos. The nsfw lora was also removed https://huggingface.co/TheYuriLover/HunyuanVideo_nfsw_lora
but its back on huggingface without images.
Your lora is great! , can I have a footjob with lora, thank you very much!
Just noticed you put the dataset.toml in the zip, thanks, gotta try to copy your setting on my next training and see if I finally get somethhong 'moving' :p
EDIT : I see so just [1,50] in frame_buckets, the interesting part was 30 num repeat, (I only had 5, probably gonna try more in my next run and see where it goes) now one more question remain for me, does having video training clips in a dedicated video folder (with corresponding path in the .toml file) is needed or putting all in the a default images folder (images plus videos alike) is the same deal ? because I put everything in the same folder since the start
How is the LORA, with diffusion pipe or other? formed, what are the necessary PC capacities?
Where is the workflow json file in the training data download? Was it removed in an update?
What are you using to generate captions for videos?