CivArchive
    WAN 2.2 I2V - POV Cowgirl - LOW v0.2
    NSFW

    UPDATE 9/1 v1

    Updated training settings should give even better movement.


    ---
    UPDATE 8/19 v0.2

    I've updated the dataset & training settings so that movement is more preserved. It works much better even without the general NSFW lora, give it a try and let me know how it goes for you.
    ----

    My attempt at POV cowgirl for WAN 2.2. It works pretty well (doubly so if you also use the general NSFW lora), though the girl's hip motion can be less aggressive than I'd like.

    This is only my second ever video lora and I'm training locally on a 3090 so it's slow progress. I'm still tweaking the training settings and the training data to hopefully show more hip movement. Give it a try and let me know your thoughts!

    Description

    Updated dataset, tweaked training parameters

    FAQ

    Comments (12)

    jlar843Aug 19, 2025
    CivitAI

    So it's possible to train an i2v LORA on a 3090?

    TwoMoreLurker
    Author
    Aug 19, 2025

    Yes, though it takes a super long time

    bobbysmithy55555426Aug 27, 2025· 1 reaction
    CivitAI

    Can I ask what the dataset for this was like? Interested in training my own video loras, want to know how much effort it would be.

    Also what tools you're using for training generally speaking, if you're feeling generous with the info :)

    TwoMoreLurker
    Author
    Aug 28, 2025· 3 reactions

    @bobbysmithy55555426 Sure, I'm happy to share.

    The dataset for this specific Lora is 22 clips of ladies doing their thing: 14 clips are real woman and 8 are anime*. All are roughly 5 seconds long, and they are all trimmed to 256x256 pixels. There are probably professional tools that can trim videos, but I do it the caveman way by playing the clip and using the builtin Windows screen capture tool to capture a 256 square section of the screen. (This also ensures that the clips are all of the same framerate).

    They are then captioned. I haven't found a good AI tool that will accurately caption nsfw stuff yet so I do this manually. It sucks, but is very important. You describe who is doing what, where. I'll post an example in the next comment.

    I then use diffusion-pipe to train the actual lora. I mostly use the default settings in the WAN low-noise config.

    I've experimented a little and I've gotten the best results by doing a two-step training, initially at a lower learning rate then cranking it up for the last ~40% of training. For high noise I'll do about 1200 steps at learning rate 2e-5, then about 800 more at 2e-4. For low noise, roughly 1800 steps at 3e-4, then 1200-1600 more at 3e-5. I train locally with my single consumer-grade GPU, so these trainings altogether take like 12-15 hours.

    I'm sure there are more professional setups, I'm just a guy with a 3090 and too much free time.

    *I particularly like to make anime gens so I try to make sure they're represented well in the dataset, but depending on the act being performed it can be hard to find uncensored clips. For example the handy-bj dataset had none since anime penises are basically always censored.

    TwoMoreLurker
    Author
    Aug 28, 2025· 1 reaction

    @bobbysmithy55555426 An example caption for one of the clips in this dataset:

    ```
    The video shows an Asian woman with long black hair. She is completely naked.

    She is bouncing up and down, having cowgirl sex with a man, her large breasts bouncing. The view is POV. His penis is sliding in and out of her vagina. The woman is aggressively moving her hips up and down, slamming the penis inside her pussy. The penis goes completely inside the pussy, then comes back out. They are having fast, rough, intense sex.

    The video focuses on her pussy, breasts, and face. As the video progresses, she covers her mouth with her right hand.

    The background has bookshelves lined with books, and a wall with many posters and pictures, indicating that the scene takes place inside an apartment.
    ```

    I am very lazy, so for the most part I keep the description of the action the same for everything, and just quickly change the description of the actors and the background for each, as well as anything "unusual" that happens (like in the above where she covers her mouth with her hand).

    @TwoMoreLurker Awesome, thanks so much for the detailed breakdown! Really appreciate it, trying to push through the activation energy of getting a workflow set up for training video loras and this helps a lot.

    I'm also just a guy with a GPU and some spare time so these details are sort of more interesting to me than the professional setups :)

    mrdionSep 1, 2025

    @TwoMoreLurker 256x256 is enough to generate clean 480 and 720?

    TwoMoreLurker
    Author
    Sep 1, 2025· 1 reaction

    @mrdion It works well for 480p, my GPU can't handle 720p so I haven't been able to test it.

    emilkwokSep 2, 2025

    @TwoMoreLurker Hey there! The results on your Lora are amazing, really great work! I'm also a Lora training enthusiast, and if you don't mind, I was hoping I could ask you a couple of questions:

    1.What's the purpose of your two-step training ? I noticed you mentioned training first with a low learning rate, then switching to a higher one. My understanding has always been that you set the learning rate to reach the global minimum loss. Does your two-step process have a special purpose or achieve a specific effect?

    2. How exactly do you implement this two-step training in diffusion-pipe? My guess is that you train with a low learning rate up to a certain point, stop the training, and then in diffusion-pipe, you load the Lora file from the first stage to start the second stage with the higher learning rate. Is that the correct process? If not, could you walk me through your specific steps for transitioning from the first to the second training stage n diffusion-pipe?

    Thanks for taking a look. I look forward to your reply!

    TwoMoreLurker
    Author
    Sep 3, 2025

    @emilkwok
    1. Honestly it's all vibes. The first time I did it I trained at a higher rate for a first try, but it didn't turn out the way I wanted so I tweaked the dataset and (because I am lazy and didn't want to retrain from 0) I tried resuming from about halfway with a higher learning rate to "offset" the old data and it turned out well. So I just keep doing that and it seems to work. I have absolutely no scientific or mathematical reason for doing it this way, maybe it's worse than just leaving the rate the same the whole time :)
    2. That's exactly right. I save a checkpoint every 10 epochs, and I manually restart training with a --resume_from_checkpoint command and an updated settings.toml

    emilkwokSep 3, 2025· 1 reaction

    @TwoMoreLurker Thanks for getting back to me! I'm looking forward to your future work. I hope we can chat more about Lora training techniques and tips down the road.