CivArchive
    Grinding - V 2 - Alpha
    NSFW

    Update on pussyjob alpha - After testing, I'm not able to get cum shooting, only post-cum shots. I realized that some of my prompt captions were messed up. I am retraining now.

    She's grinding on you... try not to blow your load!

    Wan T2V, 14B

    Prompting:

    A woman is straddling a man, engaging in sexual intercourse.
    The man is laying down on a bed. The woman is grinding on top of him.
    The scene is shot from the man's point of view.

    Prompting for v2 - alpha pussyjob:

    A woman is straddling a man on the floor. Both of them are naked.
    The woman is rubbing her pussy on the man's penis. The man's penis shoots a stream of cum as he ejaculates.
    The scene is a medium shot from the man's point of view. Realistic.

    Cfg: around 3.0 seems to work well

    Training:

    This lora was created using diffusion-pipe on a 4090. Mostly default settings. See toml files inline below.

    I trained using 57 clips @ 256x256, 24fps, 48 frames each. The clips were extracted from 6 longer publicly available vids @720p-1080p resolution each, using davinci resolve (free). I trained for 60 epochs and tested. Results were ok, but not spectacular. I let it run overnight for a total of around 10 hours of training and landed at 160 epochs with the results you see here.

    I captioned each clip using guidance from this article. I also refined my understanding of captioning through discussions with it's author (thanks @ComfyTinker!)


    Captioning:

    Each file was captioned manually with something like this example:

    A woman is straddling a man, engaging in sexual intercourse.
    
    The man is laying down on the bed. The woman is grinding on top of him.
    
    Their faces are not visible. The woman and man are naked. She is fit and has large breasts. She has long, flowing, blonde hair. A green wall is visible in the background. The woman is lit from the side by bright, natural light.
    
    The scene is shot from the man's point of view. Realistic.

    Notice that I did not use a made up keyword! Major learning for me here was that with wan/hunyuan we're not training the CLIP model, so using made-up words will result in a 'conceptual' (ie: always applied, regardless of prompts) lora, rather than a targeted lora that responds to prompts. This is because we aren't able to add new terms to the CLIP model with current training methods, so it either drops the made-up keyword or does something else unknown with it.

    Other learnings: I had previously trained with ~50, 128x128 clips and used the keyword gr1nd1ng. The female motion result was great! The male was a jumbled mess, likely due to the low resolution.

    Feedback and questions welcome!

    config.toml:

    # Dataset config file.
    output_dir = '/mnt/d/Projects/video-training/grinding/output'
    dataset = '/mnt/d/Projects/video-training/grinding/dataset_256px.toml'
    
    # Training settings
    epochs = 200
    micro_batch_size_per_gpu = 1
    pipeline_stages = 1
    gradient_accumulation_steps = 4
    gradient_clipping = 1.0
    warmup_steps = 100
    
    # eval settings
    eval_every_n_epochs = 1
    eval_before_first_step = true
    eval_micro_batch_size_per_gpu = 1
    eval_gradient_accumulation_steps = 1
    
    # misc settings
    save_every_n_epochs = 10
    checkpoint_every_n_epochs = 10
    #checkpoint_every_n_minutes = 30
    activation_checkpointing = true
    partition_method = 'parameters'
    save_dtype = 'bfloat16'
    caching_batch_size = 1
    steps_per_print = 1
    video_clip_mode = 'single_middle'
    blocks_to_swap = 15 # 10 was too low and caused too much swapping/slow training (180s/step vs 25s/step)
    
    [model]
    type = 'wan'
    # 1.3B
    #ckpt_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan2.1-T2V-1.3B'
    # 14B
    ckpt_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan2.1-T2V-14B'
    transformer_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan2_1-T2V-14B_fp8_e5m2.safetensors' #kijai
    vae_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan_2_1_VAE_bf16.safetensors' #kijai
    llm_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/umt5-xxl-enc-bf16.safetensors' #kijai
    dtype = 'bfloat16'
    timestep_sample_method = 'logit_normal'
    
    [adapter]
    type = 'lora'
    rank = 32
    dtype = 'bfloat16'
    
    [optimizer]
    type = 'adamw_optimi'
    lr = 5e-5
    betas = [0.9, 0.99]
    weight_decay = 0.01
    eps = 1e-8

    dataset.toml (stolen from hearmeman's runpod, but mostly default values from tdrussel):

    # Resolutions to train on, given as the side length of a square image. You can have multiple sizes here.
    # !!!WARNING!!!: this might work differently to how you think it does. Images are first grouped to aspect ratio
    # buckets, then each image is resized to ALL of the areas specified by the resolutions list. This is a way to do
    # multi-resolution training, i.e. training on multiple total pixel areas at once. Your dataset is effectively duplicated
    # as many times as the length of this list.
    # If you just want to use predetermined (width, height, frames) size buckets, see the example cosmos_dataset.toml
    # file for how you can do that.
    resolutions = [256]
    
    # You can give resolutions as (width, height) pairs also. This doesn't do anything different, it's just
    # another way of specifying the area(s) (i.e. total number of pixels) you want to train on.
    # resolutions = [[1280, 720]]
    
    # Enable aspect ratio bucketing. For the different AR buckets, the final size will be such that
    # the areas match the resolutions you configured above.
    enable_ar_bucket = true
    
    # The aspect ratio and frame bucket settings may be specified for each [[directory]] entry as well.
    # Directory-level settings will override top-level settings.
    
    # Min and max aspect ratios, given as width/height ratio.
    min_ar = 0.5
    max_ar = 2.0
    # Total number of aspect ratio buckets, evenly spaced (in log space) between min_ar and max_ar.
    num_ar_buckets = 7
    
    # Can manually specify ar_buckets instead of using the range-style config above.
    # Each entry can be width/height ratio, or (width, height) pair. But you can't mix them, because of TOML.
    # ar_buckets = [[512, 512], [448, 576]]
    # ar_buckets = [1.0, 1.5]
    
    # For video training, you need to configure frame buckets (similar to aspect ratio buckets). There will always
    # be a frame bucket of 1 for images. Videos will be assigned to the longest frame bucket possible, such that the video
    # is still greater than or equal to the frame bucket length.
    # But videos are never assigned to the image frame bucket (1); if the video is very short it would just be dropped.
    frame_buckets = [1, 16, 32, 48]
    # If you have >24GB VRAM, or multiple GPUs and use pipeline parallelism, or lower the spatial resolution, you could maybe train with longer frame buckets
    # frame_buckets = [1, 33, 65, 97]
    
    
    [[directory]]
    # Path to directory of images/videos, and corresponding caption files. The caption files should match the media file name, but with a .txt extension.
    # A missing caption file will log a warning, but then just train using an empty caption.
    path = '/mnt/d/Projects/video-training/grinding/8-256px/'
    
    # You can do masked training, where the mask indicates which parts of the image to train on. The masking is done in the loss function. The mask directory should have mask
    # images with the same names (ignoring the extension) as the training images. E.g. training image 1.jpg could have mask image 1.jpg, 1.png, etc. If a training image doesn't
    # have a corresponding mask, a warning is printed but training proceeds with no mask for that image. In the mask, white means train on this, black means mask it out. Values
    # in between black and white become a weight between 0 and 1, i.e. you can use a suitable value of grey for mask weight of 0.5. In actuality, only the R channel is extracted
    # and converted to the mask weight.
    # The mask_path can point to any directory containing mask images.
    #mask_path = '/home/anon/data/images/grayscale/masks'
    
    # How many repeats for 1 epoch. The dataset will act like it is duplicated this many times.
    # The semantics of this are the same as sd-scripts: num_repeats=1 means one epoch is a single pass over all examples (no duplication).
    num_repeats = 1
    
    # Example of overriding some settings, and using ar_buckets to directly specify ARs.
    # ar_buckets = [[448, 576]]
    # resolutions = [[448, 576]]
    # frame_buckets = [1]
    
    
    # You can list multiple directories.
    # If you have a video dataset as well remove the hashtag from the following 3 lines and set your repeats
    
    # [[directory]]
    # path = '/video_dataset_here'
    # num_repeats = 5
    

    Description

    By request, added pussyjob.


    This is alpha only. Initial results look good, but need to do more testing.

    FAQ

    Comments (16)

    rabelaavxqo549Apr 19, 2025· 1 reaction
    CivitAI

    Very good work; I like it very much.

    civitai7_Apr 19, 2025· 1 reaction
    CivitAI

    Turned out great, judging by the previews. Thanks again for posting the training configs.

    4800689Apr 19, 2025· 1 reaction
    CivitAI

    I didnt tested this lora yet, however, previews are good. I have a request for you, can you also make a lora for Flux 1D for same concept. So that I can use this using I2V model.

    lowcaloriesyrup
    Author
    Apr 20, 2025

    I haven't trained for Flux 1D yet and I'm not planning on training for that model in the near future.

    You can use t2v loras with the i2v model though... Another commenter mentioned they had some success with this lora with i2v.

    skinnamarinkydinkydinkApr 19, 2025
    CivitAI

    Any chance you could upload your training data and training configs?

    skinnamarinkydinkydinkApr 19, 2025· 1 reaction

    Oh!

    I didn't read the whole post.

    Very informative and thanks for sharing so much info!

    Would still love to see the training data though if you don't mind uploading.

    Looking forward to using this extensively... lolol.

    lowcaloriesyrup
    Author
    Apr 20, 2025

    I got burned on another lora when i uploaded my training data (my own stupid mistake). So, I'm not going to upload training data for now. Let me see how some of the other lora creators fare with uploading their data for a bit first, then maybe I'll continue sharing data...

    skinnamarinkydinkydinkApr 20, 2025· 1 reaction

    @lowcaloriesyrup - what does it mean to get burned in this context? I have uploaded my data when asked. I don't want to get burned.

    playtime_aiMay 1, 2025

    @leisure_suit_larry Putting together a data set and captioning it is the part that takes time, effort and work. If you post your dataset, you run the risk of someone else retraining on your dataset and uploading a new version of your lora. That is about the only risk.

    lowcaloriesyrup
    Author
    Apr 28, 2025· 11 reactions
    CivitAI

    Posted this model to tensor.art. I'll be moving all of my content over there... At least until we can find a decentralized means of sharing models that doesn't cave the moment a payment processor whines.

    hboxgames132May 3, 2025· 3 reactions
    CivitAI

    Please a Hunyuan for framepack T.Tpleasse

    lowcaloriesyrup
    Author
    May 5, 2025· 2 reactions

    Will do sometime this week!

    zml_wMay 10, 2025· 5 reactions
    CivitAI

    I really like the creativity of this model, but it doesn't perform well on i2v. I sincerely hope you can come up with a version for i2v.

    lowcaloriesyrup
    Author
    May 14, 2025· 4 reactions

    Thanks for the feedback and for letting me know it doesn't work well for i2v. I'm not currently set up to train for i2v, but I will take a look in the near future!

    BreezyHeezyJun 29, 2025

    If I could offer some advice, I would try creating images that are the same basic camera angle as the sample videos, even if the image you're using is zoomed out, it should "recognize" everything and get to work... I'd assume that images that are too far zoomed out would be problematic on something like this, but, for example, you could use a created image that has the same basic camera angle and say something like "camera zooms in slowly as she...." or something to that effect... because t2v trained loras tend to work quite well on images, generally, even when its anime vs real life stuff so yeah, I'd just try to get the original image generated to match that basic camera angle and keep experimenting with the prompts. Should be good to go. And when you get a prompt that works, save it, and then tweak it with the working copy saved.

    zml_wJul 1, 2025

    @BreezyHeezy Thank you for your suggestion! I'll try. May I ask if you have any useful templates?

    LORA
    Wan Video

    Details

    Downloads
    3,010
    Platform
    CivitAI
    Platform Status
    Available
    Created
    4/19/2025
    Updated
    5/21/2026
    Deleted
    -

    Available On (1 platform)

    Same model published on other platforms. May have additional downloads or version variants.