????? Lora (+???????, +???????) [Taz] - WAN 2.2 14b / 5B / 1.3b T2V & I2V (Wan 2.1 & 2.2)

????? Lora (+???????, +???????) [Taz] - WAN 2.2 14b / 5B / 1.3b T2V & I2V (Wan 2.1 & 2.2) - v0.8 Wan 2.1 14B T2V

NSFW

About this version

I trained using the newly re-captioned dataset from the 5B model. The result is incredibly good. For the first time I'm pretty happy with the result. Give it a try. I haven't tested I2V, it should work for that though. Most examples are with lightning speed lora and low resolution (480x832)

Trigger word: PENISLORA

What can this lora do?

This lora can add ????????????? to both men or women viewed from the front/side. Other angles such as POV may have a backwards ????? head.

Other things it can now do:
Side view of the penis

Cumming / Cumshots

Blowjobs (its captioned for the words "???????" and "??????????" )

What can't it do?

No penetration in the training data. Also nothing from POV angle, though there is a few images from above and 1 POV video in the training data.

Sometimes ???????? with ??????? have the ????? slip out the closed mouth.

Recommended Settings

It works pretty good with the new lightning dyno high model. I'll link to it in my example workflow. I like to use dyno high model (no lightning lora), then for low I use the lightning v2 lora on the regular 2.2 low base model.

Dataset

84 images at 512x resolution

43 videos at 256x resolution

(I let DP pick the aspect ratio automatically)

This is the same exact dataset as the 2.2 5B model. I made no changes.

Training

I used the default diffusion pipe settings.

[optimizer]

type = 'adamw_optimi'

lr = 2e-5

betas = [0.9, 0.99]

weight_decay = 0.01

eps = 1e-8

I was baffled why it was taking so long to train the high until I realized after over 60 hours of training that I had put my videos in the images directory which resulted in the high being trained ONLY only on videos and twice (once with a very high resolution). Once I fixed this, I went back and trained from 11K steps up to around 13K with the images in the training data. The high model was fine without to be honest.

For the low, I trained it properly with videos and images the whole way, around 6K steps in I upped the image resolution from 512 to 1024 actually and didn't get an OOM (it fit around 24GB exactly). I trained it to around 10.5K steps. Also I trained the low on the full timestep range (0 to 1 instead of 0 to 0.85) from some advice, it may switch better over from high to low on the speed up lora with low steps.

I think I might do another version with more angles such as POV and from the behind to make this work for any situation. In that case I don't think it needs 10K steps per training session, epochs around 5K steps looked fine.

The results

I think it was a combination of improved captioning and 2.2 base model being better. But this lora turned out really well.

Description

Summary:

This lora is designed to generate frontal view of the ????? (including side view), ????????????, blow jobs / ??????????, and ??????? all in one. Some things it does better than others. It works for both men and women (or anything you want to attached ????? to...) Give it a try and post your generations on my page, I love to see my loras used :)

Trigger: PENISLORA (chuck it in the front of the prompt)

(btw I also have a 1.3B lora, check it out, I will update it to v0.8 as well later)

Recommended Strength:

I need more time to test it, I think between 0.8-1 is fine, lower will give weird ????? shapes.

Notes:

Despite the fact that I lowered video resolution, I was still met with almost all of A100 vram usage. My method to lower the resolution may have made the clips too low quality and I think there may be a slight blur to some generations. Try an earlier epoch like 10 to compare on the same seed to make sure, or just try another seed. I cannot for the life of me get it to get the ??????? consistently, you can ???????? good ones or you will get the ??????? ??? come out of her mouth a million times. My major worry is the blur, please report if its a deal breaker and if this version needs to be redone :(

The good news is that almost always you get a good looking ????? now from many different angles (you can get it to look you straight on from above the head now). You should try to anchor the ????? like v0.7 release notes if you get detached ?????. Reference the training data zip for an example workflow and also all the captions I put in a csv file for you there :) (I recommend if using an LMM to generate prompts to feed it that csv file to reference for ????? related terms)

What I learned from 0.7 is that it can do a lot more than I thought, I just need time like a week or so of running through different strengths and prompts to see how to get it to do what I want properly. Already though its a lot easier than before, so I have high hopes for this version.

Example Prompts:

I will add some later, but just check any video I post in this version, I will include the prompt.

Negative Prompt:

色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，倒着走, 模糊的眼睛,低细节面部,水印, 图层, 故障, 面, 屏幕残影, 图像干扰, 刺青, tattoo, 晒痕, muscles, biceps,

For this ^ you can remove muscles/biceps if you want. Also add "??? from mouth" or "??? on lips" if trying to do a ??????? related video otherwise ??? will probably ??? out of her mouth.

Can do well:

Generate a good looking ????? from front / side / below / facing directly to the camera angles

Blowjob (including ??????????, and you wont get detached ????? heads if she takes the ????? out of her mouth like other ??????? loras often do)

Tips for certain actions:

For ??????? try: "??? shoots from the tip of the ?????" (this is used in the captioning as well as "??? leaking")

For blow jobs use "???????" or "??????????" both of those are captioned as one word.

For ???????????? use "stroking the ?????" or "grabbing/gripping" the "????? shaft". Though "shaft" and "base" are captioned it doesn't quite know what what part of the ????? is despite my efforts.

Kinda can do well:

Cumming and ??? shots (negative prompt should include "??? from mouth" and "??? on lips" etc as it tends to shoot ??? from both the ????? and the mouth).

Cannot do well:

POV / view from back of ????? (????? head will be random position since its not trained on this angle at all, I feel POV loras plenty of them exist already and they do good enough)

Data set:

-71 images (8 new images from last version), 512x512 resolution

-40 videos (13 new videos from last version) , 512x320 resolution

(why so few data each time? I am doing captioning by hand as detailed as I can, I am trying for quality over quantity, but I do think this dataset could use more images). I noticed one or two typo or mistakes in the captioning but I didnt wanna restart and lose 10 hours of training, it should be ok though.

Training:

Loaded lora weights from v0.7 using Diffusion Pipe on Runpod

Steps additional steps since last version 4450,

16 Epochs total at 5 repeats each (it took around 3 hrs per epoch)

Total training time of around 56 hours (a little over 100 usd on runpod on an A100 at 98% GPU vram usage)

Credits:

Seruva19's ghibli lora and detailed breakdown of his process and sample dataset/captions

Hearmeman's diffusion pipe template for runpod and his simple to follow instructions

Kijai's feedback on proper workflow settings and all the people in the Banodoco discord answering my questions