This is a complete re-haul of the dataset from scratch. Triggerword: PENISLORA
Donate to my kofi and I can train an i2v optimized version.
Dataset
I took out all the images and this was trained purely on video. Due to the issues we had with motion before. Next I went through the dataset and anything low resolution (640x) I put into its own bucket group because training them on higher resolution gave blurry penis heads. Also because 9:16 videos train weird, I converted all those to cropped 4:3 or 16:9 with black bars. This left me with 4 groups: HD 16:9 / 4:3 and LOW Res 16:9 / 4:3 (1280x704, 1088x832 on HD, and 640x360, 640x480 on Low res). The newly added data was mostly 121 frame clips. So a majority of our data is trained on high resolution and longer. I created a whole new tool to both trim clips and crop them. And I used mradermacher's Qwen3.5-27B-heretic-GGUF with my captioning tool to caption the new clips. And I am blown away by how good this was at captioning NSFW. Gemini is still better but it can only do SFW dataset captioning. I recommend you check this model out.
Training
Trained on Musubi Fork by Akane on ltx2.3 branch. So I had run this for like 5 days straight tweaking the dataset as I went. And then suddenly LTX 2.3 dropped. So I scrapped the LTX 2.0 working version and started from 0 but with the ideal settings. I accidentally trained the audio on LTX2.0 version and it sounds great despite not being captioned. So I might do V2 on LTX2.3 with sound next so understand V1 is not trained on audio. It took around 24 hours of straight training to reach 17.5K steps at 6s/it. I think maybe I should've trained lower resolution to speed things up, but the result was good. We got detail on the penis head around 15K steps in. The shaft and motion were pretty solid from 4K steps in. Around 17.5K we started seeing raising in avg loss and worse result so I stuck with 17K, though the 16.5K checkpoint was also good.
Prompting
Same as old versions. Use PENISLORA trigger at front. The word for penis is "Penis". Not trained on flaccid penis and most penis in the dataset are circumcised. You can also prompt "Penis shown from the front" or "penis shown from the side". "Blow job" is captioned and as is "deepthroat" but there is not a ton of data so YMMV. I think maybe cum is captioned partially but I tried to remove this from the dataset as I think it will need a separate lora for that, but give it a try (if its still in the dataset it would be "cum shoots from the penis"). If penis has no action you can state "the man's penis is exposed". Use "the man strokes his penis" or "the woman strokes the man's penis" for jerking or hand jobs.
Known Issues
Sometimes penis head doesn't come out right, especially with showing from odd angles. Try different seeds. The penis may be super bouncy, this was due to some poor captioning on data where the penis was not being stroked or sucked. I think easy to fix in v2. Nipples may not be great. Sometimes breasts are weird. Try to use a different lora to fix that. You probably will get random penis on women if they're nude. Maybe try a different lora to fix that. Will try to fix in future versions these problems. It may be a bit overcooked. Let me know, I can try to give earlier checkpoints.
Description
I added an additional 100 videos to the dataset which focus on penis, stroking, and blow jobs.
Total 191 videos, 83 images.
I did a complete 100% recaption of all the videos from scratch. I used my captioning tool and I used grok to do the initial run of captions, then I went through and adjusted by hand every caption.
For the images, I did some slight adjustments.
Prompting
Note: Only jerking is tested and shown working, but there is about equal data training for the jerking, blowing, and cumming. Words in [brackets] are optional, sometimes they are captioned.
For jerking:
"The [man] is stroking his [erect] penis"
"The [woman] strokes the man's erect penis"
For blow job:
"The [woman] giving a blow job to the [erect] penis."
(you can also use "deepthroat" for deepthroat bj).
For cum:
"Cum shoots from the [tip] of the penis [onto her breasts]
The penis is captioned for two angles which are captioned like below:
"The penis is shown from the side."
"The penis is shown from the front."
(A few data pieces are captioned "The penis is shown from above" from above view, no POV data).
Training
I trained this on ai toolkit for 26 hours straight on an RTX 6000 ada on runpod (it trained around 4.4 it/s). Things started to take penis shape around 8K steps, but the most stable seems 9.5k steps. Though I saw good results on 13k, 15k and the final 20k steps as well.
You can try any checkpoint here, let me know which is best.
I split the videos in 49, 73, 81, 89, and 97 buckets (plus images at 1). And I put caption drop out at 0.01. Float8 transformer, and 4bit encoder. Cache latents, videos trained on 512 resolution and images 512 and 768. I did not select "Do I2V", if anyone wants it specifically trained for I2V, then I'd need them to cover training cost (my kofi)