Trained on a quite large collection and a lot of different concepts, the main one being women lying on their back showing their asshole. Regularization data were also mixed in to prevent bias.
The model seems to perform well in a wide variety of scenes and has the potential to perform well in much more tasks than the one it was intended for.
Works for T2V as well as I2V.
Training/Dataset parameters
259 * 512x512 videos of 3 seconds.
189 * 1024x1024 images.
Rank 128.
18 epochs on the high noise.
22 epochs on the low noise.
Higher repeats on concepts with less data.
Learning rates started at 0.0001 for both, and the last ~20% of training ran at about 0.00002.
Videos were sliced and cropped using a custom tool I wrote.
Captioning:
Qwen2.5-VL was used for the initial captioning.
The system prompt was adapted for each different concept to provide the model with a context and instructions on what to look for.
Manually reviewed with minor corrections.
Trained using diffusion-pipe on a single 5090.
What it what trained on:
Women in piledriver position
Spreading her buttocks with own hands
Viewer spreading her buttocks
Close-up of vaginas from under
Penis anal insertion
Penis anal pull-out
Penis anal thrusting
Side views
POV views
A variety of pubic hairiness, but I am somehow still unable to control it
Ejaculation, but hard to control
Faces were mostly cut out of frame to minimize character bleeding
What it cannot do:
Do NOT expect good results on standing poses. There is none in the dataset.
See the prompts associated with the samples.