I2V

I2V has finally landed, performing much better than using the T2V lora with the I2V model. Trained on the exact same dataset, excluding images as they cannot be used to train i2v. The low noise model has been trained at the native resolution of the dataset for maximum fidelity (1024x1024).

The concepts and prompts are the same as the T2V version.

T2V

Trained on a quite large collection and a lot of different concepts, the main one being women lying on their back showing their asshole. Regularization data were also mixed in to prevent bias.

The model seems to perform well in a wide variety of scenes and has the potential to perform well in much more tasks than the one it was intended for.

Training/Dataset parameters

259 * 512x512 videos of 3 seconds.
189 * 1024x1024 images.
Rank 128.
18 epochs on the high noise.
22 epochs on the low noise.
Higher repeats on concepts with less data.
Learning rates started at 0.0001 for both, and the last ~20% of training ran at about 0.00002.
Videos were sliced and cropped using a custom tool I wrote.
Captioning:
- Qwen2.5-VL was used for the initial captioning.
- The system prompt was adapted for each different concept to provide the model with a context and instructions on what to look for.
- Manually reviewed with minor corrections.
Trained using diffusion-pipe on a single 5090.

What it what trained on:

Women in piledriver position
- Spreading her buttocks with own hands
- Viewer spreading her buttocks
Close-up of vaginas from under
Penis anal insertion
Penis anal pull-out
Penis anal thrusting
Side views
POV views
A variety of pubic hairiness, but I am somehow still unable to control it
Ejaculation, but hard to control
Faces were mostly cut out of frame to minimize character bleeding