trained on RTX 4016 ti (16GB).
training setting:
resolution 244, num frames 17
It seems learning motion does not require so high resolutions.
Description
FAQ
Comments (15)
How many videos in dataset?
17 videos sliced from 2 videos.
How did you train this lora? Do you have a guide?
To train lora, I used diffusion-pipe (https://github.com/tdrussell/diffusion-pipe) on wsl2.
some tips I have:
I only have 16gb vram, so I had to set resolutions=244 and frame_buckets=[17] by changing ./examples/dataset.toml. The length of videos for train should be about 1 sec, so I sliced my videos before training.
I found that, for training motion, you dont have to use so high resolution, and for training face, you have to train on higher resolution like 512. I trained that lora for about 4 hours, 2000steps.
Too big dataset may cause overflow of vram. I trained on 17 videos.
@yanagi099 Thank you for the information! Is there a reason why the videos in the dataset need to be exactly 1 second?
@screamlouder Actually, you dont have to make the length of video to 1 second, because the video will automatically shorten into frame number you set in setting. However, If you want to choose exact moment in the videos, you should slice the video and choose them. Sorry for my poor English.
@yanagi099 Did you resize the videos to the resolution you set, or just put the training to that resolution and keep the videos their original format?
@asdrabael I just put the training to that resolution.
@yanagi099 so say I have a video that is 768x1280 resolution and I set it to 384. Will that warp the video or just resize it to 230x384? Just trying to understand how the function works.
@asdrabael I dont know well, but it seems the video would resize to have 384x384 pixels keeping the aspect ratio.
@yanagi099
I'm still trying to figure out how you did this. My pc is linux. I got 12 video clips that are 1 second long each. I resized them all to 242x140. I tried 17 frames down to 10 frames. I have nothing using my gpu at all, and everything loads up and when it starts step 1, I get an OOM error.
It only generates asian faces for me?
I think this one still have one of the best bouncing/moving boobies of all loras,
and the fact is training on RTX 4060 ti wtf
@Cyberfolk yeah but the number of frames and resolution he used is low so that part wasn't so surprising, the surprise is how good it turned despite that :)
For example I went OOM and crashed on my RTX 3090 (24GB vram) when trying to train at 432 pixels resolution and 69 frames, but it worked at 350 resolution (or something like that) and 65 frames quite easy, so these two parameters really put the Vram in high use quickly
Details
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.