v2. training. but it doesn't work
I don't know why.
To increase the diversity of target frames, I increased the number of frame types in the dataset.
Previously, it was fixed at 81 frames, but this time it was trained at 33, 49, 65, 81, 97, 113, and 129 frames.
Lora rank has been lowered to 32.
Previously, it would have been 64.
The front and back padding areas have been increased from 4 frames to 8 frames each.
4 frames are not enough in some case.
(This occurred when using a lot of different loras.)
As we all know, using the End Frame feature in wan 2.2 causes degradation near the last frame.
I couldn't stand this.
So I tried to making Lora.
This is my first attempt, but I think it worked better than I expected, so I'm uploading it to share.
This is what Lora does.
Creates same extra frames at the beginning and end of the video.
It was trained with a dataset of 4 starting frames + loop video for 5-77 frames + 8 subsequent frames.
For the FLF2V.
The intention is to be able to complete the loop video even by cutting out the frames that VAE distorts.
After creating a video, additional work is required to select and cut out duplicate frames.
I hope it works as intended.
Description
high - 2500steps.
low - 1000steps.
An error occurred at step 1000 due to incorrect settings.
When learn again, It will start from step 1.
I'll update when have a chance.
FAQ
Comments (12)
Maybe this will help with interpolation trimming out a few frames and ruining a good loop.
Why not just create a better node instead?
It's a bit complicated to explain, but I'll try my best to explain it.
Once you create an 81-frame video with FLF2V, you want to create 1 frame and 81 frames equally.
The problem is that frame 81 deteriorates in the VAE stage.
Frame 1 and Frame 81 must be connected, but Frame 81 is difficult to use.
If you cut out that frame, the misalignment of movement will become larger and the chances of solving the problem will be further reduced.
I couldn't solve it with color matching or other methods.
It may be improved if the recently released SVI is used, but looking at the method, it seems that there will be a problem in that if you do not like the image in the center, you will have to print the entire image again.
(I haven't tried it yet)
If you are talking about something different than this, please leave a more detailed explanation.
@zzozz I'm aware of the issue, but I'm not sure why/how a lora would solve it? The natural solution would be to throw out the last 1-2 frames and reinterpolate between the generated last frame and the goal last frame. Use something like tensorrt rife vfi
@ultimo_intento If you make an 81-frame video, there is movement from 1 to 81 frames.
Using this lora, compress and move it from about 5 to 77. The first 4 frames and the last 4 frames will be buffer. if VAE makes some frame worse, we cut it out. the loop video itself will already be completed from 5 to 77 frames.
Post-editing is required after creating the video, but I tried it this way and I think it works.
@zzozz I understand what yoo're doing, but it's just not an optimal way and, as you said, requires post processing. Instead the the padding files aren't needed (also note: a lora is not a deterministic fix), it's the clean and intelligent splicing of frames that removes errors introduced by the decoding step. Also a higher framerate will lower the amount of possible mutation, further improving the quality of frames generated (at the cost of time though).
Thank you for the extremely useful LoRA.
It may be for the same reason that WAN S2V processes every 77 frames.
Question: Why are frames 1 through 4 also discarded?
Frames 1 through 4 seem fine to me.
Depending on the WAN model, there may be a difference in image quality between 1-2 frames at the starting point and the subsequent frames.
To prepare for this, a dummy frame was also placed in the front.
The latest WAN models seem to be fine.
Still, it works for now, so let's use it like this just in case.
And it's too annoying to train again because I ruined my learning environment because of stupid Gemini.
I think you messed up the explanation of the frames a little or I didn't understand correctly... If I generate 81 frames, should I crop the first 4 and the last 4 to generate the loop?... Or should I do something else?
It depends on the video.
Before frame interpolation, the first four frames were mostly correct, but the latter ones showed subtle differences depending on the video.
If you're using a video editing program, you should manually set the loop section and compare.
I'm not sure when it started, but it started working less and less.
I tried training v2, but it failed. It'll take some time to figure out what's causing it.
or Maybe I should move to SVI.