This
is a pair of LoRAs trained on the WAN 2.2 I2V 14B low and high noise
models. Together, these LoRAs model a mostly facial ???????. The man
doesn't have to be present in the first frame.
Trigger Phrase:
A man ???? on herI
recommend using a CFG of ~3.5 for high-noise sampling and 5-7 for low
noise sampling. The examples in the showcase were generated using a
constant CFG of 3.5 and 7 for the high and low noise sampling steps,
respectively, 24 total sampling steps, a split step of 3, and a shift of
5 for both samplers. My full workflow is available on my Patreon.
I
noticed while I was generating samples that the ???????? appear to be
bigger if the man is already in frame with his hand on his ????? and the
generated video has an aspect ratio close to 16:9. I think this quirk
arises due to the nature of the dataset.
If you like this model and want to support the creation of future models, please support my work by leaving positive reviews or posting your generations to the model page.
