Wan 2.2:
example prompt: Very high quality 4k animation, This is a masterpiece of the best quality, the video is showcasing amazing and highly detailed artwork. Man's POV of him laying down with a woman facing away while riding on top and having anal sex with him. She is riding on top quickly bouncing her butt up and down as his penis goes all the way in and out of her asshole.
Wan 2.1:
This model was trained on realistic videos on the Wan T2V 14B model. Version 1.0 was trained on the I2V 720P model, but Version 1.1 should be compatible with both I2v and T2V. I trained locally on an A6000 using 20 2 second videos at 1920x1080 resolution (24 FPS, 48 frames each). Training 2800 steps took around 48 hours.
Training Setup
I trained this model locally using diffusion-pipe: https://github.com/tdrussell/diffusion-pipe.git
Here are the toml files I used to train as well as the command to start training (within train.sh): https://drive.google.com/drive/folders/1Ns6IPQlNp-jYz76LlqpYpOoaQwUx3AN9?usp=sharing
Description
FAQ
Comments (20)
Thanks for creating this and sharing your training setup.
Would you be willing to expand on this a bit more? What training software were you using? What configuration of that training software did you use (eg: diffusion-pipe .toml files)? What resolution were the videos?
Appreciate any additional info!
I updated the description to provide the info you requested <3
i wish tutorial how to install this on windows and train. Any gradio avalilable ? mitsumi ?
@arkadiuszdukowicz311 Do you have 48GB of vram? If not, it may not be feasible for you to train, at least not with videos of the resolution and length I did with this.
I don't know anything about gradio or mitsumi unfortunately.
That said, there are plenty of tutorials out there like this one: https://www.stablediffusiontutorials.com/2025/03/wan-lora-train.html
@pikenrover I appreciate the updates. Any chance you could open up the files you shared to be public (eg: anyone with the link can access)?
@arkadiuszdukowicz311 There are a lot of good tutorials out there for running wan lora training locally, but you might also consider just running using a pre-configured runpod template. I've used the one described by this article and it was pretty easy to get up and running. Plus the model weights already setup and configured with the template. https://www.reddit.com/r/StableDiffusion/comments/1j6ezug/wan_lora_training_with_diffusion_pipe_runpod/
@arkadiuszdukowicz311 Here's a good article on how to get wan lora training up and running locally: https://civitai.com/articles/12837
@pikenrover how is a6000 for generation? it takes me 15 minutes on 720p model with this lora to generate 720x1280 3 seconds 16fps on a 4090
@lowcaloriesyrup Oops, my fault, it should be accessible now.
@swaglordrtz It's been really good for me. My workflow takes about 20 minutes for a 4-5 second video on the bf16 version of the 720P I2v model. That includes upscaling from the originally generated 512x768 generation to 1728x2560.
@swaglordrtz I can't speak to generation and I don't know whether this translates at all, but for training, my 4090 is about 1.5x as fast as training the equivalent on A6000. This is for a WAN character lora using 1024px images. You probably know this, but the main benefit you get from the A6000 is the extra VRAM. For training this means you can use higher rez videos with more frames to train. For generation, it means you can load full weight models and generate higher resolution/longer clips.
@pikenrover TYSM! I was able to download them!
Can you guys please provide good parameters for training a T2I 14B LoRA from images on an RTX 4090 with 64GB RAM? What would be the optimal learning rate, network dimension, flow shift, number of epochs, batch size, timestep sampling method, blocks to swap, and optimizer settings? Which model precision should I use - bf16, fp16, or fp8? Are there specific model versions or links to recommended model weights that would work best for face training? I've trained on 20 photos of myself, but instead of generating my likeness, I'm getting a completely different person in the results!
@CyberAImania this is a good article for getting diffusion-pipe set up and models downloaded/configured. I had the same question as you though: How do I train using 14B models on my 4090? @SingularUnity pointed me to the correct models to use in the comments of that article. I'm getting good results training wan character loras, but I'm still not as happy as with hunyuan character training results. Still tweaking things.
Here are the pertinent parts of my config.toml. Everything below the model paths are just diffusion-pipe defaults. I'm using 40-50 1024px images. Image set should contain various compositions and angles of your subject. For captioning a character lora, you should consider just using a keyword in the caption files. Unless there's something specific that you don't want the keyword to be associated with (eg: if there is a lot of variation in hair color in your training data, but you don't want random haircolor appearing in your generated images, you might want to specify hair color in your captioning files). Feel free to DM me if you want more help.
epochs = 50
blocks_to_swap = 20
[model]
type = 'wan'
ckpt_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan2.1-T2V-14B' # WAN-AI
transformer_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan2_1-T2V-14B_fp8_e5m2.safetensors' #kijai
vae_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan_2_1_VAE_bf16.safetensors' #kijai
llm_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/umt5-xxl-enc-bf16.safetensors' #kijai
dtype = 'bfloat16'
timestep_sample_method = 'logit_normal'
[adapter]
type = 'lora'
rank = 32
dtype = 'bfloat16'
[optimizer]
type = 'adamw_optimi'
lr = 2e-5
betas = [0.9, 0.99]
weight_decay = 0.01
eps = 1e-8
@lowcaloriesyrup pm you
FYI, I just learned this myself... if you train on the T2V model, the resulting lora will work with T2V and I2V. When you train on I2V, it won't work for T2V.
train t2v not i2v on wan
I know, testing a t2v model now.
@pikenrover
Did some tests , lora trained on i2v works better (on i2v)
vs if you use t2v lora in i2v (with same seed and other settings)
both with 480 and 720 models
But it also could be that t2v lora needs different settings when applied in i2v
@blo01 I'm finding the same thing actually for a different one of my models. The I2V trained version is the only one that gives any sort of good result.
Details
Files
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.