This Lora was trained with 80 videos (portrait ratio) of 24 frames in total each, only double blocks were trained, so it only has 172 MB, It is epoch number 52 out of 100 epochs of total training.
I recommend strength 1 to 1.5
If you are using comfyui's native nodes to generate videos using Hunyuan I advise you to test generating with the sampler/scheduler: dpmpp_2m + beta.
To use it, you can use the trigger word: 360c4m3r4 or follow a prompt structure that I think works very well, which is the following:
360c4m3r4 The scene features a close-up view of a {CHARACTER DETAILS}, capturing their face and upper body from a low angle. {BACKGROUND DETAILS}. The {CHARACTER} {ACTION}, its {mouth/nose} prominently visible as it moves closer. There are no significant changes in the {CHARACTER} viewing angle during the scene; the focus remains on the {CHARACTER} face throughout.
Example:
360c4m3r4 The scene features a close-up view of a dog yorkshire terrier, capturing their face and upper body from a low angle. The background is a super market. The dog yorkshire terrier appears to be walking towards the viewer, its mounth prominently visible as it moves closer. There are no significant changes in the dog yorkshire terrier viewing angle during the scene; the focus remains on the dog yorkshire terrier face throughout.
Carry out your tests and adjustments and see which prompt best suits your needs.
I trained using the fork I made of diffusion-pipe, it has an interface using gradio and a docker image that makes it easy to use, with just one command you have the entire environment set up and the models already downloaded, if you want to use it follow the instructions in the README of the repository: https://github.com/alisson-anjos/diffusion-pipe-ui or you can use the template for Runpod or VastAI.
https://runpod.io/console/deploy?template=t46lnd7p4b&ref=8t518hht
https://cloud.vast.ai/?ref_id=142589&creator_id=142589&name=Hunyuan%20Lora%20Train%20Simple%20Interface
Article: Train LoRA for Hunyuan Video using diffusion-pipe Interface with Docker, RunPod, and Vast.AI | Civitai
Description
FAQ
Comments (14)
please share you training tutorial.
I will make a tutorial as soon as possible, I trained this lora and the others I made using diffusion-pipe, there are some articles here on civitai explaining and also some videos on youtube, in my specific case I made a fork of the official diffusion-pipe repository and added an interface and also created a docker image with this it is easier to use diffusion-pipe and carry out the training, I will send the link to my fork and some tutorials, I will soon make my guide.
diffusion-pipe (cli): tdrussell/diffusion-pipe: A pipeline parallel training script for diffusion models.
my fork (interface, docker, template runpod and vastai): alisson-anjos/diffusion-pipe-ui: A pipeline parallel training script for diffusion models.
youtube tutorial: https://youtu.be/wVTZj-RGIXw
article: https://civitai.com/articles/10547/train-lora-for-hunyuan-video-using-diffusion-pipe-interface-with-docker-runpod-and-vastai
@alissonerdx i have a lora trained, i do not use comfyui, i'm on an ssh gpu server, how do i start generating videos with it now? could you please give me the cli commands? i sshed into a gpu server, so it's not on my localmachine.
@co773c710n5 In this case, at the moment, I think you can try to use the musupi-tuner project for this, it has an inference script there
https://github.com/kohya-ss/musubi-tuner
Example:
python hv_generate_video.py
--fp8
--video_size 544 960
--video_length 60
--infer_steps 30
--prompt "solo,Xiangling, cook rice in a pot ,genshin impact ,1girl,highres,"
--save_path .
--output_type both
--dit ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt
--attn_mode sdpa
--vae ckpts/hunyuan-video-t2v-720p/vae/pytorch_model.pt
--vae_chunk_size 32
--vae_spatial_tile_sample_min_size 128
--text_encoder1 ckpts/text_encoder
--text_encoder2 ckpts/text_encoder_2
--seed 1234
--lora_multiplier 1.0
--lora_weight xiangling_im_lora_dir/xiangling_im_lora-000003.safetensors
For this script to work, I believe you will have to download the original HunyuanVideo models and not the Kijai ones, I'm not completely sure about this part.
I saw some issues opened by geeve-research both in diffusion-pipe and in some musubi-tuner projects, I believe you are looking for a solution for this hehehe.
What can you tell us about training only double blocks? I saw you mention it as well on your github for the GUI interface. Wondering if you can share your thoughts on it.
So from what I saw someone said that training only double blocks decrease motion blur and make lora more compatible to combine with other loras (multiple loras), but I didn't do much experimenting on that, this was my first lora that I used this feature.
This is what the person said:
Someone 1:
"Anyone else tried training just the double blocks? (vs training all the blocks and 'only using the double blocks' during inference, since there is a difference) I just tried training, same dataset, same settings, but the difference being training all the blocks as normal (308MB LoRA generated) vs training only the double blocks (172MB LoRA generated) and for 'this particular test' I'm for some reason seeing a better likeness to my training data, and the prompts are being followed better, etc, on the 'only double blocks trained LoRA', just reporting it... I'll have to try many more 'dual trainings' to see if this holds in anyway or is just a fluke (which is the assumption I have atm, as logically 'less was trained' throughout the model) with this particular dataset, etc... (double block only training 'tweak' to diffusion-pipe for ref:"
"~10% speed boost in my case, bit less VRAM"
Someone 2:
"I tested it yesterday, and it performed really well in terms of movement, helping to avoid some blurriness and unnecessary jitters.
Especially when mixed with other Loras, the improvement is very noticeable."
"Below are just the results from my personal tests. I trained two Loras: one for character design and another for NSFW motion. From my perspective, the character design Lora doesn’t show much difference compared to full blocks, but it does result in a smaller Lora size and faster training speed. As for the other Lora, I found that turning off single blocks during inference, as opposed to directly training only with double blocks, results in a 'cleaner' output(but it's much better compared to full blocks.), with less motion blur and a bit less style (since I only trained motion and tried to minimize style). This is more noticeable when mixing Loras. It indeed performs better compared to full blocks, but due to the training steps, I can't yet determine if directly training with double blocks has more advantages than just using double blocks during inference. I'll test this again once I have more training steps."
How are you doing video captions in your dataset? And I just wanted to say your docker image is beautiful. thank you.
Thank you for docker image and tutorials. When I trained Lora with images on past checkpoint I needed to create folder like "X_trigger" and include all my image and caption in that folder. If I want to train from video instead of img, will the name of folder arbitrary or need to make same as image (X_trigger)?
The name of the folder doesn't make any difference, basically if you want to mix images and videos, both can be in the same folder, if you want to train images in a resolution x and videos in a resolution y or in different repetitions, you will need to create a folder for each of the types, one for images and another for videos and then in your dataset settings you can have more than one [[directory]] that will allow you to configure each dataset separately, repetitions, resolutions,.....
@alissonerdx Thank you for clarification!!
@yue_liang There is just one detail, this separation of [[directory]] is not yet possible through the gradio interface, what you can do is upload your entire dataset through gradio and configure the training parameters and click on train and then press stop next, only for it to save the configuration files, then close gradio and use jupyter lab to separate the files into 2 folders (images and videos). After that you go to the dataset configuration file that will be in the configs folder (/workspace/configs/{name of your dataset}) and edit it to add more than one [[directory]] in the configuration file (toml) pointing to your folders, remembering, if you change it manually do not run the training through the interface because the interface will overwrite the files, you will have to run the training through the terminal in jupyter lab, this is explained in the article I made. https://civitai.com/articles/10547/train-lora-for-hunyuan-video-using-diffusion-pipe-interface-with-docker-runpod-and-vastai
@alissonerdx Yeah, I saw that message as well, thanks. Will keep experimenting!
Details
Files
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.