HUNYUAN VIDEO 1.5
Official Releases (from Tencent, pulled from Comfy-Org's HuggingFace)
Maintained and updated here for convenience.
Not affiliated with Tencent — just a fan of Hunyuan Video.
🎉 Training code for Hunyuan Video 1.5 is now released!
LoRA's incoming... Confirmed, Lora training working in musubi-tuner, its a bit picky, but it works.
I find that T2V works best with er_sde and beta11, and I2V works best with ipndm and simple/beta11.
TEXT TO VIDEO MODELS
• 720p T2V — FP16 — 16GB
• 480p T2V — FP16 — 16GB
• 480p T2V — FP8 — CFG-Distilled Scaled — 8GB
IMAGE TO VIDEO MODELS
• 720p I2V — FP16 — 16GB
• 480p I2V — FP16 — 16GB
• 480p I2V — FP16 — Step-Distilled — 16GB (added Dec 5)
• 480p I2V — FP8 — Step-Distilled Scaled — 8GB (added Dec 5)
• 480p I2V — FP8 — CFG-Distilled Scaled — 8GB
UPSCALE MODELS
• 1080p SR — FP16 — Distilled — 16GB
NOTES
• Text-to-Video models can also be used for Image-to-Video.
They behave differently from true I2V models, but still work.
TENCENT UPDATE LOG (summarized)
Dec 05, 2025
• 480p I2V Step-Distilled model released (8 or 12 steps recommended).
• End-to-end generation ~75% faster on RTX 4090 (≈75 seconds per video).
• Step-distilled quality remains close to the original.
• Optional 4-step mode available for ultra-fast output.
• Training code now open-sourced (Muon optimizer).
• HunyuanVideo-1.5 available on Hugging Face Diffusers.
Nov 27, 2025
• Cache inference support added (deepcache, teacache, taylorcache).
• Major speedups.
Nov 24, 2025
• Deepcache inference introduced.
Nov 20, 2025
• Inference code and model weights released.
LightX2V COMPATIBILITY
Tested and WORKING (4 or 8 step generation):
• T2V 720p FP16
• T2V 480p FP16
• I2V 720p FP16
• I2V 480p FP16
• I2V 480p FP8 CFG-D Scaled (Distilled)
Tested and NOT WORKING (full 50-step generation):
• T2V 480p FP8 CFG-D Scaled (Distilled)
Description
FAQ
Comments (9)
for anyone curious, i did have some successful tests on lora training last night.
still unsure if fp16 and distilled lora's are at all interchangeable.
currently trying to train a lora that understands male anatomy, and the basics of oral sex.
is there an easy cloud option to to train loras?
@mckenna don't know, haven't looked into that, but, judging based on what i've been able to train on a 4090, cloud may be the only real option for video training on this. training on a 4090 with just a few images rides right at 23.4gb, 5090 would have better luck, and of course, the professional cards with 48+ gb vram. even 2 3090's with nvlink.
@MrReclusive666 I have one 3090 so no chance, last version of Hunyuan video trained very well on 50 images, I am looking into creating a replicate trainer my self.
@mckenna so you are in the same boat as me, pushing the limits of 24gb vram trying to train this thing.
in musubi, it does allow the fp16 models, probably won't take much to train against the fp8 distilled, that should help, but I haven't gone down that path yet, still working on the over-fitting issue, backgrounds and camera position fit WAAAAY before anything else.
@MrReclusive666 yes similar I am trying to create a replicate trainer which is way out of my league, will update if it works :)
@mckenna so a bit of some info, this is all using musubi tuner.
yesterday, i got my first video training set going, started with simple bj videos, I have to limit to 256x256 @ 49 frames otherwise oom.
but thanks to something I learned back in hyv1, as long as you have some matching images for these videos, that quality is ALOT better then just 256x256 clips alone, it has a high res reinforcement of that scene (768x768)
but, 16 clips, 16 high res stills of those clips, 2 hours of training on a 4090. it does reaaaaly well, looking at it, could of let it train for another hour, those 2 hours were a total of 100 epochs (3200 steps).
today, I spent some time.. probably way to much time, building up a more generalized set, hoping to use as a base for all my future training, the idea is this set has the basic understand that hyv1 had originally (hyv1 could do a lot even without lora's)
thats my goal with this set.
so a selection of clips for standard adult content, 36 in total, with 3 high res reinforcements pulled from each clip. so 36 videos, 108 images (set in different sets or oom)
i have clips running on repeat of 3 so the images don't over-fit waay before motion is trained.
the biggest slowdown for me at the moment, is rank/dim of 8, might be able to push 16, but 32 oom'ed on videos no matter what I did (unless i wanted 1 second clips)
musubi is nice, its smart with clip training, only loads 1 clip at a time, so as long as 1 clip fits, all fit.
one of the things i did, to get more frames into training, was take a 3 second 24fps clip, dropped to 16 fps, then saved it again as 24 fps, that shrinks 3 seconds into 2, and yes, it is a fast clip, but, part of why my last night was just test, it did exactly what I wanted it to do.
i can generate at 16fps now and its not all in weird slowmo, because 24fps on these clips is technically fast forward.
once I see how this current training comes out, ill probably start dropping some motion trained lora's for 1.5
@MrReclusive666 nice work bro. I am focused in trying to make an on demand trainer for pictures styles that can be runned on run pod on demand, replicate did not work out at the end. so the ideas is to run masubi trained serverless in runpod.
@mckenna still working on character training setup, but in the process of figuring out best settings for that, nailed styles.
trying to make a jinx lora, haven't nailed jinx yet, but nailed arcane style, lol
Details
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.