The Secret Sauce, trained on (≈5600 images)
-This model's dimension is large, so combining it with other LoRA models may not be beneficial.
Description
FAQ
Comments (62)
Will this work with GGUF of hunyuan video, as I have only 8gb Vram. Will this be compatible with that GGUF version.
Hi, it should be compatible with the GGUF version of the model, but 8Gb of vram seems very low, I would say that 16gb is the minimum, even 24 is low for Hunyuan unfortunately, But I recommend you to try LTXv, it's fast and not so demanding on vram
@pgc Thanks for the reply, yeah 8gb is pretty low, LTXv is also good but I think hunyuan is better than that, thanks for this suggestion, will see try to upgrade the hardware.
Please add some description.
Does the hunyuan lora understand sex positions?
Hi, no, I didn’t train anything sexual. It’s essentially a LoRA focused on refining the "look" style, face, and upper body. You can check the Flux version images for a better understanding of what it does. The dataset used for the Hunyuan LoRA is roughly the same but much smaller approximately 1,500 images compared to around 5,600 for the Flux version. I’m currently testing the addition of videos and motions, but that will be a separate model.
@pgc oh, I get it. I didn't know you can train a video lora on images. thanks for clarifying.
@pgc can you explain how this is hunyuan specific? Is it the dimensions of the training images?
why is the video so short? Are you running out of memory, or do you just not want to wait too long? Are you using HunyuanVideo Enhance-A-Video to further improve quality? Is SageAttention enabled?
A person on reddit wrote that he makes 20-second videos on RTX 3090 24Gb
here are his settings
249 frames interpolated x2 = 498frames
= ~20 seconds video
size: 720H x 400W
steps: 7
inference time: 53 seconds (vid2vid 0.5 denoise)
I have 3090, I have been experimenting with Hunyan for the second week, my goal is to learn how to make infinitely long consistent videos using hunhyuan_rf_inversion or vid to vid
The goal is to showcase the overall look, so I’m fine with 2–3-second videos as long as they have decent quality and motion. I have a 4090, and generating 249 frames at this resolution is not feasible, at least with the FP8 version of the model. The official FP8-scaled version doesn’t support LoRA yet.* Not using FastHunyan here*
Also, 20 seconds for 249 frames equals 12 FPS, which is kinda slow for the motion. I prefer short videos that look real-time rather than long, slow ones.
I use Enhance-A-Video and Sage Attention, yes. But since the model isn’t restricted by size or length, there are no "best settings" to recommend, sky "vram" is the limit.
As for long video generation, I’m not interested in that at the moment. The current methods aren’t very good. Using the last frames of the first video to generate the second and so on, creates a kind of self-convolution, and the results degrade progressively over time.
Did you train this on AI images?
1/5 is AI images generated with the Flux version using prompts of the original dataset,
4/5 is the original dataset (but smaller), only composed of real photoshoot with DSLR on 70-200mm lenses
OK thanks for the reply, yeah that first image with the grass behind the woman reminded me of faces I've seen in flux. Interesting how that crosses over to this.
What are the activation words
"fat cows"
Could you share workflow for some samle videos please?
What strengths (model + clip) do you use?
I mean what's the posting of posting all that but not your prompt??
The point of posting all that? I think you don't see the amount of training, and rendering time to create these videos, I made about 200 at high resolution each of them renders in about 320seconds. More than 15 hours of extra computing just to give you guys some examples...
I never understood why people don't post their prompts. That's the ultimate helpfullness there.
I posted 10000 of these prompts two monh ago on the first iteration of the model, flux 1.0. Is it enough or you want more?
@pgc Sorry dude, I'm not seeing the prompts. All of it shows blank. I am visually impaired so that could be it too ;)
@StanleyPain in the download as training dataset there is a .txt file that contains 10k prompts, this is all the dataset captions merged
@pgc Thanks I will check it out :)
@StanleyPain did you find the training dataset? where is that 😢
Boobs seem to be less round/smooth with this LoRA
Hey pgc, what native resolution you used to generate these videos? Do you use any nodes for face restore inside comyui? I'm trying to use topaz upscale on my generations, but my results not even close to yours.
Hi, I rendered most of my last posts at 800x1184 without teacache, no additional post processing, then upscaled using topaz video 3.3.3, x2 with frame gen to 60fps
Was this updated today? I saw it disappear and then reappear.
I misclicked and deleted the whole page instead of an updated unpublished version, I asked the moderation to revive it, fortunately it was still possible
@pgc Good deal. So what's new?
description would help, what does it even do?
from an OP comment: Hi, no, I didn’t train anything sexual. It’s essentially a LoRA focused on refining the "look" style, face, and upper body. You can check the Flux version images for a better understanding of what it does. The dataset used for the Hunyuan LoRA is roughly the same but much smaller approximately 1,500 images compared to around 5,600 for the Flux version. I’m currently testing the addition of videos and motions, but that will be a separate model.
He's not lying, look at the examples, mostly similar face structure. I will say it helps with backgrounds and stuff, but it seems it defaults to the same woman
Also, if you check the other examples in his FLUX version, with prompts, he doesn't seem to be using any specific triggers. The model's JSON doesn't say it has triggers. It may literally just be a bunch of photos with descriptions baked into a LoRa, no real overarching words. Although there should be a single unique trigger, there doesn't have to be.
@makiaevelio543 This is correct, the thing is, when you train a vast amount of images with various characters/poses/environments/night or day/framing etc. with long LLaVA or Florence2 captions, it's hard to clearly describe what it does and what the model retain the most from training, this is why I called it secret sauce it's not a gatekeeper thing
Almost every face was different so I don't know what could be the cause for faces that defaults to the same woman
I resumed the training from 15K steps (B3) to 30K steps (C2), but I didn't have the time to test it and do comparisons yet
@pgc It's a valid strategy, more or less what the models themselves use. It's just a "style lora". I think maybe some people are new to all this and see the manageable set of straightforward pose loras and assume this must be like it.
I think the face is just the face that is the average of thousands of faces. At least that's how I've attempted to rationalize it lol
It is a lot more work, but I think what the BigASP creator did to mitigate this problem was to build an entire workflow around prompting images. He created his own model to "properly" rate all of his images, and using that model's output, would randomly assign only a subset of the proposed tags and descriptions to the image.
This worked two fold: you can properly separate the instagram models from the amateur photos (by properly rating the images, which I think reduces the 'face'), and you can allow simple prompts to hit many more complicated ones (by associating them when they're randomly removed from similar images). A lot of work, and that "rating model" is a big question mark.
The training data seems to be missing, i don't see it on the Flux versions either
civitai.com/api/download/models/995043?type=Training Data
I removed the v1 few days ago, this was not the full training "images" data, it was the captions.
This is the combined captions for the B3 version
https://drive.google.com/file/d/1cb-1H_kWnjFtSMcVPvTAPw7PrZUVN7oe/view?usp=sharing
Is this trained on AI images or real images?
For the most part yes, but there is a small part of flux synthethic data augmentation
Why did you upscale your stuff with Topaz instead of posting the raw thing of what your lora does? It misrepresents what the thing does and makes everyone feel bad about their own creations. This is the worst activity on Civit and there is no accountability, I hate it. This isn't the model page for Topaz upscaler
Hi, it just looks sharper with its almost imperceptible, i'm not stopping you to use your own upscale solution, I do things for me first then I share if it can be useful for others, you hate it or not i don't give a single F to be honnest
@pgc ey bro, if I am interested in knowing how to raise the quality, it is nothing else with Thopas video, or there is another program
@OFIA Hi, I didn't use anything else than topaz and comfyUI, you can use the upscale node from comfyUI with the upscale model of your choice, you can also use frame interpolation node to have a more fluid video, from 24 to 48 fps. On topaz the frame interpolation is not limited by a multiplier value, you can set the desired output fps instead, I use 60fps.
I ran a batch to generate all these videos over the night, I prefered using an external upscaler to only upscale the videos that I wanted to keep, this way I didn't spent extra power on videos that was not good enough.
@pgc bro, please can you share your work-flow. I want to use. New here to video.
Hi, I noticed that this Lora was trained on significant size of data. Which hardware did you use and what was the VRAM required?
It was trained with diffusion-pipe, I have a 4090 but I didn't remember the peak VRAM usage during the training, I think 16go is enough but i'm not sure. To train videos, 512x512px 33frames rank 32 barely fits on a 24go GPU, so it's not possible if you don't have a 3090/4090 or less (to train videos)
How is the LORA, with diffusion pipe or other? formed, what are the necessary PC capacities?
It was trained with diffusion-pipe yes, I have a 4090 but I didn't remember the peak VRAM usage during the training, I think 16go is enough but i'm not sure. To train videos, 512x512px 33frames rank 32 barely fits on a 24go GPU, so it's not possible if you don't have a 3090/4090 or less
Is there a workflow you recommend for accuracy with Hunyuan V2V please?
And what are the best settings to use for motion accuracy vs video quality? thank you!
From the examples it seems like the training data was completely AI/synthetic data. All the faces look like your typical AI face.
is there B2?
mostly women or difference demographics of people?
scroll down one click
what does this do if added it seems to do nothing
It's a secret
it adds sauce (shhhhh)
I also have this question, since there is no freacking description or any words of wtf it does?!
yes. nothing wrong with giving a description so people will know
Hi, what´s strenght for images recommended ? Thank you
Details
Files
Available On (2 platforms)
Same model published on other platforms. May have additional downloads or version variants.