It's Schnell Simulacrum v1512 merged in a very specific way with Hunyuan.
Works with many Flux loras AND Hunyuan loras.
I know for certain that this is a confusingly derived model, and the inference is different on my ComfyUI. I cannot be certain as to why, and I do not understand as of right now WHY this is happening.
https://civarchive.com/articles/11578/accidental-discovery-flux-loras-work-on-hunyuan
There is LITERALLY NO PRECEDENT TO THIS. So if you're expecting some sort of miracle or magical cure-all to merge Flux and Hunyuan this isn't going to happen.
However, there ARE multiple loras that have very obvious effects to outcome; especially those that train the CLIP_L.
I repeat, this is a BRAND NEW discovery that I made, and I almost never get attention like this from model releases. In this case I don't have all the answers.
Just know that I'm looking, and trying to FIND those answers for you all. That's the best I can do for now.
Consult the article for the ComfyUI flow chart that I used to merge the things. The outcome is essentially the same.
Combine the merged Hunyuan with OTHER HUNYUAN LORAS to specify more specific outcomes with the system; since this is highly unpredictable and unprecedented currently.
All the models are compacted.
To preface... I have no idea how to help you use this. Schnell Sim v1512 was trained heavily on plain English mixed with booru tagging and has full burned pathways for anime, 3d, and realistic. It supports negative prompting and runs optimally between cfg 3.5 and 6; but this isn't Flux, it has different rules, and yet the model works. I'll be working towards a reason in the coming weeks as to why, but as of right this minute I have no idea why.
I'll need to analyze the call chain and the block loading to see if Comfy's guidance system is somehow introducing T5 into the mix... I actually have about 50 things I need to check.
/////
Good luck...
/////
I STRONGLY advise the use of loras until I actually train this thing properly.
Prompt it mixed with actions and Schnell Simulacrum tagging.
Contents:
Hunyuan base model BF16
I chose the BF16 because it's the most responsive to the bf16 Schnell Sim.
In it's current state is forced to be bf16 until I release a proper checkpoint loader node with quantification attached to each of the subsets.
Simulacrum Schnell v1512 BF16 LINK
There is a full series of articles and notes on this model; which includes tagging, structures, prompt use, and careful prompt planning.
CLIP_24_L_OMEGA BF16
This is trained with over 38 million samples at this point. Try not to stub your toe, it can probably identify the bandaid you're looking for.
LLAMA FP8
This is more conveniently small and moderately fast.
It shouldn't impact performance too much
Hunyuan BF16 VAE
This is the only one that seems to compact correctly and actually yield the correct valuations based on the system.
Merging Help:
ComfyUI:
Checkpoint Save Node -> connect all the endpoints of the chain here.
Lora Loader:
****
We are using this to load the CLIP from the Flux model.
If the Flux model does not have a CLIP it will likely not work; but it may.
****
The standard Lora loader needs to be used to merge the clip correctly with the CLIP_L omega 24. Hook the model and the clip to it, only hook the OUTPUT from the clip, but not the output from the model.
Strength:
Model: 0
Clip: 1.0
Hunyuan Lora Loader:
****
We ARE using the models from these now.
****
This loads loras by single blocks or double blocks. Loading Flux loras should be loaded by SINGLE BLOCK, at less than 1.0; 1.0 will superimpose TOO MUCH POWER for most loras, and the majority of loras are literally BURNT TO A CRISP so they cannot be used. Lucky for us, Sim Schnell is like a perfectly cooked pot roast and is not burnt at all. It's seared to perfection on low heat over a period of weeks.
Strength:
Double Blocks: 0.2
Single Blocks: 0.8
Run both chains to the Checkpoint Save;
Model OUT from Hunyuan Lora Model chain
Clip OUT from Flux Lora Loader chain
VAE OUT from the VAE Load node
It will take more than 90 gigs of ram if set to CPU mode TO COMPACT THE MODEL, so bare with it; it takes time to convert a model. I think it took nearly 15 minutes.
You may now use your merged model; which generates faster, and will utilize many optimizations that the system provides.
You may be able to load it in many different and unique ways, or compact your own using a similar process.
This is GROUND WORK; laying foundation for the potential to build something truly great.
My version of Schnell is protected under a modified Apache 2.0 with the stipulation where small businesses, companies, and research firms may profit from this; but larger corporations and for-profit larger research groups cannot directly profit from this without monetary compensation.
For Hunyuan's case, I yield the license to Hunyuan for the merge as this isn't my base model. All rights are reserved by the original model creators and there will be no fight if a cease and desist is issued.
I will make a full apache protected finetune eventually, just not today my friends. Not today.
I am an independent researcher stationed in the USA and will respect the licensing and rights of the model owners based on local USA law and the implications therein.
Description
This isn't... supposed to work.
Just load it in the main Checkpoint loader in ComfyUI and pray you have the sanity to use it.
FAQ
Comments (60)
k so what it do tho
Many things that it shouldn't do.
@AbstractPhila How much vram? Options for poor people? I am cursed with 12gb, pls I do anything.
@loveaiv I'll work something out. For now this is just the compacted one because it's convenient.
ok be more vague then.
@crombobular yea how stupid is this post so far :D
@crombobular
I released another model immediately before this one named nearly the same thing; and that model is the basis of this one.
What exactly is the disconnect here? There's a full tag guide there.
32Go :(
It's a complicated piece of equipment. If I don't ship it like this, I'm almost guaranteeing the majority of you won't use it correctly.
The download values always reflect incorrectly uniform values. 5 on the clip, 15 on the unet, 7 on the workflow, etc.
I'll release the broken up one later, but this should be the primary showcase.
I think he meant 32GB
DO WHUT NOW
Yes. I broke all the barriers.
@AbstractPhila Then you need to do a better write up because I'm not sure if anyone even understands what this does !
UPD: You have to load this model with "Load Checkpoint" node, just like you do with SDXL or SD1.5 models. CLIP and VAE are built-in. What's interesting, it takes less time to render 288x384x33 video than with regular Q5 Hunyuan.
but this don't make much sense, you say in another comment that you go OOM on your setup, so how did you render videos if you go OOM ?
@NoArtifact Not sure about @DigitalGarbage but on my 4090 plus 64GB I get one render followed by a crash on the next attempt. So it renders, then goes OOM. I'm getting more luck by being ruthless with the memory management, still pretty flaky though. EDIT: Got it working as an FP8 model, not fully tested it for quality, I have no idea how to prompt it and I'm too tired to work on it right now.
@iamtherealian001 I'll just pass then, got a 3090 too but only 32 gb ram, unless I see video poping that are clearly better than the ones made on Hnyuan base
@iamtherealian001 How long does the render take perform? I have a similar setup
I have instant crash with checkpoint node.
@mikebobby681369 I pulled to the newest version of Comfy.
@NoArtifact it's going OOM on the second pass with upscaler or if I set high resolution and/or too many frames for the first pass. I am comparing it with Q5 version of both llama and hunyuan, using fastvideo lora, I could get for even more than 200 frames @ 1280x720 but it takes a lot of time to do.
This one instead won't allow you to generate any clips longer than 1 second in a good quality, but it is really decent, I should say, in overall comparison results with this merge are more stable in terms of composition, movements and overall character appearance.
@DigitalGarbage If you do a manual merge, try setting the double blocks to 0 and single blocks to 1.5, you'll see a tremendous difference. This one is merged at 0.8 single 0.2 double I think.
I think the 0.2 double blocks impacted the overall interpolation too much.
@Okures Sorry for the delay, I had a bit of a disaster trying to fix the cooling on my 5800X, long story short I'm now back to my 6700K which seems to be working a lot better for some reason.
To answer your question; the hassle wasn't worth it with the raw downloaded model. I could get 105 frames at 384x288 16 steps in about 80 seconds, however that does not include the 5 minutes restarts in between.
What I've done instead is save out the model without the clip and VAE at BF16. I'm able to use that with the OMEGA 24 CLIP-L that he recommends plus the quantised lllama-Q4_K_M and standard BF16 VAE. That fits in 20.5 GB VRAM plus 30GB RAM and renders in 75 seconds..
I've used Tensor Cutter to cut it down to FP8. That seems to make no performance difference but gives a different output.
For all of this I've been using the Fast Lora at 0.7 strength.
After messing around with it for a bit, I believe this is a good demonstration that we could get some interesting model merges but, for me, the most impressing thing about it is the OMEGA 24 CLIP
@iamtherealian001 really appreciate the detailed reply. No worries mate.
what's the minimum vram needed to run this beast?
I suppose it's 24GB. Can't imagine this running with lower VRAM, it goes OOM even on my setup (3090 + 64GB RAM)
@DigitalGarbage save it Fp8 then, probably half the size no ?
All of it probably x3
@DigitalGarbage It's 32gb I'm about to try it on an a6000 48gb on Vast I'll update with results
Set your ComfyUI's vram reservation to around 2 gigs reserved for system and just run it. Run the rest using vram dumping nodes.
There are some nodes to quantify at runtime, find some of those if you need them.
I ran it on 12 GB 4070. The flight is normal in resolution up to 640*480.
any chance to prune it below 24 GB?
You mention we can use Flux and Hunyuan Lora?
I tried some Flux lora and it wasn't picking it up. I tried a Character lora and it was not the person at all. I tried a style lora and it doesn't pick it up.
I've loaded them into a Hunyuan Video LoRA Loader, LoraLoaderModelOnly and Load LoRA node just in case and nope.
This thing is extremely heavy so couldn't upscale it. Stayed maximum setting at 512x768 at 45 frames 24fps and even reduced to 512x512 for faster result.
RAM tanked heavily at 63/64GB and vram 23/24GB.
yeah i would also like to have more details about this merging theory and compatibility between hunyuan and flux. Never heard about this before. sounds like a dream .. were all this come from?
thank yuo
@LatentDream It works my friends. It seems to have little response to Flux1D loras; which isn't too shocking.
I trained Schnell entirely fixated on timesteps, language comprehension, and booru tagging association. I fed nearly the same database into SDXL and sdxl didn't have the intelligence to grasp it.
The T5 and Flux however, seems to have done just that. It caught it, fixated, and then rationalized it.
I THINK, the reason this works, is because each lora I trained has the CLIP_L trained alongside it in a low-heat fashion, and the loras themselves are trained specifically on Schnell; which is the heavily timestep-distilled variation of flux.
Essentially, this implies a few potentials to me.
1. Flux is inherently still a mystery, and this only adds to the mixing pot.
2. Timesteps, Shift, and interpolative math are cross-learned and the behaviors are expansive beyond the layers that superimpose those behaviors.
3. These blocks are divergent in a way that manifests outcome; which is astonishing to me.
This tells me Shuttle will likely work, if someone extracts a shuttle lora from the Shuttle 1.0 model.
Shuttle looks better than Simulacrum, and probably trained for a lot longer on higher quality data; but it doesn't have the comprehensive capability due to my absolute fixation on context over quality.
@AbstractPhila You're right, I did use a flux Dev lora. I'll go back to try with a Flux Schnell. There aren't many compare to dev, but I hope this works.
@AbstractPhila Getting no difference with Schnell loras. Can you post an example with a lora that worked for you or is it just in theory that this model should work?
@Catz Are you loading the loras using the Single Block and Double Block method as I described?
You must load the Flux lora in multiple hunyuan lora loaders, and then load the clip from said lora to merge with the parent clips from the model.
This isn't a simple process, which is why I released the compacted version. You need 3 loaders to load one lora.
@AbstractPhila why not share your worflow?
@thefoodmage I just made one, I'll share it.
@AbstractPhila The workflow is flowing, but not working as advertised for me. I have not managed to use a FLUX Lora to output what it is supposed to, although a couple times it would result in static noise so it is doing something with the Lora.
It was especially disappointing to see the model be only kinda willing to make 40k space marines while using FLUX Lora as aids, which bare Hunyuan models I have used so far were moderately good at already. The statement "If it doesn't work, try with a much much higher strength." could be expanded upon. Such as what the single to double block ratio should be or clip strength. The idea is very enticing, but getting there is harder than expected (although the warning was issued). Might this have something to do with me using a 7900 XTX? I do not think so, but mentioning for record.
The model is still fun to use so maybe it will be my go-to regardless, but damn, still want it to work right.
@SDuser666 Sorry, not much training with marines in SimSchnellv1512. I'll include more cool shit like this next time.
It didn't work for my loras even with the workflow but I noticed that flux and hunyuan have things in common regarding lora training
I don't have time to review the entire gradient tape, but I did notice that my particular setup for training Flux Schnell is quite similar in core-elements as the setup used to train Hunyuan as well.
It's possible my Schnell is an anomaly based entirely on the methodology, captions, and planning I used; and this sort of outcome is a very rare outcome; but I don't think so.
There's a lot of potential for research exploration but I don't have time today.
I'm confused. Where do I use the 600mb simulacrum file in the workflow? is it another lora?
It's already merged...? It says right there in the description.
Ah, sorry. Overlooked that.
@dchan23732 It's okay. I know how complicated this stuff is. Good luck on your journey.
I used this lora to get an image of a woman, alas, the result was a dark-skinned African woman.
Not everything seems to be working.
https://civitai.com/models/975857/ozge-ozacar-turkish-actress?modelVersionId=1092815
I have... a plan to fix it. It's a 5 stage distillation interpolation, but it needs... a catalyst. I don't have a catalyst picked out yet. There are 12 potential interpolation candidates for this process and so far I'm not liking any of the ones I've found.
It should be trainable on an a100 though... Maybe on the 5090.
In any case, the outcome should be a more yielding model.
The opposite might be true, where I'm just creating a more rigid Hunyuan. So it's hard to say if it'll work right now.
@AbstractPhila Maybe you should announce a fundraiser among civitai users? even for 10 dollars, let's say for 100 people, that's enough for something :) Or maybe you should write to the Hunyuan development team? If it were possible to combine hunyuan with flux lora, it would be a jump into space. They would be ahead of many competitors. In any case, I will be looking forward to some kind of result. Well good luck.
@mrsanders1313840 It's really not going to be very expensive to train. I think I figured out how, actually.
It's not as complicated as I thought it might be.
The BULK of information about humans and avatar control is in the VAE, rather than the UNET.
I hope everything works out for you and the flux lora will work. By the way, the Chinese are already tuning the hunyuan https://github.com/SkyworkAI/SkyReels-V1
@mrsanders1313840 Hold the phone, that skyreels thing has native I2V?! How good is it?
@firemanbrakeneck Probably better than Hunyuan. The Hunyuan paper says they basically focused on avatar faces and upper bodies.
@firemanbrakeneck it was just posted. We'll keep watching. https://huggingface.co/Kijai/SkyReels-V1-Hunyuan_comfy
@AbstractPhila As always, Kijai is the best :) https://huggingface.co/Kijai/SkyReels-V1-Hunyuan_comfy We'll keep watching.
Will hunyuan loras work (perhaps badly, like XL-pony), or was the structure changed significantly?
Edit: No errors, but poor results.
Based on my (banging rocks together) testing this checkpoint is very impressive and very frustrating. The impressive part is that the checkpoint will almost always produce something good. The frustrating part is that "something good" can be very loosely related to the prompt.
Can I get a cyborg assassin with the appearance of a 10 year old girl running across the screen towards a robot? No.
Can I get a cyborg assassin with the appearance of a 10 year old girl running across the screen? No.
Can I get a cyborg assassin running across the screen? No.
A panicked crowd? Forget it.
Characters want to cluster in the center, ages can be under 8 or over 20, they will do a light jog when asked multiple times.
And yet the consistent quality makes me want to try endlessly...
Sounds like the intentional age gap burn. Everything with certain traits isn't even supposed to exist, likely being introduced by HunYuan itself rather than Flux.
Details
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.