
Hey everyone,
A while back, I posted about Chroma, my work-in-progress, open-source foundational model. I got a ton of great feedback, and I'm excited to announce that the base model training is finally complete, and the whole family of models is now ready for you to use!
A quick refresher on the promise here: these are true base models.
I haven't done any aesthetic tuning or used post-training stuff like DPO. They are raw, powerful, and designed to be the perfect, neutral starting point for you to fine-tune. We did the heavy lifting so you don't have to.
And by heavy lifting, I mean about 105,000 H100 hours of compute. All that GPU time went into packing these models with a massive data distribution, which should make fine-tuning on top of them a breeze.
As promised, everything is fully Apache 2.0 licensed—no gatekeeping.
TL;DR:
Release branch:
Chroma1-Base: This is the core 512x512 model. It's a solid, all-around foundation for pretty much any creative project. You might want to use this one if you’re planning to fine-tune it for longer and then only train high res at the end of the epochs to make it converge faster.
Chroma1-HD: This is the high-res fine-tune of the Chroma1-Base at a 1024x1024 resolution. If you're looking to do a quick fine-tune or LoRA for high-res, this is your starting point.
Research Branch:
Chroma1-Flash: A fine-tuned version of the Chroma1-Base I made to find the best way to make these flow matching models faster. This is technically an experimental result to figure out how to train a fast model without utilizing any GAN-based training. The delta weights can be applied to any Chroma version to make it faster (just make sure to adjust the strength).
Chroma1-Radiance [WIP]: A radical tuned version of the Chroma1-Base where the model is now a pixel space model which technically should not suffer from the VAE compression artifacts.
Quantization options
Alternative option: FP8 Scaled Quant (Format used by ComfyUI with possible inference speed increase)
Alternative option: GGUF Quantized (You will need to install ComfyUI-GGUF custom node)
Special Thanks
A massive thank you to the supporters who make this project possible.
Anonymous donor whose incredible generosity funded the pretraining run and data collections. Your support has been transformative for open-source AI.
Fictional.ai for their fantastic support and for helping push the boundaries of open-source AI.
Support this project!
https://ko-fi.com/lodestonerock/
BTC address: bc1qahn97gm03csxeqs7f4avdwecahdj4mcp9dytnj
ETH address: 0x679C0C419E949d8f3515a255cE675A1c4D92A3d7
my discord: discord.gg/SQVcWVbqKx
Description
FAQ
Comments (67)
You TL;DR'd and it doesn't work.
You need FluxMod nodes for ComfyUI and the example workflow. The model is not a regular Flux.1 model, its been kajiggered.
It would be really cool if you could talk to Civitai, to get your new model supported on their systems. I think it would get a lot more attention and in turn raise more money. A rolling release similar to Noob Ai could be a good idea. I think this is a really cool project.
Getting really good results.
I wonder if I can get ai-toolkit to fine tune loras in this
Loras training is coming soon.
might need to ask ostris for that
@Lodestone i'm open to doing it via other methods too. I will wait for the github lora training code to be finished.
How do I get this to work on Forge OP?
Forge will need to add support. As of this post, only ComfyUI is supported.
I applaud the Open Sourced initiative.
Can you add a training config for Kohya SS Gui for training loras on this model?
no kohya is not supported yet.
i have lora trainer code here but there's no instruction on how to do it atm, it's still WIP
https://github.com/lodestone-rock/flow
@Lodestone Understood. Let me know when it is. I am eager to start training.
@keirproductions173 If you have multiple GPUs, the Lode's flow trainer is quite performant. Its pretty straight forward to convert a Kohya dataset to the flow jsonl format. The one gottcha is that repeats need to be done within the jsonl as duplicate lines, but that too is pretty easy to script. There's several of us in the Discord server that have been testing it.
Amazing model.
Should be everyone's default model. period.
Is there and in-depth prompting guide? Can't imagine that there aren't any control words or something
since this model is trained using VLM captions it's recommended to use LLMs to extend your prompt
i also put higher weight towards tags on v.12 so you can prompt it with tags too
@Lodestone it is support e621 tags in v.12?
Anyway, mix of tags and usual text may cause some issue, because tag is has another meaning of usual things. For example, 'balls' as tags does not mean "many spherical objects".
Did t5xxl model support arrange tag? For example in something like this:
"Tags e621: solo, anthro..."
Then we can use tags from different datasets. For example same tags in danbooru and e621 have different meaning. Also it prevent from occasionally activating tag words.
@ArcticFoxWithMonocle booru tags and e6 are supported but there's no prefix to indicate which tags belong to which
Using LLMs to fluff up a prompt also worked well with stock Flux. If you have the compute, shoving one in to the workflow is a thing that can be done.
An extremely promising model ruined by the fact that the more trained it gets, the more often it generates anime images (even when specifically prompted heavily for realistic + negative prompt against anime). Sad
hey can you provide some prompt example? or some statistics of this issue?
balancing the dataset is not easy and i need the feedback to properly tune the dataset weighting.
rn the effort is to learn NSFW stuff as fast as possible but i cannot guarantee the distribution is balanced.
@Lodestone A very beautiful 30 years old woman sitting on a bench, gives Anime girl to me while what I really want is a mature 30 years old flesh.
@MOVZX try to prepend "A photograph" or something that indicates a photo. this model respond well to proper and non ambiguous prompting
I've noticed that it's extremely sensitive to some words. For example just the word "detailed" in my prompt, which otherwise described realistic appearances ("a real life photo of..."), caused it to generate a flat color anime drawing (a good one, but still unexpected). Removing it generated a perfectly realistic image.
@Lodestone A photograph of a stunning 35-year-old woman with flowing, golden blond hair, captured in natural light as she relaxes on a plush sofa within a warmly decorated living room. Her posture is relaxed yet elegant. She is wearing a soft, cream-colored knitted sweater over a silk blouse, paired with tailored black pants that accentuate her figure. The lighting highlights the contours of her medium-sized breasts subtly, adding to the realistic charm of the scene.
Still producing non human character, like anime doll :(
Also, image style may depend from resolution and cfg value.
I only recently started experimenting with this checkpoint. I sometimes experienced what is mentioned here. The model is very prompt sensitive which is a good thing. As @Lodestone mentioned, adding "A photograph" helped but did not fully resolve the issue for some of my prompts. Combining "A photograph of" at the beginning of the prompt while "The background is a" followed by a description of the scene at the end of the prompt, really helped steer the image towards a realistic setting. Not sure if that helps the fine tuning, but possibly the more illustration like samples used for training did not include backgrounds, while the photo-realistic ones included scene descriptions?
amazing. ty, but how to make work for lowly 12gb humans?
Comfyui usually allow to run model which is not fit into videocard.
For example, I was using this model on 6gb VRAM run comfyui with `--lowvram` option.
I also have 12 GB VRAM, Q8 works fine for me: https://huggingface.co/silveroxides/Chroma-GGUF/tree/main/chroma-unlocked-v12
It is not so obvious that to run gguf models you should use Load Chroma Diffusion Model and Padding Removal custom nodes from ComfyUI_FluxMod with T5 clip only. Hope it help someone.
How to use First Block Cache (from wavespeed for example) with this model?
Simply adding caching node is causing error:
ComfyUI_FluxMod/flux_mod/layers.py", line 103, in forward x_mod = (1 + mod.scale) * self.pre_norm(x) + mod.shift AttributeError: 'list' object has no attribute 'scale'it's incompatible with wavespeed atm, this model has major architectural overhaul. long story short this is not FLUX model anymore it's a different model altogether.
@Lodestone I read about that thing on Huggingface repo, but I was hope that this Cache can be ported or somehow adapted for this model.
@ArcticFoxWM It's not too hard to implement First Block Cache node for Chroma. Someone just needs to do it. ChatGPT o1 pro / Claude 3.7 may be able to manage it if given the wavespeed fbcache comfy node + info on chroma's arch changes
ping me when forge will add support, i`m absolute noob in terms of doing something complex like adding support by myself (someone mentioned in comments it possible)
You should open an issue, since i don't see any issue for it: https://github.com/lllyasviel/stable-diffusion-webui-forge/issues?q=is%3Aissue%20state%3Aopen%20chroma
Somehow nobody else has made that feature request over there, so I did just now.
https://github.com/lllyasviel/stable-diffusion-webui-forge/issues/2744
@WithoutOrdinary This patch is not working. When trying to execute the command git apply forge.patch it returns an error. error: corrupt patch at line 110. Something was left out somewhere
@mrsanders1313840 I cannot help with the patch, I do not use Forge. I do believe you need to be on a specific version though.
@mrsanders1313840 The patch was fixed, works on my setup!
Is the lora training code on github finished ?
yes but there's no instruction on how to use it yet
are you not updating the model on here? there's alraedy a v14 on huggingface but this is v11 here
Uploading to Civitai is a pain in the ass and slow. It usually takes 2-3 failed uploads before the site works long enough to successfully push a new version. I'll bug Lode to push a more recent checkpoint.
Prompt adherence and visual consistency is otherwise quite good, but no matter what I put in the positive or negative prompt, I can't get v14 to generate a dark or nighttime picture. Every subject has a spotlight on them from off camera 24/7.
v15 just released on huggingface, have you tried that yet?
@crombobular Downloaded and tried, but same issue. Based on the effect from tweaking the prompt, feels like the tagging of the training data may have led the model to conflate dark/dimly-lit rooms with camera flash at some point.
@crombobular v16 is better at low light, but continues to be unable to do a completely dark scene.
@vbt92 yeah, saw. i guess its goung zo take a few nore epochs
The model is currently training, some flaws in prompt following are expected and fairly normal at this point.
@Yulexuan I continue to see improvements in low light in v19 across the same prompt, seed and sampling criteria, but when prompted for a dark scene, it still makes it look like someone turned their flash on, which continues to suggest to me a gap in the training data.
@Yulexuan Just checked v26. Model seems much more baked now, but a scene described as completely dark still suffers the same spotlight treatment. Not sure if the training set can changed midstream, but this seems to be an issue that's only compounding with each epoch.
Chroma is said to be uncensored, yet I don't see any NSFW images here. Would anyone mind to post some? Thanks.
I can't figure out why, but it just doesn't show my posts. Only one made it through for some reason. I guess it has to do with using the newer checkpoints from huggingface, but even when I strip that data and manually choose the checkpoint here, they just fail to get linked to this page. I just don't bother anymore. Anyway yeah, it's definitely not censored. but some anatomy bits are a bit off in first pass/low res, but inpainting them with low denoise seems to do the trick more often than not. Newer checkpoints get polished more and more, too.
@ailu91 Thanks. I'm seeing the images on your profile. Looks even less sophisticated then original Flux. I guess I will still stay with my own SDXL checkpoint for quite a while. Non-SDXL checkpoints come nowhere close in terms of NSFW.
@SubtleShader Chroma does make it much easier to compose an image, it follows prompts surprisingly well as long as the content was trained, but yeah realistic checkpoints from Pony or Illustrious surpass it currently. Also it doesn't seem to do well with LORA, barely any effect.
@ailu91 If you check my style, you will see that neither Flux nor Chroma can currently do that in terms of posing, perspective and anatomical creativity. Pony & Illustrious can, but they are not photorealistic enough for me even after img2img.
@SubtleShader I guess it boils down the specific datasets. Eventually models that try to be too broad just cannot match the more focused ones.
There is potential here though, I abandoned any other flux models for now
@ailu91 same, chroma is so much better than using flux + nude lora. it's not great at lighting yet and still has some quirks but it's also only at 16/50 epochs (check their huggingface for the latest).
@SubtleShader let's give it some time, they are just ~16 epochs in. Juggernaut guys have plans to release NSFW-capable checkpoint as well
THANX! Have some questions:
1) Lora, Controlnets, all this not working?
2) Please make fp8 and q4,q5 versions
Loras work, but not always correctly, because it's a snell model, loras work for actresses. You can download the gguf version here https://huggingface.co/Clybius/Chroma-GGUF/tree/main
Less outdated GGUF models can be found here https://huggingface.co/silveroxides/Chroma-GGUF/tree/main
Will someone make a workflow for low VRAM machines?
The workflow on the HuggingFace repo works with the GGUF versions here (I use 4_0 with 8GB VRAM): https://huggingface.co/silveroxides/Chroma-GGUF














