I quantized the latest versions of Lodestones Chroma for use in lower VRAM machines. Enjoy. For full versions https://huggingface.co/lodestones/Chroma/tree/main
For me it works best with Euler/Beta or DPM++2M/ SGM Uniform16-20 steps or Restart/SGM Uniform 6-10 steps. CFG 2 in Forge and 2.7 in Comfy with res_2s/bong_tangent. Download and use the Hyper_LowStep (str.1) or Chroma2Schnell (str. 0.125) LoRA from https://huggingface.co/silveroxides/Chroma-LoRA-Experiments/tree/main for fewer steps and faster generation times.
Description
Latest Chroma unlocked & detail calibrated, quantized Q4_K_S. Excels in: lighting, human/animal anatomy, art styles, image clarity without having to use LoRA. In order to use update Forge to the latest version or update Comfy and use an appropriate Chroma workflow.
FAQ
Comments (25)
Why it marked with Model type: Other if it is based on Flux 1.Schnell?
Civitai has it marked as "Other" so that's a question for them I suppose.
milt68 Wait, when I create a new model I set the model type myself. There is a dropdown with SD1.5 by default but you can change.
So which model is this actually based on? A little more information would be helpful.
green_tomato, it is based on Flux schnell but pruned and completely retrained. Follow the link on my description and check Lodestones model card.
milt68 I understand that. Training and pruning doesn't change architecture. That is what I'm saying. Pony, Illustrious and NoobAI models are technically the SDXL models, but the original models authors somehow bought the new category from CivitAI (I suspect), but they are still SDXL and not Other for sure.
Why I'm signalizing that? In my CinEro V6 model I've got a comment that my Illustrious model is not Illustrious because it is unable to render exactly the same Anime'ish dress of one character. Other user can tell that CinEro V6 can't render some specific feature from some other SDXL model... So, what should I do? Make all I can to initiate the new family called "CinEro"?
I think this way we will decompose the whole infrastructure of Diffusion Model Families. I vote for minimizing the differentiation of models by dataset used for training and keep division by architecture only.
homoludens Agreed. Let me see if I can update the details.
Its because civit doesnt have a Chroma category for some reason
homoludens from what I understand chroma is a Flux based model -- that seems to have less limitations around (nsfw) stuff - it functions fairly similarly, but unfortunately theres not a lot of loras etc for it (maybe cuz it doesnt have a category on civit lol) afaik it is not like just a simple prune - it is different enough to warrant its own sub category of flux similar to kontext (this is the best way I can describe what I think is the case) go to pixaroma discord if you want Im sure someone can explain it there just dont post nsfw stuff
d3darth333 I pointed above that if architecture the same, creation of new category just makes things more blur. In fact many modern SDXL models in categories like Pony, Illustrious, NoobAI made on mixtures of resources from all 4 categories and this family division have no technical reasons. Only the sake of promotion of the author.
Same thing for Chroma. If it is based on Flux 1.S architecture, then Flux 1.S LORAs should work. If someone will communicate with CivitAI and invest big money on category creation it will be good for the owner of the "First in Category" model but only add an information noise and diversity into the models hierarchy.
ONE FUNNY THING...
In ComfyUI I tried to load CLIP models from HiDream with node for Flux and use them with HiDream and Flux Dev GGUF models. These HiDream - Flux 1.D combinations are actually working and I can't understand how.
This blows my mind because Unet architectures are different. How it works have no idea. Maybe some classes with automatic recognition of proper architecture takes place, but these games with model family names makes everything so unclear...
d3darth333 It's being retrained as we speak so the finished model will be properly trained with human anatomy. At least it strays away from this concept that vaginas and boobs are semi-okay but penises are not. Hmpf...
milt68 Maybe I'm explaining things not clear enough...
>> It's being retrained as we speak so the finished model will be properly trained with human anatomy.
My CinEro V6 models was trained on datasets passed through multiple iterations of natural skin and realism improvements. It renders natural imperfections and pigment variations better than many other ILL / PONY models.
According to your statement I should demand from CivitAI to introduce the new category.
@Sateluco should also demand it's own category / family because their models also have it's own flavor different from CinEro.
This path leads us to chaos as I can see it so far.
The fact you use training (precise fine-tuning on small dataset like few hundreds of HQ images OR big datasets with few thousands images, as I do) doesn't mean you create the new family. Training doesn't change the Neural Net structure, the number of layers, the type of neuron activation, the number of UNet layers. It just change the weights of links between the neurons. Why should I expect new family if I didn't create any novel structure?
homoludens No, it was a pretty clear explanation. Chroma definitely belongs to the Flux architecture.
"According to your statement I should demand from CivitAI to introduce the new category".
Not at all. Simply put, when I looked at the other Chroma checkpoints they were all listed under "Other" rather than "Flux" on Civitai. As I said above I think you're right, they should be listed under Flux since the base model for Chroma is Flux.1.S. So I changed the details on this one.
milt68 OK then.. I though missunderstanding occured.
homoludens Okay well you clearly know more about the functionality of how these work than I do. I do agree that for the most part illustrious, pony, and SDXL could all be in the same category- I'm not sure if pony prompts are set up slightly different but as far as general structure "Can I use a lora with this" it works, but I think a lot of people might be going for a pretty specific look and while the architecture maybe similar the images they are trained on [Now that I think about this.. does it mean that it only affects the prompts or is there an actual structural difference in how the vectors look] differ enough to warrant their own category. Obviously though if you're in a pinch and looking for a very specific lora, I think most users would want the pony and illustrious results when looking for SDXL models.. I certainly have started filtering by all 3. I'm not a coder and have only been messing around with this for a couple months.
That is interesting about HiDream - I only played with HiDream for like a day the way you structure prompts with full sentences is similar. Well I just asked Chat GPT because I'm curious and it gave me a pretty long answer but it said it boils down to:
The HiDream clip model outputs a compatible embedding vector.
You use the node path that accepts embeddings instead of text prompts.
It works because Flux doesn’t inherently care who made the embedding—it just needs the right shape and semantic alignment.
I am curious how do the images turn out aesthetically?
milt68 Yeah I mean maybe it was the workflow example I got which left a multitude of prompts around, or the fact that the only nsfw loras I could find on here were anthropomorphic but it seemed like its primary nsfw usage right now was ~furry art. Which I'm not against its just not generally my "cup of tea" so to speak, unless that tea has been spiked at least.. anyway I did find that the image generation for Chroma did give some pretty great nsfw images of women but it struggled hard to engage with anything having to do with a man and woman eloping.
I am curious do you write your own prompts or are you using an AI to generate them for you because your images in the example for this model are great, and the wording is pretty poetic. I suppose part of me wants this process to retain some sense of human artistry and imo that could best be demonstrated by the language used for prompts. In the end though it does seem like AI's so convenient to use if you know how to use it for every step of the way it tends to win out. I guess maybe finessing the settings to some degree humans still seem to do better because ChatGPT just gives awful suggestions in my experience for guidance and denoise/noise.
d3darth333, I write my prompts and enhance them if I need to make them more descriptive using GPT4All that I run locally using prompt-specific LLMs.
milt68 Are Loras going to produce the same quality results though just because they have similar architecture? Perhaps I should try I haven't played around with that much, until this recent Nvidia studio driver update Flux was pretty taxing on my system and I do mostly nsfw stuff so I kinda defaulted back to SDXL just for convenience.
I guess what I'm trying to get at here are the priorities for civit generally will be this model, embedding, lora, etc works best for not just what is compatible. I get homoludens stance that we can't just call every new model its own checkpoint model category but if it differs enough that the quality suffers if you don't use a Chroma specific lora than I would want a Chroma specific category to find those loras but perhaps what the designers/devs of civitai could do is have a tier based system for search results that first lists ("designed for" models) and then (compatible models) or maybe just have it be a switch/checkbox in the filter menu. I know that doesn't really address the issue of how to determine what qualifies as category worthy but it would at least make it easier to understand for newer people like me as well as give more specific results when needed.
d3darth333 Yep, you should try. Most of mine work even if they give errors in Comfy. A little more complicated to use them in Forge.
d3darth333
That is interesting about HiDream - I only played with HiDream for like a day the way you structure prompts with full sentences is similar.
HiDream (if you managed to finally get it working) gives the VERY GOOD results in sens of adherence and aesthetics, BUT...
Good results can be achieved only with HiDream Full I1 and at steps 50 and around that value. When using GGUF format to fit into 12-16Gb VRAM, changes in steps, resolution (keeping 1Mpx resolution is critical), sampler make image explode immediately.
I managed to get good results with a tiny sweat spot: UniPC + Simple, 1Mpx resolution, 45+ steps, DualGGUF CLIP encoder (Llama + T5). It can render the text as you requested most of the time. It gives photorealistic color palete, realistic anatomical and constructive consistency. BUT it is way too slow. No way to make it working with Steps 20 or lower.
d3darth333 ` am curious do you write your own prompts or are you using an AI to generate them for you because your images in the example for this model are great, and the wording is pretty poetic.`
Flux models use CLIP T5 encoder by Google which loves beautiful, immersive natural language. Llama based CLIP encoders (which Flux seems compatible with) also likes this immersive language. If you aren't a native English speaking person - the only way is to use LLM to convert your draft into a beautiful and rich visual description. This is crucial for Flux and HiDream.
PS: IMHO, ChatGPT sucks in creative task like this. I prefer Claude.
PS2: you can find some of my images posted in Fluxmania checkpoint page. They are produced with ComfyUI workflow containing the Qwen 2.5 VL 3B node which accepted the image as input and produced the quite good visual description. This works well for my Dataset preparation pipeline. And 3B Visual Language model is a super tiny in comparison with ChatGPT. Dunno why ChatGPT don't work for you.
homoludens I had "decent" results, but it was very clear my graphics card was not up to par for its capabilities and it took forever. Now I know why... making me wish I had just forked out for 12 gb instead of 10 back then that seems to be a "breakpoint" for a lot of things in image gen- SDXL lora training at least at 1028x1028 etc.
homoludens 'They are produced with ComfyUI workflow containing the Qwen 2.5 VL 3B node which accepted the image as input and produced the quite good visual description' Ill check them out- So basically Qwen is used as like an auto prompt to describe the image and you feed it in through vae encode to latent- do you use ipadapter or control net at all? I was basically told you needed those early on to do images of a consistent character subject (unless you had like a lora/embed for it) but it does not seem to be the case.
d3darth333 No... Qwen 2.5 VL 3B describes image in natural language and I use Empty Latent to generate image from scratch with automatically generated text as if you wrote it yourself without reference image.
