
Hey everyone,
A while back, I posted about Chroma, my work-in-progress, open-source foundational model. I got a ton of great feedback, and I'm excited to announce that the base model training is finally complete, and the whole family of models is now ready for you to use!
A quick refresher on the promise here: these are true base models.
I haven't done any aesthetic tuning or used post-training stuff like DPO. They are raw, powerful, and designed to be the perfect, neutral starting point for you to fine-tune. We did the heavy lifting so you don't have to.
And by heavy lifting, I mean about 105,000 H100 hours of compute. All that GPU time went into packing these models with a massive data distribution, which should make fine-tuning on top of them a breeze.
As promised, everything is fully Apache 2.0 licensed—no gatekeeping.
TL;DR:
Release branch:
Chroma1-Base: This is the core 512x512 model. It's a solid, all-around foundation for pretty much any creative project. You might want to use this one if you’re planning to fine-tune it for longer and then only train high res at the end of the epochs to make it converge faster.
Chroma1-HD: This is the high-res fine-tune of the Chroma1-Base at a 1024x1024 resolution. If you're looking to do a quick fine-tune or LoRA for high-res, this is your starting point.
Research Branch:
Chroma1-Flash: A fine-tuned version of the Chroma1-Base I made to find the best way to make these flow matching models faster. This is technically an experimental result to figure out how to train a fast model without utilizing any GAN-based training. The delta weights can be applied to any Chroma version to make it faster (just make sure to adjust the strength).
Chroma1-Radiance [WIP]: A radical tuned version of the Chroma1-Base where the model is now a pixel space model which technically should not suffer from the VAE compression artifacts.
Quantization options
Alternative option: FP8 Scaled Quant (Format used by ComfyUI with possible inference speed increase)
Alternative option: GGUF Quantized (You will need to install ComfyUI-GGUF custom node)
Special Thanks
A massive thank you to the supporters who make this project possible.
Anonymous donor whose incredible generosity funded the pretraining run and data collections. Your support has been transformative for open-source AI.
Fictional.ai for their fantastic support and for helping push the boundaries of open-source AI.
Support this project!
https://ko-fi.com/lodestonerock/
BTC address: bc1qahn97gm03csxeqs7f4avdwecahdj4mcp9dytnj
ETH address: 0x679C0C419E949d8f3515a255cE675A1c4D92A3d7
my discord: discord.gg/SQVcWVbqKx
Description
FAQ
Comments (397)
Simple questions: Why use Chroma vs QWEN, Flux, Krea, SDXL, what edge does it have? What do I get out of it that could be different? Which version to use with low VRAM? How to properly prompt it? The author does not say much about it.
Compared to SDXL, Chroma is more capable. Compared to the others, Chroma is both trained from the ground up to be uncensored, and to work well as a base for others making their own finetunes. It is also slightly more lightweight than FLUX/Krea/Qwen.
@aiaiaiai829569 Thanks, this makes it stand out. I guess the AIO is the one to go for? Any special prompting requirement?
@Learning2025 AIO is an unofficial thing made by some guy, just download the official version here and use that.
Chroma is unslopped and uncensored. It's trained on NSFW and can do porn out of the box covering a wide range of fetishes in a way that can't be replicated stacking a hodgepodge of loras on a censored model.
I get much more realistic results with Chroma, than every other model I tested so far.
@Learning2025 There is no one specific prompting requirement, but it works most closely to Flux/Krea. In my experience, it is also very sensitive to prompt changes and the negative prompt. If you are stuck with plastic skin, unrealistic style, or other issues, I recommend copying the prompts of others and then modifying them to your liking.
Chroma´s the best, fuck the rest.
I love Chroma because of the insane range. Scroll the community feed below, or check my profile, to see how diverse Chroma can be. It’s uncensored, handles NSFW and gore, and it’s Apache 2.0.. It’s uncensored, handles NSFW and gore, and it’s Apache 2.0.
From my tests, Qwen can repeat looks across seeds, Krea feels a bit too polished, and stock Flux misses some concepts. With Chroma I usually don’t need LoRAs for my use. Some Flux LoRAs can work with Chroma, but you’ll need to experiment. Also, it’s open-source 😎.
I prefer plain-language prompting over SDXL’s tag soup. Qwen, Flux, Krea, and SDXL all have strengths, so the choice is yours. I use Chroma 1-HD GGUF Q8 on an RTX 3060 12 GB and I’m waiting to try Chroma1-Radiance when there’s a GGUF.
About a prompt guide: from what I understand, Chroma was trained on a slightly different dataset mix, including Danbooru-style tags, so prompting is a bit different from standard Flux while the overall logic stays similar. I’m not a pro and I’m still learning, but if you want to improve your prompting there are solid threads on the official Chroma page on Hugging Face — just scroll a little: https://huggingface.co/lodestones/Chroma/discussions?not-for-all-audiences=true
You can also search Reddit and ask questions in the Discord community.
Guys, thank you all.
Chroma is on a different league regarding well...nsfw, but also just general prompt adherence
@Jorot Just a heads up. I also have a 3060 and found that using f8 is about 20% faster than Q8 GGUF with no loss of anything.
@mweldonsd594 I will give it a try, thanks!
A.M.A.Z.I.N.G
If you want to see the photorealistic potential of Chroma HD:
res_6s
sigmoid_offset
50 steps
cfg 4
flan_t5-xxl-F32
Profphotos Lora 0.75
Very, very slow (14s/it for me), but these are photos, not AI images. And that's with every perversion you can think of. :-)
I'm sure this works but seems a bit overkill, I get great results using res_2s and bong_tangent with 20 steps and you don't need any lora at all, just the right prompt.
Try this in the positive prompt: "Professional photography. Bokeh. Flickr. Instagram. OnlyFans. 2010s." (yes, "onlyfans" really works)
And this in the negative prompt: "Low quality. Low resolution. Minimal detail. Blurry. Harsh lighting. Bad anatomy. Body horror. Horrible hands and feet. Broken fingers or toes. Extra fingers or toes. Missing fingers or toes. Unrealistic. Cartoon. Anime. Comic. Painting. Drawing. Illustration. Watermark. 3D. Plastic. Fake. Airbrushed. Photoshop. AI generated. Slop. Monochrome. Desaturated. Sepia. Polaroid. Green tint. Yellow tint."
You have to be detailed and precise prompting for a specific style because the model is super sensitive to specific words. For example. things like "realistic", "high quality" or "high resolution" in the positive prompt can actually make it worse because real photos aren't captioned like that. You have to think about how photos are actually described, e.g. 2010s as the era the photo as taken in, while avoiding words more likely to describe artwork (that's why realistic is bad, people describe artwork as realistic, not anything that actually passes for real).
I know this all sounds super anal but once you get it right I promise the results are worth it.
so effectively 300 steps give you photorealism? wow... (res_6s is doing 6 steps per one step, so 6*50 = 300)
@Kaleidia Probably but I'm using res_2s with 20 steps (doing 2 substeps per step, so 2*20=40) and get good results, see my comment above for details. The prompt is more important than anything else.
@nuclear_diffusion_ the m version is not doing full substeps, it is generating steps (according to the number) in between but those are just approximations and not full steps. at least that is how I understood the res4lyf package description: https://github.com/ClownsharkBatwing/RES4LYF?tab=readme-ov-file#sampler-settings
@Kaleidia Yeah but I'm using res_2s not res_2m...anyway, the point is you can still get good realism without a super slow workflow.
Haha. just for Science. 27s/it on my RTX 5060Ti 16GB in my main PC :D
@nuclear_diffusion_ sorry, got my wires crossed there somewhere, using res_2s with 20 steps as well and it is more than enough for all kinds of images I do atm...
50 steps with res_6s will melt your PC. Well, not really, but you get the point. :) It's an absolute overkill, as others already suggested. I'm getting amazing images even with res_2s at ~30 steps. The other suggestions are good.
Low VRAM/RAM users, will have to use the FP16 or Q8 GGUF version of the t5-xxl text encoder and load it into RAM instead of the VRAM to make space for Chroma.
I don't understand this discussion at all. I am a photographer and have been trying to take photorealistic pictures since SD 1.4. As an addition to my portfolio. To photograph “models” I can't afford at “locations” I can't get to. But here, it's always about doing something quickly. No, not 50 steps, rather 25 or 23 or even better 22. Why? The most important thing for me is prompt following. I want to write what I want and get it. Not score 9, score 8, score 7... or “only fans.” What's wrong with you people? When I have an idea and the images are implemented in such a way that I can use them as source material, I'm happy. If I need 5000 steps to do that, I don't give a damn. And if my computer needs 2 days to do it, I don't give a damn either. Why do I have to produce 100 images a minute? To jerk off faster?
https://civitai.com/images/47418498
https://civitai.com/images/45861979
@nanunana hey, you are totally right to do it your way, it just felt a bit odd to me at least. Speed does not factor in if the result is what you want for me as well, my standards are just a bit lower, so getting something nice out of roughly 40 steps is enough for me. I am also not a fan of these strange tags like "masterpiece or image platforms", to me they just clutter up the prompt and in some cases even confuse the text encoder. Write full sentences and you are mostly on the good side. Chroma is a bit fiddly here as tags can influence the outcome if used too much, it just uses T5 and that wants natural language aka sentences with proper use of comma and fullstops. On a tag list without any order you will get mostly some anime-cartoony image without any proper style.
Even the negative prompt should be in natural language, it just works better imo...
@nuclear_diffusion_ if you use word "flickr", you will get flickering stripes on the photo, thats not a feature people want to see in a photo like its has printer cartriges not aligned :D
@blhll I haven't noticed that personally but the prompt probably still works if you remove it. The main idea is to give the model context cues for what sort of image it is, e.g. the prompt includes "Flickr" so this is likely to be a professional photo, or the prompt includes "Snapchat" so this likely to be an amateur photo. Because the model is captioned on that sort of data and understands the association.
@nanunana I suggested "onlyfans" because the site is 99.99% real photos and the model is captioned with that sort of data in training, so including it as a context cue biases the result towards realism. That's what you want, right? I'm not trying to fuck with you I'm suggesting it because it works.
Chroma has good prompt adherence but it's not perfect and the model can't read your mind to know exactly what you want, so tricks like this are often helpful to nudge it in the right direction. The model is trained on a wide variety of styles without bias towards any particular one so you have to steer it to get what you want.
Your method of waiting 2 days of an image gen might work for you but other people looking for advice might be disappointed with that, so I thought it might be helpful to suggest an alternative that still gets good results. You're welcome to continue doing things your way.
@nuclear_diffusion_ thats a new info about OnlyFans for me, but u sure that are real Pictures? Here are hundreds and hundreds model for Social Media and i was often requested which is the best model to create nsfw content for social media...
@nanunana I don't know if you could build a reliable workflow for that yet since the model is still new and there's not much yet in the way of loras. But definitely the word "onlyfans" in a prompt has a positive effect for photorealism. Try it yourself and see.
@nuclear_diffusion_I don't see any difference there; your prompts must have differences elsewhere. Only the word “Onlyfans” doesn't change anything here. Maybee the negative...
@nanunana Chroma is hard to prompt, because they used a fairly small/limited model for captioning and used a wide variety of image styles. Which is why a lora for a style, like photography probably helps. You can't just used intuitive natural language with the base version, and weird little hacks like others mentioned (not nessasarily THOSE ones) work, all for that reason. Limited/weak captioning, and a diverse dataset.
That generally makes the base model, IMO, not ideal for most people. Unless you want to really mess with the prompting anyway. But that is the creators intention - that it is a blank slate for others to train, not a thing intended to be stand alone, in itself.
Hi what's new with v1.0-HD-rev-0.1 from old v1.0HD? What's the difference?
The revised version is fixed
@2P2 fixed in what sense
According to the notes on huggingface, 0.1 was rebased on Chroma v48. The first release was based on v50.
According to the author, it was overtrained, which is apparently bad for the model and for future fine-tuning. The revised version fixes this.
@2P2 I'm getting weird graphic image when i use the workflow from your official tiger eye png image. I downloaded every file necessary. Like you did i also combine two Checkpoints chroma_v10HDRev01.safetensors (17gb) and chroma-unlocked-v48.safetensors (17gb) all same prompt (Extreme close-up photo of a tiger eye. large title "Chroma1" overlayed in the center of the image) and setting but still a strange blue screen image. Why is this?
Is it neccessary to use two checkpoint to get the best quality image or one if enough?
@haidensd58757 I simply use 48. It'd be like refining with any other model type, you're trying to use the other model to improve what the other lacks. You could make a hard-baked merge of those two models, but its much quicker to tune the weights, even if you probably are swapping vram each time. You could also bypass or have a pause inbetween the passes. It's a common pattern when using pony+sdxl and chroma/qwen+wan.
So what are the vram requirements? If I can run flux dev can I run this on a 12gb card? What about fp8?
Yes, yes, you can do both
It's lighter than Flux Dev, so if you can run that, you can run Chroma.
@2P2 it works for me, it just takes forever to load, but once loaded, it takes around 5 minutes. does this need vae? Because problem is the faces are distorted.
@Starry_Eyes It uses the standard Flux VAE, so just use that.
Faces, especially in the distance, can come out distorted, yes. The model hasn't been fine-tuned yet, so expect less than ideal coherency with objects that are further away. If you use ComfyUI, you can somewhat mitigate this by using some of the more advanced samplers such as res_2s from the RES4LYF nodes, or by generating your images at higher resolution. Chroma tolerates up to 1536 px in both dimensions perfectly fine. I think you can push it even to 2k, but grid artifacts might appear in this range,. Or you could generate at lower resolution and use something like the UltimateSD Upscaler to upscale your original low-res image and fix the artifacts.
Also, I can recommend you to check the Skimmed CFG nodes and use CFG skimming so that you can increase the CFG in the sampler without "burning" the image in order to solve some of the coherency problems. You can drag some of my recent Chroma images in ComfyUI to see how to use the Skimmed CFG node.
I hope this helps!
@mmdd2543 hey so how the heck do I achieve photorealism? Everything still has that AI generated look. Im using 50 gguf, euler seems to work the best, and I do use shot on sony camera, etc for the prompt.
What cfg range should I be using?
@Starry_Eyes The end result, be it photorealism or more artistic style, is a complex mix of many factors - The precision of the diffusion model and the text encoder (FP16/FP8/GGUF8/GGUF6 etc.); the usage of the right LoRAs; using high enough step count; using a good sampler + scheduler combination; using tricks to manipulate the noise which can make the image even more detailed and so on.
To break this down, always use the highest precision diffusion model and text encoder that can fit in your VRAM and system RAM. The diffusion model should ideally be in FP16 precision, but if your VRAM does not allow this, the next best quality level is GGUF Q8. It's a bit slower than FP8, but it's closer to FP16 in terms of quality with only a small hit to generation times. If that's still too big for your VRAM, go one level down (GGUF Q6 and so on). The text encoder can be cached to system RAM without slowing down the generation speed, so if you have a lot of ram (at least 32 GB), you should be able to comfortably use a T5 text encoder in FP16 precision.
Sampler and scheduler combination is a very broad topic, but generally, I like to use res_2s + bong tangent at ~30 steps, or res_3m at 50-60 steps when I want the best quality out of Chroma. This is more steps than I would normally use with other models with these samplers, but it's necessary since this base model hasn't been fine-tuned yet.
As for CFG, in terms of photorealism, I've found the ideal CFG range for Chroma to be 3.5-4.0. Unfortunately, since this model hasn't been fine-tuned yet as I already mentioned, using CFG of 3.5-4.0 can result in incoherency such as messed up fingers, faces, and other small features in your images. That's why I use CFG skimming or Adaptive Projected Guidance (APG) to be able to crank up the CFG and force the model to "draw" the details more coherently, while also supressing the burning effect that inevitably results from high CFG thanks to the CFG skimming or the Adaptive Projected Guidance.
What about prompting? You should write in natural language since this is what the T5 text encoder understands. Also, use terms that one would use to describe a real photo in order to steer the model into photorealism. "Sprinkling" in photographic terms like "Shot with XYZ camera with XYZmm lens" or "Shot on Kodak Gold 200 analog film" (if you want to emulate film photography look) helps too. In the negative prompt use negative qualifiers (low quality, jpeg artifacts, etc.) and also non-photographic styles (watercolor, sketch etc.).
Last but not least, if you use ComfyUI, you can drop any of my images in ComfyUI and the workflow will be automatically recreated for you. You can study these workflows to see how they helped create the images.
If you enable partial loading and low vram options on whatever tools you use vram wont matter as long as your system ram (and potentiall virtual/swap) can handle the overhead.
@mmdd2543 Thanks for the tip with cfg skimming. which values are you using in the cfg skimmed node (and which node from the 4 in the pack?) in comfyui? I always had problems with bad hands and feet with the final release version of Chroma1 (v50).
@mmdd2543 I'd love to look at some of your workflows but unfortunately every image I downloaded from your profile - the JPG don't seem to contain any of the workflows. For example:
Good model. Is there any generation guide?
No, there are just random recommendations. How would the guide look like?
use natural language in both positive and negative, test the different samplers and schedulers, everyone has their prefered combo (mine is res_2s and either sigmoid_offset or the beta scheduler with custom settings or bong_tangent if lazy) most of that is in custom nodes or packages, so base is euler with beta scheduler. Gives ok images but can be better.
does the F8 model need a VAE?
Yes. If you don't have the FLUX vae already, you can download it from here:
https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/ae.safetensors
Put it into Flux category. Nobody checks other category.
It doesnt do any harm, if it is in Flux. Big finetuners didnt even heard about it.
It already has 1,492 likes on HuggingFace in total, and it is more than just distilled Flux, not similar
By that same logic, Pony, Illustrious, and NoobAI should be placed in the SDXL category as well. It doesn’t do any harm if they’re listed under SDXL, right? After all, they have far more in common with each other than Chroma does with Flux.
Its just marketing, dont compare to huggingface. Civitai tuners, mergers are mostly on Civitai. I dont want big tune, just put some fixing and people here are capable to do that.
Enough, if somebody merge loras in it.
Can 4070S handle Chroma?
ohh, it is overpowered and it runs Chroma, of course. Even older GPUs run Chroma very well if quantized
@2P2 FP8?
@marcin3005 Considering that 4070 Super has 12GB VRAM, the FP8 precision should work fine for you. If you are willing to wait a bit more for you images, but get quality similar to FP16 precision, you can try the Q8 GGUF version here.
my 3060 can handle chroma, takes about 3 minutes.
@Starry_Eyes Try fp8, or better yet, fp8 flash with cfg1 and 8 steps. Should be much faster.
What settings do you use? What sampler, scheduler, dcfg, cfg, steps?
Results of my attempts look more like a child's drawing than photorealism. :(
I use fp8 - is it very bad?
fp8 isn't bad and not the reason
Try https://civit.ai/models/1908534
Lora won't help if basics are weak.
start with CFG = 2.5 / Steps = 20 / Sampler = DPM++ 2M or Heun / Scheduler = SGM_uniform or Beta. Increase CFG and Steps as needed. Start your positive prompts with things like the names of specific camera models and/or "professional photo, 85mm lens, f/8 aperture, 1/30 shutter speed, ISO 100". Start your negative prompts with non-photo image types like "sketch, drawing, illustration, painting, cartoon, anime, 2d, 2.5d, cgi, render", etc. Note that Chroma has a tendency to switch to non-photo style if your prompt subject contains non-real things like fantasy creatures (dragons, orcs & other whatnot) and settings, so the positive/negative prompts here can help ward that off. So can the LoRA @2P2 mentioned.
It's also helpful to read through other comments, such as the ones @Starry_Eyes posted recently, to get more answers to questions similar to yours.
standard workflow w. fp8 & meaningful prompting. basics are next lvl without loras
Thanks for trying to help, but nothing works.
Results are terrible, every time. I think I've tried every setting I've seen here. Maybe I downloaded the wrong model, VAE, or something else, maybe I'm just a noob. The results are much worse than anything I've tried with non-Chroma. This one is definitely not for me, although I see you are making good use of it.
@marcin3005 You can maybe start with the really fast and easy to use unofficial rapid all in one version of chroma. put the v2 ( i have not tested v3 yet) safesensor file in your checkpoint folder and use the basic text to image comfyui workflow. use CFG 2 and 12 steps and euler beta. https://huggingface.co/Phr00t/Chroma-Rapid-AIO For realistic images i always start with "an amateur photo of a...." or "a professional photo of..." The tag "UHD" at beginning can boost the image quality but sometimes looked overcocked.
Hey, just made a test with this beautiful contribution.
https://civitai.com/images/98248382
In my opinion, chroma likes coherent and meaningful sentences. Preferably long and detailed. Two example images from the prompt quick and dirty edited found in the chroma gallery without changing the comfyui settings or seed with my setup.
https://civitai.com/posts/21827006
Original:
Photograph, dynamic composition, dynamic pose, action shot. Lighting is: black and white, monochrome, soft light, diffused light, reflector, subtle contrast, beauty dish, high key portrait, (light particles, bokeh, silhouette:0.7), (depth of field:1.5). Colors are: ivory white subject and navy blue highlights. Subject is: Woman, solo, embarrassed, spoken blush, freckles. Her hair is crew cut, big hair,. Her eye color is purple. She is wearing tuxedo, lock and key choker, . She is Looking back, . Background and details are: background\:snow, abstract, whimsical, weird. View is: fisheye, masterpiece, best quality
Rework:
A action shot of a girl while the Lighting is black and white, monochrome aestetics with soft and diffused lights and reflectors shines, subtle contrast comes in with high key peaks. light particles flowing in with a bokeh effect bringing a silhouette with depth of field and the Colors are scattered in ivory with white subjects and navy blue highlights. she is looking embarrassed, having blushing cheeks with freckles and crew cut hair with her eyes purple. She is wearing a tuxedo, a lock and a key choker. She is Looking back while the Background and dense details are snowy, a abstract, whimsical and weird View in fisheye focus.
@marcin3005 Hi! I tried to help another user who similarly had problems with achieving good results. You can refer to my reply to them here. In it I explain what to do to get better results. I hope this helps! Don't hesitate to ask any questions. I'll try to help, if I can.
@marcin3005 Just for the sake of my own curiosity, how are you wording your prompts? I'm asking because there's something of a learning curve if you're used to prompting for SDXL/Pony/NoobAI/Illustrious models and are overly-reliant on *booru tags. While Chroma can use them, it's way more responsive to "natural language" descriptions of the images you want to create.
Check this out: @Jorot posted this image a few days ago and the posting contains the full positive and negative prompts used to create it.
https://civitai.com/images/97558235
It makes for a very good example of a basic photorealistic prompt you can use as a reference.
Quick side note for folks who would mention using "the basic ComfyUI workflow": Please bear in mind that not everyone uses ComfyUI so that advice isn't always helpful while specifics can be. I'm using Stable-Diffusion.cpp myself, which is a command-line app, because I need Vulkan for rendering instead of CUDA or ROCm. (plus I absolutely can't stand patch-panel style UIs.)
@rasta040 and @mmdd2543 - Heh... I guess we were all writing at the same time. ;-)
@marcin3005 - Yeah, what they said.
@MrSnichovitch Gee, thanks :) I agree with all of the above.
In my case I just slap together what ChatGPT gives me after feeding it notes and tips I’ve collected from Hugging Face, Reddit, and the Chroma Discord. I may tweak something, but I’m usually too tired after work to think :d.
@marcin3005 Scroll down and you’ll find many talented artists sharing their images with prompts, try recreating them with your own ideas. I believe it’s one of the best ways to learn the basics.
I’m also thinking about writing a short Chroma prompt guide, posting it in the official Discord for feedback, then cross post to CivitAI. I’m still learning myself, but it could help people get started. Chroma is crazy versatile, can’t wait for Radiance.
Thank you all for your interest in my problems.
I'll try to try all the tips, but it will take some time. I'll respond successively.
@rasta040 here are the test results.
original:
https://civitai.com/images/98387381
rework:
https://civitai.com/images/98387545
In my opinion, it's hard to be satisfied with the results :(
@marcin3005 @Starry_Eyes An aspect of one megapixel would be desirable (portrait and landscape both works nice w. chroma) to provide more details. I am simply running euler, 25 steps, beta and despite 16GB vram i love the original FP8. Prompting is one (crucial) element.
@marcin3005 Yep... definitely something weird here. Don't think it would be the t5xxl encoder because the prompts don't contain anything extraordinary that would throw any version of it for a loop, at least not in my experience. Try using the euler sampler (plain euler, not euler_a) and the beta scheduler as @rasta040 stated they're using above and see if that helps.
On that note, you're using the standard FLUX.1 Dev vae, correct?
@MrSnichovitch Yes, standard ae.safetensors file. Same one I used with Flux.
@Jorot You're right, watching others is a good learning experience. It's worked so far, but not with Chroma.
I'm convinced that in my case, it's not about prompts or settings, but something more technical.
Maybe I should have asked this question from the beginning. Could this be why I'm using FORGE, not COMFY like everyone else?
same thing for me. hands and face all distorted and it always looks cg, never photorealistic.
Ive tried using both t5xxl fp 16, and fp8 encoders. Been using chroma HD float scaled learn topk8, and have tried with the regular chroma hd, and 50. Same results. Steps around 30, sampler euler, sometimes res multistep. same thing. I havn't tried res step 2 yet.
I also switch to the gguf versions too.
@MrSnichovitch Its ae.safetensor right?
@Starry_Eyes yep, that's it.
@MrSnichovitch so im also looking into the beta scheduler like you suggested, I don't know what node sigma is supposed to attach to.
@marcin3005 @Starry_Eyes I'm not running Forge, but this may be related: in stable-diffusion.cpp, two specific parameters need to be set to use Chroma, "--guidance 0" and "--chroma-disable-dit-mask". "--guidance" is for setting the distilled guidance scale (normally 3.5 with SD models), and Chroma produces garbage if it's not set to zero in this particular program. "--chroma-disable-dit-mask" is something of a secondary flag that needs to be set if t5 masking is disabled, which s-d.cpp does by default (seems weird, but this is a weird program still in heavy development.) If there are settings in Forge that allow you to set distilled guidance scale to 0, and disable both DiT and t5 masking, try it and see if it helps.
@MrSnichovitch does comfy have that? Do I put "--guidance 0" and "--chroma-disable-dit-mask" as command lines for the start up where the --lowvram is?
I've put them in the cmnd line, nothing seems to really have changed.
@Starry_Eyes Sorry, no. Those parameters are specific to stable-diffusion.cpp and I don't use ComfyUI. The idea is that you need to find equivalent settings in Comfy and see if changing them makes a difference... If you already see a setting in some node labeled "Distilled CFG Scale" ( not the normal "CFG Scale", but a separate setting), set it to zero or the lowest value possible. Hopefully, someone running Comfy who knows where that is can help because I wasn't able to find anything in the ComfyUI wiki.
@marcin3005 You definitely should see a "Distilled CFG Scale" slider in Forge. Should be to the left of the regular CFG Scale slider according to a screengrab from June 27th I found on the Forge github page. Set that to the lowest value possible (if not zero) and see if things improve.
@MrSnichovitch so ive been trying the clownshark sampler, it helps with photorealism, problem is everything still looks badly photoshopped.
I guess All I can do is hi res fix the images in kontext or regular flux for now.
@marcin3005 I remember chroma needs specific ComfyUI nodes to have shifted sampling to achieve best results, can't be loaded as normal flux models. IDK is forge supports Chroma officially or it just load Chroma like a flux dev?
@MrSnichovitch dcfg = 0 gives better probability to recieve anime/cartoonish outputs. For better probability of photorealism dcfg > 2 .
@marcin3005 try sets: Euler/Beta or DPM2_a/Beta, CFG=4-6.
@marcin3005 Okay... so reading a number of comments in other threads has led me down a selection of rabbit holes, and one brought me to a couple of github pages that seem to have pointed to a golden carrot for you:
https://github.com/maybleMyers/chromaforge
This is a version of stable-diffusion-webui-forge that's been specifically patched to work properly with Chroma, utilizing patches from https://github.com/croquelois/forgeChroma. The maybleMyers fork of Forge is being kept well up-to-date (last patch was on Sept. 2nd), so you should be able to use it as a drop-in replacement for your current version of Forge be able to get workable results with Chroma and still use whatever other models you've been using.
@mphobbit Could probably be chalked up to differences in generation programs, but Distilled CFG values > 0 in s-d.cpp lead to absolute body horror and incoherent images. Avoiding cartoony results is more a matter of prompting tweaks. One can never know if a given setting will improve or degrade results without testing.
@MrSnichovitch
> Avoiding cartoony results is more a matter of prompting tweaks. <
No, with the same promt it gave more anime with dcfg<2 (experimented both locally and on TA). Probably dcfg adds somehow to regular cfg. Ahh, forgot, if it doesn't want, prompting tweaks can be bypassed. It's not only about anime-related topics, Chroma liked to fall into anime for general sci fi until 40s versions.
> but Distilled CFG values > 0 in s-d.cpp lead to absolute body horror and incoherent images
On the fixed seed dcfg has no sufficient infulence.
UPD:
This picture https://civitai.com/images/99065632 generated with dcfg = 3.5
And this one - https://civitai.com/images/99066364 - with dcfg =0.
@mphobbit For my own edification, what program(s) are you using and what backend (CUDA, ROCm, CPU, etc)?
@MrSnichovitch wrote you in PM (just not to flood comments)
@Beezer79 which workflow to use this AIO ser?
You can try the workflow there : https://civitai.com/images/99728949 I can get relatively consistent photorealistic style with it.
Great model. Quick question: What, if any LORAs work with this?
Nope
Flux loras can work but not always, depends on how they were trained. There are some adapted to Chroma, mainly from Silveroxides. On Forge you must change a bit of code because Forge freaks out if there's too much difference
https://github.com/croquelois/forgeChroma/issues/4#issuecomment-2864621714
TijuanaSlumlord has already made some great LoRAs specifically for Chroma — give them a try!
Civitai won't add Chroma as a model type, so discoverability of Chroma loras very difficult.
I have't tried a lot of LoRAs with Chroma, but the few that I did try worked pretty well. For example, I use these a lot in my images: Chroma - Professional Photos, Grainscape UltraReal, Sony Alpha A7 III Style, Background LoRA. All but the first one were trained on Flux Dev. I hope Civitai will add a Chroma category for easier discoverability of Chroma content.
@ailu91 Forge supports Chroma since June without extensions.
@mphobbit Yes, But they didn't touch lora_unmatch. I'm not saying to use an extension (which was merged to Forge, hence the support), but explaining why LORA might not get loaded. Forge will refuse to load LORA that has too many mismatches to Chroma, basically stuff that was trained for Flux- but they MIGHT work if we remove the restrictions, as there may be other issues not on Forge's side.
please people, don't hide your prompts for this, they are crucial for getting anything good with this model. sharing with others would really help us.
Are the files of "VAE, Clip, text_encoder" using the Flux model? Do I need to download "VAE, text_encoder" from your website?
No need for CLIP. For VAE, just use the standard ae.safetensors that works with FLUX. As for text encoders — go with whatever you like. You can stick with the default FLUX options like t5xxl_fp16, t5xxl_fp8_e4m3fn, or try other T5-based encoders like gner or flan.
@Akalabeth Do I need to download his "text_encoder" folder, which contains two files of 4.5G and 4.9G, and place them in the corresponding local locations?
@sunweixi1993786 No, this is for generating through Diffusers. Just use the same stuff you’d use for FLUX.
@Akalabeth May I ask if this model can use Flux's Lora?
New prompt to brute force realism. Useful when generating anime characters. Remove ''The photo has natural motion blur from camera movement.'' if you want a sharper image at the cost of realism:
The image captures a raw, illicit Snapchat photo, imbued with the spontaneous, personal feel of a stolen moment. The lighting is a harsh, direct flash from a smartphone, creating stark highlights and deep shadows, typical of an impromptu photo. Overall impression is one of a low-to-mid quality smartphone photo, vertical in orientation. The photo feels illicit, personal. Candid photo using an iPhone camera. The photo has natural motion blur from camera movement. Reddit. Snapchat. OnlyFans. Flickr. Twitter. Facebook. Instagram. Amateur. 2010s.
A high detail photograph taken using Kodak Portra 400 film, with a 55mm lens. high quality professional RAW photograph. candid amateur photograph. vintage film photograph. film. film grain. bokeh. depth of field. clear details. cosplay.
A candid amateur vintage photograph, resembling an accidental snapshot. The photograph lacks a clear subject and has a chaotic, awkward composition. The overall effect is deliberately candid and amateur.
Negative Prompt:
sketch. drawing. illustration. painting. digital art. cartoon. anime. 2d. 2.5d. unreal engine. 3D render. CGI. computer graphics. fake. synthetic. artificial. distorted. over-saturated. over-processed. low resolution. low quality. low detail. pixelated. image noise. bokeh. blur. blurry. blurry background. airbrushed. unrealistic skin. plastic skin. waxy skin. waxy appearance. porcelain skin. doll-like.
Add to start of negative : "Make everything look artificial and fake. Use flat, cartoon-like rendering styles, similar to caricatures or digital illustrations. Depict humans with exaggerated, distorted, or cartoonish features, as if drawn in a caricature or comic style. Apply stylized effects like paintings, anime, vector art, or comics. Prefer smooth, flat surfaces with minimal texture and detail. Use simplified shapes and soft, low-contrast colors. Avoid realistic lighting or depth. Use unnatural shading and overly clean surfaces. Avoid anything that looks natural or lifelike. Introduce flaws like blurry textures, washed-out colors, distorted anatomy, or unrealistic proportions. Emphasize artificial lighting, visual effects, and surreal compositions. Prioritize low resolution and minimal surface detail. Include visual traits often found in AI-generated art, such as plastic textures, soft blur, poor anatomy, extra limbs, or inaccurate facial features. The final result should not resemble anything captured with a real camera." This should be better : "Use flat, cartoon-like rendering styles, similar to caricatures or digital illustrations. Apply stylized effects like paintings, anime, vector art, or comics. Prefer smooth, flat surfaces with minimal texture and detail. Use simplified shapes and soft, low-contrast colors. Avoid realistic lighting or depth. sketch. drawing. illustration. painting. digital art. cartoon. anime. 2d. 2.5d. unreal engine. 3D render. CGI. computer graphics. fake. synthetic. artificial. distorted. over-saturated. over-processed. low resolution. low quality. low detail. pixelated. image noise. bokeh. blur. blurry. blurry background. airbrushed. unrealistic skin. plastic skin. waxy skin. waxy appearance. porcelain skin. doll-like."
ALSO, I have wasted so much time generating at a size too small. Use something larger like 1024x1280 instead and I start to see much clearer results -- even using the "hyperspeed" lora at 16 steps.
@tumor1486 I don't think you should give orders in the prompt since this is not how the trained images are captioned. You are only needlessly increasing the token count. So instead of writing "Make everything look artificial and fake" try "everything is artificial and fake" (drop the "make").
Also, you should not make your prompts excessively long. Your example negative prompt is way way too long. Chroma has 256 tokens limit, if I'm not mistaken. That's for the combined positive and negative prompts. If you go over that limit, prompt adherence will suffer and image artifacts might appear. Try to be descriptive with fewer words. You don't really need to write an essay to get a good image out of Chroma. :)
@mmdd2543 Disclaimer: I am not an expert.
From CHATGPT:
https://huggingface.co/docs/diffusers/api/pipelines/chroma
"Key Excerpt (from the ChromaPipeline doc):
The call method’s parameter list includes:
max_sequence_length: int (defaults to 512) — Maximum sequence length to use with the prompt.
Hugging Face
+1
This clearly states that the default maximum token length for inputs to ChromaPipeline is 512 tokens."
Depending on length of a prompt you can go up to 1024 it seems. t5TokenizerOptions: min_padding at default: 1. min_length : 0, 256, 512, 768, or 1024, depending on length of a prompt, have not tried other numbers.
My current negative with min_length at 256 for "t5TokenizerOptions" :
"Use flat, cartoon-like rendering styles, similar to caricatures or digital illustrations. Apply stylized effects like paintings, anime, vector art, or comics. Prefer smooth, flat surfaces with minimal texture and detail. Use simplified shapes and soft, low-contrast colors. Avoid realistic lighting or depth. sketch. drawing. illustration. painting. digital art. cartoon. anime. 2d. 2.5d. unreal engine. 3D render. CGI. computer graphics. fake. synthetic. artificial. distorted. over-saturated. over-processed. low resolution. low quality. low detail. pixelated. image noise. bokeh. blur. blurry. blurry background. airbrushed. skin_reflection. perfect_skin. caricature_like"
Positive equally important, anything that mentions photography terms, something like:
"Professional photography. Bokeh. Flickr. Instagram. OnlyFans. 2010s. This image is ultra detailed RAW professional photograph, captured with a Leica M11, 50mm lens, moody tones, razor-sharp subject, natural light with subtle shadows, full-frame sensor detail, timeless Leica color rendering, film-like texture, perfect clarity photo, intricate details."... rest of a prompt, for example.
also that is with dpmpp_2m with bong_tangent or beta, or euler with beta,
for example res_2s or dpm_2_ancestral is 2 times slower
@mmdd2543 also part from chatgtp: "1. Stronger Text Encoders (like T5-XXL):
These models do detect the difference between instructional and descriptive phrasing.
Instructional phrasing (like “Use...”, “Apply...”) often yields:
Slightly stronger visual enforcement of those styles.
Better alignment with prompt intent in image generation.
Especially important when using compound prompts with many descriptors.
2. Smaller Encoders or Non-instruction-Tuned Ones:
Might not differentiate much between the two.
They just extract token-level meaning and semantic similarity — both versions might encode similarly.
3. Image Model’s Role:
If the image model (e.g., Chroma) is well-aligned to the text encoder, the difference can be meaningful.
If it's loosely coupled or doesn’t deeply "listen" to embeddings, then differences may be marginal."
Using draw things on my 5th gen iPad Air with M1 chip, it only produces grey squares or a pixelated grey image. Can someone please help me fix this?
My settings are:
{"batchSize":4,"model":"chroma_v50_f16.ckpt","seed":2915148897,"cfgZeroInitSteps":0,"loras":[],"decodingTileHeight":640,"separateClipL":false,"speedUpWithGuidanceEmbed":true,"hiresFix":false,"zeroNegativePrompt":false,"maskBlur":1.5,"teaCache":false,"strength":1,"clipSkip":2,"width":1024,"maskBlurOutset":0,"cfgZeroStar":false,"batchCount":1,"controls":[],"steps":22,"decodingTileOverlap":128,"causalInferencePad":0,"tiledDecoding":true,"height":1024,"preserveOriginalAfterInpaint":false,"guidanceScale":4.2000000000000002,"decodingTileWidth":640,"shift":3.1581929,"tiledDiffusion":false,"sampler":12,"seedMode":2,"resolutionDependentShift":true,"sharpness":1.1000000000000001}
cfgZeroInitSteps":0 ?
Is there any workflow recommendation that is easy to use and reduces memory usage?
Trying to get a Radiance [WIP] model to work in ComfyUI, I have converted it to .safetensors and get this error on run: UNETLoader
ERROR: Could not detect model type of: D:\ComfyUI\ComfyUI\models\diffusion_models\chroma\2025-09-09_22-35-05.safetensors
Any suggestions?
I mean it's a good bit of work to do. You need to apply this pull request: https://github.com/comfyanonymous/ComfyUI/pull/9682 or just check out this repository/feature https://github.com/blepping/ComfyUI/tree/feat_support_chroma_radiance
Then make changes to the stock workflow using the stub vae and empty latent image.
And what you will find after all of that is a model that does not currently produce images all that well compared to the currently released models (at the moment) and takes like 30%+ longer to generate.
@zoot_allure855 Thanks!
radiance support with the new comfyui version. https://github.com/comfyanonymous/ComfyUI/releases/tag/v0.3.60
Congratulations on getting the Chroma base model category added!
Hell Yeah!
@Jorot congrats, it took 10 months (or 1 year since Chroma Fur Alpha)
Is this online generation the next step ?
It is still being trained but there is also a 2k res version that is far better already imo: https://huggingface.co/lodestones/chroma-debug-development-only/tree/main/2k-test
hi, which scheduler do you use with the er_sde sampler?
I use beta or sgm_uniform, kl_optimal is possible at your own risk as very specific.
Has any a good working workflow for this 2K Version? I still have issues with bad hands and feet. Thanks.
its pth format, do i just change the format to safetensors??
@Hieheihei you will find here the safesensor and gguf files of the 2K version. https://huggingface.co/silveroxides/Chroma-Misc-Models/tree/main/Chroma-DC-2K
any prompting guide or dataset tags?
No prompting guide as of yet, but your best bet to get started is to describe what you want to see in prose phrasing rather than short tags. Plenty of images in the gallery below with the "circled I" in the bottom right corner have full prompts you can use as reference and you can glean tips for photorealism from them too. Just play with it and see what you get.
On tags, Chroma is sort of like the love child of FLUX.1 Schnell and IllustriousXL, so a number of danbooru/E621 tags do work as you'd expect, but using them has a tendency to put the model into "waifu mode" where the results skew heavily towards cartoony/anime images. I'm not sure if specific character or artist tags work or not, but some might.
I bid 10.010 for 1.0, so hopefully it will be promoted!
RIP.
good
May I ask if this model can use Flux's Lora?
hit or miss
yup
It works to some extent. Better train again in Chroma using your flux dataset, for characters, avoid high ranks like 16 or above, Chroma lora rank can be as low as 4.
I would like to ask, in the Fp8 series, among the "Chroma1-HD" version, there are two versions, "Chroma1-HD-FLASH_float8_e4m3FN" and "Chroma1-HD_float8_e4m3fn". How to choose between these two versions? In the same "Chroma-V50" version, there are three: "Chroma-unlock-V50-annealed_float8", "chroma-unlocked-v50-flash-heun_float8", and "chroma-unlocked-v50_float8" How to choose among the three versions ?What are the differences between the two versions of HD? What are the differences among the three versions of V50?
the f-l-a-s-h version is the distilled one. Think about the Flux pro -> Flux dev. Flux pro needs more steps to get good image, just like Chroma1-HD. While Flux dev can get good images much quicker, just like Chroma1-HD f-l-a-s-h, but you throw away some of it's ability to use negative prompts.
Any tips for subject training that won't result in Flux lines creeping into images?
I trained lora on flux dev in ostris ai toolkit , didnt had any problem using it in chroma.
From my recollection you can reduce "flux lines" from a simple setting in the training config. There is a github thread discussing the phenomenon and fixing it by turning off the setting: Apply T5 Attention Mask
https://github.com/kohya-ss/sd-scripts/issues/1488
https://github.com/lllyasviel/stable-diffusion-webui-forge/issues/1712
Chroma HD really needs a Nunchaku version as it's 2x slower than Flux. Or Teacache support at the very least. Other than that it's fantastic, even as a base model. Powerful, unique, fully uncensored, and can do a hell of a lot without relying on lora spam...especially if you like to experiment. I imagine though that many have tried it and given up too quickly. Chroma absolutely deserves and needs more exposure though so we can get these finetunes, loras, and other utility underway.
There is Chroma cache https://github.com/feffy380/comfyui-chroma-cache it degrades the quality a bit but offers around 2x faster. Magcache is an option as well, but that one is a bit inflexible.
Then there is the distilled version of Chroma https://huggingface.co/lodestones/Chroma1-Flash . You lose some flexibility in using negative prompts (just like Flux Dev and Flux Schnell), but you can get great generations at low steps.
(Note: the part of the link that's censored is "F-l-a-s-h")
here is another link https://huggingface.co/Clybius/Chroma-fp8-scaled/tree/main/Chroma1-HD?not-for-all-audiences=true just select the one with the word f-l-a-s-h
@mwcircle430 Yea, I'm aware of all that but chroma cache and mag degrade too much. The only things that are reasonable to use right now is one of silveroxides "heun flash lora's" or maybe the "hyper chroma low step lora". I'd rather run the model proper with a high quality cache with minimal detail loss or even better nunchaku since you'll get the s-p-e-e-d plus keep most of your details. The strength of the model shines without hypers, turbos, f-l-a-s-h etc etc. I appreciate the thought though, but I've gone through every existing method at current to improve s-p-e-e-d without killing details or complex coherence and what not.
@mwcircle430 why on earth is CivitAI censoring the "f word" lmao. Prejudice against Macromedia or fast superheroes?
Sorry about my stupid question.
I still get the correct clip.
I useing diffusion model loader to load the model.
wdym
It's hard to understand your question but if you are talking about what CLIP models to use and how, assuming you are using ComfyUI you need at the very least the T5XXL text encoder which can be loaded with "Load CLIP" node. Regular T5 encoder is FP16 but you can use FP8 scaled or quantized GGUF versions of it with basically no noticeable downsides (at least for me). Make sure "type" is set to "Flux".
You can also use Chroma with node "DualCLIPLoader" and load the older "clip_l" text encoder at the same time as T5. I personally don't know if this makes a significant difference, but I've seen some people say clip_l helps with tag based prompting.
All these are on huggingface, just search for "t5xxl fp16"/"t5xxl fp8" or "clip_l" and they should be some of the first results.
Thanks for your sharing! I have one question: how can I eliminate banding and artifacts, especially since they become very noticeable after HiRes-Fix and Ultimate SD Upscale?
Needs a good step count+sampler+scheduler combo
Flux based models just tend to be a huge pain with stuff like hires-fix and img2img. Depending on what you're trying to achieve, it might actually be better to do your img2img passes of Flux outputs in something else, like a SDXL finetune. In other words, use Chroma/Flux for establishing the composition of the image and use SDXL for adding detail, just my suggestion.
You cannot get rid of them completely. You can make them barely visible by using specific settings, for example: RES4LYF res_2s/bong-tangent, try other image dimensions (it tends to appear in large images), if you have a long prompt, shorten it as much as you can - remove all the fluff words that you cannot draw, also adjust the negative prompt. In T5tokenizerOptions set padding to 0 and length to 256. The model is very sensitive to the terms used in the prompt, try making drastic changes to it. Do not use/apply model weight quantization - it ruins model quality. I am not sure that USD upscale even works for Chroma, at least I couldn't get any good results from it. The model can produce great results, but it is frustrating to use.
@huc2cvt678 That makes sense. I certainly wouldn't expect a script made for SD1.5 to work in Chroma (Ultimate SD upscale).
Can you elaborate a bit more on "RES4LYF res_2s/bong-tangent" though lol. What is it and what does it do? Is it a node?
A tip for people using ComfyUI on lower to medium end machines:
1. I speed up my generations by using the Chroma Cache node starting at 0.30 with a cache interval of 1. This will make the latter 2/3 of the steps much faster in exchange for slightly lower quality.
2. Do the initial sampling at SD1.5 resolution (i.e. 432x768) as the model was trained on both 512 and 1024 resolutions. Use a medium step count such as 33, and a good sampler/scheduler combo (I use DPM++ 2M and beta). If you are doing furry/anime generations then also consider using a higher CFG of 4.0 to 6.0.
3. Next upscale to SDXL resolution using an upscaling model (I personally use 4x-Nomos2-hq-mosr as it is both really fast and quite accurate)
4. Sample again (still using the cached model) for just 11 steps and 0.20-0.35 denoise. This will clean up the image, fix the details, and bring out the style a bit more.
Extra tips:
- Chroma can be prompted similarly to Illustrious, and in fact it knows booru and e621 tags. Personally I use simple natural language with booru tags sprinkled in occasionally, like a slightly more verbose version of Illustrious prompting.
Hello, the FP8 version of HD-1.0 runs in a RTX 4070 SUPER 12GB?
@zDoog Yes, that's the GPU I'm using. It's not very fast at high resolutions, but it works.
可以给一个workflow吗?
Can it be used on 6GB VRAM?
Yes, start with a Q3 or Q4 GGUF. Mind you, this model runs slower than Flux. You should be able to run it on 6GB VRAM though. Maybe start with Flash version which is geared towards 8 steps. You can find it here for starters https://huggingface.co/silveroxides/Chroma1-Flash-GGUF/tree/main, then choose either - Chroma1-HD-Flash-Q3_K_S.gguf which is 4.29GB OR Chroma1-HD-Flash-Q4_K_S.gguf which is 5.43GB. Goodluck!
@Eshemizu Thank you!
@bomg66 You can also do the same thing with t5 xxl
There's also a project by (yet again!) Lodestone called RamTorch, which uses system RAM to hold the model and feed the GPU as needed. It's slower but not terribly so, so you could even run a full checkpoint instead of very low quants. it does hammer even 64GB so have that much, or more.
@ailu91 interesting. how ramTorch works with comfyui? Have you had any experience with it? thanks :)
@Beezer79 Didn't see it in comfy still, just in the ChromaForge fork. It takes a couple of minutes at first run to push whatever it needs to RAM but subsequent generations just feed directly to the GPU without any wait time. It's kind of silly though because it leaves 12GB VRAM completely unused. Doesn't seem to run with Radiance.
I like it for upscaling to absurd sizes now. Give it a couple of days, it's still baking :)
@ailu91 thanks for the information. i will test it on my my second pc with a 12GB RTX 3060 and 96GB of ram when its ready.
i have used the main version on a GTX 1060 6GB and worked... besides the painful time it takes to sampler.
Also, Lodestone mentioned on his discord that at some point he will be making/coding his own version of a nunchaku quant specifically for Chroma. Not sure exactly when but it's on his roadmap. So at some point we'll have a smaller, much faster, and still quality version of Chroma.
v1.0-HD is a very good model!
Is there a way to use this model with controlnet open pose?
I haven’t tried it myself, but looking at that post it seems Chroma can work with ControlNet (in that case the Reddit user used Flux Union Pro). So yeah, ControlNet does work with Chroma.
https://www.reddit.com/r/comfyui/comments/1lj3qm9/chroma_controlnet_is_it_possible/
I just get a light noise, needs a retrain
May I ask how to configure the workflow for local redrawing of this model? When I applied the local redrawing function of Flux, it got an error prompt or I couldn't redraw at all!
It is very different from Flux 1
@2P2 Is there a separate Chroma version of the partial redrawing and high-definition magnification workflow? After I imported this workflow, I couldn't perform operations such as changing clothes or taking off clothes, and the facial restoration became even more blurry! The workflow simply cannot be used. But it seems that this model can be used with Flux's Lora.
Whats the best quality speed lora now?
@2P2 thanks. Im unable to get good quality out of my generations anymore, they are all pixelated. I had this issue when I started using it and had figured out how to get around it. Now I can't figure it out again. Do you have any settings suggestions?
Any .py script/walkthrough to train LoRA .safetensors to use with Chroma1-HD?
Interesting - I will have a look at this. Thank you!
This is what I was looking for, specific information/scripts for how to train a LoRA with Chroma: https://github.com/tdrussell/diffusion-pipe/blob/main/docs/supported_models.md
https://hf.co/silveroxides/Chroma-LoRA-Experiments
These got removed and replaced with a repo that only has the flash loras. Does anyone know where you can find the rest that were removed? Some of the speed loras there worked better than flash.
I can push a whole mirror
@2P2 yes please
@steven317 yes, I'm done
@2P2 Thank you so much
@steven317 Posted some new and old loras there, keeping updated.
And Chroma Rapid AIO from Phr00t has been removed too :/
I can mirror it (v6 is preferable)
I was sleeping on Chroma. The quality and realism that can be achieved is insane. Lora's also train extremely well
are the gguf models up to date with the latest release? and is there any workflow for them?
@2P2 is radiance compatible with normal hd lora? i'm guessing not. also i feel like those gguf listed on the actual page might not be the current version? (the one from the actual model page) remember a fixed version was released a day after
@TheNecr0mancer No, the HD loras are obsolete since Chroma1 HD
@2P2 I mean the chroma1 hd lora
@TheNecr0mancer No, gives artifacts on Chroma1 HD and Radiance. It is on the right, check
https://image.tensorartassets.com/cdn-cgi/image/anim=true,plain=false,w=2048,f=jpeg,q=100/model_showcase/0/693e90df-67fc-cc06-ea51-f40e3c8f8644.png
@2P2 I dont think you're understanding what im saying, I have no idea which other model your referring too. Im only talking about chroma1 HD to radiance. No other models. Chroma1 HD loras, the ones from the current available model.
For any newbies who want quickstart suggestions for realistic images... here are my recommendations and the reasoning why.
As a new user to Chroma I was both impressed and dissapointed initially. Images would typically look ok or completley washed out. It appears that the model can't produce what you want correctly, but it's actually an issue with the prompt lacking style context.
1. Style lora (https://civitai.com/models/1908534/chroma-professional-photos)
Chroma is a powerful base model with no inherent stylization baked in. This means that unless you prompt heavily for the style you'll likely get poor outputs, a guide is helpful. The lora above at a strength of 0.7 is effective in guiding the model to photographic images while still allowing for other styles to be prompted.
2. Sampler and scheduler.
These seem incredibly important. I've gone for DPMPP 2M and bong_tangent for the scheduler.
Other configurations I've tried all struggled excessively with fingers and toes, the combo above gets it right 9/10. It may not be ideal for all styles or subject types though but has been recommended in a few places and seems to work well.
The above Lora (HD-Flash_r12-FP32.safetensors) is recommended by the author in another post which I can't find any more, but has worked well for me.
Lora strength 1.0, make sure CFG is set to 1.0 exactly and steps around 16.
Without the lora go for CFG 4.0 and 30-40 steps.
Even if you don't plan to use the speedup lora eventually, having it now to work on your prompts will save a lot of your time.
4. A hint for CLIP: Go with a FP16 clip either the one recommended with the model or Google's T5. FP8 can impact quality of the outputs.
1. I prefer https://civit.ai/models/1967914
2. RES4LYF isn't worth it
3. The temp link has been expired. Was https://hf.co/silveroxides/Chroma-LoRAs/resolve/main/HD-Flash/HD-Flash_r12-fp32.safetensors
4. More like a recommendation, not a hint, neither fp16 is much different from fp8
@2P2 Your feedback and corrections are appreciated. Have you taken any specific steps to correct finger/toe issues without the use of samplers in RES4LYF? Curious how you'd recommend it to be run. Most examples I found were for older versions.
@Diecron312 I am not the only one who thinks that RES4LYF is cancer, normal samplers and schedulers are enough. It just someone started with that RES4LYF and other users rushed to try. I already stated, I usually get correct digit count tho I've been using GGUF, it saves a lot of memory, and Chroma itself is good at anatomy
@Diecron312 Also nice, https://civit.ai/models/1995853?modelVersionId=2259088
Running everything at FP16 means you need 64GB of RAM or your system will get into a crawl. Chroma uses nearly 40GB when using T5xxl_fp16.
Update: After a lot of learning, I'm now doing dpmpp_2m with simple scheduling at CFG 4.0, 40 steps. This takes ~28 seconds per image and provides outstanding results.
Alternatively, with Chroma1-HD patched with flash weights with the same scheduling at CFG 1.0, 16 steps (~8.5s per image)
Flash Loras and weights neuter this model's ability to output realistic subjects with incredible levels of detail, because you have no negative prompt to guide the realism. So then you have to use a style lora but they need high strength to fight the model inferring anime, drawings etc... the results are quite far from what the model is capable of. I still use them to force realism when using the patched weights mentioned above, because the model ignores negative prompts when they're applied. I mostly do this when I want to quickly iterate through a new prompt getting it mostly there before switching back to the raw HD base model.
I've learned now how to better prompt for realism (hint: this was a lot easier when a LoRA isn't running, hiding the models true intent from you). Truth is sometimes you'll add a word or phrase that the model has only been trained on anime images etc. Or perhaps you ask for a celebrity in a position the model doesn't have information on (or worse, ai gen/anime/cgi that isn't properly tagged), those can all degrade the result a lot. Watch out for the realism fading into other styles when you add specific keywords and consider alternatives. (e.g. - using correct anatomical terms like breasts can help a lot).
If you're having trouble, here's the current prompt I start with - the negative being the most important for realistic images, but this will not help you if using Flash LoRA or weights as mentioned above:
positive:
high resolution professional photograph,
the french woman is dressed in a sundress, dances in forest.
soft cinematic lighting, warm tones, soft bokeh.
negative: illustration, anime, drawing, painting, hentai, fake, generated
Is this Comfy only or will it work in Forge?
what is the best parameters for character lora training with v1.0 HD
i'm using rank 4, alpha 4, lr 0.0001, adamw, ~1000steps
Short realistic picture prompt:
A high quality professional RAW photograph captured with Canon EOS 5D Mark IV paired with the Canon EF 24-70mm f/2.8L II USM lens.
Please thumb up if you find this useful, help more people!
This is what we call "forcing it", for sure. "Candid amateur photo of..." is a great way to start.
Sample with: DDIM for HALF the generation time. It will be lower quality.
Sample with: DPM++ SDE 2S for ACCURATE generations. it'll give similar outputs.
Res_x, lawson_x, all take longer and will get less quality than the DPM++ SDE 2S. Heun is the only other I'd consider but they take even longer. The higher the number just imagine each step is run that number of times extra. Rarely worth it. Seeing res_4s is an easy way to just run res_2s at 60 steps instead of 15.
Hyper_Chroma_low_steps(1) + lenovo(0.3) + goontube(0.3) = barely even needing to specify "candid amateur photo of". And to be honest, including the information that you want it to look like it more important than forcing the reality. It'll be real if you describe real things.
This negative... is probably worth money: "This low quality greyscale unfinished sketch is inaccurate and flawed. The image is very blurred and lacks detail with excessive chromatic aberrations and artifacts. There is an overall blur on the photo. The image is overly saturated with excessive bloom. It has a toony aesthetic with bold outlines and flat colors. The people have generic and stereotypical faces. The people are too pretty, like a model or idol. The people are too old or too young. worst quality, low quality, JPEG compression residue, incomplete, extra fingers, poorly drawn hands, deformed fingers, malformed fingernails, malformed limbs, fused fingers, deformed feet, bad feet, missing toes, extra toes, missing fingers, three legs. poorly drawn faces. deformed. disfigured. bad neck anatomy, walking backwards. There's a watermark and a signature. This image contains styles such as 3D or CGI rendering, AI generated, Anime, cartoon, manga, cel-shaded, stylized illustration, painterly, sketch, illustration styles, non-photorealistic, CGI style."
I HAVE TRIED ALMOST ALL OF THEM.
@llhappyll822 Sample with DPM++ SDE but which scheduler? Also, do i need to download something else because i don't have DPM++SDE 2S, i have it only without the 2S
Another simple suggestion for better images: LoRA.
https://civitai.com/models/1908534 professional photos
https://civitai.com/models/2014953 general enhancer
我发现用nunchaku的chroma工作流有非常大的问题,开始很好,下一次打开就一塌糊涂,啥都没有改,就全部崩。不知道为什么
Idk, I had to run uv pip install comfyui-frontend-package==1.28.4
[edit: There may be some problems in certain versions, if a workflow doesn't open, downgrade or upgrade]
@2P2 我发现只要把cfg4.5,steps32先打开,运行几次后再进行调整就正常下去了。不会崩掉。
It looks like it can't generate a foreskin-covered penis well.
It is indeed the biggest challenge humanity and AI has encountered so far.
but you can train images on that yourself, right? lol
Many such cases!
That said it probably does better than most "base" models on this site. Be the change you want to see and cook us up a lora :)
@siddoney01 Yes, and I have seen them. People have created private loras, or at least loras of the past for Flux that I've seen applied to Chroma, that can make perfect uncircumcised penises. If you have one, you could simply take the photos yourself and create one. Might be a little narcissistic, or it might be your thing lol. I say using your own as you might be able to get all states and angles, then supplement with photos from online
"unretracted foreskin" works well. Why can't you gen it with Chroma?
̷I̷ ̷h̷a̷v̷e̷ ̷t̷o̷ ̷u̷s̷e̷ ̷a̷ ̷f̷o̷r̷e̷s̷k̷i̷n̷ ̷l̷o̷r̷a̷ ̷f̷o̷r̷ ̷I̷l̷l̷u̷s̷t̷r̷i̷o̷u̷s̷ ̷X̷L̷ ̷t̷h̷o̷u̷g̷h̷
@ryuty77292
it doesn't work for me :(
SD WEBUI Neo is bad
@2P2 I was told that it is the most updated version etc. of SD. What do you recommend?
@ApexThunder_Ai This isn't SD, it's based on Flux. Have a look at comfyui and use the Chroma template.
sorry if this might be a dumb question, but do i have any chance to run this on my 2060 with 6GB? i got FLUX to work with gguf models, but i assume there is no way here, or is there?
Should be available too
It could be possible to use the full checkpoint or a larger quant like q8 if you use comfyui-multigpu's distorch v2 loader and you have some free system RAM. There are also more variants of GGUF on hf.
@Rad2 Thanks! I had a look for GGUFs on hf, and found them.
If you can run flux you can run this; this model is smaller than flux; it's basically flux schnell with some stuff pulled out and then trained for 50 iterations.
You might wanna check out - https://huggingface.co/spooknik/CenKreChro-SVDQ - it's a Chroma x Krea merge and then quantized to nunchaku version - it should be way faster.
If you generate images on CivitAI, I just tried it myself and yeah, it didn’t work for me either.
So, did you choose Flux VAE in the generation menu? It seems like I can’t select it either because there’s a filter that only shows models compatible with Chroma (see the screenshot in the link).https://imgur.com/l7LzltE
I’ve run into this problem before when generating images with Chroma and trying to use Flux LoRAs I couldn’t select them because the filter only displays models related to Chroma.
And by the way, that claim that “Chroma is unstable” is bullshit. Locally, I generate many images every day without any problems.
I'm also having troubles with this. I need to send a lot of attempts to have one that actually outputs an image. This is going on even by selecting Chroma and nothing else (no VAE or Loras)
And it's not only image generation, training loras for Chroma on the website is messed up too. The training process halts at the beginning and whether it goes through or cancels itself seems to be a coin flip
@GutterMind2020267 Damn, that sucks, I bet it’s an issue on CivitAI’s side. I don’t have any troubles generating images locally. It’s probably a bug or something wrong with their Chroma setup.
@Jorot everything is wrong with them, and yes, I have been getting various good images with Chroma without the civitai generator too, it works as intended and I can treat it as a better version of SDXL
Chroma photorealism slider
https://civitai.com/models/1995853?modelVersionId=2259125
Has anyone had any success with an inpainting workflow for Chroma?
@2P2 _Yes, thank you, but does it work as well as it does with SD/SDXL? I've basically tried that with Chroma (and Flux) before and got extremely mixed results. Sometimes it works fine and other times it spits out complete garbo.
@awoo (it is basically img2img) img2img works as intended
@2P2 do all the standard inpainting nodes still work properly?
I do have decent success with img2img (especially latent upscale) but when I was trying to do inpainting with Flux before I ran into heaps of issues, I was wondering if maybe someone has already made an Inpaint Model Conditioning node tweaked for either Flux or Chroma? Or maybe I just need to rearrange my inpainting workflows?
@awoo I don't think what you're asking for exists, but it probably wouldn't hurt scouring this question and others like it on the forums + discord: https://huggingface.co/lodestones/Chroma/discussions/116?not-for-all-audiences=true (the answer references the answer given by 2P2)
@llhappyll822 and I replied using my HF sock puppet. Seriously, I have been using a detailer and a tiled upscale refiner, they work well for me, Chroma1 HD
@2P2 lmfao small world
@llhappyll822 countless alt accounts
@llhappyll822 thanks I'll give that a try
For the life of me i can not get this model to do a blowjob that is more than just half the tip in the mouth, or tongue sticking out.
Are you describing the dick at all? In my experience lots of models struggle with things like BJs if you are actively describing the dick in the prompt, the weights of the dick related tags overpower the "oral"/"blowjob"/"fellatio" etc tags and the model tries its hardest to render a "complete" dick which results in this.
Possibly even putting things like "penis" or "glans" in negative could help as well.
Basically think of it as "only tag what you see", during a BJ you don't really see that much of the cock, so cock related tags should be absent.
@awoo I gave this a shot and it just does not want to behave... so odd
@TheNecr0mancer Can you share some of your outputs with embedded workflow? (assuming you're using ComfyUI and not the CivitAI service, which I honestly think is not that good lol)
@awoo yeah I'm using comfy, can you give me a full example positive prompt to test that seems to work for you so i can test? Thanks
Where did the DC 2K, scaled_hybrid_rev2 and other version go?
Deleted?
https://civitai.com/models/1956921
https://civitai.com/models/1964020/chroma-dc-2k-lora
why? i don´t know. but you can find the model here: https://huggingface.co/silveroxides/Chroma-Misc-Models/tree/main
That's a bummer. I was wondering the same. I wanted to tag them in the new image I posted, but couldn't find them too. I wonder why they are gone now.
Are there any nodes that help with prompt adherence while keeping cfg at 1?
i use the Normalized Attention Guidance (NAG) node from this package in comfyui. works good for me with the scale of 4. https://github.com/pamparamm/sd-perturbed-attention
@2P2 @Beezer79 I didn't realize nag worked, I thought I saw people saying it didn't. Where in the workflow does it go exactly? Before which node, after which node.
Thanks
@TheNecr0mancer
Repo (1) KSamplerWithNAG instead of KSampler, can just connect your negative prompt to nag_negative
Repo (2) Normalized Attention Guidance to KSampler
@2P2 I would like to be able to just directly connect the NAG node, but i don't understand where to put it.
@TheNecr0mancer With my custom node tip : From the model or lora node and from your negative prompt into the Normalized Attention Guidance Node and then into the ksampler, or like in my personal Chroma NAG workflow into the CFGGuider (with cfg 1) and Basic Scheduler and then into the SamplerCustomAdvanced.
@Beezer79 do you have workflows with both examples that I could dissect?
@TheNecr0mancer :facepalm:
@Beezer79 I wish NegPIP could work with Chroma1
Fwiw doesn't the Flash version use no negative out of the box?
@awoo More like if CFG=1, negative prompt doesn't work at all, can use NAG instead of it with any CFG value
@2P2 I've figured out how to add the NAG node now. Is it normal that it makes generation speed slower as if i was using CFG higher than 1?
Also do you guys have any recommended settings for the nag node?
I currently have cfg 1
Nag scale 2.5
Nag tau 2.5
Nag alpha 0.25
Nag sigma end 0.75
@TheNecr0mancer i have only changed the nag scale to 4 and left all other values at preset values.
@Beezer79 [resolved]
@TheNecr0mancer "makes generation speed slower as if i was using CFG higher than 1?"? Rephrase and answer: If CFG=1, it doesn't use negative prompt and it is faster to generate. Must be explicitly 1, not lower nor higher, you'll get a normal generation speed then
@2P2 cfg is 1. But generates slower as if cfg is higher than one.
Also NAG node is causing OOM. But if I use a workflow without NAG node and set cfg to 2.5 no OOM
@TheNecr0mancer
1) OOM due to the node? ¯\_(ツ)_/¯
2) Don't use ChenDarYen/ComfyUI-NAG. pamparamm/sd-perturbed-attention has a better NAG node
@2P2 @Beezer79 the pamparamm nag node has absolutely no effect on the result of my image. It says on the page it's only designed for sd 1.5 and sdxl
@2P2 I think something may be wrong with comfyui 0.3.66
do anybody know where to train a lora with this model is it just like flux?
Right here https://civit.ai/models/train (need at least 1k buzz)
Something else:
https://github.com/ostris/ai-toolkit
https://github.com/tdrussell/diffusion-pipe
https://github.com/Nerogar/OneTrainer
@2P2 ok cool thanks bro
It's probably better than flux. Most of the loras I've tested feel over-baked. Use JoyCaption or similar to caption your photos with real language, and it'll likely know exactly what you're asking for
@llhappyll822 我装了好多次JoyCaption,每次都装不上去。。。。。另外说一下,grok也可以反推,但是感觉没JoyCaption好用
Does anyone know of any "correct hands" LoRAs that work with Chroma? I'm getting really tired of fighting with prompt text to get 4 fingers and 1 thumb on each hand.
Side note: please don't recommend workflows for ComfyUI. I don't (and won't) use Comfy.
Fp8 and low step loras have a tendency to do that, the only way to get consistent anatomy is going bf16 with 40-50 steps, no fast lora. Also generating on 2 megapixel resolutions help with details. Going for the slowest possible settings is the only way to archieve consistency.
@piconejo Cool, than 1 hour per pic for me.
Yeah, you’re right, bad hands are my main complaint with Chroma too. I guess it’s mainly a Flux-related issue. It would be really nice to have some "Chroma Hands" LoRA.
Although in my experiments, simpler prompts tend to give more consistent anatomy, and some LoRAs can help fix it (like Chroma – Professional Photos).
And i guess @piconjo right abaout bf 16 version with higher step counts
@Jorot I've tried Chroma - Professional Photos (which is awesome in its own right) at low strength -- around 0.25 to 0.4 -- to help improve hands, and in some instances it does work, but its influence over generation can still be too strong for non-photorealistic images.
Believe me, I wish I had the system resources to use the full BF16 model, but FP8 is pretty much all my hardware can handle. And it's not like the FP8 model or other quants can't produce great results. It's just them hands, man... them nasty ol' AI-generated hands, the butt of so many jokes and memes.
I rarely get issues with hands, but it may be a multi-faceted approach. I use a heavily-quantized version of Chroma 48.
The biggest thing for me is using the correct sampler. You should be able to select a sampler. DPM++ SDE 2S at 16 steps, using a Hyper_Low_step lora (like a distill). Those two things.
Yes, it will double generation time instead of using DDIM, for example. But it will almost never give me incorrect anatomy.
You could also force it with negatives, which I almost always do as well. " incomplete, extra fingers, poorly drawn hands, deformed fingers, malformed fingernails, malformed limbs, fused fingers, deformed feet, bad feet, missing toes, extra toes, missing fingers, three legs. poorly drawn faces. deformed. disfigured." Should be enough, PLUS all your other negatives! lol, thats just my limb section.
@MrSnichovitch @Jorot that's issue with sampler, scheduler or prompt. I've tried a lots of them. If you have grammar or logical issues in your natural language prompt, it will fuck up the result, with fingers or body horror. So I've started using AI to check for grammar and other textual mistakes.
regarding scheduler - it should be either beta, or beta57. For sampler - top results I had with dpmpp_2m, res_multistep, deis_2m. I am obviously using comfy, not sure if those are available on civitai or somewhere else.
@hildezart726 Thanks for the advice! I’ll give it a try.
@hildezart726 I can confirm that. euler + beta and deis_2m + beta57 works best for me at the moment.
我用的是8步lora,感觉肢体什么的还好啊,没有什么变形的地方
@lilililili123 我也没啥问题。
Anybody can train the body slider lora?Lots of time generate the fat body...Thanks.BTW,i think the v35 and v42 is better for me
try adjust your prompt, not everything should be solved by lora. Mention body type and that is it.
@hildezart726 i use the slender and slim,any suggest?
@yazi6393wy136 slender works perfectly fine for me. Also in combination with toned
Don't use tags, natural language is better
@piconejo can u give some examples?
@yazi6393wy136 natural language example: Extreme close-up photo of a tiger eye. large title "Chroma1" overlayed in the center of the image
@yazi6393wy136 a young woman with a fit body and narrow waist, petite. (for example)
Civitai's best model. Best model in whole AI Community.
who would've thought not being a narcissist would make better models (looking at you pony v7)
can you tell me what setting and how to word this prompt then please????
realistic, 4k, professional photograph, award winning layout, a woman wearing a long robe with the word buzz as a repeated pattern on the beach, looking at viewer, a giant lobster emerging from the ocean, behind the lobster, is godzilla eating a hero sandwich
because it cant do that at all, what should settings be, etc...
@justafish Anyway, NewbieAI will release their model in 2 months, they have been training on a better architecture, maybe it won't be like Pony V7 xD
@mystifying haha what is this prompt? My attempt using fp8 version https://civitai.com/images/109473284
You can give your prompt to an AI saying it should be descriptive with natural language (for T5 text encoder, not CLIP).
@Kaalciv thank u!!!!! so the solution was details, when you want bizarre or complex concepts from chroma, give alot of details, this is very helpful (◕‿◕✿)
@Kaalciv Looks shit. "Your prompt" is a glorified crap.
@navidisileli998
How to avoid generating huge... hoses on girls? With a fairly simple prompt, I get a Thailand moment with the girls every other time. Has anyone else encountered something like this?
Use latest chroma-hd
Nice story, dude
It depends largely on how you prompt and how much weight you put on the tokens.
As I have personally noticed, if you use prompting with booru tags, futanari often ends up everywhere, but if you prompt in natural language, the results are much better. Also, try not to play around with the weights on the “pillars.” And here's some advice that isn't the most convenient, but often helps to somehow designate female genitals, whether it's a description of nudity or just “camel toe.”
Half of dataset is gay content. Worst slop ever.
@tannoralarick779 No
what the fucking fuck fuckster fucking fuck of the fucks in the fucker fuck fuck is this fuck
@tannoralarick779 To be fair, its not, if you look at images people are posting, it probably has the most variety of themes and representation across genders and genres. Prob one of the most versatile models for any taste.
@blhll Luckily the Lustify creator promised a finetune.
@andryamron823 He needs to finish LUSTIFY v8 (SD XL) first
@tannoralarick779 half of humanity are male
@llhappyll822 95% of Chroma users are male. And not gay, but i have doubts about it now.
@remsenharman138 I'd rather the model be able to generate a penis, which this model can do better than most. Sure that means more users will like that because they specifically like penises, but for me it's like the "finger" problem. Just chroma struggles with hands instead of penises lol
@llhappyll822 Penis is realistic on every model now, be it sd 1.5 or realistic pony, illustrious. What about emotions(subtle, not the basics)? Can it do naughty face for example, without 100 reinforcing tags?
man it gets better omg
watching those radiance updates with a slack jaw
@llhappyll822 Pixel space is our future
Where do i get the vae and the clip for GGUF models?
TE only, don't download 'Full', you only need its encoder https://huggingface.co/easygoing0114/flan-t5-xxl-fused/tree/main
https://huggingface.co/wangkanai/flux-dev-fp16/blob/main/vae/flux/flux-vae-bf16.safetensors
Can we get Chroma1-Flash.
https://huggingface.co/lodestones/Chroma1-Flash/tree/main
Guys, how do you train LoRAs of Chroma HD. My training workflow using flux trainer nodes won't work with it.
On civitai or with AI Toolkit
Wow, Chroma really came through. Im blown away with how amazing each new release is. I just know when you considered it a finished product, that it's gonna be crazy awesome.
I have a question, why every time I mention the "breast", then the output is broken clothes or reveal nipple. I `m go nuts
Use NSFW negative prompts to prevent nudity.
I often just keep "nipples" in my negative to prevent this exact problem. Also I find describing the clothes and specifying "cleavage" can help.
Best real model so far
Man, do I LOVE Chroma!!! BUT... how do I post images created with Chroma to Civitai and have Civitai read the metadata? This is very frustrating. I have to add all my metadata manually which is very time consuming.
Sorry, that doesn't work. It saves the file, but when I go to upload an image to Civitai, it says, "We weren't able to detect any resources used in the creation of this image. You can add them manually using the + Resource button", and all of the tool, techniques, and both pos and neg prompts are empty.
this one should work https://github.com/xxmjskxx/ComfyUI_SaveImageWithMetaDataUniversal ... believe me, my fight with comfy and it's metadata handling is legendary by now, to a point where i made my own custom nodes and scripts just for the comfy team to break shit again because of their retardation with Node 2.0.
@TijuanaSlumlord That works perfectly! Thank you so much!
if you're using chroma locally on comfyui, you don't need any tool for metadata. the metadata is automatically saved in the png files and is restored when you upload png's to a post.
This model is truly amazing. I hope it can improve in generating realistic asian boys and anatomy in the future,It has the potential to be the perfect combination of Flux and Pony.
Такой же мёртворождённый кусок кала, как и седьмая Поня.
idi nahuy otsuda, zhiznyu obizhenniy)
@TurboCoomer lol :) I liked all his comments about roasting Pony 7, Chroma is not the good case
@TurboCoomer, уж лучше быть обиженным жизнью, чем обиженным умом, как ты. Про Z-Image слышал? Вот чем-то таким и должна была стать Хрома.
Я не знаю, как Лодстоуну удалось так изнахратить лежащий в основе Флакс Шнелл, что он напрочь подрастерял анатомию и стал жрать как три Флакса Дев, запущенных одновременно; даже 5090 на Хроме пердит от натуги почти полторы минуты. Вдобавок 1.0 сливает в плане анатомии предрелизной 48 итерации. На взвешенную критику и просьбы допилить Хрому, Лодстоун реагировал отговорками. Если поначалу была вера, что он не оставит всё так, то сейчас ясно, что тот такой же воздухан и шарлатан, как и Астралох.
Будь Хрома при всех её спорностях с анатомией так же легка и быстра, как Шнелл, её можно было бы юзать и развивать. Но с такой невменяемой прожорливостью это rest in piss.
@Chertilo
I feel like Chroma1-HD is the best model for furry art, not counting proprietary ones. The resentful flaw of all Qwen image models is that they fall flat with drawing anyone recognisable (e.g. Flareon).
Although it is indeed slow and picky with prompts, if it delivers, it DELIVERS, like 10/10.
Thanks for suggesting using Chroma v48, maybe it will do better.
Can you share an example of your objective critique you've written to Lodestone about this model?
@Chertilo Парадоксально, но я пробовал скачать с этого сайта Z-Image, в целом картинки норм, но вообще мимо промта, который я задаю.
Ощущение, что модель вообще не умеет в NSFW и в разнообразие анатомии - все получаются одинаковыми, как ни крути.
Поэтому мне и стало интересно, а о какой модели Z-Image ты говоришь?
@IamKinky я давно вышел с его (Лодстоуна) канала и сейчас не отыщу тот свой (и не только) пост, а писать здесь заново немаленький текст со списком всех провалов Хромы мне, надеюсь на понимание, очень лениво. Приведу из самых больших проблем Хромы отсутствие дистилляции, перешарпленность и ужасное воспроизведение стиля аниме. И не должна она полторы-две минуты генерировать одну картинку на 5090, это попросту неприемлемо и неюзабельно.
@Undar, наивысшая точность следования промпту только у монструозных Флакса 2 и НаноБананы, требующих сотни гигов видео и оперативной памяти. В своей же нише Зимейдж не имеет равных. Ну а сисик-писик уже совсем скоро: после релиза базовой модели NSFW-файнтюны попрут так, что держись; сам жду с нетерпением, потирая свои мохнатые кумерские ладошки.
este modelo es Txt to image o Image to image?
Text to image
img2img works also fine, already tested it, denoising around 0.5-0.7 is perfect
wow. an awesome model.
it works with comfyui and i can also use flux loras ...
amazing results with SRPO
You can also use flux control nets, which is also a plus.
Chroma Z Image https://huggingface.co/lodestones/Zeta-Chroma
wtf, i was torn between chroma and zimage, both are amazing, but lack few things. i with test this.
@snap2887 There's also "Chroma?" Lumina2, but lodestone stopped training it a year ago
well, it doesn't work with either Chroma or Z-image workflows...
@Monastyr generally authors need to literally merge code into comfyUI to get their models to run. If the model works, the comfy-anon people will try. They have great radiance-x0 workflows, they keep up. Also, it's brand new and will probably not show you anything at all lol
Not really ready for prime time yet. Training only started not long ago.
Truly magnificent work, a project worthy of applause.
yes
This is my go-to model for 99% of creations. I've tried Z-image, Qwen image, and all the others. This is the best. Only downside is it is very slow for me. I use Z-image for prototyping (10x faster) then use Chroma once the prompt is worked out. Great work.
How would you get benefit from uncensored Chroma then? Isn't Z better in your case?
same situation, this handles male/female anatomy the mosttttt correct way. but its slow even on my 5070ti with 16G...... ZIT sucks at priviate body parts
I never got anything good out of this model. With or without VAE, every possible sampler. Eventually just deleted it.
There are 2 models on the workflow and none of them is the one available to download here.
why publish something if you are not providing all resources?
I don't know exactly which one you're looking at, but the links provided in the description should get you to those models.
You can try this link for quantize versions https://huggingface.co/collections/silveroxides/chroma
If you use ComfyUI, I recommend Chroma1-HD-fp8mixed-final.safetensors
https://huggingface.co/silveroxides/Chroma1-HD-fp8-scaled/tree/main
And to speed things up and decrease the steps needed for generation, one of these as well.
https://civitai.com/models/2032955/chroma-flash-heun
is this trained on anime or mostly real?
Hi! In my experience, it’s about 50/50, but it’s easier to get good anime/stylized/cartoonish results out of the box. That said, there are fine-tunes dedicated to realism (like UnCanny Chroma, GonzaLoma Chroma) and others for anime/stylized results (like Chroma-Anime-AIO, and more). There are several options, but these are the ones I’ve personally tried so far.
having issues witih forgeui it keeps asking me for text encoder 2
If you are running Forge Neo then chroma runs in the ui under the FLUX UI pre-set in the top left hand corner.
Chroma needs the vae download ae.safetensors and a text encoder download t5xxl_fp8_e4m3fn.safetensors. Put the ae.safetensors VAE in "Forge\models\VAE". Put the t5xxl_fp8_e4m3fn.safetensors file in "Forge\models\Text_encoder".
If you are using forge neo then these two pages have useful info. https://github.com/Haoming02/sd-webui-forge-classic/wiki/Inference-References
and https://github.com/Haoming02/sd-webui-forge-classic/wiki/Download-Models .
This is probably the most artistic and aesthetically pleasing model I have seen recently. It is simply a masterpiece.
Absolute Gamechanger. Everything other models boast about is true here. Incredible Prompt adherence, rather fast (depending on model) and you often neither need loras nor negatives to get exactly what you want. Your characters don't look like they have been in an industrial accident either.
Just be descriptive and clear. Only Lora I use a lot is Lenovo.
Any chance some settings changed so NSFW is no longer allowed? I trained a new lora but am incapable of using/testing it because I keep getting a warning saying I can’t use my lora with a SFW model :( (perhaps this is a Civitai hiccup?)
most fun model to use and learns loras incredibly fast
i build my 5070ti 16G so i can locally gen my image....
turns out the gen speed slower then perchance online....
did i set it up wrong?
impossible to answer based on just that.
online is almost always gonna be faster but most of us are on linux due to 3-5x faster, then there's micro improvements,flags,settings,etc with i wont get into.
you cant just say perchance...
Any good workflows with upscale/seedVR?
Are you going to make another Chroma model based off the new Flux.2 model?
they tried and failed, regardless of what they'll tell you. The creator has more or less stopped traditionally training models based on how lucky they got with Flux 1, and has more fun mutilating the process in hopes of improving it. That generally means all the results since Chroma 1 have been... useless?
@makiaeveli Indeed. Nothing but out of focus blurry images. Kinda sucks that something like this went tits up and is now pushing daises.
@makiaeveli What is this fud? They are actively working on more than 1 model. All you need to do is go to their huggingface page to see that.
It looks like this model fails with the Hyper-8 LORA resulting in endlessly repeating tiny blocks. Just curious is that confirmed and a technical reason for that? Can it be used with Nunchaku?
you could probably ask on their discord
I've tried a lot of new models including zit and flux2 and couldn't recreate it, it would be a great shame if a new version based on flux2 couldn't come out
the kaleidoscope version seems abandoned, I am similarly sad. Klein is so good, and the current zeta (zit based) project is looking to maybe take months more. The loss is still decreasing and you can try it every day on lodestone's huggingface, but right now it's a lot more like crayon drawings on acid than a modern photo model.
