HiDream-O1-Image (codename Peanut) is an 8B text-to-image foundation model from HiDream.ai, built on a Pixel-level Unified Transformer (UiT) that operates end to end on raw pixels with no external VAE or separate text encoder. The same checkpoint handles text-to-image, instruction-based editing, and multi-reference subject personalization natively at up to 2,048 x 2,048.
Originally released by HiDream.ai on Hugging Face. All credit for the model goes to the HiDream.ai team. Civitai is hosting a mirror so creators can run it on-site - head to the original repo for weights, updates, the technical report, and to follow the project directly.
Built by
- HiDream.ai - upstream organization and authors of the technical report.
Versions mirrored on Civitai
Two checkpoints are mirrored, both as fp8 SafeTensors:
- Standard - full 50-step model. Best quality. Guidance scale 5.0.
- Dev - distilled 28-step model. Faster, guidance scale 0.
HiDream also publishes a 200B+ Pro variant upstream, but weights are not public, so it is not mirrored here.
One model, three tasks
The same checkpoint handles text-to-image, instruction-based editing with a single reference image, and multi-reference subject-driven personalization with up to ten reference images. Mode is selected by what you pass at inference - no separate adapters or LoRAs needed.
Native 2K and multilingual text
Direct synthesis up to 2,048 x 2,048 without upscaling. Strong long-text rendering in both English and Chinese (LongText-Bench 0.979 EN / 0.978 ZH), 0.90 on GenEval for compositional prompts, and 89.83 on DPG-Bench for dense prompt alignment.
Reasoning-driven prompt agent (upstream only)
The HiDream repo ships a separate "thinking" prompt agent (Gemma-4-31B or an OpenAI-compatible API) that rewrites raw instructions into self-contained prompts before generation. That agent is not part of the Civitai mirror - if you want it, run upstream locally.
Links
- Hugging Face: HiDream-ai/HiDream-O1-Image
- GitHub: HiDream-ai/HiDream-O1-Image
- Technical report: PDF
- License: MIT
Description
Comments (7)
If it's an Open Source product I'll definitely try it.
Cool, but I'll let you in on a secret as to why it might not be successful. This one, and other new models too. I think I'll be posting this with every new model.
NO INFORMATION ON HOW TO USE IT!!
What does it work with? Forge, Nano, A1111? Just Comfy UI? If it’s just Comfy, please provide full instructions on where to place which files, what nodes are required, etc. A lot of people started out with the A1111 and SD models, and switching to Comfy’s spaghetti code is unacceptable to them, which is why they still use SDXL or Illustrious—because you just download those models from Civit and they work.
If you want your model to be successful, prepare a simple tutorial on YouTube, etc., explaining what to do and how to do it.
Translated with DeepL.com (free version)
Wait, so their's no offical prompting guide for this model? That's a shame.
It's simply not ready yet: https://huggingface.co/Comfy-Org/HiDream-O1-Image/tree/main
If you read the huggingface, they specify that the model has no VAE, I don't think a clip, but uses gemma as a prompt enhancer. So... simply the checkpoint in the diffusion-models and text-encoders directory. It wouldn't run anyway, but that may be to your point.
Also, there is information provided. They provide an app.py that will open a gradio server, with a prompt helper. I mean you're right that they don't provide detailed prompt examples, but there are a few.
idk someone posted a workflow, maybe it does work? https://civitai.red/models/2618821
You know, you could just learn comfyui ONCE, and not have any of these problems.
@brnfd24434343d I use the comfy when I have to; I just don't like this type of interface ;)
An AIO model? I wonder how one can fine-tune it? But at least it remains within the reach of the community without needing a runpod.
