ERNIE‑Image - CivArchive (CivitAI Archive)

ERNIE‑Image - Image

Originally Posted: https://ernie.baidu.com/blog/posts/ernie-image

ERNIE-Image is an open text-to-image model from the ERNIE-Image team at Baidu. Built on a single-stream Diffusion Transformer (DiT) with 8B parameters in a latent diffusion (LDM) framework, it ships with a lightweight Prompt Enhancer that expands brief inputs into richer, more structured prompts to better unlock the model's capabilities. With only 8B DiT parameters, ERNIE-Image achieves state-of-the-art performance among open weights text-to-image models — and it is built not just for visual appeal, but for controllability: accurate content depiction matters as much as aesthetics. In practice, it excels at complex instruction following, precise text rendering, and structured image generation — areas where many existing open weights models still fall short.

Key Features

•Competitive performance at compact scale: With only 8B DiT parameters, ERNIE-Image remains competitive with substantially larger models and achieves leading performance among open weights models on several challenging benchmarks.
•Precise text rendering: ERNIE-Image handles dense, long-form, and layout-sensitive text especially well, producing readable and faithful results in Chinese, English, and other languages.
•Robust instruction following: The model reliably handles complex prompts, multi-object relations, and knowledge-intensive descriptions, making it well suited for tasks that demand fine-grained control.
•Structured visual generation: ERNIE-Image is especially effective on images with clear layout or narrative structure — posters, manga/anime storyboards, multi-panel compositions, and cohesive multi-element visuals.
•Broad stylistic range: Beyond clean graphic design and illustration-style outputs, the model supports realistic photography and distinctive stylized aesthetics, including softer, more cinematic and film-like tones.
•Easy to deploy and adapt: Thanks to its compact size, ERNIE-Image runs on consumer-grade hardware (24G VRAM), bringing high-quality image generation within reach for research and production use. The moderate parameter count also makes fine-tuning and adaptation straightforward for researchers and developers.

Description

https://huggingface.co/Comfy-Org/ERNIE-Image/tree/main/diffusion_models

FAQ

Comments (25)

elevendrApr 17, 2026· 1 reaction

CivitAI

So we got a new image model. I wonder how this compares to Flux 2 Klein 9B and Z Image Turbo?

RhodynoliaethApr 17, 2026· 1 reaction

Very satisfactory. There’s no real censorship in place; it just needs some fine-tuning. There’s no significant degradation of image quality when using long prompts. The overall performance is quite good. But for now, the only models available by default are Asian models.

liutyiApr 17, 2026· 2 reactions

https://wiki.liutyi.info/display/AI/ERNIE+Image+Turbo+test+v2 Vs https://wiki.liutyi.info/display/AI/Z+Image+Turbo+test+v2 vs https://wiki.liutyi.info/display/AI/FLUX.2+Klein+9B+test+v2

liutyiApr 17, 2026· 5 reactions

Ok in general. Turbo got some diagonal artifacts. Visible on night lights. Might have limbs issue a bit more often than Klein and much more often then ZIT. Ok with complex prompt. But may put Asian instead of explicitly mentioned other race. Have Built-in prompt enhancer. That helps with short prompts and not that much with long. PE translates prompts to Chinese BTW but despite that increase diversity of faces/race. May generate naked woman. But also may ignore this part of prompt. Text is better be in “” it may render it without, but much lower quality.

liutyiApr 26, 2026

Another fast visual comparison using 20 images (test created by Gemini)

- https://wiki.liutyi.info/display/AI/ERNIE+Image+test+2.20.gemini
- https://wiki.liutyi.info/display/AI/ERNIE+Image+Turbo+test+2.20.gemini
- https://wiki.liutyi.info/display/AI/FLUX.2+Klein+9B+test+2.20.gemini
- https://wiki.liutyi.info/display/AI/FLUX.2+Klein+base+9B+test+v2.20.gemini
- https://wiki.liutyi.info/display/AI/Z+Image+Turbo+test+v2.20.gemini
Just to see how one of the top models goes thru the test
- https://wiki.liutyi.info/display/AI/Nano+Banana+2+test+v2.20.gemini

liutyiApr 17, 2026· 10 reactions

CivitAI

The model got Built-In prompt enhancer. So did a test with and without it. Same 40 prompts. Same seed. PE is ON. PE is OFF. Same will be done for Turbo version. now available ERNIE Image Turbo test v2 without PE . But test v1 is ready for Turbo in both PE ON and PE OFF . On HF there is a demo available for turbo to test the model.

KhoraiApr 17, 2026· 1 reaction

CivitAI

I can't wait for the model to be quantized so it can smoothly run on my 5080. The images i have seen look promising and i will be keeping an i out

mrmrswiggly612Apr 17, 2026

It is quantized already on Hugging Face. Full range of GGUF on unsloth.

KhoraiApr 17, 2026

@mrmrswiggly612 Thx for letting me know :D I'll check it out!

ferretduckApr 17, 2026· 2 reactions

https://civitai.red/models/2546115/ernie-ernie-turbo-gguf as well

KhoraiApr 17, 2026

@ferretduck thx! don't know why it didn't show up when i searched civit.

svvabd323Apr 18, 2026· 20 reactions

CivitAI

Asian faces predominate.

emailhackedbypro969Apr 18, 2026· 4 reactions

Guess Baidu seems not Western company

TheP3NGU1NApr 23, 2026· 3 reactions

No worse than ZiT. Just add "caucasian" to your prompt and :gasp: it stops happening.

haidensd58757Apr 18, 2026· 3 reactions

CivitAI

I just did a test on HF, result? zImage and Klein 9b is better in terms of realism. But Ernie is slighty better in prompt following, with one shot it follows ur prompt accurately while in zImage you have to try 3 times to get it right.

zzkszzks603Apr 18, 2026· 28 reactions

CivitAI

Impressions so far

Pros:

Blazing fast. It’s twice as fast as zimage turbo (tested on an RTX 5080).

Perfect rendering of feet and various types of socks.

Accurate prompt comprehension with solid adherence.

Once the prompt is locked in, it enters a stable "gacha" (rolling) rhythm, consistently producing high-quality results.

Cons:

Characters are unattractive and lack variety. Additionally, celebrity name keywords are ineffective.

The probability of anatomical errors (like three legs) is relatively high.

amazingbeautyApr 23, 2026

the step time is lower than zit ?

EricRollei21Apr 18, 2026· 7 reactions

CivitAI

Seems to be less censored than other models, great with text, fairly fast (but turbo model not as good as base) but also very hard to get fine details on things and hands and feet not always good. It can't generate over about 1500x1500 without creating body horror.

zwelimbalo88Apr 20, 2026· 9 reactions

CivitAI

Its also SUPER easy to train

jonog247634Apr 20, 2026

Are you using ai-toolkit to train? If so can you help with decent settings for it plz. I tried a test and couldn't get any Loras to work with default workflow

ViennarApr 20, 2026· 9 reactions

CivitAI

Ernie Tutbo works 30-50% slower than Z turbo. It is impossible to generate at high resolutions

TheEarthIsFlatApr 20, 2026· 1 reaction

CivitAI

On my, intel Core i9-12900KS, 5090 FE and 64 GB Corsair Dominator Platinum DDR5 6800 megatransfers per second RAM, this model is super fast, but as far as NSFW , or even just sexy women in general, I still like Z-Image better.

bionovafood863Apr 29, 2026

Made me laugh. Z-Image is at the end of the tunnel in this regard. This applies to sex.

antonovfedir193May 9, 2026

CivitAI

Treat me like I'm a big dummy. How do I use this in Neo? I've downloaded the model, but I can't seem to figure out the additional required files or how to set up the ERNIE interface.

Starry_EyesMay 10, 2026

CivitAI

I downloaded the FP8 version, but i'm pretty sure it's the prompt enhancer that keeps crashing comfy. Is there any workflows that dont use it?

Checkpoint

Ernie

by CivitaiOfficial

Download (Beta) View on CivitAI

base model

Details

Downloads

1,647

Platform

CivitAI

Platform Status

Available

Created

4/16/2026

Updated

7/17/2026

Deleted

Files

ernieImage_image.safetensors

Size:

14.96 GB

SHA256:

94a35abaa0899cccc34d2e37310abf74a0a714256526117bba782c7eb4eb91c7

Mirrors

HuggingFace (2 mirrors)

ernie-image.safetensors

CivitAI (2 mirrors)

ernieImage_image.safetensors

ernieImage_bf16.safetensors

ModelScope CN (1 mirrors)

ernie-image.safetensors

Key Features

Description

FAQ

What is ERNIE‑Image?

How do I use ERNIE‑Image?

What files are available and where can I download them?

Comments (25)

Details

Files

ernieImage_image.safetensors

Mirrors