CivArchive
    ERNIE‑Image - Image Turbo
    Preview 127733290
    Preview 127733292
    Preview 127733293
    Preview 127733289

    Originally Posted: https://ernie.baidu.com/blog/posts/ernie-image

    ERNIE-Image is an open text-to-image model from the ERNIE-Image team at Baidu. Built on a single-stream Diffusion Transformer (DiT) with 8B parameters in a latent diffusion (LDM) framework, it ships with a lightweight Prompt Enhancer that expands brief inputs into richer, more structured prompts to better unlock the model's capabilities. With only 8B DiT parameters, ERNIE-Image achieves state-of-the-art performance among open weights text-to-image models — and it is built not just for visual appeal, but for controllability: accurate content depiction matters as much as aesthetics. In practice, it excels at complex instruction following, precise text rendering, and structured image generation — areas where many existing open weights models still fall short.

    Key Features

    • Competitive performance at compact scale: With only 8B DiT parameters, ERNIE-Image remains competitive with substantially larger models and achieves leading performance among open weights models on several challenging benchmarks.

    • Precise text rendering: ERNIE-Image handles dense, long-form, and layout-sensitive text especially well, producing readable and faithful results in Chinese, English, and other languages.

    • Robust instruction following: The model reliably handles complex prompts, multi-object relations, and knowledge-intensive descriptions, making it well suited for tasks that demand fine-grained control.

    • Structured visual generation: ERNIE-Image is especially effective on images with clear layout or narrative structure — posters, manga/anime storyboards, multi-panel compositions, and cohesive multi-element visuals.

    • Broad stylistic range: Beyond clean graphic design and illustration-style outputs, the model supports realistic photography and distinctive stylized aesthetics, including softer, more cinematic and film-like tones.

    • Easy to deploy and adapt: Thanks to its compact size, ERNIE-Image runs on consumer-grade hardware (24G VRAM), bringing high-quality image generation within reach for research and production use. The moderate parameter count also makes fine-tuning and adaptation straightforward for researchers and developers.

    Description

    FAQ

    Comments (25)

    elevendrApr 17, 2026· 1 reaction
    CivitAI

    So we got a new image model. I wonder how this compares to Flux 2 Klein 9B and Z Image Turbo?

    RhodynoliaethApr 17, 2026· 1 reaction

    Very satisfactory. There’s no real censorship in place; it just needs some fine-tuning. There’s no significant degradation of image quality when using long prompts. The overall performance is quite good. But for now, the only models available by default are Asian models.

    liutyiApr 17, 2026· 5 reactions

    Ok in general. Turbo got some diagonal artifacts. Visible on night lights. Might have limbs issue a bit more often than Klein and much more often then ZIT. Ok with complex prompt. But may put Asian instead of explicitly mentioned other race. Have Built-in prompt enhancer. That helps with short prompts and not that much with long. PE translates prompts to Chinese BTW but despite that increase diversity of faces/race. May generate naked woman. But also may ignore this part of prompt. Text is better be in “” it may render it without, but much lower quality.

    liutyiApr 17, 2026· 10 reactions
    CivitAI

    The model got Built-In prompt enhancer. So did a test with and without it. Same 40 prompts. Same seed. PE is ON. PE is OFF. Same will be done for Turbo version. now available ERNIE Image Turbo test v2 without PE . But test v1 is ready for Turbo in both PE ON and PE OFF . On HF there is a demo available for turbo to test the model.

    KhoraiApr 17, 2026· 1 reaction
    CivitAI

    I can't wait for the model to be quantized so it can smoothly run on my 5080. The images i have seen look promising and i will be keeping an i out

    mrmrswiggly612Apr 17, 2026

    It is quantized already on Hugging Face. Full range of GGUF on unsloth.

    KhoraiApr 17, 2026

    @mrmrswiggly612 Thx for letting me know :D I'll check it out!

    KhoraiApr 17, 2026

    @ferretduck thx! don't know why it didn't show up when i searched civit.

    svvabd323Apr 18, 2026· 20 reactions
    CivitAI

    Asian faces predominate.

    emailhackedbypro969Apr 18, 2026· 4 reactions

    Guess Baidu seems not Western company

    TheP3NGU1NApr 23, 2026· 3 reactions

    No worse than ZiT. Just add "caucasian" to your prompt and :gasp: it stops happening.

    haidensd58757Apr 18, 2026· 3 reactions
    CivitAI

    I just did a test on HF, result? zImage and Klein 9b is better in terms of realism. But Ernie is slighty better in prompt following, with one shot it follows ur prompt accurately while in zImage you have to try 3 times to get it right.

    zzkszzks603Apr 18, 2026· 28 reactions
    CivitAI

    Impressions so far

    Pros:

    Blazing fast. It’s twice as fast as zimage turbo (tested on an RTX 5080).

    Perfect rendering of feet and various types of socks.

    Accurate prompt comprehension with solid adherence.

    Once the prompt is locked in, it enters a stable "gacha" (rolling) rhythm, consistently producing high-quality results.

    Cons:

    Characters are unattractive and lack variety. Additionally, celebrity name keywords are ineffective.

    The probability of anatomical errors (like three legs) is relatively high.

    amazingbeautyApr 23, 2026

    the step time is lower than zit ?

    EricRollei21Apr 18, 2026· 7 reactions
    CivitAI

    Seems to be less censored than other models, great with text, fairly fast (but turbo model not as good as base) but also very hard to get fine details on things and hands and feet not always good. It can't generate over about 1500x1500 without creating body horror.

    zwelimbalo88Apr 20, 2026· 9 reactions
    CivitAI

    Its also SUPER easy to train

    jonog247634Apr 20, 2026

    Are you using ai-toolkit to train? If so can you help with decent settings for it plz. I tried a test and couldn't get any Loras to work with default workflow

    ViennarApr 20, 2026· 9 reactions
    CivitAI

    Ernie Tutbo works 30-50% slower than Z turbo. It is impossible to generate at high resolutions

    TheEarthIsFlatApr 20, 2026· 1 reaction
    CivitAI

    On my, intel Core i9-12900KS, 5090 FE and 64 GB Corsair Dominator Platinum DDR5 6800 megatransfers per second RAM, this model is super fast, but as far as NSFW , or even just sexy women in general, I still like Z-Image better.

    bionovafood863Apr 29, 2026

    Made me laugh. Z-Image is at the end of the tunnel in this regard. This applies to sex.

    antonovfedir193May 9, 2026
    CivitAI

    Treat me like I'm a big dummy. How do I use this in Neo? I've downloaded the model, but I can't seem to figure out the additional required files or how to set up the ERNIE interface.

    Starry_EyesMay 10, 2026
    CivitAI

    I downloaded the FP8 version, but i'm pretty sure it's the prompt enhancer that keeps crashing comfy. Is there any workflows that dont use it?

    Checkpoint
    Ernie

    Details

    Downloads
    640
    Platform
    CivitAI
    Platform Status
    Available
    Created
    4/16/2026
    Updated
    5/15/2026
    Deleted
    -