CivArchive
    Z-Image Turbo - Quantized for low VRAM - Text encoder fp8 scaled
    Preview 112778761Preview 112778827Preview 112778980

    Z-Image Turbo is a distilled version of Z-Image, a 6B image model based on the Lumina architecture, developed by the Tongyi Lab team at Alibaba Group. Source: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

    I've uploaded quantized versions from bf16 to fp8, meaning the weights had their precision - and consequently their size - halved for a substantial performance boost while keeping most of the quality. Inference time should be similar to regular "undistilled" SDXL, with better prompt adherence and resolution/details. Ideal for weak(er) PCs.

    Features

    • Lightweight: the Turbo version was trained at low steps (5-15), and the fp8 quantization is roughly 6 GB in size, making it accessible even to low-end GPUs.

    • Uncensored: many concepts censored by other models (<cough> Flux <cough>) are doable out of the box.

    • Good prompt adherence: comparable to Flux.1 Dev's, thanks to its powerful text encoder Qwen 3 4B.

    • Text rendering: comparable to Flux.1 Dev's, some say it's even better despite being much smaller (probably not as good as Qwen Image's though).

    • Style flexibility: photorealistic images are its biggest strength, but it can do anime, oil painting, pixel art, low poly, comics, watercolor, vector art / flat design, comic book, sketch, pop art, infographic, etc.

    • High resolution: capable of generating up to 4MP resolution natively (i.e. before upscale) while maintaining coherence.

    Dependencies

    Instructions

    Workflow and metadata are available in the showcase images.

    • Steps: 5 - 15 (6 - 11 is the sweet spot)

    • CFG: 1.0. This will ignore negative prompts, so no need for them.

    • Sampler/scheduler: depends on the art style. Here are my findings so far:

      • Photorealistic:

        • Favourite combination for the base image: euler + beta, simple or bong_tangent (from RES4LYF) - fast and good even at low (5) steps.

        • Most multistep samplers (e.g.: res_2s, res_2m, dpmpp_2m_sde etc) are great, but some will be 40% slower at same steps. They might work better with a scheduler like sgm_uniform.

        • Almost any sampler will work fine - sa_solver, seeds_2, er_sde, gradient_estimation.

        • Some samplers and schedulers add too much texture, you can adjust it by increasing the shift (e.g.: set shift 7 in ComfyUI's ModelSamplingAuraFlow node).

        • Some require more steps (e.g.: karras)

      • Illustrations (e.g.: anime):

        • res_2m or rk_beta produce sharper and more colourful results.

      • Other styles:

        • I'm still experimenting. Use euler (or res_2m) + simple just to be safe for now.

    • Resolution: up to 4MP native. Avoid going higher than 2048. When in doubt, use same as SDXL, Flux.1, Qwen Image, etc (it works even as low as 512px, like SD 1.5 times). Some examples:

      • 896 x 1152

      • 1024 x 1024

      • 1216 x 832

      • 1440 x 1440

      • 1024 x 1536

      • 2048 x 2048

    • Upscale and/or detailers are recommended to fix smaller details like eyes, teeth, hair. See my workflow embedded in the main cover image.

      • If going over 2048px in either side, I recommend the tiled upscale method i.e. using UltimateSD Upscale at low denoise (<= 0.3).

      • Otherwise, I recommend your 2nd pass KSampler to either have a low denoise (< 0.5) or to start the sampling at a later step (e.g.: from 5 to 9 steps).

      • Either way, I recommend setting the shift to 7 to avoid noisy textures in your results. Keep in mind that some schedulers (e.g.: bong_tangent) may override the shift with its own.

      • At this stage, you may use even samplers that didn't work well in the initial generation. For most cases, I like the res_2m + simple combination.

    • Prompting: officially they say long and detailed prompts in natural language works best, but I tested with comma-separated keywords/tags, JSON, whatever... either should work fine. Keep it in English or Mandarin for more accurate results.

    FAQ

    • Is the model uncensored?

      • Yes, it might just not be well trained on the specific concept you're after. Try it yourself.

    • Why do I get too much texture or artifacts after upscaling?

      • See instructions about upscaling above.

    • Does it run on my PC?

      • If you can run SDXL, chances are you can run Z-Image Turbo fp8. If not, might be a good time to purchase more RAM or VRAM.

      • All my images were generated on a laptop with 32GB RAM, RTX3080 Mobile 8GB VRAM.

    • How can I get more variation across seeds?

      • Start at late step (e.g.: from 3 til 11); or

      • Give clear instructions in prompt, something like give me a random variation of the following image: <your prompt>)

    • I'm getting an error on ComfyUI, how to fix it?

      • Make sure your ComfyUI has been updated to the latest version. Otherwise, feel free to post a comment with the error message so the community can help.

    • Is the license permissive?

      • It's Apache 2.0, so quite permissive.

    Description

    This is Z-Image Turbo's text encoder, Qwen 3 4B, quantized in fp8 scaled, which is half the size of its BF16 counterpart with a negligible difference in quality, as you can see in the comparison image.

    Just download it to your "models/text_encoders" folder (or equivalent). In ComfyUI, you can load it via the Load CLIP node or equivalent, with the type set to lumina2.

    Credits to https://huggingface.co/jiangchengchengNLP/qwen3-4b-fp8-scaled/blob/main/qwen3_4b_fp8_scaled.safetensors

    Checkpoint
    ZImageTurbo

    Details

    Downloads
    758
    Platform
    CivitAI
    Platform Status
    Available
    Created
    12/7/2025
    Updated
    12/10/2025
    Deleted
    -

    Files

    zImageTurboQuantized_textEncoderFp8Scaled.safetensors