Release – Version 0.2 (Unsure which model for your GPU? See Rule of Thumb below.)
What’s new?
Since this is meant to become a semi-realism model, I pushed it further in that direction and added more details. I also intentionally switched to new showcase samplers because different seeds simply looked better in this version. A few images were replaced as well.
(Feedback is highly appreciated!)
Node:
Because this is a checkpoint/LoRA merge (I only use LoRAs that I have trained myself), it can cause issues if you use an additional LoRA with a high epoch. Try starting with a LoRA strength of about 0.3 and increase it gradually from there.
Advanced tip:
In the ModelSamplingAuraFlow node, you can adjust the value between 3.00 and 3.10. This can help if you get images with weird hands or other repeated visual glitches.
• bf16 Diffusion Model (fp8/fp16 coming soon, write me if u need it bad ^^)
• No CLIP and no VAE included (ask me if you need help)
• Recommended settings: CFG 1, 8 steps (max. 15)
• Sampler: Euler A, Scheduler: Simple or Beta (Beta highly recommended)
• Sample images are not upscaled and no Hi-Res Fix was usedOriginal ComfyUI Models: Link (here you can find CLIP and VAE)
First Release – Version 0.1
This is my first Z-ImageTurbo aka checkpoint LoRA merge release, so it’s still an early version (V0.1).
• bf16/fp8/fp16 Diffusion Model
• No CLIP and no VAE included (Ask me if you need help with that.)
• Recommended settings: CFG 1, 8 steps (max.15)
• Sampler: Euler A, Scheduler: Simple or Beta (Beta highly recommended)
• Sample images are not upscaled and no Hi-Res Fix was usedOriginal ComfyUI Models: Link (here you can find CLIP and VAE)
I’m still learning and improving, so future updates are planned. Feedback is highly appreciated!
Rule of Thumb
NVIDIA Turing (RTX 20-series)
→ ❌ no real BF16 support, FP16 is the practical option
→ Quality: usually fine, but a bit more fragile than newer formatsNVIDIA Ampere (RTX 30-series)
→ ✅ BF16 works well (problems? try to update your PyTorch/CUDA or use fp16)
→ Quality: generally very close to FP32, little noticeable lossNVIDIA Ada Lovelace (RTX 40-series)
→ ✅ BF16 stable, FP8 partly possible via software
→ Quality: BF16 ~ FP32; FP8 can show noticeable quality drops depending on workloadNVIDIA Blackwell (RTX 50-series, e.g., 5090)
→ ✅ BF16 very solid, FP8 better supported but not magic
→ Quality: FP8 is usable, but there is still some quality loss in many cases... not huge, but realFP32: still needs to be released by Z-Image
Note: You can load FP8 on almost any GPU and benefit from lower VRAM usage when loading, but on hardware without proper FP8 support it is automatically converted to FP16 or FP32 for computation. Because the original data is already quantized to FP8, this can introduce some quality loss, and there is no real FP8 compute speedup, only memory and data transfer benefits.
Description
FAQ
Comments (19)
This looks very nice! Wish i could find some HW to generate locally, in the meantime, i'll have to test it on a rented GPU ^^;
Hey, sorry to hear that.
I'm currently uploading the fp8 version and then the fp16 version, which, according to my information, should run better on older PCs. Maybe that will work for you!
@VisionaryAI_Studio That's the hope! :D
Im on bleeding(ish) edge on Comfy and Pytorch/CUDA installs and have forced everything in comfy to BF16 and haven't noticed any problems.
@StanleyPain yes, but i am still using dark age GPU, SDXL is already struggling ^^;
I need to figure out how to make it run reliably on Apple Silicon
@VisionaryAI_Studio can confirm that fp8 is running on my ancient 1660ti
@wnaa Hey, thanks for the info!
The weights might be stored in FP8 (which saves RAM), but through software emulation they’re automatically converted to FP16. You won’t get the actual FP8 speedup without a GPU that supports Tensor Cores.
Important: I’m just a layperson and I get my info from the internet. :)
@VisionaryAI_Studio yes, FP8 computation is not always possible, but even in GPU with FP8 support, computation using higher precision often yield better results. The speedup regarding FP8 is when doing CPU/GPU RAM swap: less data to move, faster gen :D
@n_Arno Thanks for clarifying! I've updated the note, even though probably hardly anyone reads it anyway ^^
Wonderful. Great work. It produces very vivid results.
Thank you very much for your feedback and kind comment!
Stay strong, my computer…! You can do it!!
I also used mine as a heater :D
Absolutely awesome model! In my opinion, it’s even better than the original. For a v0.1 release, this is already incredibly strong work. Great job — I’d love to see more like this! 👌
Thank you very much! Luckily, I have someone to give me tips 😁
which is better FP16 or bf16? i have an RTX 3090.
Hey!
bf16 if it causes problems fp16
It depends a little on how up-to-date your PyTorch/CUDA is, but bf16 should be the right choice for you.
I've updated the information a little, thanks for asking!
If it's my Lora being merged.
heunpp2+linear_quadratic, cfg: 1.5, step: 14-20, it would be the best choice.
Hey, thanks for the info.
But I don't merge other people's LoRAs
14-20 steps take the turbo away from the whole thing ^^

















