This is a quantized Z-Image that supports ComfyUI latest "TensorCoreFP8Layout".
On supported GPU, ComfyUI can do calculations in FP8 directly, instead of dequantizing + BF16. Much faster than BF16 and classic FP8 scaled models.
Also supports latest ComfyUI quantization features:
FP8 scaled: Higher precision than pure FP8.
Mixed precision: Keep important layers in BF16. Higher precision than pure FP8 scaled model.
If you were hit by errors, update your ComfyUI.
Update:
(12/9/2025): Files updated for ComfyUI v0.4.
ComfyUI v0.4 changed how it handles calibrated metadata. Turbo and Qwen3 files are reuploaded with updated metadata. Please redownload them to prevent quality regression.
(12/6/2025): Added quantized qwen3 4b.
Mixed precision:
Not every layer is quantized. Early and final and some middle layers are still in BF16. That's why this model is about 500MB larger than classic FP8 model.
Post-training calibrated and FP8 tensor core support:
If you have a newer GPU (Nvidia: RTX 4xxx and later, AMD: gfx1200, gfx1201, gfx950):
Those GPUs have hardware FP8 calculation support. This model has post-training calibrated metadata. ComfyUI will automatically read the metadata and utilize those fancy tensor cores and do calculations in FP8 directly, instead of dequantizing + BF16.
On 4090, comparing to original BF16 Z-Image model:
gguf q4_K model: -26% it/s (dequantization overhead)
classic FP8 scaled model: -8% it/s (dequantization overhead)
this model: +31% it/s
this model + torch.compile: +60% it/s
On RTX 5xxx GPUs it should be faster than above because newer tensor cores and better fp8 support. Not tested.
AMD GPU not tested.
Welcome to share your results in the comment section.
If your GPU does not have FP8 tensor core:
This model can still save you ~50% VRAM. And slightly better than classic fp8 scaled model because of mixed precision.
Tips:
torch.compile is recommended.
Pytorch built-in feature, no dependences required and easy to use.
Recommend the "TorchCompileModelAdvanced" node from ComfyUI-KJNodes. Set the "dynamic" to True.
Note: If this is the first time you use torch.compile, it needs to compile the model, usually takes 2min, and the progress bar will be stuck at step 0. Do NOT cancel the job.
It's compatible with sage attention etc.
ComfyUI only utilizes FP8 tensor cores doing linear, not attention. Which means it is 100% compatible with all kinds of attention optimizations (sage attention etc.).
FAQ:
Will you quantize other models to tensorcorefp8?
If there is a very different version released, e.g. z-image base model, I will make a tensorcorefp8 version for that, assuming no official version.
I do not accept commissions.