π Z-Image AIO Collection
β‘ Base & Turbo β’ All-in-One β’ Bilingual Text β’ Qwen3-4B
β οΈ IMPORTANT: Requires ComfyUI v0.11.0+
π₯ Download ComfyUI
β¨ What is Z-Image AIO?
Z-Image AIO is an All-in-One repackage of Alibaba Tongyi Lab's 6B parameter image generation models.
Everything integrated:
β VAE already built-in
β Qwen3-4B Text Encoder integrated
β Just download and generate!
π― Available Versions
π₯ Z-Image-Turbo-AIO (8 Steps β’ CFG 1.0)
Ultra-fast generation for production & daily use
β« NVFP4-AIO (7.8 GB) π
π― ONLY for NVIDIA Blackwell GPUs (RTX 50xx)!
β‘ Maximum speed optimized
πΎ Smallest file size
π FP4 precision - blazing fast
Perfect for: RTX 5070, 5080, 5090 owners who want maximum speed
π‘ FP8-AIO (10 GB) β RECOMMENDED
β
Best balance of size & quality
β
Works on 8GB VRAM
β
Fast downloads
β
Ideal for most users
Perfect for: Daily use, testing, RTX 3060/4060/4070
π΅ FP16-AIO (20 GB)
πΎ Same file size as BF16
π ComfyUI auto-casts to BF16 for compute
β οΈ Does NOT enable FP16 compute mode
π¦ Alternative download option
Note: Z-Image does not support FP16 compute - activation values exceed FP16's max range, causing NaN/black images. Weights are cast to BF16 during inference regardless of file format.
Perfect for: Alternative to BF16 download (identical inference behavior)
π BF16-AIO (20 GB) β RECOMMENDED FOR FULL PRECISION
β
BFloat16 full precision
β
Absolute best quality
β
Professional projects
β
Also works on 8GB VRAM
Perfect for: Professional work, maximum quality
π¨ Z-Image-Base-AIO (28-50 Steps β’ CFG 3-5)
Full creative control for pros & LoRA training
π‘ FP8-AIO (10 GB)
β
Efficient for daily use
β
Full CFG control
β
Negative prompts supported
β
8GB VRAM compatible
Perfect for: Daily work with full control
π΅ FP16-AIO (20 GB)
πΎ Same file size as BF16
π ComfyUI auto-casts to BF16 for compute
β οΈ Does NOT enable FP16 compute mode
π¦ Alternative download option
Note: See technical explanation in FAQ below.
Perfect for: Alternative to BF16 download (identical inference behavior)
π BF16-AIO (20 GB) β RECOMMENDED FOR FULL PRECISION
β
Maximum quality
β
Ideal for LoRA training
β
Professional projects
β
Highest precision
Perfect for: LoRA training, professional work
π Turbo vs Base - When to Use?
β‘ Use TURBO when:
β‘ Speed is priority β 8 steps = 3-10 seconds
πΈ Production workflows β Consistent high quality
πΎ Quick iterations β Rapid prototyping
π― Simple prompts β Less complex scenes
π¨ Use BASE when:
π¨ Creative exploration β Higher diversity
π§ LoRA/ControlNet dev β Undistilled foundation
π Complex prompting β Full CFG control
π« Negative prompts needed β Remove unwanted elements
βοΈ Recommended Settings
β‘ Turbo Settings (incl. NVFP4)
π Steps: 8
ποΈ CFG: 1.0 (don't change!)
π² Sampler: res_multistep OR euler_ancestral
π Scheduler: simple OR beta
π Resolution: 1920Γ1088 (recommended)
π« Negative Prompt: β Not used!
π¨ Base Settings
π Steps: 28-50
ποΈ CFG: 3.0-5.0 (start with 4.0)
π² Sampler: euler β OR dpmpp_2m
π Scheduler: normal β OR karras
π Resolution: 512Γ512 to 2048Γ2048
π« Negative Prompt: β
Fully supported!
π Quick Overview
Turbo Versions
β« NVFP4 β 7.8 GB β RTX 50xx only β Max Speed π
π‘ FP8 β 10 GB β 8GB VRAM β Recommended β
π΅ FP16 β 20 GB β β BF16 compute β See FAQ β οΈ
π BF16 β 20 GB β 8GB VRAM β Max Quality β
Base Versions
π‘ FP8 β 10 GB β 8GB VRAM β Efficient
π΅ FP16 β 20 GB β β BF16 compute β See FAQ β οΈ
π BF16 β 20 GB β 8GB VRAM β LoRA Training β
π‘ Prompting Guide
β Good Example:
Professional food photography of artisan breakfast plate.
Golden poached eggs on sourdough toast, crispy bacon, fresh
avocado slices. Morning sunlight creating warm glow. Shallow
depth of field, magazine-quality presentation.
β Bad Example:
breakfast, eggs, bacon, toast, food, morning, plate
π Tips
DO:
β Use natural language
β Be detailed (100-300 words)
β Describe lighting & mood
β Specify camera angle
β English OR Chinese (or both!)
DON'T:
β Tag-style prompts (tag1, tag2, tag3)
β Very short prompts (under 50 words)
β Negative prompts with Turbo
π Bilingual Text Rendering
English:
Neon sign reading "OPEN 24/7" in bright blue letters
above entrance. Modern sans-serif font, glowing effect.
δΈζ:
Traditional tea house entrance with sign reading
"ε€ι΅θΆε" in elegant gold Chinese calligraphy.
Both:
Modern cafe with bilingual sign. "Morning Brew" in
white script above, "ζ¨ζ¦εε‘" in Chinese below.
π₯ Installation
Step 1: Download
Choose your version based on:
GPU: RTX 50xx β NVFP4 possible
VRAM: 8GB β FP8 recommended
Purpose: LoRA Training β Base BF16
Step 2: Place File
ComfyUI/models/checkpoints/
βββ Z-Image-Turbo-FP8-AIO.safetensors
Step 3: Load & Generate
Open ComfyUI (v0.11.0+!)
Use "Load Checkpoint" node
Select your AIO version
Generate!
No separate VAE or Text Encoder needed!
π Credits
Original Model
π¨βπ» Developer: Tongyi Lab (Alibaba Group)
ποΈ Architecture: Single-Stream DiT (6B parameters)
π License: Apache 2.0
Links
π Z-Image Base: https://huggingface.co/Tongyi-MAI/Z-Image
π Z-Image Turbo: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
π§ Text Encoder: https://huggingface.co/Qwen/Qwen3-4B
π Version History
v2.2 - FP16 Clarification
π Updated FP16 descriptions for technical accuracy
β οΈ Clarified: FP16 weights β FP16 compute
π FP16 files are cast to BF16 during inference
v2.1 - NVFP4 Release π
β Z-Image-Turbo-NVFP4-AIO (7.8GB)
β‘ Optimized for NVIDIA Blackwell (RTX 50xx)
π Maximum speed generation
v2.0 - Base AIO Release
β Z-Image-Base-BF16-AIO
β Z-Image-Base-FP16-AIO
β Z-Image-Base-FP8-AIO
π ComfyUI v0.11.0+ support
π Qwen3-4B Text Encoder
v1.1 - FP16 Added
β Z-Image-Turbo-FP16-AIO
π§ Wider GPU compatibility
v1.0 - Initial Release
β
Z-Image-Turbo-FP8-AIO
β
Z-Image-Turbo-BF16-AIO
β
Integrated VAE + Text Encoder
β FAQ
Q: Which version should I choose?
RTX 50xx + Speed β NVFP4 π
Most users β Turbo FP8 β
Full precision β BF16 β
LoRA Training β Base BF16
Q: Turbo or Base?
Fast & simple β Turbo β‘
Full control β Base π¨
Q: Will NVFP4 work on my RTX 4090?
β No! NVFP4 is only for RTX 50xx (Blackwell architecture).
Use FP8 instead for RTX 40xx and older.
Q: Do I need separate VAE/Text Encoder?
β No! Everything is already integrated.
Just Load Checkpoint and go!
Q: Works on 8GB VRAM?
β Yes! All versions work on 8GB VRAM.
(NVFP4 requires RTX 50xx regardless of VRAM)
β οΈ Q: What about FP16 for older GPUs (RTX 2000/3000)?
Important technical clarification:
Z-Image does NOT support FP16 compute type. Here's why:
π Technical reason:
- FP16 max value: ~65,504
- BF16 max value: ~3.39e+38 (same as FP32)
- Z-Image's activation values exceed FP16's range
- Result: Overflow β NaN β Black images
What actually happens:
ComfyUI automatically casts weights to BF16 for computation
You can see this in logs: "model weight dtype X, manual cast: torch.bfloat16"
"Weight dtype" (file format) β "Compute dtype" (actual calculation)
For RTX 20xx users (no native BF16):
BF16 is emulated via FP32 = slower but works
There is no way to run Z-Image in true FP16 compute
FP8 with CPU offload may be a better option for limited VRAM
TL;DR: FP16 and BF16 files behave identically during inference. Choose based on download preference, not GPU compatibility.
π Get Started Now!
Download β Load Checkpoint β Generate!
Recommended versions:
π‘ FP8 for most users (best size/quality balance)
π BF16 for maximum quality
β« NVFP4 for RTX 50xx speed
All versions work on 8GB VRAM
Happy generating! π¨
Description
π Z-Image-Turbo-BF16-AIO | Maximum Quality Photorealistic
8-Step Generation β’ Max Precision β’ Bilingual Text β’ All-in-One
β¨ What is Z-Image-Turbo-BF16-AIO?
This is the BF16 maximum quality All-in-One version of Alibaba's Z-Image-Turbo model - offering the absolute best precision for professional photorealistic generation in just 8 steps.
Key Features:
π BFloat16 precision - Maximum quality
β‘ 8-step generation - Still lightning fast
π¦ All-in-One - VAE + Text Encoder integrated
πΈ Photorealistic - Professional grade
π Bilingual - English & Chinese text rendering
πΎ 20GB file - Full precision
π― 8GB VRAM - Yes, still works!
π― Quick Start
Installation:
Download Z-Image-Turbo-BF16-AIO (20GB)
Place in
ComfyUI/models/checkpoints/Load with "Load Checkpoint" node
Generate maximum quality!
Recommended Settings:
Steps: 9
CFG: 1.0
Sampler: res_multistep
Scheduler: simple
Resolution: 1920Γ1088
π Test Results
All tests on RTX 4060 (8GB VRAM) β’ 1920Γ1088 β’ 9 steps β’ CFG 1.0 β’ res_multistep + simple
π¬ Test 1: Urban Coffee Shop
Prompt:
Modern coffee shop interior with industrial design. Exposed brick walls,
wooden beams on ceiling, pendant lights hanging above bar. Professional
espresso machine on marble counter, barista preparing latte art. Customers
sitting at wooden tables with laptops. Large windows showing city street
outside. Warm afternoon lighting, cozy atmosphere. Photorealistic style,
professional architectural photography, 8K detail.
Time: ~TBD seconds
[Image placeholder]
π¬ Test 2: Traditional Chinese Architecture
Prompt:
Beautiful traditional Chinese temple courtyard during golden hour. Red
wooden pillars with intricate gold carvings, curved tile roofs with
upturned eaves. Stone lion statues flanking entrance. Cherry blossoms
in full bloom around courtyard. Red lanterns hanging from eaves. Soft
sunset light casting warm glow. Ancient architecture, peaceful atmosphere.
Professional travel photography, ultra-sharp detail, cinematic composition.
Time: ~TBD seconds
[Image placeholder]
π¬ Test 3: Gourmet Food Photography
Prompt:
Professional food photography of gourmet sushi platter on black slate plate.
Assorted nigiri and maki rolls with fresh salmon, tuna, and avocado.
Garnished with pickled ginger, wasabi, and microgreens. Chopsticks placed
beside plate. Rustic wooden table surface. Soft natural window light from
side creating subtle shadows. Shallow depth of field, appetizing presentation.
Restaurant-quality styling, commercial food photography, magazine-worthy.
Time: ~TBD seconds
[Image placeholder]
π¬ Test 4: Modern Architecture
Prompt:
Stunning contemporary architecture, white concrete building with curved
organic shapes. Floor-to-ceiling glass windows reflecting blue sky and
clouds. Minimalist modern design with clean geometric lines. Surrounded
by landscaped gardens with native plants. Shot from low angle emphasizing
height and drama. Bright daylight, high contrast shadows. Professional
architectural photography, ultra-sharp focus, award-winning composition.
Time: ~TBD seconds
[Image placeholder]
π¬ Test 5: Bilingual Signage
Prompt:
Modern fusion restaurant exterior at evening time. Large illuminated sign
above entrance reading "Dragon Kitchen" in elegant English script, with
"ιΎε¨" in traditional Chinese characters below. Both texts in matching
warm golden glow. Contemporary storefront with glass facade, interior
lights visible. Urban street setting with pedestrians. Bilingual text
perfectly rendered, professional signage design. Evening photography,
moody atmosphere, vibrant lighting.
Time: ~TBD seconds
[Image placeholder]
π‘ Prompting Guide
Natural Language Works Best:
Good Example:
β
A cozy bookstore with floor-to-ceiling wooden shelves filled with
colorful books, comfortable reading nooks with cushions near large
windows, warm pendant lighting, peaceful afternoon atmosphere,
professional interior photography
Bad Example:
β bookstore, books, chairs, window, cozy, warm light
Bilingual Text Rendering:
English:
Neon sign reading "OPEN 24/7" in bright blue letters above entrance.
Modern sans-serif font, glowing effect.
Chinese:
Traditional tea house sign with "ε€ι΅θΆε" in elegant gold Chinese
calligraphy on red wooden board with ornate border.
Tips:
β Natural language (not tags!)
β Detailed (100-300 words)
β Include lighting, mood, style
β English or Chinese work!
β No negative prompts needed
βοΈ Settings
Tested Configuration:
Resolution: 1920Γ1088
Steps: 9
CFG: 1.0
Sampler: res_multistep
Scheduler: simple
Other Resolutions:
1024Γ1024 - Square
1536Γ1024 - Landscape
1024Γ1536 - Portrait
1920Γ1088 - Wide (tested!)
Performance:
RTX 4060 8GB: ~3-5 seconds
Yes, works on 8GB VRAM!
π§ Installation
Step 1: Download
Z-Image-Turbo-BF16-AIO.safetensors (20GB)
Step 2: Place File
ComfyUI/models/checkpoints/
βββ Z-Image-Turbo-BF16-AIO.safetensors
Step 3: Load & Generate
Use "Load Checkpoint" node
VAE & encoder already included!
Set: 9 steps, CFG 1.0, res_multistep + simple
Write detailed prompt
Generate maximum quality!
π Advantages
BF16 Benefits:
π Maximum precision - BFloat16 format
π¨ Best possible quality - No precision loss
β¨ Professional grade - For critical work
πΈ Finest details - Every pixel perfect
vs FP8 Version:
π Maximum quality (FP8 is very close)
πΎ Larger file (20GB vs 10GB)
π― For absolute best results
Still works on 8GB VRAM!
vs Other Models:
β‘ 8 steps vs 20-50 (SDXL/Flux)
π Bilingual text (unique!)
π¦ All-in-One (simple!)
π 3-5 seconds per image
β FAQ
Q: BF16 vs FP8 quality?
A: BF16 is maximum precision. FP8 is very close but slightly compressed.
Q: Worth the extra size?
A: For professional work, yes! For testing/casual, FP8 is great.
Q: Need negative prompts?
A: No! Model doesn't use them.
Q: Can I change settings?
A: Keep CFG 1.0, res_multistep, 9 steps for best results.
Q: Works on 8GB VRAM?
A: Yes! Tested on RTX 4060 8GB.
Q: Render Chinese text?
A: Yes! "Sign reading 'εε‘εΊ'"
Q: Commercial use?
A: Yes! Apache 2.0 license.
π― Perfect For
π Professional work - Maximum quality needed
πΈ Commercial photography - Critical projects
π¨ High-end content - No compromises
π Bilingual materials - EN/CN text
π’ Architecture viz - Detailed renders
πΌ Client work - Best quality delivery
π Food photography - Magazine grade
βοΈ 8GB VRAM - Still accessible!
π Troubleshooting
Weird images?
Check CFG = 1.0
Use res_multistep
9 steps exactly
Text not working?
Put in quotes: "COFFEE"
Describe style & position
EN or CN only
Out of memory?
Lower resolution
This is max precision version
Try FP8 instead
πΎ Requirements
VRAM: 8GB (RTX 4060 tested)
RAM: 16GB minimum
Storage: 22GB
ComfyUI: Latest version
π When to Use BF16 vs FP8?
Choose BF16 if:
β Professional/commercial work
β Maximum quality needed
β No file size concerns
β Best of the best
Choose FP8 if:
β Testing/casual use
β Limited storage
β Faster downloads
β Quality still excellent
Both are great! FP8 = 95% quality, BF16 = 100%
π Credits
Model: Tongyi Lab (Alibaba)
Architecture: Single-Stream DiT (6B)
License: Apache 2.0
Format: BF16 All-in-One
Size: 20GB
Precision: BFloat16 (maximum)
VRAM: 8GB
Speed: ~3-5s @ 1920Γ1088
Release: November 2025
Maximum quality for professionals! π