━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✨ **Z-Image-Turbo — QwenVL Dual-Mode Auto-Prompt**
ComfyUI · Apache-2.0 · Fast S3-DiT Turbo
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Fast text-to-image S3-DiT diffusion model by Tongyi-MAI (Alibaba) for ComfyUI. QwenVL vision model auto-enhances prompts or reads image styles—choose your mode. Dual-orientation support: landscape 1920×1088 (16:9) or portrait 1088×1920 (9:16) with zero rewiring. 12-step turbo inference runs smooth on 16 GB VRAM. Apache-2.0 licensed—commercial use OK.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✨ **Features**
✅ **Mode A: Keyword → Auto-Expand** — Type subject (e.g., "mountain landscape") → QwenVL PromptEnhancer expands to rich visual prompt → generate
✅ **Mode B: Reference Image → Style Capture** — Drop any reference image → QwenVL describes it → generates new image inspired by its style & composition
✅ **Dual Orientation** — One number switch: 1920×1088 landscape (16:9) or 1088×1920 portrait (9:16); workflow auto-reconfigures, no node rewiring
✅ **Pure Z-Image-Turbo** — Apache-2.0 S3-DiT (Alibaba); no FLUX dev components; no licensing gatekeeping
✅ **Turbo 12-Step Sampling** — euler sampler + beta scheduler + ModelSamplingAuraFlow shift=3 + FluxGuidance for balanced quality/speed
✅ **16 GB Blackwell Ready** — Tested on RTX 5080 (NVFP4 UNet); ~44 seconds per 1920×1088 image at 12 steps
✅ **Flexible Quality/Speed** — 6 steps ≈ 25s; 9 steps ≈ 33s; default 12 steps ≈ 44s; raise to 15 for max detail
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📦 **Required Models** (~13 GB total)
**UNet (pick one based on your GPU):**
• z_image_turbo_nvfp4.safetensors (4.5 GB) — RTX 50 / Blackwell ⭐ recommended for 16 GB
• ~6 GB FP8 community quant — any GPU, 16 GB (CivitAI search "Z-Image-Turbo FP8")
• z_image_turbo_bf16.safetensors (12.3 GB) — any GPU, needs 24 GB+ VRAM
**Text encoder & VAE (same for all GPUs):**
• qwen_3_4b.safetensors (7.5 GB, BF16) — or qwen_3_4b_fp8_mixed.safetensors (5.6 GB) for lower VRAM
• ae.safetensors (~600 MB) — Autoencoder VAE (encode/decode latents)
• QwenVL LLM (Qwen3-VL-2B-Instruct, auto-downloaded ~3 GB on first queue) — Vision model for image read & prompt enhancement
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⬇️ **Download Links** — all from https://huggingface.co/Comfy-Org/z_image_turbo
📁 **ComfyUI/models/diffusion_models/** (or unet/) — pick one:
| File | Size | GPU | Source |
|------|------|-----|--------|
| z_image_turbo_nvfp4.safetensors | 4.5 GB | RTX 50 / Blackwell 16 GB ⭐ | Comfy-Org/z_image_turbo → split_files/diffusion_models/ |
| z_image_turbo_fp8.safetensors *(any name)* | ~6 GB | Any GPU, 16 GB | CivitAI — search "Z-Image-Turbo FP8" (community quant, Apache-2.0 derivative) |
| z_image_turbo_bf16.safetensors | 12.3 GB | Any GPU, 24 GB+ | Comfy-Org/z_image_turbo → split_files/diffusion_models/ |
⚠️ *Workflow file loads `z_image_turbo_nvfp4.safetensors` by default — change the filename in UNETLoader node to match whatever file you downloaded.*
📁 **ComfyUI/models/text_encoders/** (or clip/)
• qwen_3_4b.safetensors (7.5 GB) → split_files/text_encoders/ (BF16, default)
• qwen_3_4b_fp8_mixed.safetensors (5.6 GB) → split_files/text_encoders/ (FP8, saves VRAM)
📁 **ComfyUI/models/vae/**
• ae.safetensors (~600 MB) → split_files/vae/
⚠️ *Workflow file uses `z_image_turbo_nvfp4.safetensors` + `qwen_3_4b.safetensors` by default. Swap filenames in the UNETLoader / CLIPLoader nodes if you use a different variant. QwenVL auto-downloads from AILab node on first use.*
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🧩 **Required Custom Nodes** (2 packs)
1. **ComfyUI-QwenVL** (AILab / 1038lab) — PromptEnhancer node (expand keywords) + VL image-to-text (read reference images)
2. **ComfyUI-Easy-Use** (vjumpkung) — anythingIndexSwitch for mode toggle (keyword vs. image) + orientation picker (landscape/portrait)
Install via ComfyUI Manager (search each pack name).
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🚀 **How to Use**
**Quick Start:**
1. Download & place models in ComfyUI/models/unet/, /clip/, /vae/
2. Install 2 custom node packs via ComfyUI Manager
3. Load ZIMG_Turbo_AUTO_dual_v1.json into ComfyUI
4. Choose mode (top-left):
- **Mode 0:** Keyword input → auto-expand via QwenVL
- **Mode 1:** Reference image → auto-describe via QwenVL
5. Set orientation (top section): 0 = landscape, 1 = portrait
6. (Optional) Adjust CFG scale & steps slider
7. Queue → generate
**Mode A Workflow (Keyword):**
- Type prompt seed → QwenVL expands 30–50 words → sampler enhances details
- Example subjects (copy-paste ready):
- `elegant woman, golden hour rooftop, cinematic editorial` ← workflow default
- `mountain lake at golden hour`
- `cozy coffee shop morning`
- `futuristic rainy city night`
**Mode B Workflow (Reference Image):**
- Drag reference.png to LoadImage node → QwenVL analyzes composition, color, mood → generates new image inspired by style
- Best for: style transfer, "I want something like this but different subject"
**Orientation Toggle:**
- Orientation = 0: 1920×1088 (16:9 landscape)
- Orientation = 1: 1088×1920 (9:16 portrait)
- Toggle updates resolution, aspect ratio, and sampler dynamically; no manual node changes needed
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚙️ **Settings & Parameters**
• **Sampler** — euler + beta scheduler (built-in ComfyUI, no extra nodes required)
• **Base Steps** — 12 (default; set 6–15 via steps slider)
• **CFG Scale** — 5.0 (Guidance strength; range 3.0–7.0)
• **ModelSamplingAuraFlow** — shift = 3 (Distillation-optimized noise scheduler)
• **FluxGuidance** — ON (stabilizes output diversity)
• **Seed** — Randomize or fix for reproducibility
• **Output Format** — PNG + preview in ComfyUI
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 **Performance Tips**
• **Keyword Mode Tricks** — Use descriptors: "portrait of woman, soft studio lighting, sharp focus" yields better than bare names
• **Reference Mode Tips** — Clear, well-composed images work best; abstract/blurry refs may confuse QwenVL reader
• **Speed Tuning** — 6 steps ≈ 25s; 9 steps ≈ 33s; 12 steps ≈ 44s. Quality gain plateaus after 15 steps
• **VRAM Efficiency** — Workflow stays under 12 GB active; safe for concurrent desktop use
• **Batch Generation** — Queue 5–10 images in one session; model stays loaded between generations
• **CFG Sensitivity** — Z-Image responds well to 5.0–6.0; above 7.0 may degrade details. Experiment with reference images.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📝 **Notes & AI Disclosure**
• **AI-Generated Content** — All example outputs are AI-generated by Z-Image-Turbo. Suitable for creative projects, design exploration, and stock footage. Respect local AI disclosure laws when publishing.
• **Hardware Tested** — RTX 5080 16 GB VRAM (NVFP4 UNet + BF16 CLIP), CUDA 12.6, Blackwell SM120
• **VRAM Usage** — NVFP4 + BF16 CLIP: ~13 GB peak (16 GB Blackwell OK); BF16 + BF16 CLIP: ~20 GB (needs 24 GB)
• **Model Downloads** — Exact links verified 2026-06-29; check HF if repo changes
• **No Commercial Restrictions** — Apache-2.0; free for personal & commercial use (see licensing section)
• **Workflow Reuse** — Feel free to modify, share, fork—workflow itself is CC0 public domain
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⭐ **Found this useful?**
• Like if it saved you time generating images
• Comment your results—I read every one
• Follow for new ComfyUI workflows, all tested on 16 GB VRAM
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚖️ **Model Attribution & Licensing**
**Z-Image-Turbo** (Tongyi-MAI / Alibaba)
• License: Apache 2.0 — https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
• Commercial use permitted; derivative models OK
• Attribution appreciated (link to official repo)
**Qwen3-VL-2B-Instruct** (Alibaba DAMO Academy)
• License: Apache 2.0 + Alibaba Qwen Community License
• Used for vision-based prompt enhancement
• Commercial use OK under community license
**Qwen3.4B (Lumina2 Type, Text Encoder)**
• License: Apache 2.0
• Commercial use permitted
**ComfyUI Custom Nodes**
• ComfyUI-QwenVL (MIT/Apache), ComfyUI-Easy-Use (MIT)
**Workflow License** — CC0 Public Domain. This JSON workflow is original work, free to use, modify, and redistribute without attribution (though credit is always appreciated).
All example outputs are AI-generated. Model weights remain the property of Tongyi-MAI (Alibaba); weights must be downloaded separately from official sources.
Description
z-image-turbo
tool
rtx-5080
blackwell
alibaba
qwen
style-transfer
prompt-enhancer
fast
turbo
16gb
commercial-use
apache
s3-dit
portrait
landscape
dual-mode
auto-prompt
qwenvl
text-to-image
workflow
comfyui
Details
Downloads
39
Platform
CivitAI
Platform Status
Available
Created
6/30/2026
Updated
7/1/2026
Deleted
-







