Z-Image-Turbo — QwenVL Dual-Mode Auto-Prompt

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✨ **Z-Image-Turbo — QwenVL Dual-Mode Auto-Prompt**
ComfyUI · Apache-2.0 · Fast S3-DiT Turbo
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Fast text-to-image S3-DiT diffusion model by Tongyi-MAI (Alibaba) for ComfyUI. QwenVL vision model auto-enhances prompts or reads image styles—choose your mode. Dual-orientation support: landscape 1920×1088 (16:9) or portrait 1088×1920 (9:16) with zero rewiring. 12-step turbo inference runs smooth on 16 GB VRAM. Apache-2.0 licensed—commercial use OK.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✨ **Features**

✅ **Mode A: Keyword → Auto-Expand** — Type subject (e.g., "mountain landscape") → QwenVL PromptEnhancer expands to rich visual prompt → generate
✅ **Mode B: Reference Image → Style Capture** — Drop any reference image → QwenVL describes it → generates new image inspired by its style & composition
✅ **Dual Orientation** — One number switch: 1920×1088 landscape (16:9) or 1088×1920 portrait (9:16); workflow auto-reconfigures, no node rewiring
✅ **Pure Z-Image-Turbo** — Apache-2.0 S3-DiT (Alibaba); no FLUX dev components; no licensing gatekeeping
✅ **Turbo 12-Step Sampling** — euler sampler + beta scheduler + ModelSamplingAuraFlow shift=3 + FluxGuidance for balanced quality/speed
✅ **16 GB Blackwell Ready** — Tested on RTX 5080 (NVFP4 UNet); ~44 seconds per 1920×1088 image at 12 steps
✅ **Flexible Quality/Speed** — 6 steps ≈ 25s; 9 steps ≈ 33s; default 12 steps ≈ 44s; raise to 15 for max detail

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📦 **Required Models** (~13 GB total)

**UNet (pick one based on your GPU):**
• z_image_turbo_nvfp4.safetensors (4.5 GB) — RTX 50 / Blackwell ⭐ recommended for 16 GB
• ~6 GB FP8 community quant — any GPU, 16 GB (CivitAI search "Z-Image-Turbo FP8")
• z_image_turbo_bf16.safetensors (12.3 GB) — any GPU, needs 24 GB+ VRAM

**Text encoder & VAE (same for all GPUs):**
• qwen_3_4b.safetensors (7.5 GB, BF16) — or qwen_3_4b_fp8_mixed.safetensors (5.6 GB) for lower VRAM
• ae.safetensors (~600 MB) — Autoencoder VAE (encode/decode latents)
• QwenVL LLM (Qwen3-VL-2B-Instruct, auto-downloaded ~3 GB on first queue) — Vision model for image read & prompt enhancement

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⬇️ **Download Links** — all from https://huggingface.co/Comfy-Org/z_image_turbo

📁 **ComfyUI/models/diffusion_models/** (or unet/) — pick one:

| File | Size | GPU | Source |
|------|------|-----|--------|
| z_image_turbo_nvfp4.safetensors | 4.5 GB | RTX 50 / Blackwell 16 GB ⭐ | Comfy-Org/z_image_turbo → split_files/diffusion_models/ |
| z_image_turbo_fp8.safetensors *(any name)* | ~6 GB | Any GPU, 16 GB | CivitAI — search "Z-Image-Turbo FP8" (community quant, Apache-2.0 derivative) |
| z_image_turbo_bf16.safetensors | 12.3 GB | Any GPU, 24 GB+ | Comfy-Org/z_image_turbo → split_files/diffusion_models/ |

⚠️ *Workflow file loads `z_image_turbo_nvfp4.safetensors` by default — change the filename in UNETLoader node to match whatever file you downloaded.*

📁 **ComfyUI/models/text_encoders/** (or clip/)
• qwen_3_4b.safetensors (7.5 GB) → split_files/text_encoders/  (BF16, default)
• qwen_3_4b_fp8_mixed.safetensors (5.6 GB) → split_files/text_encoders/  (FP8, saves VRAM)

📁 **ComfyUI/models/vae/**
• ae.safetensors (~600 MB) → split_files/vae/

⚠️ *Workflow file uses `z_image_turbo_nvfp4.safetensors` + `qwen_3_4b.safetensors` by default. Swap filenames in the UNETLoader / CLIPLoader nodes if you use a different variant. QwenVL auto-downloads from AILab node on first use.*

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🧩 **Required Custom Nodes** (2 packs)

1. **ComfyUI-QwenVL** (AILab / 1038lab) — PromptEnhancer node (expand keywords) + VL image-to-text (read reference images)
2. **ComfyUI-Easy-Use** (vjumpkung) — anythingIndexSwitch for mode toggle (keyword vs. image) + orientation picker (landscape/portrait)

Install via ComfyUI Manager (search each pack name).

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🚀 **How to Use**

**Quick Start:**

1. Download & place models in ComfyUI/models/unet/, /clip/, /vae/
2. Install 2 custom node packs via ComfyUI Manager
3. Load ZIMG_Turbo_AUTO_dual_v1.json into ComfyUI
4. Choose mode (top-left):
   - **Mode 0:** Keyword input → auto-expand via QwenVL
   - **Mode 1:** Reference image → auto-describe via QwenVL
5. Set orientation (top section): 0 = landscape, 1 = portrait
6. (Optional) Adjust CFG scale & steps slider
7. Queue → generate

**Mode A Workflow (Keyword):**
- Type prompt seed → QwenVL expands 30–50 words → sampler enhances details
- Example subjects (copy-paste ready):
  - `elegant woman, golden hour rooftop, cinematic editorial` ← workflow default
  - `mountain lake at golden hour`
  - `cozy coffee shop morning`
  - `futuristic rainy city night`

**Mode B Workflow (Reference Image):**
- Drag reference.png to LoadImage node → QwenVL analyzes composition, color, mood → generates new image inspired by style
- Best for: style transfer, "I want something like this but different subject"

**Orientation Toggle:**
- Orientation = 0: 1920×1088 (16:9 landscape)
- Orientation = 1: 1088×1920 (9:16 portrait)
- Toggle updates resolution, aspect ratio, and sampler dynamically; no manual node changes needed

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⚙️ **Settings & Parameters**

• **Sampler** — euler + beta scheduler (built-in ComfyUI, no extra nodes required)
• **Base Steps** — 12 (default; set 6–15 via steps slider)
• **CFG Scale** — 5.0 (Guidance strength; range 3.0–7.0)
• **ModelSamplingAuraFlow** — shift = 3 (Distillation-optimized noise scheduler)
• **FluxGuidance** — ON (stabilizes output diversity)
• **Seed** — Randomize or fix for reproducibility
• **Output Format** — PNG + preview in ComfyUI

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 **Performance Tips**

• **Keyword Mode Tricks** — Use descriptors: "portrait of woman, soft studio lighting, sharp focus" yields better than bare names
• **Reference Mode Tips** — Clear, well-composed images work best; abstract/blurry refs may confuse QwenVL reader
• **Speed Tuning** — 6 steps ≈ 25s; 9 steps ≈ 33s; 12 steps ≈ 44s. Quality gain plateaus after 15 steps
• **VRAM Efficiency** — Workflow stays under 12 GB active; safe for concurrent desktop use
• **Batch Generation** — Queue 5–10 images in one session; model stays loaded between generations
• **CFG Sensitivity** — Z-Image responds well to 5.0–6.0; above 7.0 may degrade details. Experiment with reference images.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📝 **Notes & AI Disclosure**

• **AI-Generated Content** — All example outputs are AI-generated by Z-Image-Turbo. Suitable for creative projects, design exploration, and stock footage. Respect local AI disclosure laws when publishing.
• **Hardware Tested** — RTX 5080 16 GB VRAM (NVFP4 UNet + BF16 CLIP), CUDA 12.6, Blackwell SM120
• **VRAM Usage** — NVFP4 + BF16 CLIP: ~13 GB peak (16 GB Blackwell OK); BF16 + BF16 CLIP: ~20 GB (needs 24 GB)
• **Model Downloads** — Exact links verified 2026-06-29; check HF if repo changes
• **No Commercial Restrictions** — Apache-2.0; free for personal & commercial use (see licensing section)
• **Workflow Reuse** — Feel free to modify, share, fork—workflow itself is CC0 public domain

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⭐ **Found this useful?**

• Like if it saved you time generating images
• Comment your results—I read every one
• Follow for new ComfyUI workflows, all tested on 16 GB VRAM

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⚖️ **Model Attribution & Licensing**

**Z-Image-Turbo** (Tongyi-MAI / Alibaba)
• License: Apache 2.0 — https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
• Commercial use permitted; derivative models OK
• Attribution appreciated (link to official repo)

**Qwen3-VL-2B-Instruct** (Alibaba DAMO Academy)
• License: Apache 2.0 + Alibaba Qwen Community License
• Used for vision-based prompt enhancement
• Commercial use OK under community license

**Qwen3.4B (Lumina2 Type, Text Encoder)**
• License: Apache 2.0
• Commercial use permitted

**ComfyUI Custom Nodes**
• ComfyUI-QwenVL (MIT/Apache), ComfyUI-Easy-Use (MIT)

**Workflow License** — CC0 Public Domain. This JSON workflow is original work, free to use, modify, and redistribute without attribution (though credit is always appreciated).

All example outputs are AI-generated. Model weights remain the property of Tongyi-MAI (Alibaba); weights must be downloaded separately from official sources.
Description

Details

Files

zImageTurboQwenvlDual_v10.json

Mirrors