CivArchive
    Z-Image-Turbo — QwenVL Dual-Mode Auto-Prompt - v1.0
    NSFW
    Preview 135331226
    Preview 135331227
    Preview 135331247
    Preview 135331246
    Preview 135331251
    Preview 135331250
    Preview 135331260
    Preview 135331259
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    ✨ **Z-Image-Turbo — QwenVL Dual-Mode Auto-Prompt**
    ComfyUI · Apache-2.0 · Fast S3-DiT Turbo
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    Fast text-to-image S3-DiT diffusion model by Tongyi-MAI (Alibaba) for ComfyUI. QwenVL vision model auto-enhances prompts or reads image styles—choose your mode. Dual-orientation support: landscape 1920×1088 (16:9) or portrait 1088×1920 (9:16) with zero rewiring. 12-step turbo inference runs smooth on 16 GB VRAM. Apache-2.0 licensed—commercial use OK.
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    ✨ **Features**
    
    ✅ **Mode A: Keyword → Auto-Expand** — Type subject (e.g., "mountain landscape") → QwenVL PromptEnhancer expands to rich visual prompt → generate
    ✅ **Mode B: Reference Image → Style Capture** — Drop any reference image → QwenVL describes it → generates new image inspired by its style & composition
    ✅ **Dual Orientation** — One number switch: 1920×1088 landscape (16:9) or 1088×1920 portrait (9:16); workflow auto-reconfigures, no node rewiring
    ✅ **Pure Z-Image-Turbo** — Apache-2.0 S3-DiT (Alibaba); no FLUX dev components; no licensing gatekeeping
    ✅ **Turbo 12-Step Sampling** — euler sampler + beta scheduler + ModelSamplingAuraFlow shift=3 + FluxGuidance for balanced quality/speed
    ✅ **16 GB Blackwell Ready** — Tested on RTX 5080 (NVFP4 UNet); ~44 seconds per 1920×1088 image at 12 steps
    ✅ **Flexible Quality/Speed** — 6 steps ≈ 25s; 9 steps ≈ 33s; default 12 steps ≈ 44s; raise to 15 for max detail
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    📦 **Required Models** (~13 GB total)
    
    **UNet (pick one based on your GPU):**
    • z_image_turbo_nvfp4.safetensors (4.5 GB) — RTX 50 / Blackwell ⭐ recommended for 16 GB
    • ~6 GB FP8 community quant — any GPU, 16 GB (CivitAI search "Z-Image-Turbo FP8")
    • z_image_turbo_bf16.safetensors (12.3 GB) — any GPU, needs 24 GB+ VRAM
    
    **Text encoder & VAE (same for all GPUs):**
    • qwen_3_4b.safetensors (7.5 GB, BF16) — or qwen_3_4b_fp8_mixed.safetensors (5.6 GB) for lower VRAM
    • ae.safetensors (~600 MB) — Autoencoder VAE (encode/decode latents)
    • QwenVL LLM (Qwen3-VL-2B-Instruct, auto-downloaded ~3 GB on first queue) — Vision model for image read & prompt enhancement
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    ⬇️ **Download Links** — all from https://huggingface.co/Comfy-Org/z_image_turbo
    
    📁 **ComfyUI/models/diffusion_models/** (or unet/) — pick one:
    
    | File | Size | GPU | Source |
    |------|------|-----|--------|
    | z_image_turbo_nvfp4.safetensors | 4.5 GB | RTX 50 / Blackwell 16 GB ⭐ | Comfy-Org/z_image_turbo → split_files/diffusion_models/ |
    | z_image_turbo_fp8.safetensors *(any name)* | ~6 GB | Any GPU, 16 GB | CivitAI — search "Z-Image-Turbo FP8" (community quant, Apache-2.0 derivative) |
    | z_image_turbo_bf16.safetensors | 12.3 GB | Any GPU, 24 GB+ | Comfy-Org/z_image_turbo → split_files/diffusion_models/ |
    
    ⚠️ *Workflow file loads `z_image_turbo_nvfp4.safetensors` by default — change the filename in UNETLoader node to match whatever file you downloaded.*
    
    📁 **ComfyUI/models/text_encoders/** (or clip/)
    • qwen_3_4b.safetensors (7.5 GB) → split_files/text_encoders/  (BF16, default)
    • qwen_3_4b_fp8_mixed.safetensors (5.6 GB) → split_files/text_encoders/  (FP8, saves VRAM)
    
    📁 **ComfyUI/models/vae/**
    • ae.safetensors (~600 MB) → split_files/vae/
    
    ⚠️ *Workflow file uses `z_image_turbo_nvfp4.safetensors` + `qwen_3_4b.safetensors` by default. Swap filenames in the UNETLoader / CLIPLoader nodes if you use a different variant. QwenVL auto-downloads from AILab node on first use.*
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    🧩 **Required Custom Nodes** (2 packs)
    
    1. **ComfyUI-QwenVL** (AILab / 1038lab) — PromptEnhancer node (expand keywords) + VL image-to-text (read reference images)
    2. **ComfyUI-Easy-Use** (vjumpkung) — anythingIndexSwitch for mode toggle (keyword vs. image) + orientation picker (landscape/portrait)
    
    Install via ComfyUI Manager (search each pack name).
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    🚀 **How to Use**
    
    **Quick Start:**
    
    1. Download & place models in ComfyUI/models/unet/, /clip/, /vae/
    2. Install 2 custom node packs via ComfyUI Manager
    3. Load ZIMG_Turbo_AUTO_dual_v1.json into ComfyUI
    4. Choose mode (top-left):
       - **Mode 0:** Keyword input → auto-expand via QwenVL
       - **Mode 1:** Reference image → auto-describe via QwenVL
    5. Set orientation (top section): 0 = landscape, 1 = portrait
    6. (Optional) Adjust CFG scale & steps slider
    7. Queue → generate
    
    **Mode A Workflow (Keyword):**
    - Type prompt seed → QwenVL expands 30–50 words → sampler enhances details
    - Example subjects (copy-paste ready):
      - `elegant woman, golden hour rooftop, cinematic editorial` ← workflow default
      - `mountain lake at golden hour`
      - `cozy coffee shop morning`
      - `futuristic rainy city night`
    
    **Mode B Workflow (Reference Image):**
    - Drag reference.png to LoadImage node → QwenVL analyzes composition, color, mood → generates new image inspired by style
    - Best for: style transfer, "I want something like this but different subject"
    
    **Orientation Toggle:**
    - Orientation = 0: 1920×1088 (16:9 landscape)
    - Orientation = 1: 1088×1920 (9:16 portrait)
    - Toggle updates resolution, aspect ratio, and sampler dynamically; no manual node changes needed
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    ⚙️ **Settings & Parameters**
    
    • **Sampler** — euler + beta scheduler (built-in ComfyUI, no extra nodes required)
    • **Base Steps** — 12 (default; set 6–15 via steps slider)
    • **CFG Scale** — 5.0 (Guidance strength; range 3.0–7.0)
    • **ModelSamplingAuraFlow** — shift = 3 (Distillation-optimized noise scheduler)
    • **FluxGuidance** — ON (stabilizes output diversity)
    • **Seed** — Randomize or fix for reproducibility
    • **Output Format** — PNG + preview in ComfyUI
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    💡 **Performance Tips**
    
    • **Keyword Mode Tricks** — Use descriptors: "portrait of woman, soft studio lighting, sharp focus" yields better than bare names
    • **Reference Mode Tips** — Clear, well-composed images work best; abstract/blurry refs may confuse QwenVL reader
    • **Speed Tuning** — 6 steps ≈ 25s; 9 steps ≈ 33s; 12 steps ≈ 44s. Quality gain plateaus after 15 steps
    • **VRAM Efficiency** — Workflow stays under 12 GB active; safe for concurrent desktop use
    • **Batch Generation** — Queue 5–10 images in one session; model stays loaded between generations
    • **CFG Sensitivity** — Z-Image responds well to 5.0–6.0; above 7.0 may degrade details. Experiment with reference images.
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    📝 **Notes & AI Disclosure**
    
    • **AI-Generated Content** — All example outputs are AI-generated by Z-Image-Turbo. Suitable for creative projects, design exploration, and stock footage. Respect local AI disclosure laws when publishing.
    • **Hardware Tested** — RTX 5080 16 GB VRAM (NVFP4 UNet + BF16 CLIP), CUDA 12.6, Blackwell SM120
    • **VRAM Usage** — NVFP4 + BF16 CLIP: ~13 GB peak (16 GB Blackwell OK); BF16 + BF16 CLIP: ~20 GB (needs 24 GB)
    • **Model Downloads** — Exact links verified 2026-06-29; check HF if repo changes
    • **No Commercial Restrictions** — Apache-2.0; free for personal & commercial use (see licensing section)
    • **Workflow Reuse** — Feel free to modify, share, fork—workflow itself is CC0 public domain
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    ⭐ **Found this useful?**
    
    • Like if it saved you time generating images
    • Comment your results—I read every one
    • Follow for new ComfyUI workflows, all tested on 16 GB VRAM
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    ⚖️ **Model Attribution & Licensing**
    
    **Z-Image-Turbo** (Tongyi-MAI / Alibaba)
    • License: Apache 2.0 — https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
    • Commercial use permitted; derivative models OK
    • Attribution appreciated (link to official repo)
    
    **Qwen3-VL-2B-Instruct** (Alibaba DAMO Academy)
    • License: Apache 2.0 + Alibaba Qwen Community License
    • Used for vision-based prompt enhancement
    • Commercial use OK under community license
    
    **Qwen3.4B (Lumina2 Type, Text Encoder)**
    • License: Apache 2.0
    • Commercial use permitted
    
    **ComfyUI Custom Nodes**
    • ComfyUI-QwenVL (MIT/Apache), ComfyUI-Easy-Use (MIT)
    
    **Workflow License** — CC0 Public Domain. This JSON workflow is original work, free to use, modify, and redistribute without attribution (though credit is always appreciated).
    
    All example outputs are AI-generated. Model weights remain the property of Tongyi-MAI (Alibaba); weights must be downloaded separately from official sources.
    

    Description