This ComfyUI workflow is designed for Z-Image Base + ControlNet + Turbo high/low-noise staged generation. The main purpose of this workflow is to combine reference-image structure control, Z-Image Base foundation generation, Turbo low-noise refinement, and an optional 1.5x final enhancement pass into one controlled image-generation pipeline.
Unlike a simple Z-Image text-to-image workflow, this graph does not rely only on the prompt. It starts from a reference image, extracts structural guidance through a ControlNet preprocessor, uses Z-Image Base to establish the main image layout in the high-noise stage, then hands the partially denoised latent to Z-Image Turbo for a lower-noise finishing stage. This makes the workflow useful when users want stronger composition control, clearer structure preservation, and better final detail than a single-model one-pass generation.
The workflow is built around two Z-Image model routes. The first route uses z_image_bf16.safetensors as the Base model. This stage is responsible for the early image structure, composition, global layout, subject placement, lighting direction, and main visual identity. The second route uses z_image_turbo_bf16.safetensors as the Turbo model. This stage is used later in the sampling process to refine the already-formed image, improve details, clean edges, strengthen texture, and produce a sharper final result.
The key ControlNet component is Z-Image-Fun-Controlnet-Union-2.1.safetensors, loaded through ModelPatchLoader and applied with ZImageFunControlnet. This gives the Z-Image Base stage a structure-guided generation route. The reference image is loaded, resized, and processed through AIO_Preprocessor using DepthAnythingV2Preprocessor. This creates a depth-based control image, which helps the workflow preserve spatial relationships, foreground-background separation, object depth, and overall composition.
Depth control is useful when the user wants the final image to follow the reference image’s structure without copying the image directly. For example, if the reference contains a person standing in a specific position, a low-angle composition, a strong foreground object, or a clear depth relationship between subject and background, the depth preprocessor can turn that image into structural guidance. The final prompt can then replace the subject, setting, lighting, or style while still following the overall spatial logic.
The workflow also includes a Z-Image Fun Distill LoRA route through Z-Image-Fun-Lora-Distill-4-Steps-2602-ComfyUI.safetensors. This LoRA is applied to the Base model route before sampling. It helps make the high-noise stage more efficient and more responsive, while still allowing the Base model to establish the main structure. This is useful for online workflows where users need faster iteration without giving up too much control.
A central feature of this workflow is SplitSigmas. The BasicScheduler creates a sigma schedule, and SplitSigmas divides it into high-sigma and low-sigma sections. The high-sigma section is used for the first stage. This is where the image is still forming from noise, so the model has the most influence over composition and structure. The low-sigma section is used for the second stage. This is where the image already exists, so the model focuses more on refinement, detail, texture, and final polish.
In the uploaded setup, the sigma split happens around step 8 in a 10-step schedule. This means most of the early generation is used to build the image foundation, while the later stage is used for finishing. This is the core logic behind the “high/low noise” workflow. Base handles the high-noise foundation. Turbo handles the low-noise completion.
The first generation stage uses Z-Image Base with ControlNet. It receives the prompt conditioning, negative conditioning, depth control image, ControlNet model patch, VAE, sampler, guider, and high-sigma schedule. The CFG value in the Base stage is higher, making it more suitable for following the prompt and structural control. DetailDaemon is also used in this stage with a stronger detail setting, helping the early render produce richer forms and more detailed structure.
The second generation stage uses Z-Image Turbo. Instead of starting from zero, it receives the denoised latent output from the Base stage. This means Turbo does not need to solve the full composition from scratch. It continues from the already-formed image and works mostly on the low-noise portion of the schedule. This makes the final result cleaner and more efficient. It also reduces the risk of Turbo producing a fast but structurally weaker image on its own.
The Turbo stage uses its own CFGGuider, DetailDaemonSamplerNode, and low-sigma schedule. In the uploaded setup, the Turbo guidance is lower than the Base stage. This is appropriate because the second stage is not meant to fully redesign the image. It is mainly used to polish the image while preserving the Base-stage layout. If the Turbo stage is too strong, it may drift away from the first-stage composition. If it is too weak, the final result may not gain enough refinement.
The workflow also includes a 1.5x final enhancement section. After the staged generation, the image can be decoded, scaled with ImageScaleBy using Lanczos at 1.5x, encoded back into latent space with VAEEncode, and refined again with SamplerCustomAdvanced. This gives the workflow a practical final polish route. It is useful when the generated image already looks good but needs higher resolution, cleaner surface detail, stronger edge definition, or more refined texture.
This three-part structure makes the workflow more powerful than a normal Z-Image Base ControlNet graph. The first part extracts structure from the reference image. The second part uses Base + ControlNet to create the main image under strong structural control. The third part uses Turbo and optional 1.5x refinement to improve the final visual quality. This gives creators a flexible balance between control, speed, and detail.
The prompt section supports both English and Chinese prompt writing. The uploaded graph includes a detailed fantasy/anime-style prompt describing a red-skinned oni girl, horns, ornate black dress, spirit cats, floating objects, fantasy atmosphere, glowing cats, colorful starry background, and close-up composition. The workflow can also support cinematic photography prompts, fashion visuals, fantasy scenes, anime characters, concept art, and social media cover-style images.
The negative prompt route is also important. The workflow includes negative conditioning to suppress blur, ugly artifacts, bad results, and other unwanted output problems. Users can expand this negative prompt based on the target style. For anime images, terms related to bad anatomy, bad hands, extra fingers, deformed face, text, watermark, and logo are useful. For realistic images, terms related to overexposure, underexposure, low contrast, plastic skin, distorted faces, and bad lighting can be added.
Main features:
- Z-Image Base + Z-Image Turbo staged generation workflow
- High-noise Base stage for structure and composition
- Low-noise Turbo stage for refinement and final detail
- Z-Image-Fun-Controlnet-Union-2.1 ControlNet support
- DepthAnythingV2Preprocessor reference-image depth control
- ZImageFunControlnet structural guidance route
- SplitSigmas high/low-noise schedule separation
- BasicScheduler 10-step sampling structure
- DetailDaemonSamplerNode detail control
- SamplerCustomAdvanced multi-stage sampling
- Z-Image Fun Distill 4-step LoRA support
- Qwen 3 4B text encoder support
- AE VAE support
- Optional 1.5x final upscale and latent refinement
- Suitable for pose, depth, composition, fantasy, anime, and cinematic visual generation
Recommended use cases:
Z-Image Base ControlNet testing, Z-Image Turbo refinement, high/low-noise staged rendering, depth-guided image generation, reference composition control, anime character generation, fantasy illustration, cinematic poster creation, fashion photography concepts, controlled AI cover images, RunningHub online workflow publishing, Civitai showcase images, prompt testing, and Base-vs-Turbo pipeline research.
Suggested workflow:
Start by loading a clear reference image. The reference image should have a strong composition and readable depth structure. If the source image is too blurry or visually chaotic, the depth preprocessor may produce weaker guidance. A clean reference with clear foreground, subject, and background usually gives better ControlNet results.
Use the DepthAnythingV2Preprocessor to generate the control image. This control image guides the structure of the final generation. If the output does not follow the reference enough, increase ControlNet strength or choose a clearer reference. If the output feels too locked to the reference, reduce the ControlNet strength.
Write a prompt that defines the new image identity. Since ControlNet handles structure, the prompt should focus on what the final image should become. Describe the subject, character, clothing, materials, lighting, atmosphere, background, camera angle, and style. For complex fantasy images, describe the main subject first, then add environment and decorative details.
Use the Base high-noise stage to build the foundation. This stage should determine the main composition, body structure, scene layout, and lighting logic. If the Base stage is weak, the Turbo stage will not fully fix it. Adjust the reference image, prompt, ControlNet strength, or seed until the foundation is stable.
Use the Turbo low-noise stage for final refinement. This stage should improve clarity and texture without replacing the whole image. If Turbo changes the image too much, reduce its influence or keep the low-noise stage more conservative. If the result lacks detail, increase the DetailDaemon settings carefully.
Use the 1.5x final refinement only after the main image is good. Upscaling a bad result only makes the problems larger. First make sure the composition, face, pose, and lighting work. Then use the final upscale and latent refinement section for output polish.
When evaluating the result, check reference structure, prompt adherence, subject identity, depth consistency, detail quality, edge clarity, and whether the Base-to-Turbo handoff preserved the image correctly. A good result should feel structurally controlled but still visually creative and polished.
This workflow is designed for creators who want more control than normal text-to-image generation and more quality than a basic ControlNet pass. By combining Z-Image Base, Z-Image Turbo, ControlNet Union, depth preprocessing, SplitSigmas, DetailDaemon, and final latent refinement, it provides a practical advanced pipeline for controlled Z-Image generation inside ComfyUI.
🎥 YouTube Video Tutorial
Want to know what this workflow actually does and how to start fast?
This video explains what the tool is, how to launch the workflow instantly, and shares my core design logic — no local setup, no complicated environment.
Everything starts directly on RunningHub, so you can experience it in action first.
👉 YouTube Tutorial: https://youtu.be/mYpdxdHGlQM
Before you begin, I recommend watching the video thoroughly — getting the full context helps you understand the tool faster and avoid common detours.
⚙️ RunningHub Workflow
Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2022635998764601346?inviteCode=rh-v1111
If the results meet your expectations, you can later deploy it locally for customization.
🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!
📺 Bilibili Updates (Mainland China & Asia-Pacific)
If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1quZ7BpEPE/
☕ Support Me on Ko-fi
If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk
💼 Business Contact
For collaboration or inquiries, please contact aiksk95 on WeChat.
🎥 YouTube 视频教程
想了解这个工作流到底是怎样的工具,以及如何快速启动?
视频主要介绍 工具定位、快速启动方法 和 我的构筑思路。
我们会直接在 RunningHub 上进行演示,让你第一时间看到实际效果。
👉 YouTube 教程: https://youtu.be/mYpdxdHGlQM
开始前建议尽量完整地观看视频 —— 把握整体思路会更快上手,也能少走常见弯路。
⚙️ 在线体验工作流
现在就可以在线体验,无需安装。
👉 工作流: https://www.runninghub.ai/post/2022635998764601346?inviteCode=rh-v1111
打开上方链接即可直接运行该工作流,实时查看生成效果。
如果觉得效果理想,你也可以在本地进行自定义部署。
🎁 粉丝福利: 注册即送 1000 积分,每日登录 100 积分,畅玩 4090 体验 48 G 超级性能!
📺 Bilibili 更新(中国大陆及南亚太地区)
如果你在中国大陆或南亚太地区,可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频: https://www.bilibili.com/video/BV1quZ7BpEPE/
我会在 夸克网盘 持续更新模型资源:
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户,方便进行创作与学习。
