CivArchive
    Z image base controlnet duo sampling - v1.0
    NSFW
    Preview 130491511

    Introduction

    Everything written below is based purely on my personal experience and observations. I can’t guarantee everything is correct, so take it as reference material only. Discussion and corrections are always welcome.

    As an open-source model focused on photorealistic image generation, Z-Image-Base (ZIB) 官方页面 is easily one of the current SOTA models. Its image quality is extremely strong, and its prompt adherence and fine-detail control are honestly on another level. For example, you can modify the style of a single button purely through prompts without noticeably affecting the rest of the image.

    However, ZIB is much weaker when handling spatial relationships between multiple subjects, especially multiple characters. Basic two-person poses are usually manageable, but uncommon poses often fail badly, with frequent misalignment and anatomical issues.
    For threesome or more complex interaction scenes, no matter how detailed the prompt is, while it can occasionally generate good results, the overall experience feels more like rolling a gacha. Most outputs end up with broken anatomy or spatial errors.

    Because of this, I started experimenting with ControlNet + mlutiple processors to guide composition more reliably.

    I used the following combination:

    After testing ZIB + 8 Steps LoRA + ControlNet together, I found that ZIB’s ControlNet can solve certain structural problems, but the resulting image quality still often feels lacking:

    • overall sharpness is weaker

    • lighting tends to look flat and gray

    • prompts related to lighting respond poorly

    • ControlNet strength at 1.0 often produces terrible-looking results

    Under the 8 Steps LoRA setup, adjusting CFG (usually between 1 and 2) can sometimes help, but the workflow still feels heavily constrained by the base image. Some reference images even produce extremely strange lighting behavior.

    Overall, the experience feels very different from SDXL workflows, where high-quality results often work almost out-of-the-box with minimal tweaking. I’ve also seen similar complaints on Reddit about ZIB’s ControlNet implementation feeling relatively weak.

    Another important issue is that in ZIB, generation resolution directly affects focal length and depth-of-field behavior. Different resolutions can produce dramatically different compositions and camera feel. Solving this became one of the key goals of my workflow.


    Workflow

    After a lot of experimentation, I ended up building a relatively simple ControlNet workflow that gave me much more satisfying results.

    The core idea is straightforward:

    1. Use any checkpoint you like together with the 8 Steps LoRA
      (This is extremely important. For ZIB, the 8 Steps LoRA is almost mandatory in my opinion. It significantly improves image quality and detail rendering.
      Of course, if your checkpoint already has something similar baked in, you don’t need to add it separately.)

    2. Use ControlNet together with a suitable resolution — usually a relatively low resolution — and perform a short initial sampling pass (around 2 steps)
      This stage establishes a solid base latent with correct character positioning and spatial relationships.

    3. Upscale the latent to the target resolution, then remove ControlNet and continue sampling using only the checkpoint + 8 Steps LoRA
      This preserves the structural consistency from ControlNet while dramatically improving image quality, lighting, and detail in the second pass.
      More importantly, you can freely adjust the second-pass resolution without heavily affecting focal length or depth-of-field behavior.

    4. The “small-resolution first pass + large-resolution second pass” approach also works well outside of ControlNet workflows
      It helps reduce the coupling between focal length behavior and image sharpness/resolution.


    Recommended Models

    Personally, I use a 1:1 merge of:

    • Big Love

    • Pornmaster

    I feel Big Love performs very well in anatomy and clothing structure, while Pornmaster produces character aesthetics that fit my taste better.
    The merged result feels surprisingly balanced in actual use.


    Recommended Sampler

    Under ZIB + 8 Steps workflows, I strongly recommend samplers that inject noise at every step.
    These samplers consistently produce better anatomy and micro-detail quality than more deterministic alternatives.

    My personal recommendation is:

    • Euler A


    Key Parameters

    In this workflow, there are basically only three major parameters you need to tune.


    1. ControlNet Resolution (First-Pass Resolution)

    The first sampling pass establishes the base composition latent, so this resolution matters a lot.

    I usually default to:

    • short edge = 768

    This feels like a very balanced starting point.

    In ZIB, lower resolutions effectively produce a “longer focal length / shallower depth-of-field” look:

    • subjects become larger

    • background elements become fewer and more compressed

    • the model focuses more attention on the main subjects

    • prompt responsiveness for character details improves noticeably

    This parameter can be adjusted depending on your goal:

    Situations where lowering or increasing this resolution helps

    1. You already know the kind of depth-of-field or focal feel you want
      Adjust this value to match the desired camera look.

    2. Your subject differs heavily from the reference image
      For example:

      • different body type

      • different pose

      • weak prompt responsiveness

      Lowering the first-pass resolution enlarges the subject and reduces the influence of the original image, making the output follow your prompt more strongly.

    3. The reference image contains distracting or unwanted elements
      Lowering the resolution can help suppress them.
      Though in many cases, adjusting ControlNet strength is even more effective.

    4. Extremely low resolutions (for example 128) are usually too destructive
      The initial latent becomes too small, causing heavy detail loss and significantly reducing adherence to the reference image.


    2. ControlNet Strength

    This controls how strongly ControlNet influences the generation.

    I usually use:

    • 1.0

    Without the second sampling pass, 1.0 often produces awful-looking results.
    But in the dual-sampling workflow, 1.0 works surprisingly well:

    • strong structural adherence

    • while still allowing the second pass to restore image quality and details


    3. Final Resolution (Second-Pass Resolution)

    This is your final upscale sampling resolution.

    I usually use:

    • long edge = 1536

    This tends to produce clean and detailed images while keeping rendering mistakes relatively manageable.

    Since the base latent structure has already been established during the first pass, the second-pass resolution has much less influence on focal length and depth-of-field behavior.
    This gives you much more freedom to scale image quality independently.

    Higher resolutions produce:

    • more sharpness

    • more texture detail

    • richer micro-details

    But in very complex scenes, excessively large resolutions can also introduce:

    • incorrect clothing details

    • broken background objects

    • random hallucinated elements

    In most cases, I avoid going beyond:

    • 1920

    The second-pass resolution generally has relatively little impact on prompt adherence.


    Personal Experience & Tuning Tips

    My default starting setup is usually:

    • first-pass resolution: 768

    • ControlNet strength: 1.0

    • second-pass resolution: 1536

    Then I adjust from there based on the results.

    If the generated subject differs too much from what I want

    I primarily reduce the first-pass resolution to weaken the influence of the original image.

    If that still isn’t enough — or if the resolution becomes so low that important details disappear — I also reduce ControlNet strength.

    Typical lower limits for me are roughly:

    • first-pass resolution ≥ 384

    • strength ≥ 0.8

    Though in special cases, I’ve gone as low as:

    • resolution = 256

    • strength = 0.5


    If the reference image contains many characters or very small subjects

    1536 may not provide enough detail density.

    In those cases, I increase the second-pass resolution moderately to improve detail rendering.

    Usually I stay below:

    • 1920


    Sampling Step Distribution

    I usually use:

    • first pass = 2 steps

    You can adjust this depending on your needs.

    For example:

    • if adherence to the reference image is insufficient,
      you can slightly increase first-pass steps

    Personally, I generally keep:

    • first pass ≤ 3 steps

    • second pass ≥ 6 steps


    When nothing seems to work

    If repeated parameter tuning still fails to produce the result I want, I often take the best partially successful output and use it as the new reference image.

    Then I repeat the process iteratively.

    Surprisingly often, this works much better than endlessly fighting the original reference image.


    End

    Hopefully this workflow can help people struggling with ZIB ControlNet setups.

    And finally, good luck to everyone — hope you all generate the images you actually want.

    Description

    Workflows
    ZImageBase

    Details

    Downloads
    32
    Platform
    CivitAI
    Platform Status
    Available
    Created
    5/12/2026
    Updated
    5/14/2026
    Deleted
    -

    Files

    zImageBaseControlnet_v10.json

    Mirrors