image_z_image_SDXL_Refiner
This ComfyUI workflow generates or processes images using ZImage Base model (Phase 1), refines them with SDXL-based models (Phase 2), applies automatic face and skin enhancement, followed by dual-stage upscaling.
Installation
Required Custom Nodes
Install via ComfyUI Manager:
ComfyUI-Easy-Use - https://github.com/yolain/ComfyUI-Easy-Use
ComfyUI-Levelpixel - https://github.com/levelpixel/ComfyUI-Levelpixel
ComfyUI-Impact-Pack - https://github.com/ltdrdata/ComfyUI-Impact-Pack
ComfyUI-Impact-Subpack - https://github.com/ltdrdata/ComfyUI-Impact-Subpack
ComfyUI KJ Nodes - https://github.com/kijai/ComfyUI-KJNodes
ComfyUI LoRA Manager - https://github.com/cubiq/ComfyUI_Lora_Manager
ComfyUI-pysssss - https://github.com/pythongosssss/ComfyUI-Custom-Scripts
ComfyUI Comfyroll Custom Nodes - https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes
Required Model Files
Detection Models (ComfyUI/models/ultralytics/):
bbox/face_yolov8n_v2.pt- Face detectionsegm/skin_yolov8n-seg_800.pt- Skin segmentation
SAM Model (ComfyUI/models/sams/):
sam_vit_b_01ec64.pth- Segment Anything Model
Checkpoints (ComfyUI/models/checkpoints/ or ComfyUI/models/unet/):
ZImage Base checkpoint (for Phase 1)
SDXL Refiner or any SDXL-based checkpoint (for Phase 2)
Upscale Models (ComfyUI/models/upscale_models/):
RealESRGAN_x4plus.pth or similar
Detection models and SAM download automatically on first use or from: https://github.com/ltdrdata/ComfyUI-Impact-Pack
Workflow Structure
Input Selection
Uses CR Latent Input Switch to choose between:
Input 1: Uploaded image (LoadImage → VAEEncode)
Input 2: Empty latent for generation from scratch (default: 896x1152)
Phase 1: ZImage Base Generation
Processes selected input using ZImage Base checkpoint
Default settings: 50 steps, CFG 5, uni_pc_bh2 sampler, ddim_uniform scheduler
Supports LoRAs via first Lora Loader
Output goes to Phase 2
Phase 2: SDXL Refiner
Refines Phase 1 output using SDXL-based checkpoint
KSamplerAdvanced settings: 50 steps, CFG 1.9, start step 40
Sampler: dpmpp_3m_sde_gpu, Scheduler: beta57
Supports multiple LoRAs via "Phase 2 Lora Loader"
Trigger words managed by TriggerWord Toggle node
Detailing System
Automatic face detection: YOLOv8n v2 (bbox/face_yolov8n_v2.pt)
Skin segmentation: YOLOv8n-seg (segm/skin_yolov8n-seg_800.pt)
SAM model for precise mask generation
FaceDetailer settings: 25 steps, CFG 6, denoise 0.25, bbox_threshold 0.3
Upscaling
Two-stage progressive upscaling
Uses RealESRGAN or similar models
Each stage independently controlled via Fast Groups Bypasser
Model Compatibility
Phase 1: Requires ZImage Base checkpoint
Phase 2: Accepts any SDXL-architecture checkpoint:
Official SDXL Refiner
SDXL base checkpoints
Pony-based models
Illustrious-based models
Other SDXL derivatives
Important: Different SDXL variants may require different sampler/scheduler settings. The workflow uses dpmpp_3m_sde_gpu with beta57 scheduler for Phase 2, and uni_pc_bh2 with ddim_uniform for Phase 1. For Pony or Illustrious models, you may need to adjust:
Scheduler (try karras, normal, simple)
Sampler (try euler_a, dpmpp_2m)
CFG scale and step counts
Usage
Basic Setup
Set output folder: Enter name in "Save Subdirectory Name"
Choose input: CR Latent Input Switch - 1 for uploaded image, 2 for generation
Load models: ZImage Base for Phase 1, SDXL model for Phase 2
Set prompts: Phase 1 prompts for generation, Phase 2 prompts for refinement
Configure LoRAs: Load in respective Lora Loader nodes, toggle trigger words
Fast Groups Bypasser
Control workflow sections:
Phase 1 - ZImage Base Generation: Main generation (keep enabled)
Phase 2 - SDXL Refiner: Refinement pass (keep enabled)
Model Unload, Clear Cache and VRAM: Enable if low VRAM (default: disabled)
Detailer Bridge: Prepares for face/skin enhancement (keep enabled)
Upscale 1: First upscale pass (disable to skip)
Upscale 2: Second upscale pass (disable to skip)
Output Files
All images save to: ComfyUI/output/[subdirectory]/
Includes complete metadata: prompts, seeds, steps, CFG, models, LoRAs, all workflow settings.
Default Settings
Phase 1 (ZImage Base):
Steps: 50
CFG: 5
Sampler: uni_pc_bh2
Scheduler: ddim_uniform
Phase 2 (SDXL Refiner):
Steps: 50
CFG: 1.9
Start step: 40
Sampler: dpmpp_3m_sde_gpu
Scheduler: beta57
FaceDetailer:
Steps: 25
CFG: 6
Denoise: 0.25
bbox_threshold: 0.3
Empty Latent: 896 x 1152
Troubleshooting
Out of VRAM errors:
Enable Model Unload/Clear Cache group via Fast Groups Bypasser
Disable one or both upscale stages
Lower step counts
Face detailer not activating:
Lower bbox_threshold (default is 0.3, try 0.2)
Ensure faces are clearly visible and adequately sized
Verify detector model files downloaded correctly
Using non-standard SDXL models (Pony, Illustrious):
Adjust Phase 2 sampler/scheduler settings
Common alternatives: euler_a sampler with karras scheduler
Test different CFG values
Check model card for recommended settings
Technical Details
Optimized for: RTX 4090 with 24GB VRAM
Processing flow:
Select input (uploaded image or empty latent)
Phase 1: ZImage Base generation/processing
Phase 2: SDXL refinement
Face and skin region detection
Targeted detail enhancement
Progressive dual upscaling
Save with complete metadata
Execution time: 40 seconds to 3 minutes depending on hardware, settings, and enabled stages. 40 seconds with current settings on a RTX 4090 GPU.