Update 2/11/2026
V1 is currently in training with a significantly improved dataset, with hundreds of thousands of realistic, anime and illustrations images with NSFW + SFW samples. This will supersede the quality of this experimental checkpoint substantially, with early results showing success. Please do not do any development or training with this model as it is becoming quickly obsolete with the new release hopefully coming within the next two weeks. Yes - rather than a vague "at some point in future" I am putting myself on the line to give you a commitment 😹
The purpose of this finetune to provide realistic NSFW results that are completely uncensored, which is difficult to achieve in Z-Image-Base. While standing alone as a finetune, it is not 100% there yet on genital anatomy due to the tight dataset and training time, but when combining it with existing community LoRAs for Z-Image-Base, they are producing excellent results better than one or another alone in my experience.
This is very much a proof-of-concept finetune and requires significant enhancements and cleanup of the dataset. A future goal would be to make a NSFW model stand on it's own feet, but I am expecting it would need at least need a 50,000 image dataset so you can avoid over-fitting whilst training significantly on these garbled concepts such as genitalia for deeper, longer training.
For those who are interested, below are all the relevant statistics and configuration for this training:
Number of images: 7458
Number of steps: 46000
GPU: 2x B200
Max VRAM Usage during Training: 85.2GB
Iteration speed: between 1.00s/it - 1.10s/it
Training Suite: DiffSynth-Studio
Total Training Time: 13 hours
My DiffSynth-Studio script for running training (disregard num epochs, I was just going to shut down after a certain number of hours due to budgeting).
accelerate launch --config_file examples/z_image/model_training/full/accelerate_config.yaml examples/z_image/model_training/train.py \
--dataset_base_path /workspace/data/flatpack \
--dataset_metadata_path /workspace/data/flatpack.csv \
--max_pixels 1638400 \
--dataset_repeat 50 \
--save_steps 2000 \
--model_id_with_origin_paths "Tongyi-MAI/Z-Image:transformer/*.safetensors,Tongyi-MAI/Z-Image-Turbo:text_encoder/*.safetensors,Tongyi-MAI/Z-Image-Turbo:vae/diffusion_pytorch_model.safetensors" \
--learning_rate 1e-5 \
--num_epochs 32 \
--remove_prefix_in_ckpt "pipe.dit." \
--output_path "./models/train/Z-Image_full" \
--trainable_models "dit" \
--use_gradient_checkpointing \
--weight_decay 0.01 \
--dataset_num_workers 8I have also bundled various helper scripts I made to help with preparing the dataset and importantly, for fixing the model post-training to work within ComfyUI and other inference tools. You can check them out here: