Z-Image Turbo v2
It has been 6 months so thought I'd revisit this lora since it is one of most popular ones, despite all the noise in there. Looking back, I didn't crop nor caption about any border on some of the 650 images which gave you what looked like social media images with the UI and everything. That has been resolved with a complete re-do for this version 2 trained for half the number of steps on 200 images and quality and consistency has improved. Please share your results via Add Post below.
I have had good results at 12 steps, cfg 1, seeds_v3, beta, or 25 steps with cfg 2 on the ZIT model.
Reminder if using with other loras, decrease its strength. I use it with my speedo tan line ZIT lora at 0.5 and this at 0.8 and looks great. Using it at 1 by itself is fine.
v1
Trained with over 650 images and 30,000 steps, this can produce decent results, but also monsters. Workflows are embedded for what I used for the samples. I used it with my Speedo Tan Line LoRA and set them to about 0.6 each otherwise I got trash. I used sa_solver/beta, 12 steps
seed_v3/beta is my current choice but that changes like every week
Trigger word: pen15
TLDR:
These ZIT LoRA were my first attempt at anything other than character LoRAs, so I don't have detailed tests as to exactly what worked, but here's the rundown:
I have started to do captioning through the QwenVL node in ComfyUI using the NSFW model, Qwen3-VL-4B-Thinking-abliterated (NSFW), using a workflow that batch processes a folder of images to resize and caption them
Many of these images for this one were captioned with Python code I modified to batch process image files using joy-caption-alpha-two. It did not do a good job at differentiating between flaccid and erect. I have another more focused dataset that I resized and cropped all the images to 512 squares and has all erect penises (coming soon), but this one surprisingly gave equal or better results for ZIT at least.
For my latest character lora, I followed the advice from this article with excellent results, and gave QwenVL these instructions:
Caption EVERYTHING you see except for the man and his hair or body type. That means outfit, backgrounds, lighting, camera angles, skin details, water droplets, etc. must all go into the caption. Refer to him as Brock
I used AI Toolkit and its adapter to train this on my 5090:
ostris/zimage_turbo_training_adapter/zimage_turbo_training_adapter_v2.safetensors
Description
Trained with new dataset with similar results as Z Image Base model















