Z-Image-Turbo version of Shiho a realistic Japanese woman trained with AI-Toolkit.
Description
FAQ
Comments (5)
Great Lora! Really captures the character, and very good with her anatomy.
Lora looks very good and without that generic AI model sameface problem. Can you tell more about the training process? I can see that you trained for 185000 steps. That seems like a lot! Do you gain more precision and quality with more steps? What are the other specs like learning rate, resolutions, number of images in the dataset and caption style?
It would be really great help to learn from your experience on training this model!
Also the LoRa's file size is huge! Does the rank affect the quality a lot?
@blackestcurse93 Thank you, blackestcurse93, for your excellent feedback and insightful questions! I'm delighted you noticed that the LoRA avoids the "sameface problem," as my primary goal was to achieve true photorealism that is indistinguishable from a photograph, prioritizing unique details over generic ideals.
Here are the details about the complex training process:
Training Methodology: Incremental and Adaptive
Instead of a fixed schedule, I adopted an iterative approach, treating every 5,000 steps as one training turn. The learning rate was started at $1.0 \times 10^{-4}$ and then gradually adjusted downwards based on the results of each turn.
Dataset and Iteration:
My total dataset involved 30K images. However, to ensure maximum feature diversity and prevent mode collapse, the images were not used all at once. I employed an incremental system where 4,000 to 5,000 images were input at a time and were rotated out (replaced) every 5,000 steps. This frequent rotation, combined with the adaptive learning rate, was crucial for capturing the subtle nuances that define realistic details.
Dataset Diversity and Captioning:
The images included a wide range of poses, clothing, expressions, locations, and hairstyles, which prevents the model from generalizing into a single "AI look."
The captions for the entire dataset were generated using Qwen2.5VL, ensuring deep and precise tagging of every element.
On 185,000 Steps:
Yes, 185,000 steps is a large number, but it was necessary for my goal. This extensive training time was required to deeply embed the subtle, non-ideal features (like minor skin imperfections, natural expressions, etc.) that distinguish photorealism from typical AI generation. This precision is what allows the model to overcome the sameface issue and achieve higher overall quality.
I hope this sheds light on the process! I'm happy to share my experience if it helps others push the boundaries of realism.
@blackestcurse93 I forgot to mention the LoRA Rank (Dimension)!
The LoRA Rank was set to 128 (DIM=128).
This higher rank was absolutely critical. It allowed the model to effectively store the vast amount of micro-details and subtle variations captured through the high step count (185,000) and the iterative dataset process. DIM=128 helps prevent the loss of those realistic, non-ideal features that distinguish photorealism from generic AI faces.



