Follow me on Patreon!
SoReal! - POV
Overview
Reach your hands out for the stars! This model is the first of a series in Z-Image LORAs - aimed to bring diversity in both concepts and humanity itself to Z-Image.
Compatibility & Usage
Due to it's small size and rank, the model should have a minimal influence on the base model, further improving compatibility with other LORAs across Base/Turbo and indeed other checkpoints.
'Trigger words' aren't real - don't ask for one, just prompt normally. If you want a hand (literally), use 'a man's hand' or 'a woman's hand', which should normally get you what you want.
I'll upload a full concept list soon to show the range of concepts the model has been trained on - not confirming that the model is able to reproduce them, though.
When using Z-Image Turbo, strengths between 0.95 and 1.5 work best in my experience for V1, and 0.9 - 1.2 for V2.
Limitations
Anatomy is still rough - planning one further additional training run to try and address this for NSFW concepts but may mean a split between generalisation model (v2) and a NSFW-special model (V2-NSFW).
Future
Future iterations of this model will see stronger prompt adherence, anatomy adherence and general composition and quality through +/- reinforcement learning.
I am planning on finetuning Z-Image considerably with a model called 'SoReal!' (Or, alternatively, ZoReal!). However, I want it to be the best possible amateur finetune possible, to achieve this, I have:
1. Trained a custom quality model.
2. Trained a custom one-shot demographic model (height, weight, skin tone, ethnicity, age in years, body shape) with an average accuracy of 89% for top-confidence prediction using ConvNext-XL.
3. Finetuned wd-tagger-large-v3 on a large sample dataset of 50k hand-tagged images with human-assisted active learning.
4. Fed those tagged images (with quality, demographics and general labels) with the image metadata (incl. EXIF & Camera Metadata) to Gemini 3 Flash for generating captions.
No over-trained LORAs baked in, no dramatic loss of generalisation, just a good, all-round, NSFW-ready, finetuned model.
I am now severely limited, however, by my compute and financial situation, so if you'd like to help make SoReal!, well, so real, then you can follow me on Patreon!
Dataset & Training
Dataset of 2500 sourced from a variety of sources. Deduplication and Quality Scoring (through MANIQA) lowered the dataset to around 1400. This model was trained on a dataset of 1500 images at a batch size minimum of 10. This means
This model was trained on a dataset of 1500 images at a batch size minimum of 10. Masked loss was implemented after roughly 40,000 samples (not steps) to improve anatomy & concept adherence.
Validation loss was used with 10% of the dataset size to prevent overfitting while still maintaining strong concept adherence and generalisation.
Model was trained with AdamW through the Python adv-optm package.
Licensing
If you'd like to release a merge of this model, please contact me.
Made with <3 By BitcrushedHeart
Description
FAQ
Comments (14)
Man the full finetunes are going to be so crazy
Working on it! ;)
@BitcrushedHeart Any ETA by chance? Or general ballpark of number of training images and other training parameters? 👀
Realbooru might be a decent place to get data btw
@BitcrushedHeart giving you a heads up so you don't waste your time, there is a issue with FP16 training zimage base apparently so I'd wait. However, a FP32 model leaked, you can probably find it I'm not sure if I can share the link, however that would let you just train in FP32 if you have the hardware and quantize it to FP16 presumably
@iamjustheretodow6140 I usually train at bf16 - I'll take a look and see if this is affected
Also, dataset for full finetune is currently 120k images :)
@BitcrushedHeart
https://x.com/bdsqlsz/status/2017966918158995689?s=20
https://x.com/bdsqlsz/status/2017964791059644659?s=20
Very good
Take a look at this post. Maybe using this model as the base model for training can yield better results, because I have tried it and its limb accuracy is far higher than that of the bf16 version:https://www.reddit.com/r/comfyui/comments/1qt88kg/z_image_base_teacher_model_fp32_leaked/
I honestly can't see anything special on your examples and lora.. All of your examples you can achieve it with a good prompt. on ZImage Turbo or Zimage Base.
It's reinforcing a concept, not making a new one. I'm not going to willingly burn through the base model with a rank 16 lora!
Yes, using AI to repeatedly try different prompts can achieve most of the effects, but it's extremely time-consuming. According to your logic, all AI art style selectors are meaningless because they can all be written manually.
I honestly can't see anything special on your comment.. All of your words could've been written by <input generic ai>.
+1 on v2, great as a general realism lora but also for NSFW concepts.



















