Flat Color - Style - v2.1 [z-image-base]

NSFW

Flat Color - Style

Trained on images without visible lineart, flat colors, and little to no indication of depth.

ℹ️ LoRA work best when applied to the base models on which they are trained. Please read the About This Version on the appropriate base models and workflow/training information.

This is a small style LoRA I thought would be interesting to try with a v-pred model (noobai v-pred), for the reduced color bleeding and strong blacks in particular.

The effect is quite nice and easy to evaluate in training, so I've extended the dataset with videos in following versions for text-to-video models like Wan and Hunyuan, and it is what I am generally using to test LoRA training on new models now.

Recommended prompt structure:

Positive prompt:

flat color, no lineart, blending, negative space,
{{tags}}
masterpiece, best quality, very aesthetic, newest

Description

Trained on Z Image Base with diffusion-pipe

Same dataset/training settings as used for version 2.1 [z-image-turbo]

Training Config:

# dataset-zimage.toml

# Resolution settings.
resolutions = [1024]

# Aspect ratio bucketing settings
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 7

[[directory]] # IMAGES
path = '/training_data/images'
num_repeats = 5
resolutions = [1024]

# config-zimage-base.toml

output_dir = '/mnt/d/zimage/training_output'
dataset = 'dataset-zimage.toml'

# training settings
epochs = 50
micro_batch_size_per_gpu = 1
pipeline_stages = 1
gradient_accumulation_steps = 1
gradient_clipping = 1

# eval settings
eval_every_n_epochs = 1
#eval_every_n_steps = 100
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1

# misc settings
save_every_n_epochs = 5
checkpoint_every_n_minutes = 120
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 8
steps_per_print = 1

[model]
type = 'z_image'
diffusion_model = '/diffusion_models/z_image_bf16.safetensors'
vae = '/models/vae/ae.safetensors'
text_encoders = [
    {path = '/models/text_encoders/qwen_3_4b.safetensors', type = 'lumina2'}
]
dtype = 'bfloat16'
#diffusion_model_dtype = 'float8'

[adapter]
type = 'lora'
rank = 32
dtype = 'bfloat16'

[optimizer]
type = 'AdamW8bitKahan'
lr = 2e-5
betas = [0.9, 0.99]
weight_decay = 0.01

FAQ

Comments (17)

Unhing3dJan 28, 2026

CivitAI

What is your experience with Z-Image Base so far? Do you think we finally have a good successor to SDXL?

motimalu

Author

Jan 29, 2026· 1 reaction

It's a very pretty model! Seems Tongyi put effort into improving the base aesthetic before releasing it, maybe at the expense of diverging from the derived Turbo model.
It takes a long time to converge when training though, so I guess it will take a bit of time to tell whether it would succeed SDXL as a community model.

KitagawaYoshinoJan 28, 2026

CivitAI

you are the fastest lora maker in the world

wizaibiJan 28, 2026· 1 reaction

CivitAI

Anyone able to do outlines?

LatterdayJan 29, 2026· 1 reaction

CivitAI

I'm curious what training parameters you used for the z-image base model training.

motimalu

Author

Jan 29, 2026· 2 reactions

Hello, I've added the training parameters to the model card "About this version" section

ErebussyJan 29, 2026· 1 reaction

CivitAI

Curious why you dropped the learning rate to .00002, instead of the default ~.0001.

I've noticed personally the z-image base model converges really slowly and have been testing with bumping the learning rate up rather than down. Curious if you know something I dont, trying to figure out what works and what doesnt.

Maybe its because this style in particular is really simple, and so doesnt need to learn a lot?

motimalu

Author

Jan 29, 2026

Hello, I dropped from .00005 to .00002 when attempting to train z-image-turbo, then used the same z-image-turbo config to train z-image-base as a first blind attempt.

Convergence does seem slow and I've had difficulty training other datasets with this config, as you say for this simple style not a lot needs to be learned.

There's a lot of variance on each seed, and base generations seem to be very close to the dataset on one seed and then not at all on the next which makes it a little hard to measure too.

ErebussyJan 30, 2026

@motimalu Thanks for the reply!

I've been bumping up the learning rate slowly, going to try .0003 next, I read someone say somewhere that it started to break down around .0004 though, so we will see.

Even at these rates though, I cant help but notice it always feels like the model is 85%~ there, but never seems to converge completely. I'm going to try much more elaborate tagging, and much higher steps (Maybe 5k next, assuming it'll be good around 4k?).

My hypothesis for now is that my natural aversion to overcooking the model is holding me back and I just need to increase the step count dramatically. The problem with z-image-base is its really hard to tell when you start losing flexibility since it hides it so well.

Anyways, just trying to figure this stuff out, thanks for your time!

ACatStickJan 29, 2026· 1 reaction

CivitAI

Could you please explain how the tagging was done? I used natural language processing in Chinese to tag all visual elements in the image, but the sample showed real people. Do I need to add a trigger word?

motimalu

Author

Jan 30, 2026

Hello, yes a trigger word could help.
Tags that the model understands to reliably generate something close to the dataset should be included at the start of the dataset tags I guess.
Negative prompts are also supported in the z-image-base model, so you could try that too.

profitaiaiJan 30, 2026· 1 reaction

CivitAI

really good

MaxAktFeb 13, 2026· 1 reaction

CivitAI

Cool one!

thegodofmankindFeb 27, 2026

CivitAI

cool

diffusional_reactorMar 6, 2026

CivitAI

Can the Flux K version edit anime images too?