Z-Image-Turbo Photorealistic Lighting LoRA (Flow-DPO)

This is a specialized LoRA adapter for Alibaba-Tongyi/Z-Image-Turbo, finetuned using Flow-DPO (Direct Preference Optimization for Flow Matching) to significantly enhance photorealistic lighting, cinematic shadows, and overall image quality.

By utilizing Flow-DPO on perfectly spatially-aligned image pairs, this LoRA fixes the common "flat," "washed-out," or "plastic" artifacts often found in ultra-fast distilled models, delivering stunning, physically accurate lighting in just 8 inference steps.

🧠 Training Details & Methodology

This model was trained using a custom implementation of Flow-DPO (Improving Video Generation with Human Feedback, arXiv:2501.13918).

1. The Dataset (Strict Spatial Alignment)

To prevent the model from hallucinating or altering image structures (Catastrophic Forgetting), the preference dataset was constructed using strict spatial alignment:

Win (Chosen): High-quality, professional photographs with perfect lighting and textures.
Lose (Rejected): The exact same images degraded programmatically (Gaussian blur, lowered contrast, extreme exposure shifts, gaussian noise, and heavy JPEG compression artifacts).
Alignment: No cropping or warping was applied, ensuring the Flow Matching trajectory learned to solely correct lighting and texture.

2. Discrete Timestep Distillation Preservation

Unlike standard diffusion models where $t$ is sampled continuously $t \in [0, 1]$, Z-Image-Turbo is a distilled model specifically optimized for 8 fixed timesteps. During the Flow-DPO training, we dynamically extracted the exact discrete $t$-distribution from the FlowMatchEulerDiscreteScheduler and restricted the random sampling to these exact 8 nodes. This ensures the LoRA retains the turbo model's extreme speed without causing output blurriness.

3. Hyperparameters

Base Model: Alibaba-Tongyi/Z-Image-Turbo (6B Single-Stream DiT)
Learning Rate: 1e-4
KL Penalty ($\beta$): 1.0
Effective Batch Size: 1
Mixed Precision: bfloat16

⚠️ Limitations

Not an Image-to-Image Restorer: This LoRA changes the prior distribution of the Text-to-Image generation. It is designed to generate better original images from text prompts, not to be used as an img2img filter to fix user-uploaded bad photos (unless combined with RF-Inversion techniques, which are highly unstable for 8-step models).
Color Saturation: Pushing the LoRA scale too high (e.g., > 1.5) might result in over-sharpened or overly saturated images due to the nature of DPO margin maximization. Keep the scale around 0.6 - 1.0 for the most photorealistic results.

Z-Image-Turbo Photorealistic Lighting LoRA (Flow-DPO)

🧠 Training Details & Methodology

1. The Dataset (Strict Spatial Alignment)

2. Discrete Timestep Distillation Preservation

3. Hyperparameters

⚠️ Limitations

Description

Details

Files

Available On (1 platform)