CivArchive
    z-image-turbo-flow-dpo - v1.0
    NSFW
    Preview 1

    Z-Image-Turbo Photorealistic Lighting LoRA (Flow-DPO)

    This is a specialized LoRA adapter for Alibaba-Tongyi/Z-Image-Turbo, finetuned using Flow-DPO (Direct Preference Optimization for Flow Matching) to significantly enhance photorealistic lighting, cinematic shadows, and overall image quality.

    By utilizing Flow-DPO on perfectly spatially-aligned image pairs, this LoRA fixes the common "flat," "washed-out," or "plastic" artifacts often found in ultra-fast distilled models, delivering stunning, physically accurate lighting in just 8 inference steps.

    🧠 Training Details & Methodology

    This model was trained using a custom implementation of Flow-DPO (Improving Video Generation with Human Feedback, arXiv:2501.13918).

    1. The Dataset (Strict Spatial Alignment)

    To prevent the model from hallucinating or altering image structures (Catastrophic Forgetting), the preference dataset was constructed using strict spatial alignment:

    • Win (Chosen): High-quality, professional photographs with perfect lighting and textures.

    • Lose (Rejected): The exact same images degraded programmatically (Gaussian blur, lowered contrast, extreme exposure shifts, gaussian noise, and heavy JPEG compression artifacts).

    • Alignment: No cropping or warping was applied, ensuring the Flow Matching trajectory learned to solely correct lighting and texture.

    2. Discrete Timestep Distillation Preservation

    Unlike standard diffusion models where $t$ is sampled continuously $t \in [0, 1]$, Z-Image-Turbo is a distilled model specifically optimized for 8 fixed timesteps. During the Flow-DPO training, we dynamically extracted the exact discrete $t$-distribution from the FlowMatchEulerDiscreteScheduler and restricted the random sampling to these exact 8 nodes. This ensures the LoRA retains the turbo model's extreme speed without causing output blurriness.

    3. Hyperparameters

    • Base Model: Alibaba-Tongyi/Z-Image-Turbo (6B Single-Stream DiT)

    • Learning Rate: 1e-4

    • KL Penalty ($\beta$): 1.0

    • Effective Batch Size: 1

    • Mixed Precision: bfloat16

    ⚠️ Limitations

    • Not an Image-to-Image Restorer: This LoRA changes the prior distribution of the Text-to-Image generation. It is designed to generate better original images from text prompts, not to be used as an img2img filter to fix user-uploaded bad photos (unless combined with RF-Inversion techniques, which are highly unstable for 8-step models).

    • Color Saturation: Pushing the LoRA scale too high (e.g., > 1.5) might result in over-sharpened or overly saturated images due to the nature of DPO margin maximization. Keep the scale around 0.6 - 1.0 for the most photorealistic results.

    Description

    LoCon
    Z Image Turbo

    Details

    Downloads
    13
    Platform
    SeaArt
    Platform Status
    Available
    Created
    2/25/2026
    Updated
    2/25/2026
    Deleted
    -

    Files

    Available On (1 platform)

    Same model published on other platforms. May have additional downloads or version variants.