CivArchive
    z-image-turbo-flow-dpo - v1.0
    NSFW
    Preview 122319368
    Preview 122319372
    Preview 122319363
    Preview 122319375
    Preview 122319364
    Preview 122319369
    Preview 122319366
    Preview 122319367
    Preview 122319371
    Preview 122336830
    Preview 122336827
    Preview 122336838
    Preview 122336829
    Preview 122336832
    Preview 122336835
    Preview 122336833

    Z-Image-Turbo Photorealistic Lighting LoRA (Flow-DPO)

    This is a specialized LoRA adapter for Alibaba-Tongyi/Z-Image-Turbo, finetuned using Flow-DPO (Direct Preference Optimization for Flow Matching) to significantly enhance photorealistic lighting, cinematic shadows, and overall image quality.

    By utilizing Flow-DPO on perfectly spatially-aligned image pairs, this LoRA fixes the common "flat," "washed-out," or "plastic" artifacts often found in ultra-fast distilled models, delivering stunning, physically accurate lighting in just 8 inference steps.

    🧠 Training Details & Methodology

    This model was trained using a custom implementation of Flow-DPO (Improving Video Generation with Human Feedback, arXiv:2501.13918).

    1. The Dataset (Strict Spatial Alignment)

    To prevent the model from hallucinating or altering image structures (Catastrophic Forgetting), the preference dataset was constructed using strict spatial alignment:

    • Win (Chosen): High-quality, professional photographs with perfect lighting and textures.

    • Lose (Rejected): The exact same images degraded programmatically (Gaussian blur, lowered contrast, extreme exposure shifts, gaussian noise, and heavy JPEG compression artifacts).

    • Alignment: No cropping or warping was applied, ensuring the Flow Matching trajectory learned to solely correct lighting and texture.

    2. Discrete Timestep Distillation Preservation

    Unlike standard diffusion models where $t$ is sampled continuously $t \in [0, 1]$, Z-Image-Turbo is a distilled model specifically optimized for 8 fixed timesteps. During the Flow-DPO training, we dynamically extracted the exact discrete $t$-distribution from the FlowMatchEulerDiscreteScheduler and restricted the random sampling to these exact 8 nodes. This ensures the LoRA retains the turbo model's extreme speed without causing output blurriness.

    3. Hyperparameters

    • Base Model: Alibaba-Tongyi/Z-Image-Turbo (6B Single-Stream DiT)

    • Learning Rate: 1e-4

    • KL Penalty ($\beta$): 1.0

    • Effective Batch Size: 1

    • Mixed Precision: bfloat16

    ⚠️ Limitations

    • Not an Image-to-Image Restorer: This LoRA changes the prior distribution of the Text-to-Image generation. It is designed to generate better original images from text prompts, not to be used as an img2img filter to fix user-uploaded bad photos (unless combined with RF-Inversion techniques, which are highly unstable for 8-step models).

    • Color Saturation: Pushing the LoRA scale too high (e.g., > 1.5) might result in over-sharpened or overly saturated images due to the nature of DPO margin maximization. Keep the scale around 0.6 - 1.0 for the most photorealistic results.

    Description

    FAQ

    LoCon
    ZImageTurbo

    Details

    Downloads
    876
    Platform
    CivitAI
    Platform Status
    Available
    Created
    2/25/2026
    Updated
    4/28/2026
    Deleted
    -

    Files

    zit_fdpo_v1.safetensors

    Mirrors

    CivitAI (1 mirrors)
    Other Platforms (TensorArt, SeaArt, etc.) (2 mirrors)