CivArchive
    DPO (Direct Preference Optimization) LoRA for XL and 1.5 - OpenRail++ - SDXL - V1.0
    Preview 4862211Preview 4862214Preview 4903274

    What is DPO?

    DPO is Direct Preference Optimization, the name given to the process whereby a diffusion model is finetuned based on human-chosen images. Meihua Dang et. al. have trained Stable Diffusion 1.5 and Stable Diffusion XL using this method and the Pick-a-Pic v2 Dataset, which can be found at https://huggingface.co/datasets/yuvalkirstain/pickapic_v2, and wrote a paper about it at https://huggingface.co/papers/2311.12908.

    What does it Do?

    The trained DPO models have been observed to produce higher quality images than their untuned counterparts, with a significant emphasis on the adherence of the model to your prompt. These LoRA can bring that prompt adherence to other fine-tuned Stable Diffusion models.

    Who Trained This?

    These LoRA are based on the works of Meihua Dang (https://huggingface.co/mhdang) at

    https://huggingface.co/mhdang/dpo-sdxl-text2image-v1 and https://huggingface.co/mhdang/dpo-sd1.5-text2image-v1, licensed under OpenRail++.

    How were these LoRA Made?

    They were created using Kohya SS by extracting them from other OpenRail++ licensed checkpoints on CivitAI and HuggingFace.

    1.5: https://civarchive.com/models/240850/sd15-direct-preference-optimization-dpo extracted from https://huggingface.co/fp16-guy/Stable-Diffusion-v1-5_fp16_cleaned/blob/main/sd_1.5.safetensors.

    XL: https://civarchive.com/models/238319/sd-xl-dpo-finetune-direct-preference-optimization extracted from https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors

    These are also hosted on HuggingFace at https://huggingface.co/benjamin-paine/sd-dpo-offsets/

    Description