CivArchive
    SPO-SDXL_4k-p_10ep_LoRA_webui - v1.0
    Preview 15471700
    Preview 15471753
    Preview 15471752
    Preview 15471751

    Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference

    Arxiv Paper

    Github Code

    Project Page

    Abstract

    Generating visually appealing images is fundamental to modern text-to-image generation models. A potential solution to better aesthetics is direct preference optimization (DPO), which has been applied to diffusion models to improve general image quality including prompt alignment and aesthetics. Popular DPO methods propagate preference labels from clean image pairs to all the intermediate steps along the two generation trajectories. However, preference labels provided in existing datasets are blended with layout and aesthetic opinions, which would disagree with aesthetic preference. Even if aesthetic labels were provided (at substantial cost), it would be hard for the two-trajectory methods to capture nuanced visual differences at different steps.

    To improve aesthetics economically, this paper uses existing generic preference data and introduces step-by-step preference optimization (SPO) that discards the propagation strategy and allows fine-grained image details to be assessed. Specifically, at each denoising step, we 1) sample a pool of candidates by denoising from a shared noise latent, 2) use a step-aware preference model to find a suitable win-lose pair to supervise the diffusion model, and 3) randomly select one from the pool to initialize the next denoising step. This strategy ensures that diffusion models focus on the subtle, fine-grained visual differences instead of layout aspect. We find that aesthetic can be significantly enhanced by accumulating these improved minor differences.

    When fine-tuning Stable Diffusion v1.5 and SDXL, SPO yields significant improvements in aesthetics compared with existing DPO methods while not sacrificing image-text alignment compared with vanilla models. Moreover, SPO converges much faster than DPO methods due to the step-by-step alignment of fine-grained visual details. Code and model: https://rockeycoss.github.io/spo.github.io/

    Model Description

    This model is fine-tuned from stable-diffusion-xl-base-1.0. It has been trained on 4,000 prompts for 10 epochs. This checkpoint is a LoRA checkpoint. For more information, please visit here

    Citation

    If you find our work useful, please consider giving us a star and citing our work.

    @article{liang2024step,
      title={Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization},
      author={Liang, Zhanhao and Yuan, Yuhui and Gu, Shuyang and Chen, Bohan and Hang, Tiankai and Cheng, Mingxi and Li, Ji and Zheng, Liang},
      journal={arXiv preprint arXiv:2406.04314},
      year={2024}
    }

    Description

    FAQ

    Comments (42)

    StablediffusionloverJun 12, 2024· 8 reactions
    CivitAI

    so its basically a fine tuned sdxl nothing special right

    rockeycoss
    Author
    Jun 20, 2024· 2 reactions

    Yes. Fine-tuned to improve its image generation performance.

    Rangiku209090Jul 9, 2024

    @rockeycoss can you do it with pony XL version?

    zGenMediaSep 25, 2024

    @Doctor2024 I assume yes

    VeerGeerSep 20, 2024
    CivitAI

    Whoah, this is an incredibly useful little tool to finnagle the compositional quality of an SDXL model.

    At extreme strengths (2.0 or above, I would wager), high sigma values become undesireably "over-perfected" - however, for smaller refinement steps, you can crank it up to surprising strengths (over 4.0 lol) without running into it breaking down.

    I am very impressed!

    yikecited809Nov 2, 2024· 6 reactions
    CivitAI

    Sorry but as a non-CS major, I still dont get it...

    The discription says it dynamically adjust performance of denoising between steps, but as a lora, i didn't see how i can set the performance. Is it adjusted automatically? or is there in it a hardwired pattern, according to which the different denoising is controlled?

    please do tell me how to use it if convienient, thanks in advance!

    VeerGeerJan 4, 2025

    Unfortunate copy-pasta, but this web interface is a little high on 'click on tiny icon to see comments':


    It's fantastic for img2img, as its effects are per-step - so lower step sizes ("a lower denoise value") make it behave itself even at ridiculous values.

    For generating a full image ("txt2img"), values from -1.0 to 2.0 (usually 1.0 to 1.6) do well, depending on how the prompt and model interact with eachother* - and how cluttered or non-cluttered you want it.
    (* some models have very noisy trees and vegetation, as an example where you might want negatives)

    In img2img and low denoises (e.g. 18 steps / 0.22 denoise), you can crank it up to 4.0, 5.0, even 8.0 in some silly models, and get some very shiny outputs.


    Importantly; it lets you clean your prompts of needless clutter in attempts to make the output shinier, prettier, sparklier, whatever.
    Want it shiny? Just img2img with this LoRA cranked high, simple.

    yikecited809Jan 21, 2025· 1 reaction

    @VeerGeer Thanks!

    voidukkhaNov 23, 2024
    CivitAI

    does this "just work" as a LoRA, or does it have to be applied in a specific way so that it can do its SPO magic?

    VeerGeerJan 4, 2025· 1 reaction

    This LoRA mostly Just Works on arbitrary SDXL models, modifies the output to be "moar aesthetically pleasing (or simpler/cleaner, at negative strengths".

    How the strengths vary the output, depends on the base model - naturally.
    Some models turn SPARKLE SPARKLE SHINY SHINY at strengths of 1.5, others don't degenerate into "AI Aesthetic Quality Preference Optimization sparks and glitter" until values of 3.0 to 5.0.


    Usually does a good job of guiding full txt2img at strengths of -1.0 to 2.0, can produce really nice results in img2img of the txt2img image at strengths of 3.0 to 8.0 at progressively lower denoise values.

    Step (iteration) count always has an impact on its output, as it's a mechanism for making each step go towards a prettier output - rather than only attempting to set a prettier start direction (the older DPO implementations).

    GitarooManDec 1, 2024· 16 reactions
    CivitAI

    So the net effect is that it produces a similar effect to a positive or detail lora, yes? Some side by side examples with weight suggestions would be useful. The amount of jargon on this page is not.

    Separate_WaysDec 2, 2024· 2 reactions
    CivitAI

    it improves a lot the quality of the images, it is really wonderful, highly recommended.

    BigboyblazikenDec 12, 2024· 9 reactions
    CivitAI

    Ive tried it in Illustrious now, and it seems to still work great. The tech-talk description almost made me too scared to actually try this, and i have no idea HOW it works, but what i can say is, that using it with default value of 1, just works. I dont have to understand what exactly it does myself, if it does what it says when using it.

    I tested this and a few other similar loras now, in my favorite IL model:

    https://civitai.com/models/827184/wai-nsfw-illustrious-sdxl

    https://civitai.com/models/99619/control-lora-collection

    https://civitai.com/models/1001945/detail-slider-lora-or-illustrious-xl

    I can not really describe what exactly it does by itself, just that it is really noticeable. It, well, adds details, but to me it seems like in a slightly different way from for instance the detail slider lora, which isnt as extreme as this one. Also this one seems to make images just very slightly more realistic, which might not always be what you want. So probably this by itself can already improve images, but i actually got pretty good results by using all these 3 loras together. But be warned, it can quickly become too much, and you might have to finetune a lot. This mainly depends on your model and what you want to do tho.

    plan_trusterDec 14, 2024· 8 reactions
    CivitAI

    I dont understand what's it trying to achieve, nor does it seem to make any significant difference that could be described as "making the output better". seems like a whole lot of snake oil to me

    EDIT: after more testing it DOES deem to impove small details. especially the eyes

    spinoloverDec 27, 2024· 10 reactions
    CivitAI

    sometime you can just stumbled upon some magical lora that you missed in civitai and this is one of it.

    ignoring all the jargon and description, i give it a test in my usual setting and it work wonders. simply added the lora into my prompt & using weight of '1'. From my testing, weight of 1-2 work well while going above might started causing some distorted image.

    encourage everyone to give it a try and test it first hand instead of giving opinion without trying~

    VeerGeerJan 3, 2025

    It's fantastic for img2img, as its effects are per-step - so lower step sizes ("a lower denoise value") make it behave itself even at ridiculous values.

    For generating a full image ("txt2img"), values from -1.0 to 2.0 (usually 1.0 to 1.6) do well, depending on how the prompt and model interact with eachother - and how cluttered or non-cluttered you want it.

    In img2img and low denoises (e.g. 18 steps / 0.22 denoise), you can crank it up to 4.0, 5.0, even 8.0 in some silly models, and get some very shiny outputs.


    Importantly; it lets you clean your prompts of needless clutter in attempts to make the output shinier, prettier, sparklier, whatever.
    Want it shiny? Just img2img with this LoRA cranked high, simple.

    WorstAirtistJan 2, 2025· 16 reactions
    CivitAI

    I don't get what this is supposed to be. is it a LORA?

    Is it an Embedding?

    Do i need a 3rd party program to use it?

    The long explanation is word salad. If it does say anything, only people who are very tech savvy understand it.

    VeerGeerJan 3, 2025· 17 reactions

    It's an SDXL add-on LoRA that mostly doesn't care about what base model you attach it to, and it attempts to amplify the "visual quality" of each step in the generation process.

    What this means is that at high strengths, it will introduce a lot of sparkles and visual clutter when doing the initial big steps ("a high denoise value") - but it will not introduce too much clutter if you only use it for small steps ("a low denoise value").


    It's a quality slider, is what it is. You can also use negative values to make an image less cluttered.


    It works very well at values of 1.0 to 1.6 for txt2img, -1.0 to -0.5 is also fine if the prompt + model has a lot of noise (some models have very noisy trees, for example).

    With img2img - the sky is the limit, really.
    I've run it at 8.0 at low denoises just to get an insanely sparkly output, then blend those sparkles in where I found it relevant.

    cheers

    WorstAirtistJan 4, 2025· 7 reactions

    @VeerGeer Thank you for the laymans explaination. I think "Quality Slider" is probably the most informative portion but the rest adds in details that might not otherwise be obvious.

    Always been a 'bullet pointer.'

    And the way the creator words it sounds good n all but it doesn't tell the 'how' or the 'what' in a clear way.

    Thats why I say 'quality slider' is basically the summary of those 3 paragraphs.

    And some of these are posted as LORA's but are actually embeddings etc.

    Thanks.

    VeerGeerJan 4, 2025· 8 reactions

    @WorstAirtist
    Yes, the wordsalads are unappetizing.
    It's a rather sad consequence of people who are capable of "Infinite Word Soup" -> "clear message" usually being very uninterested in partaking in academic settings. Myself included. :- )


    In some sense, it is an embedding - just that it is embedded into the decision-making process of the SDXL UNET (the part responsible for actually putting pixels to the words), rather than embedded into the text encoder - the part responsible for putting words to the pixels - as a "(text )embedding" would be.


    The difference between this "SPO", and "DPO" - LoRAs that in the past have also been trained for 'human operator preference'* - is that this newer "SPO" attempts to guide the generation process at each road turn (step), whereas prior DPO's only tried to set a better initial road.

    This difference thus opens up new opportunities for how to use it, such as the "high strengths at low denoise values in img2img, for shiny, but not cluttered outputs" that I preach in the comment section.



    * (Cranking the strength of a DPO lora will usually result in larger breasts, proving that someone definitely sat down and picked out which of (several pictures) was generated "better")

    sephwalker525Feb 15, 2025· 16 reactions

    @WorstAirtist 

    First things first: you could actually have answered your first question yourself, and I'll tell you exactly how. In the upper right of any CivitAI model, there is a table below the create/download buttons (or in other words to the right of the model's preview images) which has important details about that model such as:

    1. What kind of model is it? (e.g. Checkpoint, LoRA, Embedding, Lycoris, and so on)

    2. What format is the file in?

    3. For LoRAs, Embeddings, and so on: What are the trigger words, if any? What base model(s) does it work with? (Technically, you can use an SD LoRA on any SD base model, but the results may not be very great if you use one trained on, say, Pony, with, say, SD1.5)

    4. Plus some other information that isn't quite as useful or relevant to this conversation.

    5. Below this table is, sometimes, an additional info section which is, by default, collapsed. This isn't used anywhere near as often as it should be, but, when it is used, it will typically contain "quick use" instructions. (e.g. it may say, "Use [this sampler] with [these settings]," or "Use [these tags] as a prefix for your prompt and [these tags] as a suffix, and use [these tags] as a default or prefix for your negative prompt," and other stuff like this, basically the same info you'd likely get from the big description box, but, boiled down to only the info you actually need to be able to immediately make use of the model)

    So, in short, the question "is model X a Y model or Z model?" can go unasked, because it is always labeled very clearly in the upper right of the model page. In this model's case, it is a LoRA, though you've already been informed of that.

    Secondly, do you need a third party program? Why would you think that you would? I've never seen a model page tell me to run code. And, I mean... It's just a LoRA, you know..? All LoRAs pretty much act the same.

    You can think of LoRAs like plugins or extensions for your model, more or less. Effectively, they temporarily add training data to a model on-demand. So if you have your SDXL base model, and you use, say, a high-quality art LoRA, then, when you begin generating, the model gets loaded up, and then the LoRA is loaded in on top, adding more "knowledge" to the model about HQ art and (usually) also making it lean more towards that stuff (temporarily, because this 'extra knowledge' is only stored in memory, the actual model file on your drive is untouched). Hence, it basically works just like plugins do. For as long as you use the LoRA, it extends the functionality of the base model.

    That said, many LoRAs are very strong and will cause the model to lean very heavily in a certain direction. Using the same example as before, if we use that high quality art LoRA, but then try to get it to generate real photography, it may very well end up struggling and produce art instead, or some awkward, uncanny mix of the two.

    So, rule of thumb with LoRAs: use as few as possible to accomplish your goal (more LoRAs = more vRAM and higher probability of conflicts that can cause crappy outputs) and only use LoRAs when you actually want the effect the LoRA gives.

    Some LoRAs play nicer with others, though. For example, this one is very likely to work plenty fine alongside other LoRAs. But, I wouldn't recommend, for example, combining a massive photography focused LoRA with a massive classical art focused LoRA alongside half a dozen style LoRAs that also conflict with each other. Also, LoRAs with trigger words are less likely to cause conflicts, though that doesn't mean they won't.

    To be clear, by "conflict" I am referring to the output being bad due to m multiple LoRAs with conflicting concepts being in use all at once. If you feel unsure, there are countless guides out there that explain this stuff in as much of as little detail as you desire. Rather than popping onto a random LoRA and complaining that its description is word salad, consider learning the basics first?

    Finally, the meat of your comment: "word salad". Here's the thing: yes, you're right, the description is ridiculous and unnecessary. That sort of info is better placed on a GitHub repo or on a separate page. Folks want to know "What does it do?" and "How do I use it, exactly?" when they read a model page's description.

    However, while I agree on principle that this lengthy technical explanation filled with jargon is absolutely not needed here (nor, I believe, is it wanted here, either), I also know that if they'd done what you expected (provided answers to "what is it" and "how do I use it"), the description box would've amounted to: "Makes your generations prettier using SPO (link to expansion). It's a LoRA just like any other, use it at 1.0 to 1.5, or whatever up to you."

    You'd still have been just as confused, and I think you'd have been even more likely to click away because the page would now appear utterly devoid of any information. I suspect that what you were missing is that basic "How to read Civit.AI model pages to identify what kind of model you're looking at" and that basic "What are LoRAs and how best to use them."

    If you'd had that prerequisite info, it would not have mattered that the description was so ridiculous and overtly technical, because you use all LoRAs in exactly the same way. The only things you need to know from the LoRA's creator is "are there any trigger words" and "at what strength should this be used", both of which can typically be found out by simply checking the preview images' prompts, should they not be mentioned in the sidebar table or the description.

    WorstAirtistFeb 15, 2025· 5 reactions

    @sephwalker525 I'm aware of everything you've said. However, sometimes people forget to tick boxes, The primary reason I asked to begin with is the name itself doesn't quite say.

    I've seen things that were actually 'Embeddings' be tagged as 'Loras' or have no tag at all, as well as other various things that go in other folders. So I asked to make sure I placed it in the correct folder both so that I could actually use it, and so it wouldn't get downloaded, never seen, and thus never used because it wasn't in the correct place.

    And I asked because though I have a solid general understanding, I stopped keepign up with 'language models' back when HTML was the big thing, and I'm not as well read as some are nowadays.

    There's nothing wrong with asking for help to be sure. And not everything is as black and white as 'just check the description.'

    Grandfather9Feb 19, 2025

    @VeerGeer Hey Veer, I am kind of struggling, if I generate something without it and then I want to improve the quality of the existing image, how do I run it? I tried, img2img with it at various strengths and various strength of the img2img, for example 80% prompt/20% image + the lora at 250% and other combinations going to 0% prompt, 100% image + lora but results are unsatisfactory at best.

    I don't understand what this sentence means, could you elaborate? I am using DrawThings as I have a Mac and that might be a setting I am not familiar with.

    - "I've run it at 8.0 at low denoises just to get an insanely sparkly output, then blend those sparkles in where I found it relevant."

    Edit: Ok, I read the comment a few times, low denoise means a small number of steps? How small? Less than 10? High denoise means high numbers of steps? More than 30, I'd guess?

    VeerGeerFeb 20, 2025· 2 reactions

    @Grandfather9 Most interfaces shortcut steps and denoise values.
    'Denoise values' in this context are likely what you're describing as the "X" in "x% prompt, y% image"

    The way most interfaces shortcut it, is by doing "steps * X", so 30% 'denoise' at 20 steps would become 6 steps.

    Excessively high SPO LoRA values, for the purpose of 'sparkling an image up', work best around 10 to 16 steps - and usually, you'd want to stay in the '20%-35% prompt' range.
    Which would, in your program's case, likely mean 60 steps at '20% prompt' (end result: 12 steps) , 45 steps at 35% prompt (end result: 16 steps)

    Grandfather9Feb 22, 2025

    @VeerGeer I'll test it, thanks for the help!

    sum1elsJan 7, 2025· 21 reactions
    CivitAI

    I looked through the training prompts at github, and you could really improve them by simply spell-checking and proof-reading them. For example, "a humbuguer" and "image of a" are not helpful captions. I can easily see that a large portion of the captions are just a mess, even without seeing the training images. Extraneous characters or missing spaces between words can also mess with the training. Only 130 words appear in more than 50 prompts, and most of those are a, an, of, and, etc. while about 4350 of the roughly 8300 words only appear in a single caption, and many of those aren't really words at all.

    MuMeGosuiJan 12, 2025· 14 reactions
    CivitAI

    The SPO-SDXL_4k-p_10ep_LoRA is a model based on Stable Diffusion XL 1.0, designed to generate higher-quality and aesthetically pleasing images through a technique called Step-aware Preference Optimization (SPO). Here's how it works:

    Step-wise Evaluation: Multiple candidate images are generated at each denoising step.

    Preference Comparison: A trained preference model evaluates and selects the highest-quality images from the candidates.

    Random Initialization: At each step, a random candidate is chosen to initialize the next optimization process.

    This method improves the texture and details of the image progressively, significantly enhancing aesthetic quality.

    This explanation is provided by ChatGPT.

    SPO-SDXL_4k-p_10ep_LoRA 是基於 Stable Diffusion XL 1.0 的模型,目的是透過「逐步偏好優化」(Step-aware Preference Optimization, SPO)技術,生成更高質量、更符合審美的圖像。其工作原理如下:

    逐步評估:在每個去噪步驟中生成多個候選圖像。

    偏好比較:使用訓練好的偏好模型比較候選圖像,挑選最佳質量的版本。

    隨機初始化:在每步中從候選集中隨機選擇一張圖像進行下一步優化。

    這種方法能夠逐步提升圖像的質感和細節,並大幅提高美學質量。

    此介紹由 ChatGPT 提供。

    Azzer_StudiosJan 16, 2025· 6 reactions
    CivitAI

    This is really good, everyone should try this, it's like RTX on,

    RessentimenteJan 19, 2025· 7 reactions
    CivitAI

    Shit is insane just turn it off if doing multiple upscales or everything gets shiny af

    heavenvisiontwy993Jan 20, 2025· 10 reactions
    CivitAI

    I don't see a license. What can it be used for, commercial or non-commercial?

    VeerGeerJan 22, 2025· 1 reaction

    It's not a product, it's "just" an academic pursuit.
    So that does put it into a limbo state of "try and see, lol, send the author(s) some beers if you feel bad about it"

    The important part is likely "adoption" so that the author(s) get 'clout' for being standouts in the crowded field.

    heavenvisiontwy993Jan 24, 2025· 13 reactions

    @VeerGeer I'm considering trying a small side business and selling T-shirts with interesting designs. I was thinking of incorporating LoRA from here because I believe it might be useful, but I think it's appropriate to ask whether it is allowed.

    Of course, I modify the image with details and use different editing software to achieve uniqueness — this is the case I want to know if it is permitted, as well as whether it is allowed without any edits. My question is about the general terms and conditions.

    I wrote to the creator, but I haven't been able to reach them yet. They seem inactive. Usually, there are licenses or descriptions about what is prohibited, but in this case, there’s nothing. I want to ensure transparency and fairness on my part toward the other person.

    enumosheJan 21, 2025· 23 reactions
    CivitAI

    DPO just fixes anatomy, but this SPO improves image composition too.

    krigetaMay 23, 2025· 23 reactions
    CivitAI

    can one use this with Illustrious?

    VeerGeerMay 27, 2025· 11 reactions

    Yes, though the "useful weights" are in the PonyXL range (1.6 - 4.0) rather than standard SDXL ranges (0.6 - 2.0)

    necrophagism777Jun 18, 2025· 8 reactions
    CivitAI

    Can't live without this anymore, almost always make your gen better.

    prodajieJul 13, 2025· 9 reactions
    CivitAI

    A triumph for AI image composition. Thank you!

    a1161327317Aug 8, 2025· 3 reactions
    CivitAI

    效果很好,但是倾向于现实作品

    bulariaishungry974Oct 26, 2025· 3 reactions
    CivitAI

    If doing medieval images, just be aware this lora heavily pushes the theme from western to eastern, and also changes the overall image color to be more blue

    CrankerNov 5, 2025· 16 reactions
    CivitAI

    what does it do

    ZUSIMODec 16, 2025· 1 reaction

    I too wonder the same thing.

    KojenseukiFeb 28, 2026
    CivitAI

    When used together with the Illustrious Anime model, the image gains a slight sense of depth. Recommended.