đź‘‹ If you like what I do and want to support the development, feel free to buy me a coffee:
Hello, you’re probably wondering: why so many versions?
Well… I’d be asking the same thing if I were in your place. The reason is simple: it’s designed this way to offer more control, since, unlike normal LoRAs, DMD2 works best at its maximum strength.
For example:
HD 1 CFG Scale has “diluted” strength, so it requires the help of triggers or manually increasing its LoRA strength. This makes it very useful for combining with PDXL LoRAs in Illustrious, since you can simply raise the strength without losing details.
DPM A1 and DPM A15 already come with boosted strength and detail, so they don’t require triggers. A1 is the standard strength, while A15 adds an extra +15%.
V4 is an experiment to generate images in 2 steps. It was created in the opposite way to HD 1 CFG: instead of reducing strength to improve stability, V4 increases strength by 1.35 ratios (20 more than DPM A15).

V5 Enhancing checkpoint details, emphasizing styles, and making prompts more effective without changing your model.

V6 It improves colors and details without changing the style; its sweet spot is cfg scale 1.
V7 (Visual Only): Built upon the foundations of V6 but heavily cleaned up to focus strictly on visual aesthetics. It sacrifices some cognitive prompt understanding (intelligence) to prioritize pure style.
V7.5 FIX (The Structural Update): The definitive, fully integrated version. It restores the deep geometric understanding and strict prompt obedience that the basic variants lacked. It features a massive parameter increase to fix the "blindness" of earlier versions, making it highly accurate for complex prompts. (Tip: Because of its dense structure, if you notice issues with hands at extremely low step counts like 3, simply increase your steps to 4-6 or lower your CFG scale to 0.6 - 1.4).

In short: it all depends on your taste and goal. For example, V4 will produce more “noise” (details) and may sacrifice some realism unless you use it with a realistic checkpoint, V7 is great for pure stylistic overhauls, and V7.5 FIX is your heavy-duty engine for complex, highly specific prompts.
But what is this for?
This LoRA is based on the architecture and style of DMD2, a well-known approach for optimizing diffusion models by focusing on reducing the number of generation steps without compromising visual quality.
So... What is DMD2?
DMD2 (Denoising Diffusion Probabilistic Model 2) is a variant of probabilistic diffusion models, designed to generate high-quality images from noise through an iterative denoising process.
According to the literature (e.g., Ho et al., 2020, Denoising Diffusion Probabilistic Models), DMD2 optimizes the denoising process by reducing the number of steps required to achieve a quality level comparable to traditional models like DDPM.
DMD2 uses an improved parameterization of the reverse diffusion process, adjusting variance weights and denoising terms to accelerate convergence.
In the context of LoRAs, DMD2 serves as the base for training Low-Rank Adaptation modules that fine-tune a pretrained model (such as Stable Diffusion) for specific tasks, minimizing computational cost while preserving visual quality.
In conclusion:
The LoRAs described here (HD_DMD2_1_CFG-SCALE, DPM_4STEPS_A1, DPM_4STEPS_A15 and V4) are adaptations leveraging the DMD2 structure to operate with a CFG scale of 1.
This is particularly interesting because normally a higher CFG scale is needed to maintain the same quality, but these LoRAs can reduce the step count to 4, 6, 8, or 10 (10 being the minimum allowed on Civitai) while achieving impressive results—cutting generation times from minutes to just a few seconds.
The Compatibility Edge: The original base DMD2 is notoriously rigid—it only accepts LCM and Euler samplers and is strictly locked to a CFG scale of 1. My custom architecture breaks these limits. If you use fewer than 6 steps, it can comfortably handle up to 2 CFG. If you increase the steps to 10-14, you can push the CFG scale up to 4-5 without breaking the model, and it supports a much wider variety of sampler/scheduler combinations.
Key Features
Optimized for fast generation: Designed to produce high-quality images with a very low number of inference steps (4, 6, or 8), enabling quick and efficient generation.
Low effective CFG scale: Works optimally around a CFG scale of 1, providing an ideal balance between creativity and fidelity without overfitting.
Three variants for different needs: Includes versions tailored for 8, 6, and 4 steps, offering flexibility depending on speed and detail requirements.
Robust visual quality: Maintains strong detail in colors, textures, and composition even with reduced steps—perfect for applications requiring both speed and quality.
Wide applicability: Suitable for users aiming to optimize generation time without sacrificing definition in their images.
Usage Instructions & Recommendations
If the LoRA you’re using requires more steps to achieve a good result, you can increase the LoRA strength or add positive prompts with keywords like "hdr" to improve lighting and detail, and negative prompts like "flat color" to control saturation and shadows.
Alternatively, you can lower the LoRA strength, which allows you to use higher CFG scales without oversaturating the image. However, since this LoRA is primarily designed for CFG scale 1, the ideal strength may vary depending on your specific use case.
Experiment with both strength and CFG scale to find the optimal balance for your workflow and desired style.
Thanks so much for your support! ♥
Description
I've been trying to reduce the steps to two, and the simplest solution is to merge it with a detailer to give me a lot of detail in those tiny steps. But I don't want to go through with this. I just realized that Civitai has a minimum of ten steps, so the improvements won't be visible, so I'll just leave this version here.
If you'd like to use the 4-step version, which actually works, I recommend the original version:
FAQ
Comments (9)
Hello, is V4 your attempt at a 2-step LoRA or is it something else?
Yes, but not completely, I'm still trying to test if it's possible to reduce the steps to 2 and although it works at certain checkpoints, it's quite complicated to refine (I have some ideas, but it's still a theory):
import os
import torch
from safetensors.torch import load_file, save_file
import torch.nn.functional as F
# Rutas de los archivos
ruta_lora = "/content/dmd2.safetensors" # LoRA DMD2 original
ruta_salida = "/content/Improved_dmd2_LoRA_1_2_Steps.safetensors" # LoRA optimizado
# Verificar que el archivo existe
if not os.path.exists(ruta_lora):
raise FileNotFoundError(f"El archivo {ruta_lora} no existe")
# Cargar pesos del LoRA existente
pesos_lora = load_file(ruta_lora)
# Diccionario para el LoRA optimizado
lora_optimizado = {}
# Parámetros para optimización en 1-2 pasos
escala_base = 0.4 # Escala alta para convergencia ultrarrápida
umbral_norma = 0.8 # Umbral bajo para evitar saturaciĂłn en CFG 1.5-2
refuerzo_atencion = 1.4 # Refuerzo mayor para detalles en pocos pasos
factor_input = 0.6 # ReducciĂłn agresiva en capas iniciales
factor_output = 1.0 # Preservar capas de salida para calidad
for clave, peso in pesos_lora.items():
# Calcular norma del peso para escalado adaptativo
norma_peso = torch.norm(peso, p=2).item()
escala_adaptativa = escala_base * min(1.0, umbral_norma / (norma_peso + 1e-8)) # Evita divisiĂłn por cero
# Escalar segĂşn tipo de capa
if "input_blocks" in clave or "down" in clave:
# Escala agresiva para capas iniciales (convergencia en 1-2 pasos)
escala = escala_adaptativa * factor_input
elif "output_blocks" in clave or "up" in clave:
# Mantener calidad en capas de salida
escala = escala_adaptativa * factor_output
elif "attn" in clave:
# Refuerzo en atenciĂłn para detalles finos en pocos pasos
escala = escala_adaptativa * refuerzo_atencion
else:
# Escala moderada para otras capas (ej. codificadores de texto)
escala = escala_adaptativa * 0.85
# Aplicar escala y limitar pesos para estabilidad en CFG alto
peso_escalado = peso * escala
peso_escalado = torch.clamp(peso_escalado, min=-umbral_norma, max=umbral_norma)
# Asegurar compatibilidad con FP16 (DMD2 usa FP16)
if peso_escalado.dtype != torch.float16:
peso_escalado = peso_escalado.to(dtype=torch.float16)
lora_optimizado[clave] = peso_escalado
# Guardar el LoRA optimizado
save_file(lora_optimizado, ruta_salida)
print(f"LoRA SDXL optimizado para 1-2 pasos guardado en: {ruta_salida}")
# Prueba opcional: Generar imagen para verificar calidad
try:
from diffusers import StableDiffusionXLPipeline
# Cargar pipeline SDXL
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16"
).to("cuda")
# Configurar scheduler para 1-2 pasos (LCM-inspired)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
pipe.scheduler.config,
algorithm_type="sde-dpmsolver++",
use_karras_sigmas=True
)
# Cargar LoRA optimizado
pipe.load_lora_weights(ruta_salida, adapter_name="optimizado_dmd2")
# Generar imagen
imagen = pipe(
prompt="retrato ultra realista, textura de piel detallada, iluminaciĂłn natural",
num_inference_steps=2, # Optimizado para 1-2 pasos
guidance_scale=1.5, # CFG estable
negative_prompt="sobreexposiciĂłn, borroso, artefactos, colores saturados",
generator=torch.manual_seed(42) # Reproducibilidad
).images[0]
imagen.save("prueba_optimizada_1_2_pasos.png")
print("Imagen de prueba generada con éxito")
except Exception as e:
print(f"No se pudo hacer la prueba: {e}")
I see, I wish i could help out but I really lack any knowledge required :/
Been trying out quite a few 2-step generations locally now, the contrast goes ham, but it might just be the checkpoints i'm using? Been lowering/raising the weight and changing the denoise value here and there.
Been getting pretty clear and sharp images using Ultimate SD Upscaler, using 2 steps per tile there as well. I'll post some here when i've tried it out some more!
fytek Thanks for trying it out. I actually based it on the 4-step version, so it was expected to fail. What I'd really like to do is convert the 1-step version they have, but since it's .bin, it's too heavy for a free Google Collab account (all my attempts end in restarts).
I'm thinking of merging this version with the Lora LCM. Maybe that will make it more compatible with DPM or reduce the number of steps: https://huggingface.co/tianweiy/DMD2/tree/main
I have used DMD since it came out, it's a huge game changer. So I will definitely enjoy this... I have no idea how exactly to make a dmd lora myself, but I've noticed all the DMD loras I've found, none work well with Animagine based mixes. (I'm guessing the clips don't match or something? It comes out like blurry-ish hard to describe) How difficult would it be to make an Animagine based dmd lora?
There are several ways to do this, but the fastest way is to refine the existing version (simply merge it with some Lora animagine) or to fine-tune it with a high-quality image dataset (style or concept).
Herrscher_AGGAÂ hmm I see, would you happen to have a guide or somewhere I could follow to try to achieve this? I'm not certain I'm understanding to fine-tune it with a high quality image set of a style or concept as I'm wanting to make just a base dmd lora to work with said base checkpoint without any bias, as yours do.
incredible!
Details
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.



