Miso-diffusion-M (Beta) - CivArchive (CivitAI Archive)

*Important, please read this before using the model because this is very experimental, as I am still trying to fine the optimal settings

You can download the clip text encoder here: https://huggingface.co/suzushi/miso-diffusion-m-beta

Miso Diffusion M (Beta) is an attempt to fine tune stable diffusion 3.5 medium on anime dataset. In comfy ui it uses as little as 2.4 gb vram without the t5 text encoder. This version is a step up from previous version (alpha) , trained on 160k image for 3 epoch to see how it adapts to anime.

Recommanded setting, euler, cfg:5 , 28-40 steps, though dpm ++ 2m also works but haven't done much testing, prompt: danbooru style tagging. I recommand simply generating with a batch size of 4 to 8 and pick the best one.

Quality tag

Masterpiece, Perfect Quality, High quality, Normal Quality, Low quality

Aesthetic Tag

Very Aesthetic, aesthetic

Pleasent

Very pleasent, pleasent, unpleasent

Additional tag: high resolution, elegant

Training is done on gh200. Switched lr scheduler to cosine this time

Training setting: Adafactor with a batchsize of 40, lr_scheduler: cosine

SD3.5 Specific setting:

enable_scaled_pos_embed = true

pos_emb_random_crop_rate = 0.2

weighting_scheme = "flow"

Train Clip: true, Train t5xxl: false

Description

Details

Files

misoDiffusionMBeta_misoDiffusionMBeta.safetensors

Mirrors