SD1.5 Flow Matching Finetune (Evangelion 260K)

Model Details

This model is an experimental fine-tune of the NAI-v2 (Stable Diffusion 1.5) architecture, adapted to use a Rectified Flow / Flow Matching objective instead of the classical DDPM objective.

The goal of this project was to test whether the Flow Matching objective accelerates the learning of new concepts and shifts the model distribution effectively to modern anime styles and characters (updating the base model's knowledge cutoff from Summer 2023).

- Developed by: [aipracticecafe](https://huggingface.co/aipracticecafe)

- Objective: Rectified Flow / Flow Matching

- Base Model: [nai-anime-v2](https://huggingface.co/NovelAI/nai-anime-v2) (SD 1.5 based)

- Resolution: Trained at 512px base resolution with aspect ratio bucketing. (Note: As training was strictly at 512px, the model's native capacity to generate 1MP images may be degraded compared to the base model). However it can generate 768x1024 images without problems.

- Status: Experimental. This is the first model with somewhat stable results. The model has not been fine-tuned on a high-quality aesthetic dataset yet, so outputs might be unstable.

Dataset Details

- Size: 260,000 samples.

- Composition: Focused on relevant characters and artists to ensure adequate representation (e.g., >500 samples per character). Includes modern anime characters from series like Girls Band Cry, Oshi no Ko, Watashi ga Koibito ni Nareru Wake Naijan Murimuri!, Frieren, etc. Also it includes other series like Neon Genesis Evangelion and Date a Live.

- Quality Clusters: The dataset was clustered into 4 quality buckets: worst score, bad score, good score, and masterpiece.

- Prompting: Danbooru tags were upsampled using a SwinV2 tagger. Prompts follow a structured format with random shuffling of general tags every epoch. Loss was scaled using a tag weight float based on tag rarity to handle data imbalance.

Training Details

- Hardware: 2x RTX 4090 (Took ~3 days)

- Epochs: 20

- Batch Size: 12 per GPU (Effective Batch Size: ~336 with 14 gradient accumulation steps across 2 GPUs). High batch size is crucial for learning new concepts as stated in the Illustrious paper.

- Learning Rate: 2.5e-5 (Constant with 0.05 warmup)

- Precision: Mixed Precision (fp32/bf16).

- Optimizer: 8-bit AdamW (bitsandbytes)

- Shift: 2

- Latent Caching: VAE latents and TE tokens were precomputed and chunked.

Inference & Usage

Because this model uses a Flow Matching objective, it cannot be used with standard DDPM/DDIM samplers without modification.

WebUI: We have published an updated fork of the forge webui [codeberg](https://codeberg.org/aipracticecafe/stable-diffusion-webui-forge) that adds native support for Rectified Flow models.

ComfyUI: It should work using standard Flow Matching / SD3 sampling nodes (e.g., using ModelSamplingRectifiedFlow and a linear shift scheduler).

Tag Ordering

It's recommended to follow the structured prompt template:

```

1girl/1boy, character name, from what series, everything else in any order, artists, quality/metadata tags.

```

Including the artist after the series name can be useful for improving its effect.

Negative prompt:

```

(very displeasing, displeasing, bad score, worse score, bad quality,worst quality, worst detail:1.1),sketch, loli, child, patreon logo, watermark, signature, blurry, bad hands, bad anatomy, bad fingers, extra fingers, extra limbs, deformed limbs, very displeasing, displeasing, comic, speech bubble, lowres, twitter username, artist name, (light particles:0.4)

```

Recommended Parameters:

- Sampler: Euler / DPM++ SDE (Must be Rectified Flow compatible)

- Steps: 48

- CFG Scale: 6

- Shift: 3 - 4

- Positive Prompt: masterpiece, good score, best quality, aesthetic, ...

- Negative Prompt: worst quality, low quality, bad anatomy, bad score, ...

- Resolution: 768 x 1024

Bias and Limitations

- Unstable Outputs: Due to the lack of a final high-quality aesthetic fine-tuning phase, some outputs may be unstable or exhibit artifacts.

- Resolution Degradation: Training exclusively on 512px aspect ratio buckets has reduced the model's ability to natively generate 1 Megapixel images (a feature present in the original NAI-v2).

- Anatomy & Color: Standard biases from the Danbooru dataset apply.

License

This model is based upon Stable Diffusion 1.5. It is distributed under the

terms of the CreativeML Open RAIL-M and the terms of the CC BY-NC-SA 4.0

license. This means that the terms of both licenses apply at the same time.

SD1.5 Flow Matching Finetune (Evangelion 260K)

Model Details

Dataset Details

Training Details

Inference & Usage

Tag Ordering

Bias and Limitations

License

Description

Comments (1)

Details

Files

naiV2FmAlpha_v10.safetensors

Mirrors