NovelAI_Diffusion_V2 - CivArchive (CivitAI Archive)

This is the novelai_v2 model released by NovelAI, converted to safetensors format.There is no difference in the output.

Based on SD1.5, so anyone can use it for inference right after downloading.For more details, please check here.

https://blog.novelai.net/novelai-diffusion-v2-weights-release-b9d5fef5b9a4

This model might seem outdated, but its training quality is very high.They are always ahead of us.Tag recognition is far better than v1 and rivals recent large-scale SDXL fine-tuning.Plus, Trained using only U-Net, it's a clean model with no TE contamination.
It’s useful not only for T2I, but also for merging with existing SD1.5 models or enhancing other models’ details and styles via I2I.

■This model is based on SD1.5, but its native resolution is 1024px, allowing for high-resolution generation.

The VAE also seems to have improved, with the previous fading issue gone and colors now appearing more vibrant.That alone would likely benefit existing SD1.5 anime models as well.

It can also generate at 1024x1536px—while slightly less stable, it's still practical.It might be more stable around 1344px.

All of my sample images were generated at 1024x1536 without using Hires fix.

The results are vivid and extremely sharp.

It also has a strong ability to render fine details such as eyes and small accessories.

It might also be interesting to try merging it with other models.

If merged with an existing 512px model, it should be possible to generate images at a 768px aspect ratio, such as 640x960.

For this model, please set it to CLIP skip2.

■Currently, Civitai's image generation with SD1.5 is limited to 512px. I requested 1024px support, but it's unclear if it will be added. Sorry to those wanting to use Civitai for inference...

https://feedback.civitai.com/p/please-consider-adding-768px-and-1024px-resolution-options-for-image

It would help if you could upvote it—more support will show its importance.

■The model understands many concepts and responds well to tag prompts.

Since it's trained on U-Net only, it's clean and a great starting point for fine-tuning.

It already knows many concepts, so training the text encoder may not be necessary.

Currently, 1536px can cause character splitting, but training LoRA at 1280 or 1536px should improve stability.

■I've prepared a ComfyUI inference workflow—feel free to use it as a reference.

The workflow using Tipo and wildcards is recommended since it allows you to try various variations without having to come up with tags yourself.

I haven't fully understood this model yet either, so I'm sure there are better ways to generate images.

■If high-resolution inference is slow, HyperLoRA might help reduce the step count.
I’m not fully familiar with its usage, but I’ve added a workflow for reference.
Let me know if you have better workflows or speed-up methods.

https://huggingface.co/ByteDance/Hyper-SD/blob/main/Hyper-SD15-8steps-CFG-lora.safetensors

■It's best to use the same base resolution aspect ratios as those used in SDXL.If you don’t mind a bit of instability, 1024x1536 is also possible.

1024x1024

896x1152

832x1216

768x1344

640x1536

■It's still early testing and quality isn't great yet, but I made a DoRA to stabilize 1024x1536 generation.
I'll continue testing and update it when I have time.

In my opinion, besides the benefits of high resolution, LoRA also helps reduce overexposure and oversaturation, making the image more balanced—so creating a style LoRA is a good choice.

https://civarchive.com/models/1253884?modelVersionId=2133885

■I created a negative TI to help stabilize quality—feel free to try it out.

https://civarchive.com/models/1809022?modelVersionId=2047219

■I also created a semi-realistic style DoRA.

https://civarchive.com/models/1253884?modelVersionId=2134238

■Here are my recommended samplers:

・euler_ancestral: Most stable and least likely to break, though results are average.

・dpmpp_sde: Great balance of texture and stability. Slower than others but needs half the steps. I prefer it over 2m/3m.

・2m/3m: Needs the same step count as other samplers; may break with low steps.

・gradient_estimation: Similar to euler but converges faster, making it more stable at low steps.

I like using the "simple" scheduler.

The "GITS scheduler" is sharp, stylish, and vivid, with faster speed and quick convergence. but it can react strongly to changes—unstable settings may cause issues.Hands and anatomy are prone to breaking down.If results degrade, adjust settings or switch back to a regular scheduler.

■Uncondzero is recommended as it slightly improves speed and enhances generation stability through the autocfg effect.

https://github.com/Extraltodeus/Uncond-Zero-for-ComfyUI

■Tag Order

"1boy, 1girl, characters, series,other General tags..."

However, most of the official explanations are for v3 and later, so they may not apply to v2.
Using an order that makes sense to you is probably fine.

The novelai_v1 method may sometimes work better and could even be more correct.

The order of quality tags is somewhat unclear, but in the official V2 model examples, quality tags appear to be placed at the beginning. From V3 onward, they are added at the end. but please let me know if I'm wrong.

Well, in practice, tag order affects strength and what becomes the main subject.

For simple prompts, putting quality tags first may help achieve high quality more easily.

In detailed prompts, quality, metadata, and rating tags may introduce unwanted elements, so placing them at the end can sometimes help avoid interference. maybe...

If you want to check which tags the model recognizes, the most reliable way is to look at the suggested tags that appear when generating images on the actual NovelAI website.

■New unique tag list (the blog doesn't mention other tags, but the rest may be the same as nai_v1).

Here, too, you can find valuable information.

https://docs.novelai.net/image/tags.html

https://docs.novelai.net/image/qualitytags.html

■ Quality Tags

best quality

amazing quality

great quality

normal quality

bad quality

worst quality

■Aesthetics Tags

very aesthetic

aesthetic

displeasing

very displeasing

■year tag

year 2022 etc...

Due to danbooru dataset trends, images from 2020+ are generally higher quality.Especially after 2022.

Pre-2018 images are mixed unless from professionals. The best way to predict which year tags work well is by checking image trends on the danbooru site.

This model is from late 2023, so tags after that may not function. 2023 tags seem to work well but are less reliable. Tags from 2022 and earlier should be safe.

Personally, I found year tags effective for older styles like 2014.

Recent years didn’t bring much benefit—sometimes they added nice atmosphere, but often caused black-and-white images or text artifacts.

year 2020 and year 2021 were relatively better.

Unless you specifically want that year's style, it's more stable to avoid using year tags as quality indicators.

■Rating tags

rating:general

rating:sensitive

rating:questionable

rating:explicit

NSFW　(There is no difference in results between uppercase and lowercase.)

For novelai_v2, it's unclear if adding "rating:" is correct.

I tested both with and without it, but couldn't confirm.

■Renamed Tags

v should instead be written as "peace sign"

double v should instead be written as "double peace"

|_| should instead be written as "bar eyes"

\||/ should instead be written as "open \m/"

:| should instead be written as "neutral face"

;| should instead be written as "neutral face"

"eyepatch bikini" should instead be written as "square bikini"

"tachi-e" should instead be written as "character image"

Description

Details

Files

novelaiDiffusionV2_novelaiV2_trainingData.zip

Mirrors

novelaiDiffusionV2_novelaiV2_trainingData.zip

Mirrors

novelaiDiffusionV2_novelaiV2.safetensors

Mirrors

novelaiDiffusionV2_novelaiV2.safetensors

Mirrors

Available On (1 platform)