CivArchive
    NovelAI_Diffusion_V2 - novelai_v2
    NSFW
    Preview 88023640
    Preview 88022885
    Preview 88022855
    Preview 88024521
    Preview 88022765
    Preview 88022803
    Preview 88022795
    Preview 88022905
    Preview 88022789
    Preview 88022904
    Preview 88022886
    Preview 88022847
    Preview 88022907
    Preview 88022957
    Preview 88022994
    Preview 88023712
    Preview 88023337
    Preview 88022924
    Preview 88023058
    Preview 88023004

    This is the novelai_v2 model released by NovelAI, converted to safetensors format.There is no difference in the output.

    Based on SD1.5, so anyone can use it for inference right after downloading.For more details, please check here.

    https://blog.novelai.net/novelai-diffusion-v2-weights-release-b9d5fef5b9a4

    This model might seem outdated, but its training quality is very high.They are always ahead of us.Tag recognition is far better than v1 and rivals recent large-scale SDXL fine-tuning.Plus, Trained using only U-Net, it's a clean model with no TE contamination.
    It’s useful not only for T2I, but also for merging with existing SD1.5 models or enhancing other models’ details and styles via I2I.

    ■This model is based on SD1.5, but its native resolution is 1024px, allowing for high-resolution generation.

    The VAE also seems to have improved, with the previous fading issue gone and colors now appearing more vibrant.That alone would likely benefit existing SD1.5 anime models as well.

    It can also generate at 1024x1536px—while slightly less stable, it's still practical.It might be more stable around 1344px.

    All of my sample images were generated at 1024x1536 without using Hires fix.

    The results are vivid and extremely sharp.

    It also has a strong ability to render fine details such as eyes and small accessories.

    It might also be interesting to try merging it with other models.

    If merged with an existing 512px model, it should be possible to generate images at a 768px aspect ratio, such as 640x960.

    For this model, please set it to CLIP skip2.

    kohya_deep_shrink is also effective for t2i, so it might be a good idea to try using it.

    Doing so can sometimes reduce the breakdown of backgrounds and fingers, leading to more stable results.

    ■Currently, Civitai's image generation with SD1.5 is limited to 512px. I requested 1024px support, but it's unclear if it will be added. Sorry to those wanting to use Civitai for inference...

    https://feedback.civitai.com/p/please-consider-adding-768px-and-1024px-resolution-options-for-image

    It would help if you could upvote it—more support will show its importance.

    ■The model understands many concepts and responds well to tag prompts.

    Since it's trained on U-Net only, it's clean and a great starting point for fine-tuning.

    It already knows many concepts, so training the text encoder may not be necessary.

    Currently, 1536px can cause character splitting, but training LoRA at 1280 or 1536px should improve stability.

    ■I've prepared a ComfyUI inference workflow—feel free to use it as a reference.

    The workflow using Tipo and wildcards is recommended since it allows you to try various variations without having to come up with tags yourself.

    I haven't fully understood this model yet either, so I'm sure there are better ways to generate images.

    ■If high-resolution inference is slow, HyperLoRA might help reduce the step count.
    I’m not fully familiar with its usage, but I’ve added a workflow for reference.
    Let me know if you have better workflows or speed-up methods.

    https://huggingface.co/ByteDance/Hyper-SD/blob/main/Hyper-SD15-8steps-CFG-lora.safetensors

    ■It's best to use the same base resolution aspect ratios as those used in SDXL.If you don’t mind a bit of instability, 1024x1536 is also possible.

    1024x1024

    896x1152

    832x1216

    768x1344

    640x1536

    ■It's still early testing and quality isn't great yet, but I made a DoRA to stabilize 1024x1536 generation.
    I'll continue testing and update it when I have time.

    In my opinion, besides the benefits of high resolution, LoRA also helps reduce overexposure and oversaturation, making the image more balanced—so creating a style LoRA is a good choice.

    https://civarchive.com/models/1253884?modelVersionId=2133885

    ■I created a negative TI to help stabilize quality—feel free to try it out.

    https://civarchive.com/models/1809022?modelVersionId=2047219

    ■I also created a semi-realistic style DoRA.

    https://civarchive.com/models/1253884?modelVersionId=2134238

    ■Here are my recommended samplers:

    ・euler_ancestral: Most stable and least likely to break, though results are average.

    ・dpmpp_sde: Great balance of texture and stability. Slower than others but needs half the steps. I prefer it over 2m/3m.

    ・2m/3m: Needs the same step count as other samplers; may break with low steps.

    ・gradient_estimation: Similar to euler but converges faster, making it more stable at low steps.

    I like using the "simple" scheduler.

    The "GITS scheduler" is sharp, stylish, and vivid, with faster speed and quick convergence. but it can react strongly to changes—unstable settings may cause issues.Hands and anatomy are prone to breaking down.If results degrade, adjust settings or switch back to a regular scheduler.

    ■Uncondzero is recommended as it slightly improves speed and enhances generation stability through the autocfg effect.

    https://github.com/Extraltodeus/Uncond-Zero-for-ComfyUI

    ■Tag Order

    "1boy, 1girl, characters, series,other General tags..."

    However, most of the official explanations are for v3 and later, so they may not apply to v2.
    Using an order that makes sense to you is probably fine.

    The novelai_v1 method may sometimes work better and could even be more correct.

    The order of quality tags is somewhat unclear, but in the official V2 model examples, quality tags appear to be placed at the beginning. From V3 onward, they are added at the end. but please let me know if I'm wrong.

    Well, in practice, tag order affects strength and what becomes the main subject.

    For simple prompts, putting quality tags first may help achieve high quality more easily.

    In detailed prompts, quality, metadata, and rating tags may introduce unwanted elements, so placing them at the end can sometimes help avoid interference. maybe...

    If you want to check which tags the model recognizes, the most reliable way is to look at the suggested tags that appear when generating images on the actual NovelAI website.

    ■New unique tag list (the blog doesn't mention other tags, but the rest may be the same as nai_v1).

    Here, too, you can find valuable information.

    https://docs.novelai.net/image/tags.html

    https://docs.novelai.net/image/qualitytags.html

    ■ Quality Tags

    best quality

    amazing quality

    great quality

    normal quality

    bad quality

    worst quality

    ■Aesthetics Tags

    very aesthetic

    aesthetic

    displeasing

    very displeasing

    ■year tag

    year 2022 etc...

    Due to danbooru dataset trends, images from 2020+ are generally higher quality.Especially after 2022.

    Pre-2018 images are mixed unless from professionals. The best way to predict which year tags work well is by checking image trends on the danbooru site.

    This model is from late 2023, so tags after that may not function. 2023 tags seem to work well but are less reliable. Tags from 2022 and earlier should be safe.

    Personally, I found year tags effective for older styles like 2014.

    Recent years didn’t bring much benefit—sometimes they added nice atmosphere, but often caused black-and-white images or text artifacts.

    year 2020 and year 2021 were relatively better.

    Unless you specifically want that year's style, it's more stable to avoid using year tags as quality indicators.

    ■Rating tags

    rating:general

    rating:sensitive

    rating:questionable

    rating:explicit

    NSFW (There is no difference in results between uppercase and lowercase.)

    For novelai_v2, it's unclear if adding "rating:" is correct.

    I tested both with and without it, but couldn't confirm.

    ■Renamed Tags

    v should instead be written as "peace sign"

    double v should instead be written as "double peace"

    |_| should instead be written as "bar eyes"

    \||/ should instead be written as "open \m/"

    :| should instead be written as "neutral face"

    ;| should instead be written as "neutral face"

    "eyepatch bikini" should instead be written as "square bikini"

    "tachi-e" should instead be written as "character image"

    ■Please feel free to ask if you have any questions!

    日本語での質問も大丈夫ですので気軽にお声がけください!

    Description

    Training Data (46.79 KB):comfyui_workflow

    7/16:I moved the quality tags back to the beginning.Added a speed-up workflow using uncondzero.

    7/14:Addition of a 15-step workflow using HyperLoRA.

    FAQ

    Comments (17)

    hjhf
    Author
    Jul 18, 2025· 4 reactions
    CivitAI

    Currently, Civitai's image generation with SD1.5 is limited to 512px. I've submitted a request to add 1024px support.

    https://feedback.civitai.com/p/please-consider-adding-768px-and-1024px-resolution-options-for-image

    It would help if you could upvote it—more support will show its importance.

    This isn't just about SD1.5—some models like SDXL are also built for 1536px, but can’t show their full potential yet.

    If resolution selection improves overall, inference quality will likely get much better.

    512px results aren’t great, so I’m unsure if the inference feature is still valuable...

    If you think it’s better to disable it, please let me know.

    Well, at least it doesn't generate completely noisy images, so for now I'll keep the inference feature enabled...

    hjhf
    Author
    Jul 22, 2025

    Thanks to everyone who upvoted!

    I’m not sure if Civitai will take notice, but showing support like this really matters—so thank you!

    hjhf
    Author
    Jul 18, 2025
    CivitAI

    Civitai inference test:euler_a, cfg:8, step:30, 512px generation works.

    Alternatively, you can use a workaround: set denoise to 1 in the i2i hi-res fix menu. This mostly ignores the original image and generates a high-resolution version with little resemblance.

    i2i increases resolution by 1.5x. Try it with any image using the size changes below.

    ■512x768>768x1152

    ■544x810>832x1216(Standard resolution)

    ■682x1024>1024x1536(Slightly less stable, but sharp when successful.)

    Even with the same seed, changing the image causes variation, so it seems the original image has some influence.

    It works with solid colors too, but using a more complex image with a person might yield better results.

    This is more of a fun trick than a proper method, and it's fragile—if the results are poor, it's best to give up...

    Sorry if I wasted your valuable buzz...

    Depending on the case, it may be better to enjoy the uniqueness as a test rather than aim for good results with this model on Civitai for now.

    LovelaceAJul 26, 2025· 3 reactions
    CivitAI

    Great model back to 2 years ago....Really hope to see v4.5 to be released....

    hjhf
    Author
    Jul 26, 2025

    It might be released just as people start thinking it’s outdated…

    Architecturally, models like Neta-Lumina already seem more advanced and promising, especially with video models improving at text-to-image too.

    I truly hope v4.5 won’t be overlooked when it's released—since it's a specialized architecture, it depends on the community to build the ecosystem for it to even run.

    LovelaceAJul 26, 2025· 1 reaction

    hjhf Yeah I know. Literally all SD1.5 model are legacy.

    Regarding Novelai v4.5, after a lot of try I really feel it is the most advanced anime model at the moment. The Strengthening & Weakening Vectors can easily go over 10 means the architecture is already completely different from SDXL,pony, illustrious or even flux?.

    Even at free generation resolution like 832x1216, subtle details can be rendered very accurately, compared to all other models. Maybe the training image resolution is even higher, like to the level of 2K already? However without Lora and other community tools I personally feel it is such a wild model, very hard to control the output, and background tend to be quite messy when many artist style tag are applied. Even same prompt can generate completely different stuff when seed changes. Enhance seems inferior to hires fix in comfyui, just keep changing a lot of details rather adding more, and the enhanced image always looks a bit washed out. Also the noise parameter in enhance is kinda wierd, seems generate more artifacts than details. Vibe transfer can help a bit but still I keep imagine what can be done if v4.5 is publicly released.

    Novelai's VAE maybe also be a quite advanced one. I noticed if I encode and decode a novelai generated pic on comfyui it deteriorate noticably.


    Surprisingly people dont really talk about it, I saw more complain about how it did not produce similar stuff from V3 or V4. Even the dev team dont mention how to better use v4.5 compared to v4 LOL. Maybe they already have the cash cow so dont really pay that much attention on customers' thoughts already.

    And yeah, I noticed the Neta-Lumina model recently announced. New architecture definitely brings a lot of hope. Illustrious I feel it can beat Novelai V4 with all the community tools, but V4.5 is not something with SDXL architecture can catch up. Neta-Lumina may has more potential if a lot of training can be applied to its unet, clip and also VAE, but it will take a lot of time......

    hjhf
    Author
    Jul 26, 2025· 1 reaction

    Thank you for sharing all the information.
    Sorry for the long message, but I wanted to share my thoughts as well...

    I haven’t used NovelAI v4.5 much yet, but it’s impressive that the vectors can go beyond 10 without issues.

    Just a guess, but maybe there’s some kind of weight regularization like in A1111.

    I also got the impression that Enhance was just doing basic i2i-style modifications when I tried it in the past.

    I find NovelAI’s prompts, inference techniques, and tools like Vibe Transfer and Director Tools fascinating too—I'd love to replicate them locally, so I often wish they shared more papers or technical details.

    NovelAI v4.5’s rendering is impressive. I suspect the U-Net + 16ch VAE setup offers both flexibility and detailed structure.

    True, since NovelAI has tweaked the VAE before, they might have done something similar in v4.5.

    The unpredictability of NovelAI v4.5 likely comes from the diversity of its large-scale training.

    Local models like Pony or Illustrious often rely on biased merges or style LoRAs instead of base models, which makes them easier to control.

    With closed models, you can’t adjust them the same way, so you need to learn their quirks.

    It’s a bit of a shame—if we could use them locally, they’d probably be even better.

    Also,Models like SD1.5 and SDXL seem to handle rough prompts well, which might be why some prefer NovelAI v3 over v4 or v4.5.

    T5 or LLM-based models often need more precise prompts unless heavily biased like Flux.

    Some models, not just NovelAI v4.5, are underrated simply because they don't perform well with simple prompts.

    That said, without some guidance, it’s hard to craft good prompts. If NovelAI shared more about prompt building, it might help users avoid guesswork. Even powerful models lose their potential if kept closed, so listening to user feedback could really help.

    Also,NovelAI is a closed model, but it started it all—and we usually follow in their footsteps. So rather than rejecting it, I think it’s valuable for the community to experiment, share insights, and find ways to replicate it locally. That was especially true back in the SD1.5 era with the leaks.

    Neta-Lumina is promising—its potential may even surpass NovelAI v4.5.

    Every architecture is a rough gem that needs refining, but that takes community effort.

    Sadly, many get ignored early and forgotten in favor of what's already proven.

    If people explore Neta-Lumina’s possibilities, even in small ways, it could become something great.

    Here’s a bit of a side note, but my take on NovelAI:

    Every version of NovelAI is highly refined—their fine-tuning is on another level.

    They train only the U-Net until tag recognition is near perfect, without losing flexibility.

    Their dataset creation and training methods are highly refined.

    Even with great new base architectures, it's extremely hard for the community to surpass NovelAI through fine-tuning alone.

    NovelAI was already impressive from v1, and v2 showed how far a small model like SD1.5 could go.

    With v3, they explored optimal SDXL training early on—some had noticed the need for VPred-like methods, but the community largely ignored it until NovelAI revealed details.

    I was deeply impressed by the NovelAI_v3 paper.

    It used many improvements that a few had long advocated, showing they recognized their value early on—and proved their effectiveness through the model's fine-tuning quality.

    VPred has only recently become widely recognized and practical.

    With SDXL, unless you're using true VPred models like CosXL, Playground v2.5, or Terminus-XL, training for VPred can be tough—but projects like NoobAI are doing great.

    I find much to agree with in the new architecture choices of v4 and v4.5.

    Honestly, I like NovelAI v4.5’s architecture—it’s close to my ideal setup.

    U-Net + T5 feels like a solid evolution, improving prompt understanding while keeping familiarity.

    It’s more stable than switching to DiT, and personally, I still get better results from U-Net. The 16ch VAE is also a big plus.

    I saw that as a solid and proper direction for evolution. Kolors, with its U-Net + LLM setup, seemed like a worthy SDXL successor, but it was sadly overlooked—dismissed due to licensing and its 4ch VAE. I still see that as a missed opportunity.

    That's why I feel NovelAI consistently trains each of their models in a near-optimal way, and I truly respect that.

    Many fine-tuned local models now produce high-quality images, but I often question whether their training, hacks, or base models were truly optimal.

    but,Models like Neta Lumina, Pony v7, and Chroma—trained with LLMs or T5—seem like cleaner, more solid starting points with fewer fundamental flaws, so I have high hopes for them.

    LovelaceAJul 27, 2025· 2 reactions

    hjhf Thank you for the long reply. Glad to see we share many similar thoughts.

    I forget to mention something NovelaiV4.5 impressed me. Fullbody/very wide angle shot. For all the local model I have used, even for flux, details and quality drop quickly when you zoom out. I get it, AI has less pixels to render/guess details. But Novelai v4.5 significantly improved this. Even with limited pixel the shape and details can be quite accuratedly rendered.

    Novelai's closed sources business model definitely provide them with much more funding to experiment stuffs. While I also see the potentional from new architecture that Netalumina/Chroma uses, in the long run the gap between open source and closed source model may still get widen, due to the funding perspective. Professional teams, try and error, training equippment setup, dataset prep......All need a lot of money.

    Also direction and sunk cost matters. I was kinda hoping the Illustrious dev team bring in new architecture but they seems stucked. Ponyv7.....I heard it use new base model Auraflow but it is kinda niche......Not sure about community supports. Chroma is training slowly and I kinda want to see what it can become in the end. Netalumina is the youngest but has the greatest potentional I agree, because it is the youngest! Many rooms for imagination. I still see the path to catch up to closed source model challenging given the huge difference on resources.

    I really hope Novelai team share more on their development, or provide native comfyui integration, provide controlnet like tile and canny, higher subscription tiers that can generate at higher resolution......But all looks quite far away. Their most recent dev blog is almost a year ago, and in discord their reaction to feedbacks and sugguestions are quite slow, if not nothing. I get it, it is a company for profit and they have their own priorities.

    I really appreciate the communities contribution to make open sources model/merge that can be used by all of us. But community-produced stuff also has limits, like you mentioned, models after merge/mix tend to be very bias to better react to simple prompt. Also there are millions of merge/mix but people barely talk about details on unet, clip and vae, I dont know that much aboout technicals but they are also quite important. Another one is that, communities' feedback is usually quite vague (I like & dont like the aesthetic/problems due to own setting issues), and lean toward NSFW side. Professional/technical feedback even sustainable funding are hard to find from community.

    Merge or even finetune, from what I have used along the way, it is like adding seasoning to cooked dishes, or cook them one more time. If well executed it can still provide good aesthetic value but it just wont change the basic (architecture). But I do like stuff like community trained controlnet and speed up lora.

    Anyway I dont want to make this a long essay. Guess just need to wait and see the bright future ahead!

    hjhf
    Author
    Jul 27, 2025

    LovelaceA 

    Thanks for the reply. I also hope for a bright future for local models.

    I agree with your analogy—most people just mix or season pre-cooked dishes.

    Many enjoy consuming, but few are interested in actually cooking.

    I hope more people get into training, more architectures get explored, and we build a more diverse community with more model choices.

    Right now, the focus tends to be too narrow, and new possibilities often get ignored—it’d be great if we could explore them together.

    It's interesting how NovelAI v4.5 renders shapes and details accurately even at a distance.

    I don’t know their exact architecture, and they likely don’t use the methods I’m thinking of, but as an example, models like Cascade use stepwise latent upscaling.

    Cascade also has UltraPixel, which adds another latent upscaling step to improve detail—making images sharper even at the same resolution.

    So enhancing latent information, not just output resolution, could also be a factor.

    Thanks for sharing the article—it was really interesting.

    You're right, NovelAI explored a different architecture on their own.

    As I mentioned, I see v4.5’s U-Net + T5 + 16ch VAE as a solid evolution and close to the best current architecture.

    The T5 article was interesting, especially since it offered a different view from mine.

    Some say T5 struggles with NSFW due to censorship, but I haven’t found that to be true.

    Even if there is some bias, it hasn’t affected training in my experience.

    I fine-tuned PixArt-Sigma, which uses T5 only, and never had trouble with NSFW learning.

    It’s a strong architecture that learns concepts very well—I was able to train almost all concepts in my dataset without issue, and others have too.

    I honestly have doubts about TE training.

    Even models like SD1.5 or SDXL with CLIP don’t really need it.

    It can speed up concept learning as a hack, but it’s a double-edged sword with side effects.

    That trade-off might be acceptable for personal, single-concept LoRA, but it's too risky for large-scale fine-tuning.

    If early learning of unknown concepts is essential, combining it with large-scale textual inversion like Pivotal Tuning makes more sense.

    Chroma and Pony_v7 don't use TE training either, and it hasn’t been an issue.

    Even Flux LoRAs often cover NSFW concepts and are popular.

    As a side note, lumina image 2.0 uses Gemma as TE, which is a censored LLM, but I believe it only uses intermediate outputs, so it’s fine.

    NovelAI has never fine-tuned TE, and likely wouldn’t do something that irregular.

    If T5 had serious issues, they'd likely switch to another LLM as the TE.

    If I'm wrong, well... that's a bit embarrassing.

    Sorry for the long message again, but I really enjoyed this meaningful exchange with you.

    It was a great chance to organize my thoughts—thank you.

    LovelaceAJul 27, 2025· 1 reaction

    hjhf Greatly appreciate the feedback. Yes I also enjoy the exchange of info. I rarely have chances to talk these with others.

    It is quite "dissatisfied" to play aorund with Novelai's V4.5 now, as I know it is the best anime model at the moment, but keep have the thoughts "what if it is a open sourced model and I am sure it can be super powerful". Getting quite impatient especially knowing there is nothing I can help. No way individuals can supply adequate amount of knowledge or funding.

    I guess best is yet to come!

    hjhf
    Author
    Jul 27, 2025

    I understand the frustration—it’s natural to feel that way seeing how impressive their model is.

    But in a way, I’m also grateful. They’ve always shown us the direction to aim for, and that can be encouraging.

    I felt despair when SDXL was released, due to its massive parameters, the heavy load from high resolution, complex architecture, and the difficulty of training.

    However, NovelAI v3 showcased such incredible quality that many people, including myself, felt it was worth pursuing the architecture.

    As a result, the community took on large-scale training and grew to the point where it could create models that come close.

    I believe v4.5 plays a similar role now—it’s something to strive toward. Even if we can’t match it directly, we can look for different paths, like video models, or new base architectures.

    So I try to enjoy what I can do now, while staying excited about the future. Let’s both keep exploring and having fun with what’s ahead.Even if what I can do is small and limited, none of it is ever wasted—every experience and bit of knowledge becomes a valuable asset for the future.

    DoNotSayMyNameAug 6, 2025· 1 reaction
    CivitAI

    Which VAE should I use when generating on this model?
    Tried without VAE - the result is very far from the screenshots

    hjhf
    Author
    Aug 6, 2025· 4 reactions

    This model already has a built-in VAE for NovelAI v2, so there’s no need to specify a different one.

    I’m not sure what you mean by “very different,” but if the colors look more vibrant, that’s likely the intended result.In my experience, the NovelAI v2 VAE has better color than v1 and is closer to the look of vae-ft-mse-840000-ema. If you noticed a similar feel, the correct VAE is likely applied.

    If you want results closer to my sample images, each one includes the prompt and workflow, so feel free to use them as a reference.

    I also shared a clean, general-purpose workflow with the model.

    There’s nothing particularly special, so you should be able to reproduce it even with WebUI.

    Some images in the gallery were adjusted using DoRAs or embeddings listed below—those might help too.

    https://civitai.com/models/1253884?modelVersionId=2070815

    https://civitai.com/models/1809022?modelVersionId=2047219

    https://civitai.com/models/1809022?modelVersionId=2061380

    If you have any other questions or issues, feel free to ask!

    If you're open to sharing your image in the gallery, I could use it as a reference and try generating something similar. It might help identify if a more tailored prompt is needed for good results with this model, and I’d be happy to test and share my findings.

    allen00191Aug 10, 2025
    CivitAI

    Is the new novelai v4.5 yet?

    TresmNov 24, 2025

    Yes, but not in open source code.

    ZuiXunZhenLiApr 5, 2026
    CivitAI

    大佬能发布NovelAI V4.5的版本吗!!!!!!

    Checkpoint
    SD 1.5
    by hjhf

    Details

    Downloads
    1,471
    Platform
    CivitAI
    Platform Status
    Available
    Created
    7/13/2025
    Updated
    5/21/2026
    Deleted
    -

    Files

    novelaiDiffusionV2_novelaiV2.safetensors

    novelaiDiffusionV2_novelaiV2_trainingData.zip

    novelaiDiffusionV2_novelaiV2.safetensors

    novelaiDiffusionV2_novelaiV2_trainingData.zip

    Available On (2 platforms)

    Same model published on other platforms. May have additional downloads or version variants.