RouWei - 0.6 vpred

NSFW

In depth retraining of Illustrious to achieve best prompt adherence, knowledge and state of the art performance.

Big dreams come true

The version number is just an index of current final release, not a fraction of the planned training.

HF repo

Large scale finetune using gpu cluster with a dataset of ~13M pictures (~4M with natural text captions)

Fresh and wast knowledge about characters, concepts, styles, cultural and related things
The best prompt adherence among SDXL anime models at the moment of release
Solved main problems with tags bleeding and biases, common for Illustrious, NoobAi and other checkpoints
Excellent aesthetics and knowledge across a wide range of styles (over 50,000 artists (examples), including hundreds of unique cherry-picked datasets from private galleries, including those received from the artists themselves)
High flexibility and variety without stability tradeoff
No more annoying watermarks for popular styles thanks to clean dataset
Vibrant colors and smooth gradients without trace of burning, full range even with epsilon
Pure training from Illustrious v0.1 without involving third-party checkpoints, Loras, tweakers, etc.

There are also some issues and changes compared to the previous version, please RTFM.

Dataset cut-off - end of April 2025.

Features and prompting:

Important change:

When you are prompting artist styles, especially mixing several, their tags MUST BE in a separate CLIP chunk. Just add BREAK after it (for A1111 and derivatives), use conditioning concat node (for Comfy) or at least put them in the very end. Otherwise, significant degradation of results is likely.

Basic:

The checkpoint works both with short-simple and long-complex prompts. However, if there are contradictory or weird things - unlike with others they won't be ignored affecting the output. No guide-rails, no safeguards, no lobotomy.

Just prompt what you want to see and don't prompt what shouldn't be on the picture. If you want to have a view from above - don't put ceiling into positive, if you want to have crop view with head out of frame - don't make detailed description of character facial features, and so on. Pretty simple but sometimes people are missing it.

Version 0.8 comes with advanced understanding of natural text prompts. It doesn't mean that you are obligated to use it, tags only - completely fine, especially because understanding of tags combinations is also improved.

Do not expect it to perform like Flux or other models based on T5 or LLM text encoders. The whole size ot SDXL checkpoint is less then only that text encoder, in addition illustrious-v0.1 which is used as the base completely forgot a lot of general things from vanilla sdxl-base.

However, even in current state it works much better, allows to do new things usually impossible without external guidance, as well making manual editing, inpainting, etc more convenient.

To achieve best performance you should keep track of CLIP chunks. In SDXL the prompt is separated into a chunks of 75 (77 including BOS and EOS) tokens, that are processing by CLIP separately, and only then are concatinating and comes as conditions to unet.

If you want to specify some features for character/object and separate them from other prompt parts - make sure they are in the same chunk and optionally separate it with BREAK. It will not solve problem of traits mixing completely, but can reduce it improving overall understanding, since text encoders on RouWei are able to process the whole sequence, not individual concepts better then others.

Dataset contains only booru-style tags and natural text expressions. Despite having a share of furries, real life photos, western media, etc. all captions have been converted to classic booru style to avoid a number of problems from mixing of different systems. So e621 tags won't be understanded properly.

Sampling parameters:

~1 megapixel for txt2img, any AR with resolution multiple of 32 (1024x1024, 1056x, 1152x, 1216x832,...). Euler_a, 20..28steps.
CFG: for epsilon version 4..9 (7 is best), for vpred version, 3..5
Sigmas multiply may improve results a bit, CFG++ samplers work fine. LCM/PCM/DMD/... and exotic samplers untested.
Some schedulers doesn't work well.
Highresfix - x1.5 latent + denoise 0.6 or any gan + denoise 0.3..0.55.
For vpred version lower CFG 3..5 is needed!

For vpred version lower CFG 3..5 is needed!

Quality classification:

Only 4 quality tags:

masterpiece, best quality

for positive and

low quality, worst quality

for negative.

Nothing else. Actually you can even omit positive and reduce negative to low quality only, since they can affect basic style and composition.

Meta tags like lowres have been removed and don't work, better not to use them. Low resolution images have been either removed or upscaled and cleaned with DAT depending on their importance.

Negative prompt:

worst quality, low quality, watermark

That's all, no need of "rusty trombone", "farting on prey" and others. Do not put tags like greyscale, monochrome in negative unless you understand what are you doing. Extra tags for brightness/colors/contrast section below can be used

Artist styles:

Grids with examples, list/wildcard (also can be found in "training data").

Used with "by " it's mandatory. It will not work properly without it.

"by " is a meta-token for styles to avoid mixing/misinterpret with tags/characters of similar or close name. This allows to have a better results for styles and at the same time avoid random style fluctuation that you may observe in other checkpoints.

Multiple give very interesting results, can be controlled with prompt weights and spells.

YOU MUST ADD `BREAK` after artists/style tags (for A1111) or concat conditioning (for Comfy) or put them in the very end of your prompt.

For example:

by kantoku, by wlop, best quality, masterpiece BREAK 1girl, ...

General styles:

2.5d, anime screencap, bold line, sketch, cgi, digital painting, flat colors, smooth shading, minimalistic, ink style, oil style, pastel style

Booru tags styles:

1950s (style), 1960s (style), 1970s (style), 1980s (style), 1990s (style), 2000s (style), animification, art nouveau, pinup (style), toon (style), western comics (style), nihonga, shikishi, minimalism, fine art parody

and everything from this group.

Can be used in combinations (with artists too), with weights, both in positive and negative prompts.

Characters:

Use full name booru tag and proper formatting, like karin_(blue_archive) -> karin \(blue archive\), use skin tags for better reproducing, like karin \(bunny\) \(blue archive\). Autocomplete extension might be very useful.

Most characters are recognized just by their booru tag, but it will be more accurate if you describe their basic traits. Here you can easily redress your waifu/husbendo just by the prompt without suffering from the typical leaks of basic features.

Natural text:

Use it in combination with booru tags, works great. Use only natural text after typing styles and quality tags. Use just booru tags and forget about it, it's all up to you. To get best performance keep track if CLIP 75 tokens chunks.

About 4M of images in dataset had hybrid natural-text captions, made by Claude, GPT, Gemini, ToriiGate, then refactored, cleaned and combined with tags in different variations for augmentation.

Unlike typical captions, these contains character names which is very useful. Better to keep it clean, short and convenient description works best. Better not use long and sloppy BS like

A mysteriously enchanting feminine entity of indeterminate yet youthful essence, whose celestial visage radiates with the ethereal luminescence of a thousand dying stars, blessed with locks cascading like the golden rivers of ancient mythology, perhaps styled in a manner reminiscent of contemporary fashion trends though not necessarily adhering to any specific aesthetic paradigm. Her eyes, pools of unfathomable depth and hue, sparkle with the wisdom of millennia yet maintain an innocent quality that defies temporal constraints...

For captioning you can use ToriiGate in short mode.

And don't expect it to be as good as flux and others, it tries very hard and after several rolls usually you can get what you want, but it is not that stable and detailed.

Oh yeah

tail censor, holding own tail, hugging own tail, holding another's tail, tail grab, tail raised, tail down, ears down, hand on own ear, tail around own leg, tail around penis, tailjob, tail through clothes, tail under clothes, lifted by tail, tail biting, tail penetration (including a specific indication of vaginal/anal), tail masturbation, holding with tail, panties on tail, bra on tail, tail focus, presenting own tail...

(booru meaning, not e621) and many others with natural text. The majority works perfectly, some requires a lot of rolling.

Brightness/colors/contrast:

You can use extra meta tags to control it:

low brightness, high brightness, low saturation, high saturation, low gamma, high gamma, sharp colors, soft colors, hdr, sdr

Example

They work both in epsilon and vpred version and works really good.

Epsilon version relies on them too much. Without low brightness or low gamma or limited range (in negative) it might be difficult to achieve true 0,0,0 black, the same often true for white.

Both epsilon and vpred versions have like true zsnr, full range of colors and brightness without common flaws observed. But they behaves differently, just try it.

Vpred version

Main thing you need to know - lower your CFG from 7 down to 5 (or less). Otherwise, the use is similar with advantages.

It seems that starting from v0.7 vpred works flawlessly now. It shouldn't suffer from ignorance of tags close to the 75tokens chunk borders like nai. It is more difficult to get burned images - even on cfg7 usually it just over-saturated but with smooth gradients, which can be useful for some styles. Yes it can make anything from (0,0,0) to (255,255,255). You will find brightness meta tags described above quite useful for easier/lazy prompting, natural text expressions also work. To get the most dark image - put high brightness into negative and/or use low brightness, low gamma tags. If you don't like very bright skin on dark background and want to reduce contrast (or on the contrary, enhance the effect) - use hdr/sdr in negative/positive.

It was reported that in rare cases on some prompts there is a drop in contrast. Looks like other vpred models have same behaviour with such prompts, adding a "separator" closer to the border of the 75-token chunk fixes this. However, with 0.7 I haven't encountered this myself.

To launch vpred version you will need dev build of A1111, Comfy (with special loader node), Forge or Reforge. Just use same parameters (Euler a, cfg 3..5, 20..28 steps) like epsilon. No need to use Cfg rescale, but you can try it, cfg++ works great.

Base model:

The model here has a small unet polishint after main training to improve small details, bump up resolution and others. Hovewer, you may be also interested into a RouWei-Base, which sometimes can perform better at complex prompts despite having minor mistakes in small details. It also comes in FP32, for example if you want to use fp32 text encoder nodes in Comfy, merge it or finetune.

It can be found in Huggingface repo

Known issues:

Off course there are:

Artists and style tags must be seperated into a different chunk from main prompt or come very last
There may be some positional or combinational bias in rare cases, but it's not yet clear.
There are some complaints about few of the general styles.
Epsilon version relies too much on brightness meta tags, sometimes you will need to use them to get desired brightness shift
Some newly added styles/characters might be not as good and disctinct as they deserve to
To be discovered

Requests for artists/characters in future models are open. If you find artist/character/concept that perform weak, inaccurate or has strong watermark - please report, will add them explicitly. Follow for a new versions.

JOIN THE DISCORD SERVER

License:

Same as illustrious. Fell free to use in your merges, finetunes, ets. but please leave a link or mention, it is mandatory

How it's made

I'll consider to make a report or something like it later. For sure.

In short, 98% of work is related to dataset preparations. Instead of blindly relying on loss-weighting based on tag frequency from nai paper, a custom guided loss-weighting implementation along with asynchronous collator for balancing have been used. Ztsnr (or close to it) with Epsilon prediction was achieved using noise scheduler augmentation.

Spent compute - over 8k hours of H100 (apart from research and fail attempts)

Thanks:

First of all I'd like to acknowledge everyone who supports open source, develops in improves code. Thanks to the authors of illustrious for releasing model, thank to NoobAI team for being pioneers in open finetuning of such a scale, sharing experience, raising and solving issues that previously went unnoticed.

Personal:

Artists wish to remain anonymous for sharing private works; Few anonymous persons - donations, code, captions, etc., Soviet Cat - GPU sponsoring; Sv1. - llm access, captioning, code; K. - training code; Bakariso - datasets, testing, advices, insides; NeuroSenko - donations, testing, code; LOL2024 - a lot of unique datasets; T.,[] - datasets, testing, advises; rred, dga, Fi., ello - donations; TekeshiX - datasets. And other fellow brothers that helped. Love you so much ❤️.

And off course everyone who made feedback and requests, it's really valuable.

If I forgot to mention anyone, please notify.

Donations

If you want to support - share my models, leave feedback, make a cute picture with kemonomimi-girl. And of course, support original artists.

AI is my hobby, I'm spending money on it and not begging for donations. However, it has turned into a large-scale and expensive undertaking. Consider to support to accelerate new training and researches.

(Just keep in mind that I can waste it on alcohol or cosplay girls)

BTC: bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c

ETH/USDT(e): 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db

XMR: 47F7JAyKP8tMBtzwxpoZsUVB8wzg2VrbtDKBice9FAS1FikbHEXXPof4PAb42CQ5ch8p8Hs4RvJuzPHDtaVSdQzD6ZbA5TZ

if you can offer gpu-time (a100+) - PM.

Description

Vpred experimental version

FAQ

Comments (30)

DraconicDragonNov 15, 2024· 1 reaction

CivitAI

Much fluffy, me likey >-<

127912Nov 15, 2024· 9 reactions

CivitAI

Very promising. I hope you focus efforts on vpred if future versions are planned, the eps examples are much less flattering.

plan_trusterNov 15, 2024· 6 reactions

CivitAI

MinthyBASED

KorinthNov 15, 2024· 1 reaction

CivitAI

Works on my machine. I probably should find a set of artist tags specifically for this model but I can't complain. Not blurry/smudgy like certain other models right off the bat.

2329252Nov 16, 2024· 1 reaction

From the about section (artist tags):
https://mega.nz/folder/ATYVQbKI#JZOo3_alb9NhZPaTsIVv7g

KorinthNov 16, 2024

@mewtsy Oh, yeah I saw this. I meant more like a mix I really like as opposed to a list of which it was trained on, my bad for the confusion lol

Q_7Nov 15, 2024

CivitAI

why did you train "by"?

Minthybasis

Author

Nov 15, 2024

It allows to make a good separation of tag/character from artist even if they have same name or parts (which is quite often) and avoid unexpected style shifts and biases.

KorewaaiNov 16, 2024

CivitAI

1. Was TE frozen?
2. How many epochs?

Minthybasis

Author

Nov 16, 2024

1. It was trained

2 Well, not sure if it is applicable because for very popular things like "Hatsune Miku" there were only 4 repeats of images and for rare ones - over 20, in addition to loss scaling. In average less then 10.

fhaifhaiNov 16, 2024· 3 reactions

CivitAI

what an awesome checkpoint, I love it! one question though if you don't mind: what usage of commas do you suggest? separating each token with them or just using them targeted to divide the prompt into topical groups?

Minthybasis

Author

Nov 16, 2024· 1 reaction

Oh that's a question. Basically try stick to default booru tags separating them with comma, it will work best. And then replace some or add extra natural text phrases if needed, without commas.

"1girl, standing, looking at viewer, leg up" - default and good. "1girl standing looking at viewer leg up" - not. "Cute girl posing in front of camera lifting her leg" - fine.

fhaifhaiNov 16, 2024

@Minthybasis thanks for replying, that is helpful. do I use underscores for the booru tags or not? does it matter?

Minthybasis

Author

Nov 16, 2024· 1 reaction

@fhaifhai No, trained without underscores, replace them with spaces for best results.

blackfuture82729Nov 17, 2024· 7 reactions

CivitAI

Holy moly, you just keep these awesome models coming, for FREE!!! You're the best, truly the best.

blackfuture82729Nov 17, 2024· 1 reaction

@Minthybasis Just an observation, this model, and 4th tail, suffer from degraded quality when using either any of the "eyeshadow"/"eyeliner"/"x lips" and their derivatives, something that gets fixed on your merges, maybe you could test it and improve it for the next iterations? The rest, exquisite, hair on horn works better, but needs more strength still (heavyly biased towards certain hair colors), same for long bangs/long hair between eyes.

nonezxcNov 20, 2024· 3 reactions

CivitAI

Great model

Rating_AgentNov 20, 2024· 4 reactions

CivitAI

The best model!!

2710333Nov 21, 2024· 3 reactions

CivitAI

My first ever really used illustration model and one of my favorit models ever. Its so simple yet so detailed and accurate and high quality. I hope you will make another one thats even better. just the tail censor and all that stuff is so cool. The diffrent art styles, not to many and not to little. Its a small but great selection. Please make a 1.0.

Good stuff. Great potential.

Edit: after using it more and reading the description again i noticed that you mentioned booru. I looked it up and my quality and went straight up. (im kinda new to this scene)

If you make a 1.0 please make it as similar as possible with the tagging.

4750586Nov 21, 2024· 1 reaction

CivitAI

My first exp in illust and that is magnificent!
But can u tell me: does this model need any VAE or not?

Minthybasis

Author

Nov 22, 2024· 1 reaction

Fp16 fix vae is included, or you can use whatever you like.

TOF_enjoyerNov 22, 2024· 1 reaction

CivitAI

since the release of illus I waited for you to make a finetune of it because your finetunes were always awesome, while this finetune was amazing for illust, I just dislike the model coloring, no matter what it just give armature coloring

Minthybasis

Author

Nov 22, 2024

Can you please post an example? Btw a better colors is expected in new vpred version along with few other fixes, will be in few days.

TOF_enjoyerNov 23, 2024· 1 reaction

@Minthybasis https://files.catbox.moe/qppmej.png
there is nothing wrong with the image it self but it is just the colors and brightness no matter what I did they are either too bright or too dark, it is just a matter of preference, nothing wrong with the model itself

justAlizardNov 25, 2024· 2 reactions

Hello @TOF_enjoyer !

I believe I may be able to answer this question since I was having the same issue before I figured out a solution.

Currently, weights for those brightness and gamma prompts do not work. Instead, what you need to do in order to control their intensity is by using prompt editing.

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-editing

I took a look at your image and noticed you were using low brightness in your negative prompt. If you simply replace that with [low brightness:8] instead I believe you will have a much more satisfying result.

You may already know this but essentially how it reads is: "Start low brightness at sampling step 8". (which in your case would be about a third of your steps.)

Increasing the number will darken the image and vice versa, you can control gamma in similar fashion.

Forgot to mention, a bit of an annoyance occurs when using hires fix with these prompts, since they will be applied again, meaning your image will either get darker or brighter. You can get around this by generating your image with the prompt but then removing the prompt before applying a hires fix.

TOF_enjoyerNov 30, 2024

@justAlizard thanks, I will try to play with this again

TOF_enjoyerNov 30, 2024

@justAlizard ok, I played around a bit, there was an improvement but not satisfying but your method inspired me to use 2 models instead of 1, so I generate the image with RouWei and the inpaint, upscale using 4th tail, this gave me the strong composition of Wei and the beautiful colors of 4th tails, I might play with it more, but I think it is batter to wait for the model to get a little bit mature, even 4th tail version 0.5, I had the same opinion about pony as I have about noobAI models

DevilSShadoWDec 3, 2024

@justAlizard thanks for this, it was driving me mad. Hope future versions fix this.

Dewal76Nov 24, 2024· 3 reactions

CivitAI

I see that you are a fellow fox girl enjoyer based on the model and images you made. Who is your favorite fox girl character?

Minthybasis

Author

Nov 25, 2024· 3 reactions

They are all so cute, it's too hard to choose!

Well, top tier waifus: Shiro (Senko), Senko, Tamamo (Fate), Wakamo (BA), Sussurro, Suzuran.

Checkpoint

Illustrious

by Minthybasis

Download (Beta) View on CivitAI

base model

anime

Details

Downloads

556

Platform

CivitAI

Platform Status

Available

Created

11/15/2024

Updated

6/28/2026

Deleted

Files

rouwei_06Vpred.safetensors

Size:

6.46 GB

SHA256:

84c8888446ca93b6b8c204bb558a9f16138138a1cd0f4fdc264c18fe4c006163

Mirrors

HuggingFace (5 mirrors)

rouwei_06Vpred.safetensors

CivitAI (1 mirrors)

rouwei_06Vpred.safetensors

Available On (1 platform)

Same model published on other platforms. May have additional downloads or version variants.

SeaArt

RouWei - 0.6 vpred

In depth retraining of Illustrious to achieve best prompt adherence, knowledge and state of the art performance.

Large scale finetune using gpu cluster with a dataset of ~13M pictures (~4M with natural text captions)

There are also some issues and changes compared to the previous version, please RTFM.

Features and prompting:

Important change:

Basic:

Sampling parameters:

For vpred version lower CFG 3..5 is needed!

Quality classification:

Negative prompt:

Artist styles:

Used with "by " it's mandatory. It will not work properly without it.

YOU MUST ADD BREAK after artists/style tags (for A1111) or concat conditioning (for Comfy) or put them in the very end of your prompt.

General styles:

Booru tags styles:

Characters:

Natural text:

Lots of Tail/Ears-related concepts:

Brightness/colors/contrast:

Vpred version

Base model:

Known issues:

Requests for artists/characters in future models are open. If you find artist/character/concept that perform weak, inaccurate or has strong watermark - please report, will add them explicitly. Follow for a new versions.

JOIN THE DISCORD SERVER

License:

How it's made

Thanks:

Personal:

Donations

Description

FAQ

What is RouWei?

How do I use RouWei?

What should I watch out for with Illustrious models?

What other Illustrious-based models are worth knowing?

Can I use this model commercially?

What files are available and where can I download them?

Comments (30)

Details

Files

rouwei_06Vpred.safetensors

Mirrors

Available On (1 platform)

YOU MUST ADD `BREAK` after artists/style tags (for A1111) or concat conditioning (for Comfy) or put them in the very end of your prompt.