CivArchive
    Preview 81195856
    Preview 81195858
    Preview 81195847
    Preview 81195842
    Preview 81195853
    Preview 81195852
    Preview 81195861
    Preview 81195854
    Preview 81195857
    Preview 81195846
    Preview 81195843
    Preview 81195860
    Preview 81195845
    Preview 81195849
    Preview 81195859
    Preview 81195848
    Preview 81195844
    Preview 81195850
    Preview 81195855
    Preview 81195851

    In depth retraining of Illustrious to achieve best prompt adherence, knowledge and state of the art performance.

    Big dreams come true

    The version number is just an index of current final release, not a fraction of the planned training.

    HF repo

    Large scale finetune using gpu cluster with a dataset of ~13M pictures (~4M with natural text captions)

    • Fresh and wast knowledge about characters, concepts, styles, cultural and related things

    • The best prompt adherence among SDXL anime models at the moment of release

    • Solved main problems with tags bleeding and biases, common for Illustrious, NoobAi and other checkpoints

    • Excellent aesthetics and knowledge across a wide range of styles (over 50,000 artists (examples), including hundreds of unique cherry-picked datasets from private galleries, including those received from the artists themselves)

    • High flexibility and variety without stability tradeoff

    • No more annoying watermarks for popular styles thanks to clean dataset

    • Vibrant colors and smooth gradients without trace of burning, full range even with epsilon

    • Pure training from Illustrious v0.1 without involving third-party checkpoints, Loras, tweakers, etc.

    There are also some issues and changes compared to the previous version, please RTFM.

    Dataset cut-off - end of April 2025.

    Features and prompting:

    Important change:

    When you are prompting artist styles, especially mixing several, their tags MUST BE in a separate CLIP chunk. Just add BREAK after it (for A1111 and derivatives), use conditioning concat node (for Comfy) or at least put them in the very end. Otherwise, significant degradation of results is likely.

    Basic:

    The checkpoint works both with short-simple and long-complex prompts. However, if there are contradictory or weird things - unlike with others they won't be ignored affecting the output. No guide-rails, no safeguards, no lobotomy.

    Just prompt what you want to see and don't prompt what shouldn't be on the picture. If you want to have a view from above - don't put ceiling into positive, if you want to have crop view with head out of frame - don't make detailed description of character facial features, and so on. Pretty simple but sometimes people are missing it.

    Version 0.8 comes with advanced understanding of natural text prompts. It doesn't mean that you are obligated to use it, tags only - completely fine, especially because understanding of tags combinations is also improved.

    Do not expect it to perform like Flux or other models based on T5 or LLM text encoders. The whole size ot SDXL checkpoint is less then only that text encoder, in addition illustrious-v0.1 which is used as the base completely forgot a lot of general things from vanilla sdxl-base.

    However, even in current state it works much better, allows to do new things usually impossible without external guidance, as well making manual editing, inpainting, etc more convenient.

    To achieve best performance you should keep track of CLIP chunks. In SDXL the prompt is separated into a chunks of 75 (77 including BOS and EOS) tokens, that are processing by CLIP separately, and only then are concatinating and comes as conditions to unet.

    If you want to specify some features for character/object and separate them from other prompt parts - make sure they are in the same chunk and optionally separate it with BREAK. It will not solve problem of traits mixing completely, but can reduce it improving overall understanding, since text encoders on RouWei are able to process the whole sequence, not individual concepts better then others.

    Dataset contains only booru-style tags and natural text expressions. Despite having a share of furries, real life photos, western media, etc. all captions have been converted to classic booru style to avoid a number of problems from mixing of different systems. So e621 tags won't be understanded properly.

    Sampling parameters:

    • ~1 megapixel for txt2img, any AR with resolution multiple of 32 (1024x1024, 1056x, 1152x, 1216x832,...). Euler_a, 20..28steps.

    • CFG: for epsilon version 4..9 (7 is best), for vpred version, 3..5

    • Sigmas multiply may improve results a bit, CFG++ samplers work fine. LCM/PCM/DMD/... and exotic samplers untested.

    • Some schedulers doesn't work well.

    • Highresfix - x1.5 latent + denoise 0.6 or any gan + denoise 0.3..0.55.

    • For vpred version lower CFG 3..5 is needed!

    For vpred version lower CFG 3..5 is needed!

    Quality classification:

    Only 4 quality tags:

    masterpiece, best quality

    for positive and

    low quality, worst quality

    for negative.

    Nothing else. Actually you can even omit positive and reduce negative to low quality only, since they can affect basic style and composition.

    Meta tags like lowres have been removed and don't work, better not to use them. Low resolution images have been either removed or upscaled and cleaned with DAT depending on their importance.

    Negative prompt:

    worst quality, low quality, watermark

    That's all, no need of "rusty trombone", "farting on prey" and others. Do not put tags like greyscale, monochrome in negative unless you understand what are you doing. Extra tags for brightness/colors/contrast section below can be used

    Artist styles:

    Grids with examples, list/wildcard (also can be found in "training data").

    Used with "by " it's mandatory. It will not work properly without it.

    "by " is a meta-token for styles to avoid mixing/misinterpret with tags/characters of similar or close name. This allows to have a better results for styles and at the same time avoid random style fluctuation that you may observe in other checkpoints.

    Multiple give very interesting results, can be controlled with prompt weights and spells.

    YOU MUST ADD BREAK after artists/style tags (for A1111) or concat conditioning (for Comfy) or put them in the very end of your prompt.

    For example:

    by kantoku, by wlop, best quality, masterpiece BREAK 1girl, ...

    General styles:

    2.5d, anime screencap, bold line, sketch, cgi, digital painting, flat colors, smooth shading, minimalistic, ink style, oil style, pastel style

    Booru tags styles:

    1950s (style), 1960s (style), 1970s (style), 1980s (style), 1990s (style), 2000s (style), animification, art nouveau, pinup (style), toon (style), western comics (style), nihonga, shikishi, minimalism, fine art parody

    and everything from this group.

    Can be used in combinations (with artists too), with weights, both in positive and negative prompts.

    Characters:

    Use full name booru tag and proper formatting, like karin_(blue_archive) -> karin \(blue archive\), use skin tags for better reproducing, like karin \(bunny\) \(blue archive\). Autocomplete extension might be very useful.

    Most characters are recognized just by their booru tag, but it will be more accurate if you describe their basic traits. Here you can easily redress your waifu/husbendo just by the prompt without suffering from the typical leaks of basic features.

    Natural text:

    Use it in combination with booru tags, works great. Use only natural text after typing styles and quality tags. Use just booru tags and forget about it, it's all up to you. To get best performance keep track if CLIP 75 tokens chunks.

    About 4M of images in dataset had hybrid natural-text captions, made by Claude, GPT, Gemini, ToriiGate, then refactored, cleaned and combined with tags in different variations for augmentation.

    Unlike typical captions, these contains character names which is very useful. Better to keep it clean, short and convenient description works best. Better not use long and sloppy BS like

    A mysteriously enchanting feminine entity of indeterminate yet youthful essence, whose celestial visage radiates with the ethereal luminescence of a thousand dying stars, blessed with locks cascading like the golden rivers of ancient mythology, perhaps styled in a manner reminiscent of contemporary fashion trends though not necessarily adhering to any specific aesthetic paradigm. Her eyes, pools of unfathomable depth and hue, sparkle with the wisdom of millennia yet maintain an innocent quality that defies temporal constraints...

    For captioning you can use ToriiGate in short mode.

    And don't expect it to be as good as flux and others, it tries very hard and after several rolls usually you can get what you want, but it is not that stable and detailed.

    Oh yeah

    tail censor, holding own tail, hugging own tail, holding another's tail, tail grab, tail raised, tail down, ears down, hand on own ear, tail around own leg, tail around penis, tailjob, tail through clothes, tail under clothes, lifted by tail, tail biting, tail penetration (including a specific indication of vaginal/anal), tail masturbation, holding with tail, panties on tail, bra on tail, tail focus, presenting own tail...

    (booru meaning, not e621) and many others with natural text. The majority works perfectly, some requires a lot of rolling.

    Brightness/colors/contrast:

    You can use extra meta tags to control it:

    low brightness, high brightness, low saturation, high saturation, low gamma, high gamma, sharp colors, soft colors, hdr, sdr

    Example

    They work both in epsilon and vpred version and works really good.

    Epsilon version relies on them too much. Without low brightness or low gamma or limited range (in negative) it might be difficult to achieve true 0,0,0 black, the same often true for white.

    Both epsilon and vpred versions have like true zsnr, full range of colors and brightness without common flaws observed. But they behaves differently, just try it.

    Vpred version

    Main thing you need to know - lower your CFG from 7 down to 5 (or less). Otherwise, the use is similar with advantages.

    It seems that starting from v0.7 vpred works flawlessly now. It shouldn't suffer from ignorance of tags close to the 75tokens chunk borders like nai. It is more difficult to get burned images - even on cfg7 usually it just over-saturated but with smooth gradients, which can be useful for some styles. Yes it can make anything from (0,0,0) to (255,255,255). You will find brightness meta tags described above quite useful for easier/lazy prompting, natural text expressions also work. To get the most dark image - put high brightness into negative and/or use low brightness, low gamma tags. If you don't like very bright skin on dark background and want to reduce contrast (or on the contrary, enhance the effect) - use hdr/sdr in negative/positive.

    It was reported that in rare cases on some prompts there is a drop in contrast. Looks like other vpred models have same behaviour with such prompts, adding a "separator" closer to the border of the 75-token chunk fixes this. However, with 0.7 I haven't encountered this myself.

    To launch vpred version you will need dev build of A1111, Comfy (with special loader node), Forge or Reforge. Just use same parameters (Euler a, cfg 3..5, 20..28 steps) like epsilon. No need to use Cfg rescale, but you can try it, cfg++ works great.

    Base model:

    The model here has a small unet polishint after main training to improve small details, bump up resolution and others. Hovewer, you may be also interested into a RouWei-Base, which sometimes can perform better at complex prompts despite having minor mistakes in small details. It also comes in FP32, for example if you want to use fp32 text encoder nodes in Comfy, merge it or finetune.

    It can be found in Huggingface repo

    Known issues:

    Off course there are:

    • Artists and style tags must be seperated into a different chunk from main prompt or come very last

    • There may be some positional or combinational bias in rare cases, but it's not yet clear.

    • There are some complaints about few of the general styles.

    • Epsilon version relies too much on brightness meta tags, sometimes you will need to use them to get desired brightness shift

    • Some newly added styles/characters might be not as good and disctinct as they deserve to

    • To be discovered

    Requests for artists/characters in future models are open. If you find artist/character/concept that perform weak, inaccurate or has strong watermark - please report, will add them explicitly. Follow for a new versions.

    JOIN THE DISCORD SERVER

    License:

    Same as illustrious. Fell free to use in your merges, finetunes, ets. but please leave a link or mention, it is mandatory

    How it's made

    I'll consider to make a report or something like it later. For sure.

    In short, 98% of work is related to dataset preparations. Instead of blindly relying on loss-weighting based on tag frequency from nai paper, a custom guided loss-weighting implementation along with asynchronous collator for balancing have been used. Ztsnr (or close to it) with Epsilon prediction was achieved using noise scheduler augmentation.

    Spent compute - over 8k hours of H100 (apart from research and fail attempts)

    Thanks:

    First of all I'd like to acknowledge everyone who supports open source, develops in improves code. Thanks to the authors of illustrious for releasing model, thank to NoobAI team for being pioneers in open finetuning of such a scale, sharing experience, raising and solving issues that previously went unnoticed.

    Personal:

    Artists wish to remain anonymous for sharing private works; Few anonymous persons - donations, code, captions, etc., Soviet Cat - GPU sponsoring; Sv1. - llm access, captioning, code; K. - training code; Bakariso - datasets, testing, advices, insides; NeuroSenko - donations, testing, code; LOL2024 - a lot of unique datasets; T.,[] - datasets, testing, advises; rred, dga, Fi., ello - donations; TekeshiX - datasets. And other fellow brothers that helped. Love you so much ❤️.

    And off course everyone who made feedback and requests, it's really valuable.

    If I forgot to mention anyone, please notify.

    Donations

    If you want to support - share my models, leave feedback, make a cute picture with kemonomimi-girl. And of course, support original artists.

    AI is my hobby, I'm spending money on it and not begging for donations. However, it has turned into a large-scale and expensive undertaking. Consider to support to accelerate new training and researches.

    (Just keep in mind that I can waste it on alcohol or cosplay girls)

    BTC: bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c

    ETH/USDT(e): 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db

    XMR: 47F7JAyKP8tMBtzwxpoZsUVB8wzg2VrbtDKBice9FAS1FikbHEXXPof4PAb42CQ5ch8p8Hs4RvJuzPHDtaVSdQzD6ZbA5TZ

    if you can offer gpu-time (a100+) - PM.

    Description

    Vpred for v0.8

    FAQ

    Comments (113)

    AquaShadesJun 9, 2025· 1 reaction
    CivitAI

    it's so peak ❤️❤️

    reakaakaskyJun 9, 2025· 4 reactions
    CivitAI

    it's so peak x2

    EBIXJun 9, 2025· 1 reaction
    CivitAI

    i have a question which might be dumb. noob vpred is more trained than illust and over all a better base so why not train on top of that model?

    reakaakaskyJun 9, 2025· 4 reactions

    noob v-pred is a good model to use, but not a good model to train anymore. The more a model was trained, the less it can learn.

    Minthybasis
    Author
    Jun 9, 2025· 4 reactions

    That's a good question, not dumb at all.

    The first version of RouWei was started approximately in the same time as Noob, but with a little different goals and approaches. All next versions are the development of previous, since at the moment of training there were no better base model according to specified criteria.

    NoobAi checkpoint has both a number of advantages and serious issues. After a long training, existing knowledge in base will not play significant role, but the inherent biases and problems may only become more pronounced and make training more difficult. Therefore, at this point, choosing another base will not bring any benefits.

    Also, like it was mentioned, noob can be a bit troublesome to train.

    base model serve as a block of clay, the clay need to be pure in substance so it can be mold into anything(meaning it has no biases), and the clay need to be complete and full with volume, otherwise when training subsequence loras/finetunes it has no existed weights/concept to be adjusted from/with cause it is not present in that particular base model

    but a v pred version base model might be interesting. is that possible? a base model that is trained in v pred configuration.

    Minthybasis
    Author
    Jun 10, 2025

    @dfijgklerhjkldghtjykghljg Well there is no 'base' model for vpred, it has been converted from base without extra aesthetic tuning on top of it. So the published one can be considered as the base for vpred.

    Also, some issues in vpred were fixed as much as possible in a relatively small training.

    OneRingJun 9, 2025
    CivitAI

    I wanted to ask one question, did you use not illustrious 0.1, but version 1.0 or 2.0? I noticed that at high resolutions the model starts to produce "a lot of gray" like new versions of illustrious. I'm just curious)

    Minthybasis
    Author
    Jun 9, 2025

    Like discussed previously, the history of checkpoit: Iluustrious-v0.1 -> Rouwei 0.6 -> Rouwei 0.7 -> Rouwei 0.8. Switching to other is tantamount to losing everything that was achieved earlier for the sake of somewhat questionable improvements.

    As for gray - might be related to bugs in text encoder, speaking briefly - can be solved by moving some tags into other 77-tokens chunk. Haven't seen this in new illustrious versions, but that's interesting. Does it only happen when increasing the resolution or with some prompts?

    OneRingJun 9, 2025

    @Minthybasis only with increasing resolution. I experimented with prompts and in principle with some settings reforge, forge and even standard automatic1111 at high resolutions (without upscaling) a pronounced "soap picture and grayness" begins, I noticed this in illustratious 1.0, 1.1, and 2.0, in 0.1 there was no such thing, so I thought that the original model had changed, but in your model everything does not go into grayness so much, but there is a slight tendency, perhaps this will not affect training LORAs, I have not tested it yet.

    Minthybasis
    Author
    Jun 10, 2025

    @OneRing Please write if there will be issues with lora, this things needs to be investigated.

    OneRingJun 17, 2025

    @Minthybasis trained LORA, the problem practically disappears and the larger the LORA, the fewer problems.

    dfijgklerhjkldghtjykghljgJun 10, 2025· 2 reactions
    CivitAI

    I now use 0.8 base to train loras, awesome awesome model, will test v pred variant later on

    alternative_UniverseJun 11, 2025
    CivitAI

    Vpred is awesome, with the natural language added and the details, and more characters (specially nikke hehe), but I notice that most of the time it generates busty females even if I don't put the tag for it and put a lot of negatives to avoid a huge size, still, it just ignore it and make them big

    Minthybasis
    Author
    Jun 11, 2025· 1 reaction

    Oh no, it's supposed to generate cunnies by default 😭! Just kidding, does it occurs on any specific artists, characters or very common in general?

    @Minthybasis cunnies huh 🤨, but hey to each their own haha, but I've been testing some nikke characters Who are not that "gifted", but still even with negative they have huge breast , and no soecif artist, hit 2.5D or illustration style

    Dewal76Jun 12, 2025

    putting in negative is not enough, did you go with adding (the name:1.5) in negative?

    Minthybasis
    Author
    Jun 12, 2025

    @alternative_Universe Hm, finding on danbooru characters from Nikke with actually small breasts is a little challenge, lol. Copy-pasting prompts or (re)writing manually gives exactly desired size, from flat to huge. Could you upload some examples for reproducing?

    It should just work without any negatives. Except may be if you're using some breast-related tags like cleavage/paizuri/etc. which have slight bias for size, but anyway shouldn't be that bad.

    @Minthybasis 2.5d style ,1girl, Privaty /(nikke/),(nikke), /(goddess of victory: nikke/) ,taking selfie in the bathroom, posing seductively, bitting own lips, front view , serious, shy, red jacket,sweaty,masterpiece, best quality, newest, absurdres, highres, high contrast,hdr

    It's the prompt with negative like big breast or busty, characters like Alice,private, Dorothy etc, always makes them busty lol,don't know why it ignores the negatives

    Minthybasis
    Author
    Jun 13, 2025

    @alternative_Universe What about just using a general tag to specify the size you want? https://files.catbox.moe/gmg103.jpg

    Also would like to point that for covering emphasis brackets \ should be used instead of /, or result will be unpredictable. Also tags newest, absurdres, highres haven't been introduced in dataset specifically, I don't know whether their use will lead to a positive effect, or opposite.

    alternative_UniverseJun 13, 2025· 1 reaction

    @Minthybasis  will try again, I used mediium size breast or natural size but they still huge lol,also,thanks for the \, I was using it wrong then this whole Time,thanks for the link I will check it

    @Minthybasis sorry to bother again,but the link is not available,any chance for a reupload?, still not working with negatives, getting very busty ladies lol, so want to try that list :(

    Minthybasis
    Author
    Jun 13, 2025· 1 reaction

    @alternative_Universe Sure, reupload in few hours when get back to pc.

    Minthybasis
    Author
    Jun 13, 2025· 1 reaction

    @alternative_Universe Here https://files.catbox.moe/6otrqv.jpg no accidental nipples this time.

    alternative_UniverseJun 13, 2025· 2 reactions

    @Minthybasis you know i was expecting an advanced list with words,cant believe was that simple lol, thank sooo much, being enjoying and exploring 0.8 vpred a lot

    dawn66666666Jun 12, 2025· 5 reactions
    CivitAI

    Version 0.8 is great, better than NoobV1.1 in my opinion. It contains some concepts I like and trains better than noob.

    dawn66666666Jun 12, 2025· 3 reactions
    CivitAI

    It would be even better if we have auxiliary models such as "ip-adapter" on this basis. Will you train "ip-adapter"?

    Anyway, the 0.8 version of the model already does a great job.

    Minthybasis
    Author
    Jun 12, 2025

    Thank you. I had some thoughts about training controlnet/ip adapter. Maybe later, but chances are not too high since there are already a lot of plans within limited time/compute and models from sdxl/illustrious/noob seems to work okay.

    dawn66666666Jun 13, 2025· 1 reaction

    @Minthybasis Yes, most of them work fine, but relatively speaking, they are not as perfect as the 0.8 model, and they would be much more perfect if properly fine-tuned.

    wtre59Jun 13, 2025· 5 reactions
    CivitAI

    After testing all three versions, surprisingly, only the base model seems to work well in my runtime environment, vpred (either the file provided on civitai or hf) inexplicably burned up, epsilon had anatomical problems, and base using the same seed rendered correct anatomy, and outside of that, the base model seems to be able to handle Zero Terminal SNR (I'm not sure if this is in line with design expectations), well, what's certain is that the base model is pretty awesome!

    (Translated with DeepL)

    E_EEJun 14, 2025· 2 reactions
    CivitAI

    love

    bl4ckfuture107Jun 14, 2025· 5 reactions
    CivitAI

    Vpred 0.8 is what other model makers should aspire to. Yeah, you have to use BREAK, but, let me tell you all, not a single model can do as much as RouWei can in terms of knowledge. Prompt adherence is also superb and stability is much improved.
    I would also ask to improve bangs (wispy, fanged, choppy, v-bangs, loosely tucked bangs, long hair between eyes).

    kanareika1Jun 16, 2025

    Can you share your settings, please? Your UI (Comfy, Forge, etc), samplings, cfg, example prompts? For me it feels like model is barely trying to follow prompt, which is very underwhelming. I understand, that it's a base model and that i'm doing something wrong probably. It's like there is zero details - maybe it's normal, because i'm used to being spoonfed by WAI, but still.

    bl4ckfuture107Jun 16, 2025

    @kanareika1 Sure, I use Euler A, 28 steps, 3.5/4 CFG (helps with backgrounds), I often do 1216x832/832x1216/1024x1024 resolution. For scheduler, use sgm uniform, imo, it's the best for RouWei, karras doesn't work well, and normal creates visible artifacts.
    I generate images this way:
    masterpiece, best quality, (one of the General Styles) BREAK 1girl, (animal girl prompts if you have any) hair-color, hair-length, hair-style (twintails, etc), eye color, body tags (breast size, thick thighs/curvy/etc), body traits (tattoos/markings/piercings) BREAK clothes, actions, facial expressions and background.

    wtre59Jun 16, 2025
    CivitAI

    I'm a little curious as to what was done to optimise the negative prompts during RouWei's training, and exactly what types of meta tags were removed - after some comparative generation using the base version, I found that the inclusion of the negative prompts still had a modest improvement in the quality of the generated images,and that the simple The same is true for the simple inclusion of high resolution tags in the positive cues.

    In my personal experience, some of the meta tags in the booru dataset reflect the ‘quality’ of the data in some way - for example

    https://danbooru.donmai.us/wiki_pages/lossy-lossless

    and

    https://danbooru.donmai.us/wiki_pages/photoshop_(medium),

    actually implies a lowres/highres effect that is more useful than their original tag.

    (Translated with DeepL)

    Minthybasis
    Author
    Jun 16, 2025· 3 reactions

    Resolution tags like absurdres, highres, lowres, etc. Despite majority of them were removed, yet some may still persist and with illustrious legacy, where they work in base, can kind of work. If you like the effect - of course you should use them, just want to warn that after upscale and with some styles they can affect negatively. Mentioned photoshop and lossy tag might be quite useful (if they didn't add unwanted biases), this is a good discovery.

    One example of optimizations are newly introduced meta-tags that characterize pictures in specific ways (starting from color and finishing with peculiarities of composition, added effects and other). But they are needed for better training first of all, not for inference because when called they may have too strong effects and biases.

    The words about keeping the negative prompt clean are mostly related to situations, where people spamming numerous tags, and then complaining about flexibility or other issues that are actually caused by this. If you know what are you doing - make it whatever your creativity wants.

    dfijgklerhjkldghtjykghljgJun 18, 2025
    CivitAI

    out of curiosity, is the date metadata tag completely removed? sometimes I want to isolate and direct to a specific period of time, some artist changed their style, some copyrighted series changed its style, etc etc, I love the year 2025~2005 tag and newest <-> oldest tag.

    Minthybasis
    Author
    Jun 18, 2025· 1 reaction

    If that tags was in original set of tags from danbooru - it should work. But no special tags based from picture upload date were introduced.

    I decided no to use it because of the need for complex augmentation to make such divisions work really well and the possible side effects. Perhaps something like this will be introduced in the future.

    @Minthybasis awesome!❤

    ClownReplicaJun 18, 2025· 1 reaction
    CivitAI

    Can someone explain the CLIP chunks to me? I'd like to understand this and BREAK better so I can hopefully generate better pictures without having to go the complex Region Prompting route.
    Also, do other models merged with RouWei inherit this?

    Minthybasis
    Author
    Jun 18, 2025· 4 reactions

    SDXL uses text encoder parts from CLIP (2 of them with different size actually) which originally can process only 77 tokens input, 75 for meaningful excluding BOS and EOS tokens. When you using prompts longer than 75 tokens, it is being divided into chunks of 75 tokens that are encoded separately. After it, hidden states from last layer before projection (or from deeper if clip skip is used) for each processed chunk are contatinated and used as input for unet.

    So, if you want to use some tags, describe features, things, etc. that are related to something specific - it is better to have them in the same clip chunk, so text encoder will be able to assess them together. Some for splitting of something, that you don't want to mix, or what gives bad result when along with something.

    Of course, clip text encoders are quite small and dumb and unet has it's own attention when processing combinations, but managing parts of prompt might be quite beneficial in several cases.

    Yes, it can separate characters features mentioned in prompt, the outcome depends from checkpoint. But still it will not be as stable as in models with more complex text encoders like t5 or llms, or with region prompting.

    MiracleKeyJun 23, 2025· 2 reactions
    CivitAI

    im done... best model ever i have seen... full control of image generations, ez to add every style by lora... i love this model

    IJDEIHJun 27, 2025· 6 reactions
    CivitAI

    I think I'm done merging models for now. Rouwei 0.8 vpred is simply the best option for anti-slop 2D image generation.

    As for feedback, the masterpiece and best quality tags seem to prevent the generation of certain styles like jagged lines, flat colors, pixel art, or oekaki. While these styles might not typically be considered masterpieces, the tags have such a strong effect that I struggled to achieve my desired results even with careful use of quality tags.

    Regardless, it's an amazing model when you know how to use it. Thank you for sharing it.

    I can vote for your input, one of my comic book lora was facing the same situation, even when the prompt is prioritized on monochrome and greyscale, meaning these two tags were put in front of the clip text chunk, the effect of comic book drawing was not showing up, until masterpiece tag and best quality tag were taken off.

    but now, just as a general common working principle, when I use my loras I do not put any broad scale comprehensive quality control meta-tag in both positive and negative prompt, let the lora work on the entire vector space, full spectrum influence on model weights.

    OneRingJun 28, 2025· 3 reactions
    CivitAI

    At the moment I am training a fairly large LORA and I was so surprised that with the previous settings without any significant changes the losses decreased by about 2.5 times, this is amazing. I was wondering, in addition to increasing the dataset, were there any significant changes in the training settings that managed to achieve such an impressive result? No model even comes close to the results of Rouwei at the moment.

    Minthybasis
    Author
    Jun 30, 2025· 1 reaction

    Glad to hear that you got a good results!

    Hm, honestly I don't know. Main changes for v0.8 comparing with 0.7 are lots of extra augmentation, better captions and very diverse balanced dataset, so may be some of these or all together.

    izoraaJun 29, 2025· 5 reactions
    CivitAI

    This one and noobai are my favorites😊♥ Thank you for your hard work in bringing us such a great model!

    alternative_UniverseJul 2, 2025· 7 reactions
    CivitAI

    the model is so good, it should always be available on site generation

    LOL2024Jul 3, 2025

    It was available on site gen before, but because no enough bid it's unavailable now.

    @LOL2024 yeah I know, I enjoyed that time

    shab987Jul 24, 2025
    CivitAI

    why do I get weird blurry outfit with comfy, with LLM clip also the same. WAI Rouwei works fine but it's based on v0.7.

    Minthybasis
    Author
    Jul 25, 2025

    Could you upload some examples of the issue with metadata?

    reakaakaskyJul 25, 2025· 7 reactions
    CivitAI

    removed all prompt and accidently hit the generate button. the model shows its true subconsciousness. it's nsfw expert 🤣

    reakaakaskyJul 25, 2025

    Just curious, If no prompt guidance, the model will 95% output r18 and super xxx things.
    Does it mean those contents were not properly tagged during training?
    Or the dataset mainly contains xxx things? (this seems unlikely)

    Minthybasis
    Author
    Jul 25, 2025

    reakaakasky No, likely it comes from caption drops that are important part of augmentation during training. But this phenomena is quite strange because most of pictures with the 'drop_possible' flag are sfw.

    7456414Jul 25, 2025· 4 reactions
    CivitAI

    incredible model <3 great prompt adherence

    Sandi22Jul 29, 2025
    CivitAI

    Hi Minthybasis, hi artists, what do you think is the difference between the vpred and epsilon version?

    Minthybasis
    Author
    Jul 30, 2025· 1 reaction

    The main difference is the way it samples images, vpred gives more control on brightness/colors. However, some people finding e-pred more creative and it has better compatibility with popular loras.

    Also there can be some slight differences in styles and default traits.

    clueless_engineerJul 30, 2025· 6 reactions
    CivitAI

    You've created something special with this model. Adding the stabilizer lora on top of it makes it spectacular.

    NTR_BLACKAug 1, 2025· 2 reactions
    CivitAI

    I'd like to know if this VPRED model supports ZSNR.

    I'm planning to use this model for LoRa character training.

    It's use noise offset or ZSNR?

    Minthybasis
    Author
    Aug 1, 2025· 2 reactions

    Yes, it was trained with ztsnr, has full range and capable of generating any brightness/saturation. You should enable the option in training.

    But for noise offset - it is a crutch that should be avoided with normal vpred models. Even with e-pred, pyramid noise is way better in most cases.

    krewgAug 7, 2025· 1 reaction

    What is this ZSNR and how does it affect image generation with VPRED NoobAI models? Could you please enlighten me? I have this weird noise/grainy look to my images that I wish to get rid of.

    NTR_BLACKAug 11, 2025· 1 reaction

    krewg Zero Terminal SNR for better colour and lighting (Lighter and darker)

    try to change a sampling or lower cfg

    degurshaftAug 2, 2025· 10 reactions
    CivitAI

    I’ve already tried quite a few different Noob and IL mixes, but this is the first model that immediately impressed me so much with its stability and responsiveness. Your model makes me want to fully switch to it, but I’m struggling with the question of how best to train LoRAs for it. I have many LoRAs trained in OneTrainer using the base NoobAI vPred and Illustrious, but they all look rather mediocre on this model.

    Should I be using RouWei itself as the base model for training, and should the parameters be similar to training on NoobAI, considering it’s a vPred model? (Though I still don’t understand how vPred RouWei works in Comfy for me without v_prediction sampling 🤔)

    Most of the time I have to work with very small datasets (10–20) to create lesser-known characters in a style as close to the original as possible, so I use Prodigy to squeeze everything I can out of limited material. I’d be very grateful if you could advise me which of the attached configs (EPS IL or vPred Noob) would be better to rely on when training LoRAs for your model, or if my approach is completely wrong and I should reconsider it entirely.

    IL - https://files.catbox.moe/vv9btp.json 

    NAI - https://files.catbox.moe/kodhms.json

    In any case, thank you for such great work and good luck with future versions

    Minthybasis
    Author
    Aug 2, 2025· 3 reactions

    Thank you for kind words.

    You should use rouwei as base model, parameters from noob-vpred should be fine. Some differences may come from a different noise scheduler, so you can try to play around enabling debiased noise estimation or mnsnr, or train with edm2. But all this is also relevant to any vpred model including noobai.

    Vpred version of rouwei contains flag in state dict which is an unspoken standard and allows software to detect and use vpred sampling by default. If you set e-pred sampling mode manually with vpred version - it will generate only noise blobs.

    Your configs looks okay (not sure about loss_weight_strength but it seems that it is used only for specific loss functions), basically the differences are only in epsilon or velocity prediction types. I'm not a big expert in style lora training, but haven't heard about any specific model-related nuances here.

    degurshaftAug 2, 2025· 1 reaction

    Minthybasis Thank you for the explanation! Regarding such a high loss_weight_strength, I was thinking that setting it to an elevated value might help the LoRA train more aggressively given the small dataset size, while Prodigy could soften the consequences if needed, since the risk of overfitting with it is lower than with other optimizers. Still, I can’t shake the uneasy feeling that everyone else is using Adam or something similar and speaks rather negatively about Prodigy… it kind of makes me feel like an idiot because of that

    Minthybasis
    Author
    Aug 2, 2025

    degurshaft Well there are a lot of different optimizers and at least a dozen are actively used. And each one has own application, like prodigy and some schedule-free can give a good result with small dataset peft, adamw works great for large-scale applications, ademamix can be optimal choice for post-training, etc.

    You can try different, compare them and select the one with best results for specific case. There is nothing stupid about it.

    degurshaftAug 2, 2025

    Minthybasis Yep, I really chose Prodigy because of its advantages with small datasets. But to my disappointment, after running comparisons with identical settings and taking into account all the models specifics you mentioned in the description, for some reason I can’t achieve visual quality comparable to generations on the WAI with a lora trained on the base IL.

    If possible, could I take a little of your time and contact you on Discord to send some examples with metadata? I’d really appreciate it if you could maybe point out what I’m doing wrong

    Minthybasis
    Author
    Aug 3, 2025

    degurshaft I don't mind, but I'm not a big expert in lora training. Likely it's better to ask people who make it regularly.

    To achieve pretty look, you can add extra tweakers, enhancers, may be even some other styles that makes image closer to desired with low weights. Or try to generate on merges.

    reakaakaskyAug 4, 2025

    >> I really chose Prodigy because of its advantages with small datasets.

    doubt, personal exp: prodigy always blows up my training, so I sticked with adawm + lr 0.00005, rank/alpha =1, bs=8 ~1000 steps, when training a tiny dataset ~10 imgs, and let it slowly cook.

    degurshaftAug 4, 2025

    reakaakasky That’s exactly the kind of opinion about Prodigy I come across most often. Maybe you could share your config so I can try to compare? In my experience, no matter how many times I try switching back to AdamW, it still doesn’t capture the original style as well as Prodigy. I usually stick to 1000 steps too, though I’m not sure about your lr, the value you mentioned seems more like the encoder lr

    reakaakaskyAug 4, 2025

    degurshaft i don't have prodigy settings anymore.
    iirc, prodigy can learn things very fast and very well, but usually has stability issue, e.g. when stack it with other lora, there will be color blobs. That's why i switched back to adamw. I'm not 100% sure that was because of prodigy.
    lr 0.00005, yeah, that is unet lr, very low, let it cook. I don't train TE.

    degurshaftAug 4, 2025

    reakaakasky I was actually asking specifically about AdamW. Im actively testing it right now, and it would be interesting to see other peoples approach to training with this optimizer. I havent noticed any artifacts when mixing a LoRA trained on Prodigy with another one.

    And again about that lr, even if that value (0.00005) is for unet, it still seems oddly low to me. Shouldnt it either match the global lr or be just slightly lower? For AdamW I use 0.0005, and unet is set to the same. I also dont train the t encoder, but Im thinking of trying it cuz I saw somewhere that activation tokens start working better

    reakaakaskyAug 4, 2025

    degurshaft oh... I misunderstood.

    alpha=rank=8

    lr=0.00005

    batch size=8

    steps ~1000

    min snr gemma=1

    no noise offset etc. basically everything is default.

    ____

    don't know what is the "global lr", I guess just a default value if no unet/te lr was specified.

    degurshaftAug 4, 2025

    reakaakasky Is such a low lr related to using a large batch size? I ve just never really tried a batch larger than 2, since I felt that increasing it eats away at the details, especially with Prodigy (considering it requires lr=1 for all networks)

    reakaakaskyAug 4, 2025

    degurshaft don't know.

    I usually don't care about details when training on small dataset. as Misthy mentioned

    To achieve pretty look, you can add extra tweakers, enhancers, may be even some other styles that makes image closer to desired with low weights. Or try to generate on merges.

    I use other LoRA to add similar details.

    necrophagism777Aug 4, 2025· 6 reactions
    CivitAI

    The prompt adherence and responsiveness is really incredible for a SDXL model

    Sundowner4547Aug 4, 2025
    CivitAI

    Any workflow for Comfy please?

    randomsnuwAug 23, 2025

    default workflow works well with it. latest comfyui recognizes eps/vpred on load without any special nodes

    rancidy164Aug 9, 2025· 5 reactions
    CivitAI

    All I get is weird colored blobs. I'm using Euler A and same prompt as the sample image. It's weird because all the other illustrious based models I've tried work just fine. I'm using Invoke.

    LOL2024Aug 10, 2025

    Are you checked your negative prompts? This model usually output bad results if you put too long negative prompts, also try to remove embeddings, some of embeddings like lazyneg works bad in Rouwei

    rancidy164Aug 10, 2025

    LOL2024 Thanks for the reply! I only have "bad quality, worst quality, watermark, artist_logo," as negative prompts and I turned off all embeddings and loras as well, trying to do bare minimum workflow. I also tried many other schedulers aside from Euler A and I don't think it could be the VAE as I already see the image forming weirdly in the preview.

    LOL2024Aug 10, 2025

    rancidy164 Are you tested both EPS and V-pred, then they all got similar corrupt results?

    rancidy164Aug 11, 2025

    LOL2024 Epsilon actually works!

    Minthybasis
    Author
    Aug 11, 2025

    You should update your software because it can't detect that the model is using vpred sampling. If you're using a1111 - switch to dev branch, it works there.

    exomoyoAug 16, 2025

    I had horrible results at first, but changing the scheduler from karras to simple it now works as intended

    nip_ottoAug 12, 2025· 19 reactions
    CivitAI

    https://drive.google.com/file/d/1hdfc6mJF4MEuFyKjxyDlClnC8QYwAHKY/view?usp=sharing

    crappily replaced artist and character tags in danbooru.csv with ones in training data, sacrificing entries count and alias

    PLGFAug 26, 2025· 10 reactions
    CivitAI

    This checkpoint ignores 90% of my prompt for some reason. It gets the character, but keeps making them nude even though I describe their clothes. Also, it doesn't seem to know what 'foot focus' means.. I can tell it would be really good if it would just adhere to my prompt.

    BKM_UAAug 26, 2025

    Maybe you didn't add "by" when using the artist tag? For example, you wrote "wlop" instead of "by wlop"

    Minthybasis
    Author
    Aug 26, 2025· 4 reactions

    Can you upload some examples where you get bad results with metadata? The model has some biases and places where you can slip, which are described, but in general it should give the opposite experience to what you're getting.

    mahououAug 28, 2025· 4 reactions
    CivitAI

    can you add newest, recent, early, old etc time related tags to get different styles from certain years better?

    Minthybasis
    Author
    Aug 30, 2025· 1 reaction

    You mean time tags in general (like for 1990s, 2010s, etc.) or for each artist style?

    mahououOct 7, 2025

    @Minthybasis yes getting certain artist styles is hard if their artstyles change

    Minthybasis
    Author
    Oct 8, 2025· 1 reaction

    @mahouou That's a complex thing because lazy introducing of tags based on timeline won't solve the problem. In v0.8 I tried to split some styles by using a combination of texture harmonics and embeddings to clasterize them and split into buckets. But I wasn't able to set up that system that would allow to do this fully automatically for whatever style, without the need for manual tweaks and supervision.

    I'll try to introduce more for next version of dataset. Btw if you want to have this for some exact artists - just list them, they will be prioritized.

    BaxterSep 3, 2025· 10 reactions
    CivitAI

    This model is really great, my favorite Illustrious finetune! Have you had the chance to look at the progress in regards to the new Lumina model? It seems to vastly improve over Illustrious base models and I was wondering if you ever thought about tinkering or finetuning the base Lumina model or the NetaYume model? The results look vastly more natural and less "AI" so to speak. If you have the time, I recommend taking a look :)

    Minthybasis
    Author
    Sep 13, 2025· 8 reactions

    Thank you! Sorry for the long reply, got messed by civit notifications.

    Yes, the Lumina looks very promising, one can say that this is what was expected from the sd3/3.5. Considering that in any case it is necessary to move in that direction, eventually I'll release a finetune of dit or some hybrid model. But no promises of timeframes, too much uncertainty.

    yilingshuSep 25, 2025
    CivitAI

    Which vae is used for this ckpt? I can't find vae from downloading.

    https://civitai.com/images/102352031

    This is my test generated result, the quality of details is not good.

    Minthybasis
    Author
    Sep 27, 2025

    The model is supposed to work with standard sdxl vae (fp16 fix, baked in, not eq), can also be used with its forks with boosted contrast/saturation.

    To improve the quality of generated pictures it is better to give more detailed prompt, specifying exactly what you want to get, and then use styles (one or several together).

    QHvI7vwtWDSep 29, 2025
    CivitAI

    I don't like the look of euler ancestral, what other options do I have? I used dpm 3m sde on other models but sadly it doesn't work well here

    Minthybasis
    Author
    Oct 8, 2025

    Epsilon version have some issues with unconventional samplers (actually more with schedulers). As for vpred - most should work fine. Cfg++ samplers work great and give nice results, may be you should try them.

    q6zfd1bpa319Oct 19, 2025· 9 reactions
    CivitAI

    Do you have any plans for v0.9? I already like v0.8 a lot, just wondering since the time between v0.7 and v0.8 was like 5 months, and it's been about 5 months since v0.8.

    Minthybasis
    Author
    Oct 25, 2025· 30 reactions

    Hi, yes I do have plans for future. Currently I'm testing modification of text encoder, which is the most weak part in sdxl and conversion whole latent space to 16chanels (to use flux vae). Test variants for text encoder is published here, alpha version for 16ch pretrain likely will be released next week.

    After things will be clear and new dataset ready, I'll make a large training for major version with all new features that should be something really new.

    Of course if everything goes well.

    nvo76Nov 27, 2025
    CivitAI

    So this is like an advanced model for newbies? I just don't get the point

    bl4ckfuture107Dec 10, 2025
    CivitAI

    Hey Minthy, both v0.8 versions have issues with long_bangs: it's quite hit or miss compared to v0.7

    BIG_AMar 7, 2026· 3 reactions
    CivitAI

    概括(不是无脑翻译,让头脑简单的人看得懂,如果这样还看不懂,你这智商也别玩图了,回家玩屌去吧):

    这个模型是什么?

    它是一个专门用来画**二次元/动漫风格**图片的AI模型(基于SDXL架构)。可以把它理解成一个经过海量图片(1300万张)特训的“天才画师”。

    ### 它牛在哪里?(核心优势)

    1. 听话(提示词遵循度高): 你让它画什么,它就画什么,不太会“自由发挥”些奇怪的东西。它很擅长理解**标签**(比如 1girl, blue hair)和**自然语言**(比如“一个蓝头发的女孩”)。

    2. 懂得多(知识量大): 认识超过5万个画师风格,无数动漫角色,还有很多特定概念(比如各种尾巴/耳朵的玩法)。

    3. 画得好(美学优秀): 色彩鲜艳、过渡自然、不容易出现过曝或死黑的情况,而且因为训练数据干净,画出来的图很少有烦人的水印。

    4. 思维清晰(解决标签泄露): 以前的模型容易把角色特征搞混(比如画A角色却带有B角色的特征),它很好地解决了这个问题。

    ### 使用它的“黄金法则”(怎么用才出效果?)

    这部分最重要,否则它可能发挥不出来。

    1. 画师风格必须“特殊对待”

    * 规则: 如果你想模仿某位画师的风格(比如 by wlop),必须在这些风格标签后面加上一个 “BREAK”(如果你用的是A1111这类软件),或者把它们放在**提示词的最末尾**。

    * 原因: 这能让AI清楚地知道:“这些是风格参考”,而不会把它们和画面里的角色描述(比如“1girl”)搞混。这是它和其他模型最大的不同点。

    * 错误示范: 1girl, by wlop, by kantoku, ... (这样混在一起效果会变差)

    * 正确示范: by wlop, by kantoku BREAK 1girl, ... 或者 1girl, ... by wlop, by kantoku

    2. 怎么称呼画师和角色

    * 画师: 必须用 by 画师名 的格式,这是强制性的。比如 by kantoku。

    * 角色: 最好用标准的Booru标签格式。比如角色名有括号,要用反斜杠转义: karin_(blue_archive) 写成 karin \(blue archive\)。同时,可以加上皮肤版本标签会更精准,比如 karin \(bunny\) \(blue archive\)。

    3. 简短的负面提示就够了

    * 不用写一大堆乱七八糟的词。一般来说,负面提示(不想看到的东西)只需要写worst quality, low quality, watermark(最差质量,低质量,水印)就足够了。

    4. 巧用“亮度/色彩”调节标签

    * 如果你想控制画面明暗或色彩,可以用一些特殊的元标签,比如 low brightness(低亮度)high saturation(高饱和度)hdr(高动态范围)等。这在两个版本里都很好用。

    ### 两个不同的版本:Epsilon 和 Vpred

    这个模型提供了两个“性格”稍有不同的小版本:

    * Epsilon版(主流版):

    引导尺度(CFG Scale)建议用 *7** 左右。

    特点:效果稳定,但*对亮度标签依赖较大**。如果你不手动调低亮度low brightness),它可能很难画出真正的纯黑色。

    * Vpred版(新版/潜力股):

    引导尺度(CFG Scale)建议用 *3~5**,别太高。

    * 特点:颜色更饱满,更不容易画坏(过曝),明暗范围更广(能轻易画出纯黑纯白)。但据说极少数情况下对比度可能有点问题,需要用技巧微调。

    ### 简单总结

    * 这是一款顶级的SDXL动漫模型,特别擅长理解复杂的提示词和海量的画师风格。

    * 使用核心秘诀:把画师风格标签用“BREAK”隔开,或者扔到提示词最后。

    * 负面提示不用太复杂,简单几个词就行。

    * 两个版本怎么选? 图省事用Epsilon(记得调亮度),追求更丰富的色彩和黑白对比可以试试Vpred(记得把CFG调低)。

    * 记住: 它的体量远小于Flux那种新模型,虽然已经很努力,但也不是万能的,有时候需要多抽几次卡才能得到满意的结果。

    GPUPoorChadMar 12, 2026· 1 reaction
    CivitAI

    Mind making something like this for SD 1.5? Only reason I care is that you can do SD 1.5 on phones and really fast too with apps like Local Dream that use Qualcomm NPU like honestly feels better than like my GTX 1080 when SD 1.5 was the only thing bafflingly, so a model updated and more fitted for memory constrained phone would be amazing. SD 1.5 is pretty dumb, and not sure where you would improve however so not sure if it's worth it just throwing it out there. Could be a better more modern base with similar memory foot print, but likely require full retraining pretty much if it's generalized. I'd really like to have okayish model on my phone I can generate away with in bed for fun, would make my day :)

    bl4ckfuture107Apr 7, 2026

    Doable? Yes, but SD 1.5 has fewer parameters, so you will never get the same knowledge as the SDXL based one.

    GPUPoorChadApr 8, 2026· 1 reaction

    @bl4ckfuture107 yes, but there is definitely still devices that can't run full fat SDXL mostly phones

    bl4ckfuture107Apr 8, 2026

    @GPUPoorChad I agree with you, really. I would love to see an SD 1.5 implementation to run on my AMD NPU

    CannedPsychoMar 17, 2026
    CivitAI

    Do you need to have regional prompter turned on in order for v0.8 to function properly?

    bl4ckfuture107Mar 19, 2026

    No, you use BREAK without modifications to split the prompt in ~75 token chunks.

    Neon_signsApr 12, 2026
    CivitAI

    Any plans for anima?