CivArchive
    Rouwei-Gemma - v0.1_t5gemma2b_33k7
    NSFW
    Preview 95302942Preview 95302958Preview 95302989Preview 95303014Preview 95303058Preview 95303059Preview 95303055Preview 95303057Preview 95303056Preview 95303054

    Trained adapter to use LLM as text encoder for Rouwei 0.8 (and other sdxl models).

    Update v0.2:

    New version for t5gemma-2b text encoder model with improvement performance.

    To run you need t5gemma-2b encoder model (ungated mirror, downloading instructions below).

    You need an updated set of custom nodes to make it work

    Detailed launch instructions and prompting tips below

    What is it:

    A drop-in replacement for Clip text encoders in SDXL models that allows to achieve better prompt adherence and understanding.

    Kind of same as ELLA, SDXL-T5 and likely others, but this one is focused on anime models and advanced knowledge without censorship.

    Key features:

    • State of the art prompt adherence and NL prompt understanding among SDXL anime models

    • Support of both long and short prompts, no 75 tokens limit per chunk

    • Preserves original knowledge of styles and characters while allowing amazing flexibility in prompting

    • Support of structured prompts that allows to describe individual features for characters, parts, elements, etc.

    • Maintains perfect compatibility with booru tags (alone or combined with NL), allowing easy and convenient prompting

    How to run latest version:

    1. Install/update custom nodes for Comfy

    • Option a: Go t ComfyUI/custom_nodes and typegit clone https://github.com/NeuroSenko/ComfyUI_LLM_SDXL_Adapter

    • Option b: Open example workflow, go to ComfyUI Manager and press Install Missing Custom Nodes button.

    2. Make sure you have latest Transformers: Activate ComfyUI venv, type pip install transformers -U

    3. Download adapter and put it into /models/llm_adapters

    4. Download T5Gemma

    • Option a: After activating ConfyUI venv type hf download Minthy/RouWei-Gemma --include "t5gemma-2b-2b-ul2_*" --local-dir "./models/LLM" (correct path if needed).

    • Option b: Download safetensors file and put in into ComfyUI/models/text_encoders (implemented within next nodes update)

    5. Download Rouwei (vpred or epsilon or base) checkpoint if you don't have one yet

    6. Use any image from showcase as a reference workflow, feel free to experiment

    Instructions for previous versions based on gemma-3-1b llm model can be found in this HF repo.

    Current performance:

    This version stands above any clip text encoders from various models in terms of prompt understanding. It allows to specify more details and individual parts for each character/object that will work more or less consistent instead of pure randomness, make a simple comic (stability varies), define positions and more complex composition.

    However it is still in early stage, there can be difficulties with rare things (especially artist styles) and some biases. And it works with quite old and small UNET than needs a proper training (and possibly modification), don't expect it to perform as top tier open source image generation models like Flux and QwenImage.

    Usage and Prompting with examples:

    The model is quite versatile and can accept various formats, including multilangual inputs or even base64.

    But it is better to stick to one of several prompting styles:

    (Examples in showcase or in HF repo readme)

    Natural language

    kikyou (blue archive) a cat girl with black hair and two cat tails in side-tie bikini swimsuit is standing on all fours balancing on top of swim ring. She is scared with tail raised and afraid of water around.

    Just a plain text. It is better to avoid very short and very long prompts.

    Booru tags

    Regular booru tags.

    Until emphasis support will be added to nodes, avoid adding of \ before brackets. Also unlike with clip misspelling will likely lead to wrong results.

    Combination of tags and NL:

    masterpiece, best quality, by muk (monsieur).
    1girl, kokona (blue archive), grey hair, animal ears, brown eyes, smile, wariza,
    holding a yellow ball that resembles crying emoji

    Most easy and convenient approach for most cases.

    Structured prompting:

    bold line, masterpiece, classroom.
    ## Asuka:
    ouryuu Asuka Langley in school uniform with tired expression sitting at a school desk, head tilt.
    ## Zero two:
    Zero two (darling in the franxx) in red bodysuit is standing behind and making her a shoulder massage.

    It can understand Markdown # for seperating), json, xlm or simple seperation with new lines and :. Prompt structuring allows to improve results when prompting several characters with individual features. Depending on specific case, it can work very stable, work in most cases above random level, or can require some rolls bo allowing to achieve things impossible otherwise due to biases or complexness.

    All together:

    Any combinations of above. Recommended for most complex cases.

    Quality tags:

    masterpiece or best quality for positive.

    worst quality or low qualit for negative.

    It is better to avoid spamming because it can cause unwanted biases.

    Current state of custom nodes does not support prompt weights and standard spells. Also (brackets) should be left as is, no need to add \.

    Other settings and recommendations are same as for original RouWei

    Knowledge and Train Dataset:

    Training dataset utilises about 2.7M of pictures from this dataset and few other sources. Still quite a small number.

    Training and code

    Forward code example, obtaining hidden states from t5gemma example.

    Sd-scripts fork for LORA training.

    (More training code/trainer fork coming soon)

    Compatibility:

    Designed to work with Rouwei, works with most of Illustrious-based checkpoints including NoobAi and popular merges. Unet parts of loras works, TE parts need to be retrained.

    Near future plans:

    • Custom nodes improvement including emphasis

    There will be another version trained on larger dataset to estimate capacity and decision about joint training with encoder or leaving it untouched.

    If no flaws are found, then it will be used as text encoder for large training of next version of Rouwei checkpoint.

    I'm willing to help/cooperate:

    Join Discord server where you can share you thoughts, give proposal, request, etc. Write me directly here, or dm in discord.

    Thanks:

    Part of training was performed using google TPU and sponsored by OpenRoot-Compute

    Personal: NeuroSenko (code), Rimuru (idea, discussions), Lord (testing), DraconicDragon (fixes, testing), Remix (nodes code)

    Also many thanks to those who supported me before:

    A number of anonymous persons, Bakariso, dga, Fi., ello, K., LOL2024, NeuroSenko, OpenRoot-Compute, rred, Soviet Cat, Sv1., T., TekeshiX

    Donations:

    BTC: bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c

    ETH/USDT(e): 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db

    XMR: 47F7JAyKP8tMBtzwxpoZsUVB8wzg2VrbtDKBice9FAS1FikbHEXXPof4PAb42CQ5ch8p8Hs4RvJuzPHDtaVSdQzD6ZbA5TZ

    License

    MIT license for adapter models.

    This tool uses original or finetuned models google/t5gemma-2b-2b-ul2 and google/gemma-3-1b-it.

    Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](ai.google.dev/gemma/terms).

    Description

    Update, t5gemma

    Checkpoint
    Other

    Details

    Downloads
    86
    Platform
    CivitAI
    Platform Status
    Available
    Created
    8/20/2025
    Updated
    11/15/2025
    Deleted
    -

    Files

    rouweiGemma_v01T5gemma2b33k7.safetensors