CivArchive
    Rouwei-16channel - v0.1_alpha
    NSFW
    Preview 108573028
    Preview 108573037
    Preview 108573038
    Preview 108573039
    Preview 108573105
    Preview 108573106
    Preview 108573103
    Preview 108573104
    Preview 108573134
    Preview 108573131
    Preview 108573133
    Preview 108573135
    Preview 108573137

    Experimental conversion of SDXL architecture to 16 channel latent space

    This is an experimental pretrain on top of Rouwei-0.8 that works with 16 channel latent space and uses Flux ae.

    Goals:

    • Achieve better details while maintaining low compute requirements and all existing knowledge and performance

    • Possibility of joint sampling with Flux/Chroma/Lumina and other models with same latent space

    Current state:

    Early alpha version, it is pretty raw. Images may contain extra noise and have artefacts in small details, level varies from neglectable to significant. Upscale, samplers/schedulers, styles, even prompt affect it.

    Use of GAN upscale models in pixels space instead of latent upscale gives much smoother results, bumping base resolution higher helps too.

    Epsilon prediction now, can be converted to vpred or anything in future.

    Usage:

    Comfyui

    Workflow example (Or just pick any image from showcase)

    1. Download the checkpoint (FP32 and Unet-only can be found in HF REPO

    2. Download these nodes (or just use install missing nodes using Comfy Manager)

    3. Use SDXL 16ch loader node to load it, then work just like you used to with sdxl

    4. DO NOT REMOVE Latent multyply NODES, latents should be scaled before and after processing just same as in regular SDXL inference. This step just isn't hidden yet.

    If you're getting error mat1 and mat2 shapes cannot be multiplied (_x16 and 4x3) - disable the preview option for Ksampler. It happens because preview uses taesd vae designed for 4channel.

    Other UI

    Since the main difference is just shapes of tensors, used vae and latents scaling factor - it should be easy to implement support to any other UI.

    Lora adapters, controlnet, ip-adapters, other things untested.

    Joint sampling:

    Since the model operates in 16channel latent space similat to Flux, Chroma, Limina-image and some other, you can implement complex workflows (if you have enough memory). This allows to utilize all knowlege of characters, styles, concepts from RouWei along with the performance of bigger models.

    Here is an example workflow. Using just few (1..4) steps from Flux you create some rough basic composition. Then the latents come to 16channel sdxl model where denoised (skipping initial high noise timesteps).

    It is the most simple approach, since you don't need to reconvert latents though series of vae or some adapters, you can change models on every denoising step without having any performance impact.

    Just don't forget to apply Latents multiply nodes between transitions

    How it's made

    Basically, no changes to default architecture. Just re-initializing if input and output layers to new size, then training with gradual unfreezing of blocks towards the middle.

    Default SDXL latent scale factor of 0.13025 doesn't work well here, 0.6 is used for this release.

    This is not the most optimal approach. Some changes to the outer layers of the model instead of direct use 'as is' should give improvement in future. If you have any thoughts or ideas about it - please share them.

    Training:

    To train it (in current version) all you need is to change the number of in/out channels in UNET config and set scale factor to 0.6 instead of 0.13025. And probably check vae part to work properly.

    (Code examples later)

    I'm willing to help/cooperate:

    Join Discord server where you can share your thoughts, proposals, requests, etc. Write to me directly here or dm in discord.

    Thanks:

    Part of training was performed using google TPU and sponsored by OpenRoot-Compute

    Personal: NeuroSenko

    And many thanks to all fellow brothers who supported me before.

    Donations:

    BTC bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c

    ETH/USDT(e) 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db

    XMR 47F7JAyKP8tMBtzwxpoZsUVB8wzg2VrbtDKBice9FAS1FikbHEXXPof4PAb42CQ5ch8p8Hs4RvJuzPHDtaVSdQzD6ZbA5TZ

    License:

    Same viral as for Illustrious base.

    Description

    First release

    FAQ

    Comments (18)

    RC0NNov 2, 2025· 28 reactions
    CivitAI

    Always appreciate people trying the things nobody else has the patience or know-how to do. I'll be watching this experiment with great interest.

    Awesome work! 👏

    arkbirdNov 2, 2025
    CivitAI

    HELP PLZ! I'm using Flux.1_Krea_Dev FP8 SCALED for the basic latent, but when it comes to rouwei16channel's k-sampler, an error occurs:“mat1 and mat2 shapes cannot be multiplied (15808x16 and 4x3)

    Minthybasis
    Author
    Nov 2, 2025

    It looks like an issue related to loading of 16ch checkpoint. Does regular workflow without flux work or it gives the same error?

    Minthybasis
    Author
    Nov 3, 2025

    The issue comes from preview option for Ksampler because it tries to use taesd vae designed for 4channel. Just turn off the preview option.

    reptilekillerNov 2, 2025
    CivitAI

    It will be necessary to perform full retraining for this architecture to be successful (or equivalent finetuning). But, I believe this is a good future, with balance between SDXL and FLUX.

    Minthybasis
    Author
    Nov 2, 2025· 1 reaction

    It actually is a full retraining to a different latent space, just early release. Compute requirements for it are way higher than for something like vpred conversion. The most heavy part is done, now it needs some minor changes in outer layers and polishing.

    But this is just another part of a puzzle (like TE replacement) for the future large training. I don't think it makes sense to spend significant money and time to make yet another sdxl tune. At the same time, training of new dit-based models looks like dark forest due to reasons. So, fixing the main issues of SDXL and training a modified architecture seems to be a good option. Even if we see development of new-gen models for anime arts, it will still be useful in joint workflows due to very high inference speed and style flexibility.

    PhatcatNov 3, 2025· 6 reactions
    CivitAI

    This along with Rouwei-T5Gemma is incredibly fascinating and interesting - Any plans to incorporate them both in the same model? Make a proper rouwei SDXL-Flux? :D

    Minthybasis
    Author
    Nov 3, 2025· 4 reactions

    Yes, these are experiments to be implemented together in future model. Maybe also there will be a replacement of few unet blocks with dit, but only very small part to keep high inference speed and low hardware requirements.

    PhatcatNov 4, 2025· 1 reaction

    @Minthybasis currently playing around with rouwei 16 channel using t5gemma alone, clips alone and t5gemma and clips in concat comparing results. atm concat of both clips and t5gemma seems to produce the most coherent result. this also goes for using other models with t5gemma; concat with clips produces better results.
    Also getting intermittent runtime errors with rouwei-16channel, it seems like it's a toss-up for me if a workflow will run initially or not; running the same workflow without making any changes at all sometimes it will work, other times it won't. That one I don't understand.
    Also I tried using the unet model alone but for some reason I couldn't get clips to produce anything besides noise so I had to get full model and load clips from there; is the full model not simply flux vae and sdxl clips (g and l) baked in? I' been using external flux vae and that seems to work just fine.

    PhatcatNov 8, 2025· 1 reaction

    @Minthybasis Well, the error seems like it very well could be related to preview; but when it does run with preview on, it will produce something and the progress can be followed in the preview. I dunno..
    Using my own clips with the u-net only is where it seems to run, but only noise is produced; it could be there's an issue with my clips or how I loaded them. I'll just use the full model with baked in clips it's no issue.
    But yeah I would love to show you some of the results, especially the clips vs t5gemma vs clips+t5gemma. If I could send the pictures the workflow is embedded in the metadata.

    Minthybasis
    Author
    Nov 11, 2025

    @Phatcat Yes, if it produces only a noise - it can be related to clips. Do they work with 4channel version?

    You can upload pictures with workflow to any image hosting that doesn't cut metadata, for example catbox.moe

    PhatcatNov 12, 2025· 1 reaction

    @Minthybasis No.. They only work with sdxl based models apparently; so anything illustrious based will result in garbage output.. Apparently the clips are not quite the same..
    Somewhere in the pibeline it seems to have the clips unfrozen during training, perhaps even pre-illustrious.

    On a related note, NoobAI is suffering from broken clips and there's a write-up on it you may or may not have seen:

    https://www.reddit.com/r/StableDiffusion/comments/1o1u2zm/text_encoders_in_noobai_are_dramatically_flawed_a/
    https://www.reddit.com/r/StableDiffusion/comments/1o25x9t/text_encoders_in_noobai_are_part_2/

    IJDEIHNov 7, 2025· 5 reactions
    CivitAI

    Thank you for sharing your work.

    xikin2135558Nov 8, 2025
    CivitAI

    How do I train a lora for this model?

    Minthybasis
    Author
    Nov 10, 2025· 1 reaction

    A few edits in trainer code need to be made:

    1. Change channels count from 4 to 16 in model config

    2. Change VaeScaleFactor from 0.13025 to 0.6

    3. Adjust part of code that is related to latents creation.

    First two can be done pretty easily and enough if you're using precomputed latent. Last part is more complicated, I'm going to upload some code to weekend or after it.

    bl4ckfuture107Nov 9, 2025· 5 reactions
    CivitAI

    MinthyBased does it again!

    PhatcatDec 16, 2025
    CivitAI

    @Minthybasis Hey.. So... Flux 2 VAE is out..
    So is Z-Image-Turbo, with more Z-Image models to come.

    What does that mean for this project (16-channel; rouwei with flux vae), the t5gemma as sdxl encoder project and rouwei in general?

    You moving rouwei to zit? Moving focus from flux 1 vae to flux 2 vae? Just keep working on getting rouwei-sdxl to play nice with flux 1 vae? Or something completely different?

    qekDec 20, 2025
    CivitAI

    Doesn't work well. I thought it was at least 50% standalone

    Checkpoint
    Illustrious

    Details

    Downloads
    233
    Platform
    CivitAI
    Platform Status
    Available
    Created
    11/2/2025
    Updated
    6/28/2026
    Deleted
    -

    Files

    rouwei16channel_v01Alpha.safetensors

    Mirrors

    Available On (1 platform)

    Same model published on other platforms. May have additional downloads or version variants.