Experimental Conversion of our NoobAI-RF model to Flux2 VAE.

We have observed the model's ability to adapt to the Flux2 VAE, and current trends suggest that significant improvements are possible with bigger training, which potentially would allow it to compete with bigger models.
By supporting us you could make it a reality.

More info on supporting us: click me

Model Description

This is a native training of SDXL Unet in combination with Flux2 VAE. Essentially we've adapted previously 4 channel model to work with 32 complex channels of Flux 2. No adapters or tricks, fully native.
Danbooru dataset of NoobAI has been utilized for this.

Due to limited compute we were not able to fully converge it, expect output on the level of very early anime models. We hope community will find this interesting enough to support us. We observe steady convergence throughout whole training process, and believe that further training will result in a new standard for fast local anime generation.

Please take this model a proof of concept, not as a final product.

We have used Rectified Flow for training, with staged approach for adaptation of Flux2 VAE.
Most of the knowledge seem to be preserved, but is significantly weakened due to completely new latent space.

Developed by: Cabal Research (Bluvoll, Anzhc)
Funded by: Community, Bluvoll
License: fair-ai-public-license-1.0-sd
Finetuned from model: NoobAI-RF

Bias and Limitations

Once again, we are limited in budget for this fundamental task. We have adapted enough to have it output somewhat acceptable images (Closer to a theoretical NoobAI 0.1's knowledge using Flux 2 VAE), but further progress would require large compute, as we are in territory where model is simply seeing the new level of details for the first time(as well as old level of details in a new way), and it is hard.

Most biases of official dataset will apply(Blue Archive, etc.).

Expect noise, fuzzy details, low performance in landscape aspect ratio, bad hands and generally issues with composition as a whole.

Model Output Examples

One of the benefits we have achieved is color:

Due to being native flow model, it achieves strong colors, while not making them acidic, or otherwise unstable.

Generally, as already stated, expect at least some grain and fuzzyness in all gens, as we have not converged to the juicy details yet.

Recommended Parameters:
Sampler: Euler, Euler A, DPM++ SDE, etc.
Steps: 20-28
CFG: 6-9
Schedule: Normal/Simple/SGM Uniform/Quadratic
Positive Quality Tags: masterpiece, best quality
Negative Tags: worst quality, normal quality, bad anatomy

A1111 WebUI

(All screenshots are repeating our RF release, as there is no difference in setup)

Recommended WebUI: ReForge - has native support for Flow models, and we've PR'd our native support for Flux2vae-based SDXL modification.

How to use in ReForge:

(ignore Sigma max field at the top, this is not used in RF)

Support for RF in ReForge is being implemented through a built-in extension:

Set parameters to that, and you're good to go.

Flux2VAE does not currently have an appropriate high quality preview method, please use Approx Cheap option, which would allow you to see simple PCA projection(ReForge).

Recommended Parameters:
Sampler: Euler A Comfy RF, Euler, DPM++ SDE Comfy, etc. ALL VARIANTS MUST BE RF OR COMFY, IF AVAILABLE. In ComfyUI routing is automatic, but not in the case of WebUI.
Steps: 20-28
CFG: 6-9
Schedule: Normal/Simple/SGM Uniform
Positive Quality Tags: masterpiece, best quality
Negative Tags: worst quality, normal quality, bad anatomy

ADETAILER FIX FOR RF: By default, Adetailer discards Advanced Model Sampling extension, which breaks RF. You need to add AMS to this part of settings:

Add: advanced_model_sampling_script,advanced_model_sampling_script_backported to there.

If that does not work, go into adetailer extension, find args.py, open it, replace builtinscripts like this:

Training

Model Composition

(Relative to base it's trained from)

Unet: Same CLIP L: Same, Frozen CLIP G: Same, Frozen VAE: Flux2 VAE

Training Details

(Main Stage Training)

Samples seen(unbatched steps): ~18.5 million samples seen
Learning Rate: 5e-5
Effective Batch size: 1472 (92 Batch Size 2 Accumulation 8 GPUs)
Precision: Full BF16
Optimizer: AdamW8bit with Kahan Summation
Weight Decay: 0.01
Schedule: Constant with warmup
Timestep Sampling Strategy: Logit-Normal -0.2 1.5 (sometimes referred to as Lognorm), Shift 2.5
Text Encoders: Frozen
Keep Token: False
Tag Dropout: 10%
Uncond Dropout: 10%
Shuffle: True

VAE Conv Padding: False
VAE Shift: 0.0760
VAE Scale: 0.6043

Additional Features used: Protected Tags, Cosine Optimal Transport.

Training Data

2 epochs of the original NoobAI dataset, including images up to October 2024, minus screencap data(was not shared).

LoRA Training

Current stage is trainable, but it is hard to achieve accurate reproduction if subject/content is dependent on small details, as base model did not converge to them yet. My current style training settings (Anzhc):

Learning Rate: tested up to 7.5e-4
Batch Size: 144 (6 real * 24 accum), using SGA(Stochastic Gradient Accumulation) - without SGA I probably would lower accum to 4-8.
Optimizer: Adamw8bit with Kahan summation
Schedule: ReREX (Use REX for simplicity, or Cosine annealing)
Precision: Full BF16
Weight Decay: 0.02
Timestep Sampling Strategy: Logit-Normal(either 0.0 1.0, or -0.2 1.5), Shift 2.5

Dim/Alpha/Conv/Alpha: 24/24/24/24 (Lycoris/Locon)

Text Encoders: Frozen

Optimal Transport: True

Expected Dataset Size: 100 images (Can be even 10, but balance with repeats to roughly this target.)
Epochs: 50

Hardware

Model was trained on cloud 8xH200 node.

Software

Custom fork of SD-Scripts(maintained by Bluvoll)

Acknowledgements

Special Thanks

To a special supporter who singlehandidly sponsored whole run and preferred to stay anonymous

Support

If you wish to support our continuous effort of making waifus 0.2% better, you can do it here:

https://ko-fi.com/bluvoll

Crypto link pending.

Potential future

Expected Compute Needed: We theorize that the model needs at the very least 20 epochs on full data, ideally 35 Epochs, each epoch was about 460 USD with the provider we use, at the very least each time we reach enough donations to train 2 epochs, we'll resume and train more. If we have enough donations we will update the dataset to most recent data.
Why not do this now? Caching with Flux 2 VAE takes a whooping 15 hours, and +-20TB since each latent is 2MB, which in itself costs 180 USD of compute time.

We are working on further improvements to pipeline and components at the moment of release of this model, and have plans to upgrade this arch more.