RDBT [NetaYume]
Recalibrated distribution
Do NOT download TCFP8 version. I messed up with the metadata. Huge stability issues. I will delete this version.
Use the new uploaded bf16 version.
This model is part of the test theories to improve diffusion models.
Trained from NTYM4 with ~70k images.
Aiming for
Better textures and art details.
Better and stable prompt coherence.
Balanced contrast and lighting. Never overflow/oversaturated.
Guide
Prompt: Basically the same as NetaYume. Except:
Style prompt is required. This model does not have default style. The default tv anime style in NetaYume has been nuked.
Use "Digital anime art style by @xxxx." at the end of the prompt to prevent Gemma 2 paying too much and incorrect attention to the artist name.
Quality tags are not needed. Dataset has higher quality than avg "masterpiece".
You don't need tons of tags to describe a character. Just use the most unique ones. e.g. "elf girl frieren, fox girl tamamo \(fate\)". See: img.
Prefer simple natural language at the start, and tags at the end.
Settings:
Timesteps shift 3~4.5 for better details. (from node ModelSamplingAuraFlow).
CFG scale: 1. Although CFG 1~1.5 is doable, if you want.
Sampler: Prefer euler_a + normal.
About CFG distilled model:
You can't control CFG scale and negative prompt. Those are trained inside the model.
CFG scale = 1 is a special value. It means disabling CFG and neg prompt.
Because you don't need to run a forward pass for the negative prompt, you can generate 2x faster.
Some training details
Total dataset contains ~70k images. Not equally weighted.
Only layers.[2:25] were trained.
Captions are mainly from Gemini. Natural language only, no tags.
Not a LoRA this time?
Multi stage training. No LoRA.
Versions
v0.1 cfg distilled: bf16 full model.
v0.1 cd tcfp8: (has issues, do not download, will be deleted soon) cfg distilled, also a tensorcorefp8 version for ComfyUI.
