I. Overview
This model was trained with the goal of not only generating realistic human images but also producing high-quality anime-style images. Despite being fine-tuned on a specific dataset, it retains a significant amount of knowledge from the base model.
Key Features:
Supports anime image generation using Danbooru tags
Improved accuracy in placing objects correctly within the image based on prompt descriptions
Preserves a good portion of the base model's original knowledge
Limitation:
For version 0.1:
Text generation inside images is still inaccurate.
Output image quality is currently moderate and may vary depending on prompts.
Understanding of specific character prompts via Danbooru tags is limited.
II. Model Components:
Text Encoder: Pretrained Gemma-2-2B
VAE: From Flux.1 dev's VAE
Image Backbone: Fine-tuned version of Lumina's backbone
Trained on a diverse 30M-image dataset including:
Anime images (tagged with Danbooru)
Realistic human photos
Text-containing images
Images with detailed spatial annotations
III. File Information
This all-in-one file includes weights for VAE, text encoder, and image backbone. Fully compatible with ComfyUI and other systems supporting custom pipelines.
If you'd like to use this model via Hugging Face's diffusers library, click here for more details.
IV. Suggestion Settings
System Prompt
For anime (Danbooru tags):
You are an advanced assistant designed to generate high-quality images from user prompts, utilizing danbooru tags to accurately guide the image creation process .
You are an assistant designed to generate high-quality images based on user prompts and danbooru tags.
For general use:
You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts.
You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts.
Recommended Settings
CFG: 3–6
Sampling Steps: 40-50
Sampler: Euler a
V. Notes & Feedback
This is an experimental release, and I plan to improve it in future versions.
Feedback, suggestions, and prompt ideas are always welcome — your support helps make this better!
In addition to English prompts, this model also supports prompts in Chinese and Japanese.
VI. Acknowledgments
Big thanks to narugo1992 for the dataset contributions.
Credit to Alpha-VLLM for the fantastic base model architecture.
Shoutout to AngelBottomless and his team for sharing their experiments with Lumina-Illustrious, which helped guide parts of this project.
If you'd like to support my work, you can do so through Ko-fi!
Description
FAQ
Comments (6)
Looking forward for the next versions! Please eat well and don't explode 🙏🙏🙏
Also it seems that system prompt thing (the text before "<Prompt start>") kinda do nothing :/ Just curious: Why this is even needed?
Also I think model generating much more aesthetic images on higher resolutions, >1536px
Okay, prefill is actively influencing my generations, but I still can't figure out how exactly it does this.
@Scorponov Hi, the system prompt look like you guilde the model to generate images. It is the same as when you prompt on LLM
so, to use it i need gemma text encoder? Where to find right version, can you please help? Or it is baked in? So i just download the model and use it? Or do i need VAE?
Hi! With the file I published here, you just need to put it in the checkpoints folder of ComfyUI and use it like XL. For a better experience, make sure to check the guide on how to use Lumina Image v2 with ComfyUI.
我询问gemini得知lumina可以支持2K图片直接作为训练lora的素材,这是真的吗?虽然目前在测试,但我真的很需要更高分辨率的lora素材


















