I. Introduction
AnimaYume is a text-to-image model fine-tuned from Anima, a high-quality anime-style image generation model developed by CircleStone Labs. It builds upon Cosmos 2, a model developed by NVIDIA’s research team.
II. Information
For version 0.1:
This model is a preview version fine-tuned from the Anima base model using a custom dataset. Training was conducted across multiple resolutions ranging from 768 to 1280 pixels, with a primary focus around 1024. The goal of this release is to improve stability and minimize unwanted artifacts when producing high-resolution images.
Notes: All the example images at this version were generated at the resolution 1024x1536 or 1536x1024
For version 0.2:
This model is a continuation of AnimeYume v0.1. In this version, I improved the quality of my dataset and used several techniques to prevent oversaturation and low-quality outputs. Based on my testing phase, I observed that the prompt coherence is better than v0.1, and the model remains very stable when generating images at a resolution of 1536.
Note: I am still waiting for the final version of Anima and testing some methods to make my training process faster. I know the license might make the model less popular, but I only care about whether the model is good or not. I’m aware that many others use better licenses, but I’m too lazy to spend a bunch of money training a model from scratch.
For version 0.25:
This version was trained on Anima Preview 2. Due to several issues with the base model, such as overfitting, black/white borders, quality inconsistencies, and problems with artist tags, I decided to focus primarily on improving the model’s knowledge, reducing these issues, and making it as stable as possible.
Note: In this version, I did not attempt to improve the model’s style. I tried doing so, but it caused the model to forget some of its existing knowledge. The training process is similar to v0.2, but the dataset has been adjusted to better address the issues present in Anima Preview 2.
For version 0.3:
This version was trained using Anima Preview 2. It is an experiment with a new training method for the model. You can consider it as another branch of AnimeYume 0.25, developed in parallel. However, this version uses new techniques and a larger dataset compared to v0.25.
Note: In this version, I experimented with a new training approach, so the model is slightly different from v0.25. Additionally, all example images were generated using prompts shared with users on CivitAI to evaluate whether this new method.
For version 0.4:
This version was trained on Anima Preview 3 using a custom dataset. In this release, I improved prompt understanding and artist style. Based on my testing, some artist styles match my expectations, although I haven’t tested everything in detail since I’m currently quite busy :<. Additionally, I fixed several issues from Anima Preview 3 that also appeared in Preview 2.
Note: I’ve only tested with simple test cases, not comprehensively, so if you encounter any issues, feel free to let me know. I also used a larger AI computing cluster to speed up the training process :D.
All example images were generated using prompts shared by users on CivitAI, as I wanted to evaluate the model’s performance.
For version 0.5:
This version was trained on Anima Base v1.0 using my custom dataset (a mix of a small e621 dataset and Danbooru). In this release, I added many new characters and improved the existing ones. I also enhanced support for various artist styles, allowing the model to generate results that are much closer to the original styles. In addition, the model now understands some concepts and knowledge from e621, although the support is still limited.
Notes: I’ve only tested the model with a few simple test cases so far, so if you encounter any issues, feel free to let me know. This release can be considered a demo version showcasing my new training method, which focuses on preserving existing knowledge while adding new knowledge at the same time. The release also came sooner because I was finally able to use all the resources I had available :D
All example images were generated using prompts shared by users on CivitAI, as I wanted to evaluate the model’s performance using real user prompts.
III. File Information
This file contains only the diffusion model and does not include a VAE or text encoder. To use it properly, you will need to download those components from the link here
IV. Notes & Feedback
This is an experimental fine-tuned release, and I am waiting for the final version release to tune it :D
Your feedback, suggestions, and creative prompt ideas are always welcome, every contribution helps make this model even better!
V. Acknowledgments
Big thanks to narugo1992 for the dataset contributions.
Credit to Circlestone Labs and Nvidia for the fantastic base model architecture.
If you'd like to support my work, you can do so through Ko-fi!
Description
FAQ
Comments (38)
Hi everyone, I’ve released a new version. In this release, I didn’t have enough time to test the model in detail because I’m currently very busy. Based on my testing, when comparing my model with Anima Preview 3, I feel my model might perform better (this is based on my personal preference).
Here is the comparison post, each image includes the prompt I used for testing:
https://civitai.com/posts/27912586
If you don’t mind, please give it a try and share your feedback. Thanks!
From my testing, when I test POV (especially girl POV), the success rate is quite low around 20% .
The prompt adherence also seems to have decreased a bit. Is it just me?
I was able to get good results using the base Anima P3, but yeah, higher resolution looks a lot cleaner,need more testing u guess
@Seii1 Hi, Would you mind telling more details i tested about 50 images generated with tag pov and didnt see any problem?
Is this preview 3? I assume not, still will give it a ago :D
@GPUPoorChad Yes 0.4 was trained on preview 3
Hi, may I ask what trainer you used for the fine-tuning? Thank you!
@qizongzui The model was trained using a custom script, derived from the original sd-scripts repository
amazing model but you need to fix pixelated lines
Oh i generated about 300 images and evaluated by my self and gemini and didnt see the problem 😅😅. Would you mind telling more details
@duongve13112002 NVM, i forgot to add new version to comfy XD, your model is so great get all my points
You can improve the quality by changing the VAE to a different one. I recommend using the QwenimageVAE-Liquid v7 , you can find it here: https://civitai.com/models/2487530/qwenimagevaeliquid1087
@danque what about samplers and scheduler?
@Neon_signs What about them? I am using mostly ResMultistep (sampler) with normal (schedule), or Res_2m with bong_tangent (this one is not a default)
@danque yeah that's the thing I wanted to know lol,much appreciated
Great work. Compared to version 0.3, aliasing(also known as dithering) is noticeably reduced.
People sometimes complain 'without any image or prompt information' and it's good for the mind not to care.
V0.4 is just so awesome,great work, only problem I noticed so far is that it really fights to create dim lighting,dark scenarios somehow
Thank you for another great checkpoint.
Tested with character Lora - V0.4 basically takes Preview3 outputs and just makes it better (posted comparison in gallery).
Your model is great! I hope that in the future Civitai will agree to a commercial license with Circlestone-Labs for online generation of fine-tuned checkpoints and training of Lora models. But for now, locally, your model produces incredible results. 💖
Mostly amazing, but suddenly became way worse with wrong number of fingers and toes, also as others say - problems with dim lighting/overbrightened subject.
I mean, it correctly draws dim light and heavy shadows on the background but leaves characters brightly lit.
Oh, let me find the root cause. This may be related to the style-tuned phases. Because the pre-trained base i tested i didnt meet these problems
@duongve13112002 It sometimes works fine but other times simply refuses to not overbrighten, I wish I could tell more, but I fail to pinpoint why it does that. Also, the part about fingers and toes: it is not extremely worse, but noticeably worse, and especially in unusual poses. Got absolute worst cases with barefoot characters kneeling, with a view from behind. Earlier versions/base Anima seem to be doing better in that one regard. But nevertheless, amazing model, thank you for your work!
Hey, I’m not completely sure, but I’ve observed that WAI-ANIMA generates images that are about 80% similar to this model. The style and background are slightly different, but not by much. I used the same settings for both models, so I’m not sure why this is happening.
All of the fine tunes are underbaked, they are new. They will diverge and become more different from the base model as people continue training. The differences on many of them are subtle atm.
@Drakeni No, I don’t believe that’s the reason. I ran the same test using AnimaYume v0.4 and Anima Preview 3, and the differences can be clearly observed.
@thaimannguyen4672 Honestly, I tested WAI Anima and found it to be a solid merged model. As for the similarities, I think that’s expected merged models often carries over shared traits from base model. From my perspective, this approach is a valid way to refine the base model, as long as it improves overall stability.
and ignore original model license, hide all credits, pretend it's a completely different model
Wai v0.1 was a mix of 0.5 animayume v0.4 + 0.5 anima p2. This guy never trained a model, all his models are merges.
@reakaakasky Let me add some context for those might wonder.
Wai sdxl's model is based on Noobai, but he deliberately conceals the truth, violating Noobai's license. After being questioned, he lied and said his model was based on Illustrious v1.
@ikekph5 any wai sdxl alternatives do you highly recommended?
I wonder, Would you like a version with native 1536×1536 resolution support? Give me your idea :D
i think qwen vae is limited, flux2 vae is better, and klein is very handy even do 2k upscale + denoise, it can directly denoise from t 0.7 to 0 in one step
@reakaakasky I think i will test it, Anyway this is just an experiement stuff :D
Anima itself is still not there yet is it? The preview 3 made improvements but became harder to use. Might take a few months still. Great model though. It would be amazing if it could, the VAE is so much better than the 4 channel sdxl ones...
@BoundingBoxes what do you mean harder at use? For me Animeyume v0.1 and v0.4 the same simple using...
Учитывая, что текущая модель отлично из коробки поддерживает 1024*1536, как будто не будет проблемы масштабироваться до 1536*1536
Анима выглядит перспективно и интересно
If it possible, can you share script for model trainings and/or lora tanning
Actually, it depends on the dataset, so there’s no fixed configuration for it. For LoRA, I recommend using rank 16 and alpha 8 for character training and rank 32 with style, with a learning rate of 1e-4 using the AdamW optimizer, and keeping everything else as the default settings in the Kohya repo.
It's still the best fine-tuned model. It combines the stability and the diverse output quality of the official version, and its natural language processing capability remains strong!



















