Anima is a 2 billion parameter text-to-image model created via a collaboration between CircleStone Labs and Comfy Org. It is focused mainly on anime concepts, characters, and styles, but is also capable of generating a wide variety of other non-photorealistic content. The model is designed for making illustrations and artistic images, and will not work well at realism.

It is trained on several million anime images and about 800k non-anime artistic images. No synthetic data was used for training. The knowledge cut-off date for the anime training data is September 2025.

This preview version is an intermediate model checkpoint. The model is still training and the final version will improve, especially for fine details and overall aesthetics.

Preview2

The preview2 version is a small upgrade to the first preview.

A significant part of the training is redone with different hyperparameters and techniques, designed to help make the model more robust to finetuning.
It is trained for much longer at medium resolutions in order to acquire more character knowledge.
A regularization dataset is introduced to improve natural language comprehension and help preserve non-anime knowledge.
It has the same resolution limitations as the first preview. It is trained only briefly at 1024 resolution. Going much beyond this will cause the model to break down.
This is a base model with no aesthetic tuning. It is designed to be wild and creative, with the maximum possible breadth of knowledge. It is not optimized to produce aesthetic or consistent images.

Installing and running

Workflow:The model is natively supported in ComfyUI. The above image contains a workflow; you can open it in ComfyUI or drag-and-drop to get the workflow. The model files go in their respective folders inside your model directory:

anima-preview.safetensors goes in ComfyUI/models/diffusion_models
qwen_3_06b_base.safetensors goes in ComfyUI/models/text_encoders
qwen_image_vae.safetensors goes in ComfyUI/models/vae (this is the Qwen-Image VAE, you might already have it)

Generation settings

The preview version should be used at about 1MP resolution. E.g. 1024x1024, 896x1152, 1152x896, etc.
30-50 steps, CFG 4-5.
A variety of samplers work. Some of my favorites:

Prompting

The model is trained on Danbooru-style tags, natural language captions, and combinations of tags and captions.

Tag order

[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]

Within each tag section, the tags can be in arbitrary order.

Quality tags

Human score based: masterpiece, best quality, good quality, normal quality, low quality, worst quality

PonyV7 aesthetic model based: score_9, score_8, ..., score_1

You can use either the human score quality tags, the aesthetic model tags, both together, or neither. All combinations work.

Time period tags

Specific year: year 2025, year 2024, ...

Period: newest, recent, mid, early, old

Meta tags

highres, absurdres, anime screenshot, jpeg artifacts, official art, etc

Safety tags

safe, sensitive, nsfw, explicit

Artist tags

Prefix artist with @. E.g. "@big chungus". You must put @ in front of the artist. The effect will be very weak if you don't.

Full tag example

year 2025, newest, normal quality, score_5, highres, safe, 1girl, oomuro sakurako, yuru yuri, @nnn yryr, smile, brown hair, hat, solo, fur-trimmed gloves, open mouth, long hair, gift box, fang, skirt, red gloves, blunt bangs, gloves, one eye closed, shirt, brown eyes, santa costume, red hat, skin fang, twitter username, white background, holding bag, fur trim, simple background, brown skirt, bag, gift bag, looking at viewer, santa hat, ;d, red shirt, box, gift, fur-trimmed headwear, holding, red capelet, holding box, capelet

Tag dropout

The model was trained with random tag dropout. You don't need to include every single relevant tag for the image.

Dataset tags

To improve style and content diversity, the model was additionally trained on two non-anime datasets: LAION-POP (specifically the ye-pop version) and DeviantArt. Both were filtered to exclude photos. Because these datasets are qualitatively different from anime datasets, captions from them have been labeled with a "dataset tag". This occurs at the very beginning of a prompt followed by a newline. Optionally, the second line can contain either the image alt-text (ye-pop) or the title of the work (DeviantArt). Examples:

ye-pop
For Sale: Others by Arun Prem
Abstract, oil painting of three faceless, blue-skinned figures. Left: white, draped figure; center: yellow-shirted, dark-haired figure; right: red-veiled, dark-haired figure carrying another. Bold, textured colors, minimalist style.

deviantart
Flame
Digital painting of a fiery dragon with glowing yellow eyes, black horns, and a long, sinuous tail, perched on a glowing, molten rock formation. The background is a gradient of dark purple to orange.

Natural language prompting tips

If using pure natural langauge, more descriptive is better. Aim for at least 2 sentences. Extremely short prompts can give unexpected results (this will be better in the final version).
You can mix tags and natural language in arbitrary order.
You can put quality / artist tags at the beginning of a natural language prompt.
Name a character, then describe their basic appearance.

Model comparison

You may be interested in comparing Anima's outputs with other models. A ComfyUI workflow, anima_comparison.json, is provided. This workflow generates a grid of images where each model is a column and the rows are different seeds. It can be configured to compare any number of models you select by changing a few output nodes. Supported model architectures: Anima, SDXL, Lumina, Chroma, Newbie-Image. The default configuration compares Anima, NetaYume, and Newbie-Image.

Limitations

The model doesn't do realism well. This is intended. It is an anime / illustration / art focused model.
The model may generate undesired content, especially if the prompt is short or lacking details.
The model isn't great at text rendering. It can generally do single words and sometimes short phrases, but lengthy text rendering won't work well.
The preview model isn't that good at higher resolutions yet.
The preview model is a true base model. It hasn't been aesthetic tuned on a curated dataset. The default style is very plain and neutral, which is especially apparent if you don't use artist or quality tags.

License

This model is licensed under the CircleStone Labs Non-Commercial License. The model and derivatives are only usable for non-commercial purposes. Additionally, this model constitutes a "Derivative Model" of Cosmos-Predict2-2B-Text2Image, and therefore is subject to the NVIDIA Open Model License Agreement insofar as it applies to Derivative Models.

The details of the commercial licensing process are still being worked out. For now, you can express your interest in acquiring a commercial license by emailing [email protected]

Built on NVIDIA Cosmos.