What is this?
https://github.com/AbstractEyes/anima-trainer
A tool for using JSON with Anima. This model does not require JSON, however it does provide added beneficial control WITH JSON while simultaneously being capable at many new plain English prompting.
Trained with the same trainer as Anima was trained with originally - diffusion-pipe, snapped together with a new dataset organization system so I could run it in either Runpod or notebooks.
The trigger word is NOT the exact token "JSON", it's literal json in string form.
Prelim 1k
https://huggingface.co/datasets/AbstractPhil/diffusion-pretrain-set-ft1
This is 1k images randomly sampled and subject-bucketed from the 80k image dataset "qwen_90k" that will be trained next.
https://huggingface.co/AbstractPhil/Qwen3.5-0.8B-json-captioner
Each of the images were captioned using the VLM's VIT for a JSON outputted system and additionally a variant of AnimeTIMM VIT also captioned and then processed into JSON as well.
12 epochs on the VLM JSON captions, same images back in for 8 more epochs with AnimeTIMM JSON. This is the results from subject-bucketing with json.
More specifically
https://huggingface.co/blog/AbstractPhil/subject-bucketing
This is a subject-bucket trained JSON finetune.
The specific targets are meant to provide better accuracy and more fidelity to finetunes experimentally while simultaneously training a proof-of-concept paradigm related to subject-bucketing.
TLDR Subject Bucketing
Dataset, balancing. Normally you end up with a series of, problems from finetunes. Breakpoints, kinks, issues, distortions, faults, and so on.
This is meant as an experiment to solve those exact problems. By finetuning a model with JSON, you provide a form of differentiated perspective to the AI. By grouping subjects to a more complex paradigm as stated in the article - the differentiation becomes robust.
A little longer, still short.
Each token separator is another format of language that QWEN already understands and recognizes. The more you combine in sequence, the more QWEN will understand this process - providing more utilizable structure to the diffusion system.
With robust and orderly encodings provided to the diffusion system that include differentiated lesser-used tokens in conjunction with more common-use tokens, the more powerful the training results in useful outcomes.
Why?
The smaller-scale non-bucketed variants were successful, so it's time to train the real thing. The tool itself, and the tool yields.
Now the first 1k image train for the direct tool has been successful. The results are yielding and powerful. This merits a full uptick in training.



















