CivArchive
    Akanezora - Akanezora v0.55:B
    NSFW
    Preview 133276300
    Preview 133276312
    Preview 133276314
    Preview 133276321
    Preview 133276323
    Preview 133276324
    Preview 133276334
    Preview 133276357
    Preview 133276367
    Preview 133276371
    Preview 133276377
    Preview 133276385
    Preview 133276388
    Preview 133276393
    Preview 133276427
    Preview 133276658

    Akanezora-Anima


    Akanezora is a full Anima DiT fine-tune trained entirely on a single RTX 3060 with 12GB of VRAM using Aozora. It is released both as a usable model and as proof that full Anima DiT fine-tuning can be done on consumer hardware.

    If you're interested in fine-tuning your own model, the training code is available here: [Aozora_SDXL_Trainer]


    Branch A vs B

    Branch A: follows the recommended Anima tuning setup with bell-weighted loss and non-uniform timestep sampling.
    Branch B: uses my experimental setup with uniform timestep sampling and uniform loss weighting, giving the model more even exposure across the full noise range. In my testing, this improves visual style and composition, but is slower and harder to tune.

    Version 0.55b Preview

    This is a 55% training checkpoint continued from the 0.5a checkpoint, trained on 15k images using experimental Branch B settings. The dataset consists of 50% Danbooru-tagged images with generated natural language and 50% hand-tagged / natural language-captioned images. During training, text conditioning was split into 90% tags and 10% natural language prompts.

    While this release is functional, it should be considered a work in progress rather than a finished product. It is being shared early to demonstrate the viability of the training method and to showcase the Aozora trainer’s ability to fine-tune Anima DiT models on low-VRAM hardware.

    Pros:

    • Reduces unwanted text generation by around 70%, reducing the need for heavy negative prompts.

    • More responsive to Danbooru-style tags.

    • Slightly more dynamic seed variation due to soft conditioning.

    • More SDXL-inclined generation style.

    • Better overall composition and prompt feel across varied prompts.

    • Improved NSFW output quality.

    Cons:

    • Some seeds may still closely resemble the base model.

    • Lighting effects often need to be prompted directly.

    • Still early in training and may hallucinate content.

    0.55b Training Settings

    Base Model: Akanezora V0.5a
    Training Hours: Unknown (Power went out so it took 2x longer, estimated around 50 hours)
    GPU Used: NVIDIA GeForce RTX 3060 (12 GB) | Driver version: 32.0.15.9636

    VRAM Usage: ~11.4GB

    Mixed Precision: bfloat16

    Batch Size: 1

    Gradient Accumulation: 4

    Learning Rate: 6e-6
    Timestep: Uniform
    Loss: Uniform

    Optimizer: Raven[AdamW float32 variant with offloading] | (betas:0.9, 0.999 | eps:1e-08 | Weight Decay: 0.01| Debias: 1.0)

    Max Train Steps: 201010 (Completed:115282)

    Current Checkpoint: ~55% through planned training

    Trainable Parameters: - (P: 1,956,405,248 | P Frozen: 6.44% [llm_adapter.*])

    Soft text cond: ( 0.75 - 1.25)

    Dataset Size: 15164
    Training Resolution: 1152x1152 (Aspect Ratio Bucketed: 864x1536 to 1536x864)

    VRAM saving techniques: (Momentum offloading, bfp16 mixed precision, pre-caching VAE and text encoders, Gradient Checkpointing)


    v0.50 settings: bf16 mixed precision, batch 1, grad accum 4, LR 5e-6, Raven AdamW offload optimizer, wave timestep schedule, soft text conditioning 0.75–1.25, 1152 bucketed training from 864x1536 to 1536x864, VAE/text encoder pre-cache, gradient checkpointing, and momentum offloading.

    Sampler: ER_SDE
    Scheduler: Beta

    Steps: 15-50

    CFG: 3-5

    Negative Prompt: worst quality, low quality, lowres, score_1, score_2, score_3, blurry, jpeg artifacts
    Note: You need to use qwen_3_06b_base.safetensors for text encoder, and qwen_image_vae.safetensors for VAE.


    Model Transparency Notice

    For transparency, this release includes the training setup and links back to the open-source trainer/code used to create it. This is a full fine-tune checkpoint with no LoRA, LoKR, LyCORIS, or model merge operations applied.

    This model:

    • Training started with unmodified base weights

    • Zero merge operations applied

    • No LoRA adapters — ever

    • Full end-to-end training, no sublayer freezing besides the required (llm_adapter)

    Notes

    Feedback is welcome, especially on prompt following, anatomy, hands, style consistency, repeated patterns, overfitting, and behavior without heavy negative prompts.

    License

    This model follows the license of its base model, Anima. Review and comply with the base model terms before using or redistributing.

    Description

    • Following the 50% checkpoint, I have transitioned to a more optimized configuration. I determined that the default Anima fine-tuning parameters previously utilized by the Anima base, were suboptimal in my opinion. I have since pivoted to a more primitive SDXL-derivative timestep and loss formula. Early results are promising, showing improved seed uniqueness and composition while successfully retaining the core Anima knowledge base. Additionally, I pruned the dataset from 36k to 15k images to mitigate prompt and Style similarity bleeding.

    FAQ

    Checkpoint
    Anima

    Details

    Downloads
    188
    Platform
    CivitAI
    Platform Status
    Available
    Created
    6/9/2026
    Updated
    6/11/2026
    Deleted
    -

    Files

    akanezora_V055B_txt.safetensors

    Mirrors

    HuggingFace (58 mirrors)

    akanezora_V055B.safetensors

    Mirrors

    HuggingFace (1 mirrors)
    CivitAI (1 mirrors)

    qwen_image_vae.safetensors

    Mirrors

    HuggingFace (126 mirrors)
    ModelScope (1 mirrors)