CivArchive
    Akanezora - v0.55:B (FP8 & INT8)
    NSFW
    Preview 135363423
    Preview 135412825
    Preview 135363440
    Preview 135369271
    Preview 135368970
    Preview 135367441
    Preview 135363722
    Preview 135365889
    Preview 135368478
    Preview 135366326
    Preview 135366625
    Preview 135367112
    Preview 135363475

    Akanezora-Anima


    Akanezora is a full Anima DiT fine-tune trained entirely on a single RTX 3060 with 12GB of VRAM using Aozora. It is released both as a usable model and as proof that full Anima DiT fine-tuning can be done on consumer hardware.

    If you're interested in fine-tuning your own model, the training code is available here: [Aozora_SDXL_Trainer]

    Quantized Versions

    Use torch.compile node for faster generation if supported. Speedup depends on GPU architecture, PyTorch/ComfyUI support, and whether the quantized op is native or emulated.

    FP8 E4M3: Best on GPUs with native FP8 Tensor Cores (RTX 40-series/Ada or newer, Hopper). Without native support, FP8 emulation may cut VRAM but rarely speeds things up.

    Saved as mixed-precision FP8 E4M3 with activation-aware layer selection, high-impact Linear layers quantized, sensitive layers kept BF16. Seeds won't match the base model exactly; expect minor quality loss and more seed variance.

    INT8: Uses ComfyUI's native int8_tensorwise mixed-precision loading (Requires comfyui version V0.27.0 and up), more numerically conservative than FP8, still saves VRAM with possible speed gains on supported hardware.

    Saved with calibrated tensorwise INT8 (per-output-channel weight scales), same activation-aware selection, with norms/biases/embeddings/positional buffers/final layers kept BF16. Closer to BF16 output than FP8, but seed matching still isn't guaranteed.


    Branch A vs B

    Branch A: follows the recommended Anima tuning setup with bell-weighted loss and non-uniform timestep sampling.
    Branch B: uses my experimental setup with uniform timestep sampling and uniform loss weighting, giving the model more even exposure across the full noise range. In my testing, this improves visual style and composition, but is slower and harder to tune.

    Version 0.55b Preview

    This is a 55% training checkpoint continued from the 0.5a checkpoint, trained on 15k images using experimental Branch B settings. The dataset consists of 50% Danbooru-tagged images with generated natural language and 50% hand-tagged / natural language-captioned images. During training, text conditioning was split into 90% tags and 10% natural language prompts.

    While this release is functional, it should be considered a work in progress rather than a finished product. It is being shared early to demonstrate the viability of the training method and to showcase the Aozora trainer’s ability to fine-tune Anima DiT models on low-VRAM hardware.

    Pros:

    • Reduces unwanted text generation by around 70%, reducing the need for heavy negative prompts.

    • More responsive to Danbooru-style tags.

    • Slightly more dynamic seed variation due to soft conditioning.

    • More SDXL-inclined generation style.

    • Better overall composition and prompt feel across varied prompts.

    • Improved NSFW output quality.

    Cons:

    • Some seeds may still closely resemble the base model.

    • Lighting effects often need to be prompted directly.

    • Still early in training and may hallucinate content.

    0.55b Training Settings

    Base Model: Akanezora V0.5a
    Training Hours: Unknown (Power went out so it took 2x longer, estimated around 50 hours)
    GPU Used: NVIDIA GeForce RTX 3060 (12 GB) | Driver version: 32.0.15.9636

    VRAM Usage: ~11.4GB

    Mixed Precision: bfloat16

    Batch Size: 1

    Gradient Accumulation: 4

    Learning Rate: 6e-6
    Timestep: Uniform
    Loss: Uniform

    Optimizer: Raven[AdamW float32 variant with offloading] | (betas:0.9, 0.999 | eps:1e-08 | Weight Decay: 0.01| Debias: 1.0)

    Max Train Steps: 201010 (Completed:115282)

    Current Checkpoint: ~55% through planned training

    Trainable Parameters: - (P: 1,956,405,248 | P Frozen: 6.44% [llm_adapter.*])

    Soft text cond: ( 0.75 - 1.25)

    Dataset Size: 15164
    Training Resolution: 1152x1152 (Aspect Ratio Bucketed: 864x1536 to 1536x864)

    VRAM saving techniques: (Momentum offloading, bfp16 mixed precision, pre-caching VAE and text encoders, Gradient Checkpointing)

    Sampler: ER_SDE
    Scheduler: Beta

    Steps: 15-50

    CFG: 3-5

    Negative Prompt: worst quality, low quality, lowres, score_1, score_2, score_3, blurry, jpeg artifacts
    Note: You need to use qwen_3_06b_base.safetensors for text encoder, and qwen_image_vae.safetensors for VAE.


    Model Transparency Notice

    For transparency, this release includes the training setup and links back to the open-source trainer/code used to create it. This is a full fine-tune checkpoint with no LoRA, LoKR, LyCORIS, or model merge operations applied.

    This model:

    • Training started with unmodified base weights

    • Zero merge operations applied

    • No LoRA adapters — ever

    • Full end-to-end training, no sublayer freezing besides the required (llm_adapter)

    Notes

    Feedback is welcome, especially on prompt following, anatomy, hands, style consistency, repeated patterns, overfitting, and behavior without heavy negative prompts.

    License

    This model follows the license of its base model, Anima. Review and comply with the base model terms before using or redistributing.

    Description

    Mixed-precision post-training quantization for Anima DiT.

    FP8 version: calibrated FP8 E4M3 quantization. Large compatible Linear/matrix weights were quantized to FP8 E4M3 with per-layer scaling after no-grad Anima DiT forward-pass calibration on representative dataset samples.

    INT8 version: calibrated tensorwise INT8 quantization. Large compatible Linear/matrix weights were quantized to INT8 using per-output-channel weight scales after the same no-grad Anima DiT calibration pass.

    For both versions, norms, biases, embeddings, positional embedding buffers, final layers, and calibration-sensitive layers remain BF16. Files are saved with ComfyUI mixed-precision quantization metadata, including weight_scale, comfy_quant, and quantization metadata entries for native mixed-precision loading.

    FAQ

    Checkpoint
    Anima

    Details

    Downloads
    142
    Platform
    CivitAI
    Platform Status
    Available
    Created
    6/30/2026
    Updated
    7/2/2026
    Deleted
    -

    Files

    akanezora_v055BFP8INT8.safetensors

    Mirrors

    HuggingFace (72 mirrors)
    CivitAI (73 mirrors)

    akanezora_v055BFP8INT8.safetensors

    Mirrors

    akanezora_v055BFP8INT8.safetensors

    Mirrors

    HuggingFace (157 mirrors)
    CivitAI (78 mirrors)
    TensorFiles (1 mirrors)
    ModelScope (1 mirrors)

    akanezora_v055BFP8INT8.safetensors

    Mirrors