Anime foot detector (YOLO11m) — ADetailer / Impact Pack

Also on Hugging Face (same files + ONNX, I can't upload ONNX files in this repo for some reason): https://huggingface.co/Claquasse/foot_anime_yolo

A small YOLO11m detector that finds feet in anime and illustration images (single class foot). Drop it into an ADetailer or Impact Pack pass to auto-fix feet, which diffusion models often render badly. Built for the Anima model, but it works on anime-style art in general, so it should transfer to other anime or illustration generators.

Three versions are provided. v3 is the one to use — best box accuracy and the widest coverage. v2 is the previous best and a little better at plain, clearly visible feet. v1 is the first and weakest, kept for reference.

Files

Each version ships as .pt (load directly in ComfyUI or Ultralytics) and .onnx (non-pickle, for ONNX Runtime).

- foot_anime_yolo11m_v3.pt — production (recommended)

- foot_anime_yolo11m_v2.pt — previous production

- foot_anime_yolo11m_v1.pt — reference

Install

ComfyUI (Impact Pack): put the .pt in ComfyUI/models/ultralytics/bbox/, load it with UltralyticsDetectorProvider, and feed the bounding box into a detail or inpaint pass. A bbox threshold near 0.45 is a sensible default.

A1111 / Forge (ADetailer): put the .pt in stable-diffusion-webui/models/adetailer/ and select it as the ADetailer model.

Benchmark

Held-out set of 100 generated anime images (185 feet), none of which the models trained on. Scores are mAP50 / mAP50-95.

| model | mAP50 | mAP50-95 |

|---|---|---|

| v1 | 0.28 | 0.08 |

| v2 | 0.81 | 0.50 |

| v3 | 0.81 | 0.59 |

v3 has the tightest boxes in every image type and matches or beats v2 at finding feet. Open-toe footwear is the hardest case for all of them.

The preview images show all three versions plus a generic YOLOv8x foot detector run on the same frame at once (red = v3, green = v2, blue = v1, yellow = generic YOLOv8x reference), so you can see how they compare.

Notes and scope

Trained on bare anime feet, mined from Danbooru and labeled with DWPose keypoints, plus the public-domain ANFDet set, a few hundred hand-labeled images, and feet-free images as hard negatives. v3 was trained on roughly 286k images. Footwear, sandals, and stockings sit outside the primary case, though v3 generalizes to them noticeably better than v1 or v2. Tuned for anime and illustration, not photographs.

The boxes are meant to feed a refiner, not to stand alone. v2 and v3 draw slightly looser boxes that wrap the whole foot, which is what you want for an inpaint pass.

License: AGPL-3.0 (inherited from Ultralytics YOLO). If you serve these weights over a network, AGPL's source-availability terms apply. The AGPL license is the authoritative one regardless of the toggles on this page.

Support

Building these means mining and labeling hundreds of thousands of images and renting GPUs to train on them, which takes real time and money. If the models are useful to you and you want to chip in, it is appreciated and never expected: https://ko-fi.com/claquasse

Anime foot detector (YOLO11m) — ADetailer / Impact Pack

Files

Install

Benchmark

Notes and scope

Support

Description

FAQ

Details

Files

footAnimeYolo11m_v3.pt

Mirrors

Anime foot detector (YOLO11m) — ADetailer / Impact Pack

Files

Install

Benchmark

Notes and scope

Support

Description

FAQ

What is Foot Anime Yolo11m?

What files are available and where can I download them?

Details

Files

footAnimeYolo11m_v3.pt

Mirrors