ADetailer foot_yolov8x.pt - CivArchive (CivitAI Archive)

ADetailer foot_yolov8x.pt - v2.0

NSFW

V3 Development Notice

Circa:5/11/2025
Hey everyone,

Following up on some recent discussions, I wanted to share a quick update on the progress of the much-anticipated V3 foot model, as well as the new segmentation models for hands and faces/heads.

I know many of you are excited, and I'm just as eager to get these into your hands! Here's where things stand:

Dataset Robust & Open for Unique Additions: The comprehensive dataset of over 1000 images that will form the foundation for V3 (and the new hand/face models) has been assembled. I'm confident it covers a vast range of scenarios to ensure robust detection. That being said, if you happen to have or know of images showcasing unique poses, angles, or configurations that you think might be beneficial and possibly underrepresented, please feel free to share your suggestions! I can certainly take a look. If it’s a scenario I've overlooked and would add value, I'm open to including and annotating a few more carefully selected images. While the current foundation is very strong, an extra unique example or two won't break the process and could always help refine the models further.
Annotation Workflow Optimized for Precision: After initial exploration with automated tools like SAM, I've made the decision to proceed with a fully manual annotation process for every single image. While SAM provided a starting point, the level of precision required for high-quality segmentation masks (to avoid affecting backgrounds or leaving artifacts) means that meticulous, point-by-point manual tracing is the most effective path forward. This ensures the highest possible accuracy for the masks, which is crucial for the step-up in quality I'm aiming for with V3. It's definitely painstaking, but essential for getting it right!
Meticulous Annotation Underway: The detailed work of manually annotating every foot, hand, and face/head in the dataset is now my primary focus. It's a marathon, not a sprint, as each element requires careful outlining.
V3 Foot Model is the Priority: As I've mentioned, my commitment is that the V3 foot segmentation model will be the very next model I release. All my LoRA and checkpoint training is on hold until V3 is complete and uploaded – that's my motivation to power through this detailed annotation phase! Once the full dataset is annotated, the foot model will be the first to be trained and released, followed by the hand and face/head models.
Process Documentation in Progress: For those interested in the nitty-gritty, I'm also taking detailed notes on the entire process – from the challenging setup of the annotation tools (seriously, that was an adventure!) to the annotation strategies themselves, and eventually, the training process for these yolo-seg models. I hope to share this information down the line, as it might be helpful for others venturing into segmentation model training.

So, the journey to V3 is well underway! It's a complex and time-intensive project, especially with the shift to precision segmentation and the expanded scope, but the goal is to deliver models that are a significant improvement and worth the wait.

Thanks again for your incredible patience and support. I'll continue to focus on quality and will share further significant updates when I have them!

V1/V2

Thanks to sp00ns' guide:
Training a Custom Adetailer Model | Civitai
I created a custom foot model using yolov8x.

The foot model that sp00ns provided was helpful, but I wanted to see about making my own.

ComfyUI Workflow:

I know a lot of you use ComfyUI, and have issues getting the model to work. So, just for you, I have reinstalled ComfyUI, and have come up with a rudimentary workflow for not only the ver.2.0 foot model, but also for hands and face. Feel free to deal with the settings as you please to get good results. Simply drag the pinned image below that resembles what you see above to your ComfyUI window to replicate the exact parameters that was used to generate said image. (I'll also post a pinned version of this image to the model page for v.2.0)

versions 1.0 and 2.0 are BBOX models, thus be sure to place them under the ~\ComfyUI\models\ultralytics\bbox folder. Working with the SAM model means that it effectively works as a SEG model--at least that's what I think is going on. Also, be sure to install the FaceDetailer pack as well as the UltralyticsDetectorProvider node to get this to work.

Good hunting~

Version 1.0:

I'd tried using AutoDistiller and Grounded SAM to automatically label each of the 1000 images, but it partially failed, in that it also registered hands as feet. (Also I hate Colab, as I can't get work done there without it ending the job prematurely)
Therefore, I painstakingly labeled each and every image using RectLabel on my Mac, then spent about 8 hours training the YOLO model on my PC.

Though I'd planned for 500 epochs, it ended early and determined that the best was at the 93rd epoch.

I included a lot of my own generated images, as well as some stock images; anime, 3D models, and realistic images; male and female, varying skin tones, and various footwear configurations as well as barefoot images. That being said, there are some things it still cannot handle well, such as unconventional poses (like images rotated by 90 degrees), and images where the foot is the subject of composition. My guess is because the vast majority of the training images were of that with the feet taking up a small percentage of the canvas, not enough training was dedicated for closeups of feet. On the other hand, my intent was to use this model to refine feet that would otherwise be neglected, such as in the case of full body shots where the feet take up a tiny fraction of the canvas space.

In short, this version is very good at dealing with feet for standing poses especially in full body shots. But it can struggle with feet outside of that range.

Version 2.0:

I noticed that I mislabeled my training/validation folders for version 1, so my training folder was actually my validation folder and vice-versa. I went ahead and renamed them, however simply doing so and assuming that it would take no more than 100 epochs like version 1 led to some other issues--it started detecting whole bodies as feet. So that was 3 hours down the drain. I set the epochs for 200, migrated a lot of the old validations images into the training folder, and added around 160 new images (using RectLabel to painstakingly label each and every image manually.) This time, after 12 hours, it determined that epoch 148 was the best version, so that is what this is.

From what I've tested, it can detect feet in various configurations far better than v1.0 with few issues; it can detect soles; it can detect feet rotated by 90 degrees; and it can mostly detect feet in unconventional poses--depending on the pose.

A few issues I've noticed, however, is that it sometimes detects hands/knees/other objects as feet, albeit at a lower confidence level than actual feet. If you see this occurring, I'd recommend increasing the Detection model confidence threshold in the Adetailer Detection settings to at least 0.5.

For images of feet that takes up a great majority of the canvas, sometimes it detects them, sometimes it partially detects them, and sometimes it detects one but not the other. Arguably, this model wasn't designed for such images, even though such images were included in the training dataset, because what this model does is crops the total canvas to focus on its target; feet, in order to dedicate a lot of image generation to refine/modify said feet. If feet are already the focus of the image, taking up 50% or more of the total canvas, then this model effectively serves little in the way of refining the target. One can still use it for that purpose if they so please, but it might lead to more problems than solutions, depending on how you use it.

Installation:

Simply move the file into the ~\stable-diffusion-webui\models\adetailer folder and restart the webui. ~~Should~~ It definitely also works on ComfyUI; ~~but I haven't tested it there~~ I have tested it, and there is a workflow, see above image. Of course, you'll need the ADetailer extension for Automatic 1111, or ~~its equivalent~~ FaceDetailer AND UltralyticsDetectorProvider on ComfyUI for any of this to work.

Tip: You can increase the ADetailer model count in Automatic 1111 by going to: Settings>ADetailer>Max models. It's generally advised that if you plan to do full body edits that you set the body model first before moving on other models such as the heads/hands/feet.

Note: Civitai doesn't seem to have a category for ADetailer stuff, so I'm setting it as a checkpoint--even though it's not. The settings on pruned or full and precision stuff I just set to whatever.

Also to note, these days stable diffusion seems to be good at doing feet at least in portrait aspect ratios, so I had a hard time coming up with a good use case for portrait. So I instead used the model to paint Tharja's toenails in the example. But, this model will be especially good for landscape aspect ratios similar to what I do normally, as the feet tend to be quite low quality there.

Description

Installation: Simply move the file into the ~\stable-diffusion-webui\models\adetailer folder and restart the webui.

This version was created since I noticed that my training images where mistakenly labelled as my validation images, and vice-versa. Initially, I simply switched the names around and tried for 100 epochs (since version 1 stopped at 93). However, that led to some problems, and not only did it detect feet, but it detected whole bodies as though it was feet as well, so I tried something different.

For this version, I transferred a lot of the old validation images into the training images, and tried for 200 epochs, but it determined that epoch #148 was the best. Though this version has its own problems here and there, such as occasionally detecting hands and knees as feet, it now seems to detect feet better than v1.0 and the interim version.

If you find that it's detecting hands/knees/other object falsely as feet, I'd recommend increasing the Detection model confidence threshold setting in ADetailer to at least 0.5 and/or using advanced prompt/controlnet methods.

This model isn't perfect, but it should be an improvement to v1.0 for those who desire a foot detection model outside of just simple standing/sitting poses.

Note:

This site's being very buggy today, so to differentiate the buggy v2.0 stuff, I'm temporarily calling this v2.2 so I can see what I'm working with here (there's way too many duplicates, and they won't go away when I delete them). If those other versions go away, or this one is successfully uploaded, then I'll rename this version back to v2.0

FAQ

Comments (92)

MysticDaedraFeb 23, 2024· 5 reactions

CivitAI

My suggestion would be to have it detect two feet/shoes at once, so that it can generate shoes that match. Very often (90% of the time) the shoes don't really match super closely, especially on larger upscaled images. I've had the same issue with eyes with detectors that detect individual eyes instead of both at the same time.

V3 Development Notice

V1/V2

ComfyUI Workflow:

Version 1.0:

Version 2.0:

Installation:

Description

FAQ

What is ADetailer foot_yolov8x.pt?

What files are available and where can I download them?

Comments (92)

Details

Files

adetailerFootYolov8x_v20.pt

Mirrors

Available On (1 platform)