CivArchive
    ADetailer foot_yolov8x.pt - v1.0
    NSFW
    Preview 5276658
    Preview 5276684
    Preview 5276768
    Preview 5277519
    Preview 5277573

    V3 Development Notice

    Circa:5/11/2025
    Hey everyone,

    Following up on some recent discussions, I wanted to share a quick update on the progress of the much-anticipated V3 foot model, as well as the new segmentation models for hands and faces/heads.

    I know many of you are excited, and I'm just as eager to get these into your hands! Here's where things stand:

    1. Dataset Robust & Open for Unique Additions: The comprehensive dataset of over 1000 images that will form the foundation for V3 (and the new hand/face models) has been assembled. I'm confident it covers a vast range of scenarios to ensure robust detection. That being said, if you happen to have or know of images showcasing unique poses, angles, or configurations that you think might be beneficial and possibly underrepresented, please feel free to share your suggestions! I can certainly take a look. If it’s a scenario I've overlooked and would add value, I'm open to including and annotating a few more carefully selected images. While the current foundation is very strong, an extra unique example or two won't break the process and could always help refine the models further.

    2. Annotation Workflow Optimized for Precision: After initial exploration with automated tools like SAM, I've made the decision to proceed with a fully manual annotation process for every single image. While SAM provided a starting point, the level of precision required for high-quality segmentation masks (to avoid affecting backgrounds or leaving artifacts) means that meticulous, point-by-point manual tracing is the most effective path forward. This ensures the highest possible accuracy for the masks, which is crucial for the step-up in quality I'm aiming for with V3. It's definitely painstaking, but essential for getting it right!

    3. Meticulous Annotation Underway: The detailed work of manually annotating every foot, hand, and face/head in the dataset is now my primary focus. It's a marathon, not a sprint, as each element requires careful outlining.

    4. V3 Foot Model is the Priority: As I've mentioned, my commitment is that the V3 foot segmentation model will be the very next model I release. All my LoRA and checkpoint training is on hold until V3 is complete and uploaded – that's my motivation to power through this detailed annotation phase! Once the full dataset is annotated, the foot model will be the first to be trained and released, followed by the hand and face/head models.

    5. Process Documentation in Progress: For those interested in the nitty-gritty, I'm also taking detailed notes on the entire process – from the challenging setup of the annotation tools (seriously, that was an adventure!) to the annotation strategies themselves, and eventually, the training process for these yolo-seg models. I hope to share this information down the line, as it might be helpful for others venturing into segmentation model training.

    So, the journey to V3 is well underway! It's a complex and time-intensive project, especially with the shift to precision segmentation and the expanded scope, but the goal is to deliver models that are a significant improvement and worth the wait.

    Thanks again for your incredible patience and support. I'll continue to focus on quality and will share further significant updates when I have them!

    V1/V2

    Thanks to sp00ns' guide:
    Training a Custom Adetailer Model | Civitai
    I created a custom foot model using yolov8x.

    The foot model that sp00ns provided was helpful, but I wanted to see about making my own.

    ComfyUI Workflow:

    I know a lot of you use ComfyUI, and have issues getting the model to work. So, just for you, I have reinstalled ComfyUI, and have come up with a rudimentary workflow for not only the ver.2.0 foot model, but also for hands and face. Feel free to deal with the settings as you please to get good results. Simply drag the pinned image below that resembles what you see above to your ComfyUI window to replicate the exact parameters that was used to generate said image. (I'll also post a pinned version of this image to the model page for v.2.0)

    versions 1.0 and 2.0 are BBOX models, thus be sure to place them under the ~\ComfyUI\models\ultralytics\bbox folder. Working with the SAM model means that it effectively works as a SEG model--at least that's what I think is going on. Also, be sure to install the FaceDetailer pack as well as the UltralyticsDetectorProvider node to get this to work.

    Good hunting~

    Version 1.0:

    I'd tried using AutoDistiller and Grounded SAM to automatically label each of the 1000 images, but it partially failed, in that it also registered hands as feet. (Also I hate Colab, as I can't get work done there without it ending the job prematurely)
    Therefore, I painstakingly labeled each and every image using RectLabel on my Mac, then spent about 8 hours training the YOLO model on my PC.

    Though I'd planned for 500 epochs, it ended early and determined that the best was at the 93rd epoch.

    I included a lot of my own generated images, as well as some stock images; anime, 3D models, and realistic images; male and female, varying skin tones, and various footwear configurations as well as barefoot images. That being said, there are some things it still cannot handle well, such as unconventional poses (like images rotated by 90 degrees), and images where the foot is the subject of composition. My guess is because the vast majority of the training images were of that with the feet taking up a small percentage of the canvas, not enough training was dedicated for closeups of feet. On the other hand, my intent was to use this model to refine feet that would otherwise be neglected, such as in the case of full body shots where the feet take up a tiny fraction of the canvas space.

    In short, this version is very good at dealing with feet for standing poses especially in full body shots. But it can struggle with feet outside of that range.

    Version 2.0:

    I noticed that I mislabeled my training/validation folders for version 1, so my training folder was actually my validation folder and vice-versa. I went ahead and renamed them, however simply doing so and assuming that it would take no more than 100 epochs like version 1 led to some other issues--it started detecting whole bodies as feet. So that was 3 hours down the drain. I set the epochs for 200, migrated a lot of the old validations images into the training folder, and added around 160 new images (using RectLabel to painstakingly label each and every image manually.) This time, after 12 hours, it determined that epoch 148 was the best version, so that is what this is.

    From what I've tested, it can detect feet in various configurations far better than v1.0 with few issues; it can detect soles; it can detect feet rotated by 90 degrees; and it can mostly detect feet in unconventional poses--depending on the pose.

    A few issues I've noticed, however, is that it sometimes detects hands/knees/other objects as feet, albeit at a lower confidence level than actual feet. If you see this occurring, I'd recommend increasing the Detection model confidence threshold in the Adetailer Detection settings to at least 0.5.

    For images of feet that takes up a great majority of the canvas, sometimes it detects them, sometimes it partially detects them, and sometimes it detects one but not the other. Arguably, this model wasn't designed for such images, even though such images were included in the training dataset, because what this model does is crops the total canvas to focus on its target; feet, in order to dedicate a lot of image generation to refine/modify said feet. If feet are already the focus of the image, taking up 50% or more of the total canvas, then this model effectively serves little in the way of refining the target. One can still use it for that purpose if they so please, but it might lead to more problems than solutions, depending on how you use it.

    Installation:

    Simply move the file into the ~\stable-diffusion-webui\models\adetailer folder and restart the webui. Should It definitely also works on ComfyUI; but I haven't tested it there I have tested it, and there is a workflow, see above image. Of course, you'll need the ADetailer extension for Automatic 1111, or its equivalent FaceDetailer AND UltralyticsDetectorProvider on ComfyUI for any of this to work.

    Tip: You can increase the ADetailer model count in Automatic 1111 by going to: Settings>ADetailer>Max models. It's generally advised that if you plan to do full body edits that you set the body model first before moving on other models such as the heads/hands/feet.

    Note: Civitai doesn't seem to have a category for ADetailer stuff, so I'm setting it as a checkpoint--even though it's not. The settings on pruned or full and precision stuff I just set to whatever.

    Also to note, these days stable diffusion seems to be good at doing feet at least in portrait aspect ratios, so I had a hard time coming up with a good use case for portrait. So I instead used the model to paint Tharja's toenails in the example. But, this model will be especially good for landscape aspect ratios similar to what I do normally, as the feet tend to be quite low quality there.

    Description

    v1.0: trained on 1000 of mostly full body images of people standing up with or without shoes--probably not ideal for unconventional poses, and may not detect them at the default setting, so adjust the confidence threshold as necessary.

    FAQ

    Comments (11)

    155956Jan 8, 2024· 2 reactions
    CivitAI

    very cool! i would love to see a hands and feet one combined if thats possible, since those are the things that seem to get messed up the most! good work

    Monet_Einsley
    Author
    Jan 9, 2024· 2 reactions

    Combining hands and feet models into one? That was kinda the way AutoDistiller and Grounded SAM was leading me before I put an end to that. Personally, I don't think that would be a good idea, since you'll limit yourself in what you can do with it. Like if you wanted shiny shoes, all of a sudden, your hands got shoes on them... or something like that. That being said, I believe the default hand model on ADetailer isn't that great, so I recommend getting Bingsu's hand_yolov8s model: hand_yolov8s.pt · Bingsu/adetailer at main (huggingface.co)
    Install it by dropping it into the same ~\stable-diffusion-webui\models\adetailer folder.

    And I also recommend increasing the number of ADetailer models. To do that, for Automatic 1111, go to Settings>ADetailer>Max models. Currently I have mine set to 4; person>hands/feet/face.
    In doing that, you'll achieve the same effect as what you want out of it: automatic inpainting refinement of parts that get overlooked in the base generations.

    Bingsu/adetailer at main (huggingface.co)

    (^Other good models from Bingsu that you can find above)

    All that said, even then, it's no guarantee that it'll fix hands (or feet), among other things. So, it's best to play around with the ADetailer settings and see what works consistently. Good luck, and good hunting!

    155956Jan 9, 2024· 1 reaction

    @Monet_Einsley oh wow, to be honest i had no clue that you could increase max models. thanks for the tip, and once again, thanks for the model, super useful!

    Monet_Einsley
    Author
    Jan 10, 2024

    @BakaBlitz You are most welcome! Glad I could be of service XD

    MajesticTriumphJan 11, 2024· 4 reactions
    CivitAI

    Let's be honest, we all need this, I'm tired of seeing feet all distorted and janked up. Now my biggest question is: Does it work well with sandals and open-toed footwear, specifically flip flops and any thonged footwear related?

    Monet_Einsley
    Author
    Jan 11, 2024

    It should, as my training images included all sorts of footwear, including sandals and other open-toes shoes, as well as running shoes, boots, socks, etc. This model should be perfect for you XD

    @Monet_Einsley It fails to detect quite often when heels or sandles presented in my testing.

    Monet_Einsley
    Author
    Jan 12, 2024

    @LostInHentaiThoughts Can you share an example image here? Do bear in mind, that despite me putting up close up images of feet in the training images, this model doesn't seem to recognize feet when it's the focus of the image (taking up most of the canvas area, in other words). I suspect that since a vast majority of my training images were of full body shots with people mostly standing up, hardly any of the 93 epochs had enough training of the minority cases.

    I had a training image of a top-down shot of a person laying on the floor, rotated by 90 degrees. That is to say it was in a landscape aspect ratio with the body being horizontal instead of vertical. Despite the fact I manually labeled that image's feet, the model never recognized them in training nor testing until I rotated the picture back to vertical.

    Basically, if the image you're trying to work with is a closeup shot of feet, or some other foot-focused image of that nature, this model won't work--despite me including such pictures in the training. If the image you're working with is a full body shot or some other image where the feet aren't the primary focus, this model should work, and the only things I can recommend, for this version to detect those feet if it doesn't, is to decrease the Detection model confidence threshold setting under the Detection tab in ADetailer, or in an extreme case, try rotating the canvas by 90 degrees in an editor.

    Lastly, and I've just tested it out, sp00ns' foot-yolov8l model DOES detect the edge cases that my model doesn't. Training a Custom Adetailer Model | Civitai
    On that page, it should have a link to download that particular model. Try that model and see if you get an improvement in your detections.

    To sum it up, my model was trained to find feet that wasn't the focus of the image, despite including a few (out of the thousand) images where it was the case. sp00ns' model seems to be good at feet where it is prominent, but for me it didn't detect the feet in the images I normally generate, hence why I made this model in the first place. So, try decreasing the confidence threshold, and/or using sp00n's model to see if there's any improvement for your images. Regardless of whether it works or not, it would also be helpful if I can see what sorts of images you have that this model has trouble with.

    Let me know if any of that helps, ok?

    Tozi_WhiteFeb 12, 2024· 1 reaction
    CivitAI

    I've used this other ADetailer for feet so far and it almost always detected feet - it varied in its accuracy but at least it detected. Here I tried to do really many renders and your module hardly detects feet at all. It shows all the time that it doesn't detect feet. One in 20 renders detects feet. I tryed in SDXL Turbo realistic.

    Monet_Einsley
    Author
    Feb 12, 2024

    Yes, other users have stated something similar. My guess is that this is so, because although I used a variety of images in training, it still ignores feet in certain scenarios; where feet take up the majority of the image; bottom of feet; where feet are raised up vertically; and certain poses/images where the feet are rotated by 90 degrees or more. I've no clue why this is the case, since I used all of such images in training--perhaps it simply only trained on a portion of the dataset or something.

    This model, however, is very adept at detecting feet in standing poses, and the occasional sitting/laying down pose as seen in the example images. That being said, it's no guarantee that it will improve the feet by itself, even if it detects them.

    Judging by the images I see on your profile, I'm almost certain this model would be incompatible with those images, at least at the default setting. Perhaps lowering the detection model confidence threshold in ADetailer might help.

    If in the very likely scenario that that still doesn't work, the only other thing I could suggest is using sp00ns' foot model here: Training a Custom Adetailer Model | Civitai

    Sp00n's model is very good at detecting the sorts of images of feet that this model struggles with; likewise, this model, from what I can tell, is very good at detecting images of feet that the other model struggles with.

    Arguably, I probably should retrain this model to ensure that all my images in the dataset are utilized in the training, with none of them being ignored.

    Monet_Einsley
    Author
    Feb 12, 2024· 7 reactions
    CivitAI

    I think I found the issue why this first version has a hard time detecting certain images. Took a hard look at my dataset and realized I misnamed the folders. My training folder was my validation folder and vice-versa. I've gone ahead and properly renamed them and added ~160 new training images for good measure. Gonna hunker down and retrain the model and see if that fixes the issue.

    I'll then test it, and if it works, I'll post the new version here.

    Apologies to all who had a hard time with the model, and let's hope this will fix the issue.

    Other
    Other

    Details

    Downloads
    1,775
    Platform
    CivitAI
    Platform Status
    Available
    Created
    1/8/2024
    Updated
    6/12/2026
    Deleted
    -