I was very annoyed with weird ways JoyCaption acted on my Mac so I decided to create a simple alternative Batch Images to Caption ComfyUI workflow using Florance 2. It also works well on lower end Nvidia or AMD GPUs on Linux or Windows machines (6 / 8 GB VRAM). I think this is a very viable alternative to JoyCaption on any platform. Using this is very simple -
#1. Load your desired Florance 2 model (Florance-2-Flux-Large or Florance-2-SD3-Captioner or any other)
#2. Copy & Paste or Type the folder path where you stored your images for captioning then
#3. Set the folder path where you want save the captioned output
#4. Set your caption settings
...That's it! Now Click Run.
This will save the captioned images with their own dedicated .txt files ( Captioned_Image_01.txt for Captioned_Image_01.jpg, Captioned_Image_02.txt for Captioned_Image_02.jpg and so on) with caption that you can further review to fine tune them. The Florance 2 does a very decent job (even with text within image recognition) for Flux and Stable Diffusion 3.5 but it may have little issues with complex texts or a few illustrations so it's a good idea to double check your captions. If you are missing any nodes use ComfyUI Manager to get them, and only on your first ever run use Florance 2 downloader and loader (move node connection from regular Florance 2 Model Loader) otherwise use the regular one. Enable that one on very first (make sure to disable the regular loader) run it will get and set Florance 2 model to the correct path, on your second run you can get back to regular one. If you are interested to try out other workflows ( for HiDream, WAN, QWEN Image and Flux ) I also included a text file on the archive that has this workflow file, in which you will find even more links for essential HuggingFace downloads for my other workflows I have on my CivitAI profile.