A fully offline, portable desktop application for generating high-quality image captions using the UNCENSORED Qwen3-VL 8B vision-language model. Built with a professional PyQt6 dark-themed GUI, GGUF quantized model inference via llama-cpp-python, and full CUDA GPU acceleration.
Designed for AI artists, dataset curators, CHROMA, PONY, QWEN, ZIT, ZIB, Stable Diffusion / Flux trainers who need accurate, customizable captions for their image datasets.
Up-to-date, highly accurate, less hallucinations, better visual encoder, FASTER THAN JoyCaption Beta One & ANY OTHER JoyCaption PORTABLE VERSION.
THIS IS BUILT BY A REAL TECHNICAL ENGINEER w/ 16 YRS OF EXP.
Download link below for application.
https://github.com/GitDonkeyHubbed/qwen3vl-captioner
🚀 What's New in V1.2.0
This release brings a major overhaul to how captions are generated, focusing on accuracy, anatomy, and detail over "storytelling" fluff.
🏥 Clinical Precision Mode
We've completely rewritten the prompts for all models (Flux, Stable Diffusion, Pony, etc.). Instead of "cinematic" or "moody" descriptions, the engine now focuses on:
Physical Reality: Exact shapes, textures, and spatial relations.
Accurate Anatomy: Detailed descriptions of bodies and poses without euphemisms.
Objective Detail: "Horses through" the image content, listing exactly what is there.
🔞 Uncensored / Adult Detail Option
A new "Uncensored / Adult Detail" checkbox in the settings. When enabled, this injects explicit instructions to describe all content (including nudity and adult themes) with full anatomical accuracy, bypassing standard safety refusals. Essential for high-quality dataset training.
📦 Portable Release
This version is fully portable. Models are now detected in the application folder, making it easier to share and install.
✨ Key Features
Clinical Precision: Using anatomically accurate, objective language instead of "creative writing" style. Designed for training, not storytelling.
Universal "Edit" Mode: Full control via the Edit button to handle any prompt format (JSON, XML, Booru) without needing complex hardcoded "modes".
Lean Architecture: Focused on speed and simplicity. No bloat, just tools that work.
Multi-Model Presets: Pre-configured formats for Flux 1 & 2, Stable Diffusion, Pony (SDXL), Z-Image, and more.
Drag & Drop: Drop images or entire folders directly into the app.
Batch Processing: Caption thousands of images automatically.
Smart Model Handling: Native GGUF support with auto-downloading.
Hardware Monitoring: Real-time GPU VRAM usage display.
Safety Controls: Toggle between "PG" and fully "Uncensored" XXX modes.
Auto-save & Cancel operation anytime
Description
🚀 What's New in V1.2.0
This release brings a major overhaul to how captions are generated, focusing on accuracy, anatomy, and detail over "storytelling" fluff.
🏥 Clinical Precision Mode
We've completely rewritten the prompts for all models (Flux, Stable Diffusion, Pony, etc.). Instead of "cinematic" or "moody" descriptions, the engine now focuses on:
Physical Reality: Exact shapes, textures, and spatial relations.
Accurate Anatomy: Detailed descriptions of bodies and poses without euphemisms.
Objective Detail: "Horses through" the image content, listing exactly what is there.
🔞 Uncensored / Adult Detail Option
A new "Uncensored / Adult Detail" checkbox in the settings. When enabled, this injects explicit instructions to describe all content (including nudity and adult themes) with full anatomical accuracy, bypassing standard safety refusals. Essential for high-quality dataset training.
📦 Portable Release
This version is fully portable. Models are now detected in the application folder, making it easier to share and install.
✨ Key Features
Clinical Precision: Using anatomically accurate, objective language instead of "creative writing" style. Designed for training, not storytelling.
Universal "Edit" Mode: Full control via the Edit button to handle any prompt format (JSON, XML, Booru) without needing complex hardcoded "modes".
Lean Architecture: Focused on speed and simplicity. No bloat, just tools that work.
Multi-Model Presets: Pre-configured formats for Flux 1 & 2, Stable Diffusion, Pony (SDXL), Z-Image, and more.
Drag & Drop: Drop images or entire folders directly into the app.
Batch Processing: Caption thousands of images automatically.
Smart Model Handling: Native GGUF support with auto-downloading.
Hardware Monitoring: Real-time GPU VRAM usage display.
Safety Controls: Toggle between "PG" and fully "Uncensored" XXX modes.
Auto-save & Cancel operation anytime
FAQ
Comments (5)
It doesn't work for me. Qtbuilder reports an error I can't debug easily. I say easily because there were a bunch of libraries needed that are not in the requirements (sip, jinja, etc) for an offline install. Qtbuilder just exits without a message. Infact, it raised an error without relaying a message.
I think it's easier at this point to just use the huggingface code and work off of that at this point.
My apologies for assuming most people at least had the bare minimum requirements of libraries already installed. That's on me for assuming, I will try to update the installer to include it,
Amazing tool! Much faster than comfyui nodes for me. Be careful to not move anything after queuing, when it doesnt find a file it stops completely.
Thank you. The tool I used before would confuse the left and right arms of the person in the photo and provide unnecessary, verbose descriptions of the photo’s atmosphere, but this one doesn’t do that, which is great.
Tks so much for this incredible tool. May I ask can I use different model outside of those that listed in the tool?



