A fully offline, portable desktop application for generating high-quality image captions using the UNCENSORED Qwen3-VL 8B vision-language model. Built with a professional PyQt6 dark-themed GUI, GGUF quantized model inference via llama-cpp-python, and full CUDA GPU acceleration.
Designed for AI artists, dataset curators, CHROMA, PONY, QWEN, ZIT, ZIB, Stable Diffusion / Flux trainers who need accurate, customizable captions for their image datasets.
Up-to-date, highly accurate, less hallucinations, better visual encoder, FASTER THAN JoyCaption Beta One & ANY OTHER JoyCaption PORTABLE VERSION.
THIS IS BUILT BY A REAL TECHNICAL ENGINEER w/ 16 YRS OF EXP.
Download link below for application.
https://github.com/GitDonkeyHubbed/qwen3vl-captioner
🚀 What's New in V1.2.0
This release brings a major overhaul to how captions are generated, focusing on accuracy, anatomy, and detail over "storytelling" fluff.
🏥 Clinical Precision Mode
We've completely rewritten the prompts for all models (Flux, Stable Diffusion, Pony, etc.). Instead of "cinematic" or "moody" descriptions, the engine now focuses on:
Physical Reality: Exact shapes, textures, and spatial relations.
Accurate Anatomy: Detailed descriptions of bodies and poses without euphemisms.
Objective Detail: "Horses through" the image content, listing exactly what is there.
🔞 Uncensored / Adult Detail Option
A new "Uncensored / Adult Detail" checkbox in the settings. When enabled, this injects explicit instructions to describe all content (including nudity and adult themes) with full anatomical accuracy, bypassing standard safety refusals. Essential for high-quality dataset training.
📦 Portable Release
This version is fully portable. Models are now detected in the application folder, making it easier to share and install.
✨ Key Features
Clinical Precision: Using anatomically accurate, objective language instead of "creative writing" style. Designed for training, not storytelling.
Universal "Edit" Mode: Full control via the Edit button to handle any prompt format (JSON, XML, Booru) without needing complex hardcoded "modes".
Lean Architecture: Focused on speed and simplicity. No bloat, just tools that work.
Multi-Model Presets: Pre-configured formats for Flux 1 & 2, Stable Diffusion, Pony (SDXL), Z-Image, and more.
Drag & Drop: Drop images or entire folders directly into the app.
Batch Processing: Caption thousands of images automatically.
Smart Model Handling: Native GGUF support with auto-downloading.
Hardware Monitoring: Real-time GPU VRAM usage display.
Safety Controls: Toggle between "PG" and fully "Uncensored" XXX modes.
Auto-save & Cancel operation anytime
Description
QWEN 3 VL ABL Captioner V1.4.2 — GGUF + MLX Engines
Professional GPU-Accelerated Image Captioning for Datasets
New here? Just click the green button above — it downloads the whole app as a .zip.
https://github.com/GitDonkeyHubbed/qwen3vl-captioner
✅ Get started in 3 easy steps
Download & unzip — click the big green button above, then unzip the file anywhere (Desktop is fine).
Install it (one time):
Windows → double-click setup.bat
macOS → run ./setup.sh
This automatically installs Python and the GPU engine for you. Just wait for it to finish.
Run the app:
Windows → double-click run.bat
macOS → run ./run.sh
🪟 Windows note: you also need the NVIDIA CUDA Toolkit for GPU speed. If you don't have it, install it with one command: winget install Nvidia.CUDA
🧰 Power users: prefer a specific tagged release? Grab it from the Releases page.
🚀 What's New in V1.4.0 — macOS Support
The captioner now runs natively on Macs, with two GPU backends:
🍎 Apple Silicon: Metal + MLX
llama.cpp Metal engine — the same GGUF models (including the abliterated default) now run GPU-accelerated on M-series chips. Verified end-to-end on Apple Silicon: ~7s per caption on the Q2_K quant.
New MLX engine — Apple's MLX framework via mlx-vlm, typically the fastest option on M-series chips. MLX models (4/6/8-bit) appear in the model dropdown on Apple Silicon and download with one click — no mmproj file needed, the vision tower is built in.
One-command setup — ./setup.sh installs everything: Python, the Metal wheel, and the MLX backend.
The hardware pill shows unified memory pressure on Macs instead of CUDA VRAM.
🆕 2026 model refresh
The model dropdown is reorganized into groups (newest first):
Qwen3-VL 8B ABL v2 (new recommended default) — prithivMLmods' v2 abliteration, full quant range
Qwen3-VL 8B Caption-it — an abliterated fine-tune specialized for image captioning
Huihui Qwen3-VL 8B ABL — huihui-ai's abliteration (quantized by noctrex)
Legacy v1 — kept for existing installs
MLX (Apple Silicon): abliterated (alexgusevski's conversions) and standard quants, plus an experimental Qwen3.5 4B abliterated VLM
Downloading a model now also auto-downloads its matching mmproj (vision encoder) when none is present.
⚙️ Engine bump: llama-cpp-python 0.3.40
Both platforms now use JamePeng's v0.3.40 build (Windows: cu124–cu131 auto-matched; macOS: Metal), which adds support for the Qwen3.5 / Qwen3.6-generation GGUF models — including the larger abliterated 27B/35B-A3B releases for big-VRAM rigs.
🔒 Security fix
Pillow bumped to >=12.2.0 — patches 5 CVEs (2 HIGH, 3 MODERATE): integer overflow / OOB writes in PSD and font handling, a FITS decompression bomb, and a PDF trailer denial-of-service. Users on Pillow 10–12.1.x were exposed.
📦 Dependency updates
nvidia-ml-py>=12.0 replaces the deprecated pynvml package (same Python module, no behaviour change — removes an import FutureWarning)
huggingface-hub>=0.32 floor raised; hf_xet>=1.0 added as an explicit dependency
All other dependencies scanned — no other vulnerabilities found
🔧 Infrastructure
Windows smoke CI restored (continue-on-error fix — was silently failing after a PowerShell incompatibility)
Added SECURITY.md with private vulnerability reporting instructions
