Qwen 2vl Flux - CivArchive (CivitAI Archive)

Original Project found here: https://huggingface.co/Djrango/Qwen2vl-Flux

Qwen2vl-Flux is a state-of-the-art multimodal image generation model that enhances FLUX with Qwen2VL's vision-language understanding capabilities. This model excels at generating high-quality images based on both text prompts and visual references, offering superior multimodal understanding and control.

ComfyUI currently doesn't support and there is no available nodes to load the CLIP+LLM portion into it
This is just for reviewing/testing the finetuned trained part of the Flux model
CFG set to 1 on KSampler
Rendered an image in 150s using 8GB GPU @ 512px / 10 steps using the bf16 model
This model comes will be available in 3 formats named after the folder it should be in
- diffusion_models - This one is in diffusers format, it is just the merged safetensors file from HuggingFace page
- checkpoints - This one has been converted to Flux Transformers format and prefix for stable_diffusion compatibility, does not include CLIP and VAE
- unet - I will provide the q4_0 and q8 variants, make a comment if you'd like to see any other quants

Description

FAQ

Details

Files

qwen2vlFlux_unetQ80.gguf

Mirrors

Available On (1 platform)

Description

FAQ

What is Qwen 2vl Flux?

Why was this model removed from CivitAI?

How do I use Qwen 2vl Flux?

What should I watch out for with Flux models?

What other Flux-based models are worth knowing?

Can I use this model commercially?

What files are available and where can I download them?

Details

Files

qwen2vlFlux_unetQ80.gguf

Mirrors

Available On (1 platform)