CivArchive
    Preview 42136413Preview 42136415Preview 42136414Preview 42136419

    Original Project found here: https://huggingface.co/Djrango/Qwen2vl-Flux

    Qwen2vl-Flux is a state-of-the-art multimodal image generation model that enhances FLUX with Qwen2VL's vision-language understanding capabilities. This model excels at generating high-quality images based on both text prompts and visual references, offering superior multimodal understanding and control.

    • ComfyUI currently doesn't support and there is no available nodes to load the CLIP+LLM portion into it

    • This is just for reviewing/testing the finetuned trained part of the Flux model

    • CFG set to 1 on KSampler

    • Rendered an image in 150s using 8GB GPU @ 512px / 10 steps using the bf16 model

    • This model comes will be available in 3 formats named after the folder it should be in

      • diffusion_models - This one is in diffusers format, it is just the merged safetensors file from HuggingFace page

      • checkpoints - This one has been converted to Flux Transformers format and prefix for stable_diffusion compatibility, does not include CLIP and VAE

      • unet - I will provide the q4_0 and q8 variants, make a comment if you'd like to see any other quants

    Description

    • This file goes in the unet folder

    • Loaded with UNET Loader (GGUF)

    • Quantsized from bf16 to q8_0

    Checkpoint
    Flux.1 D

    Details

    Downloads
    101
    Platform
    CivitAI
    Platform Status
    Deleted
    Created
    4/24/2025
    Updated
    5/6/2025
    Deleted
    4/24/2025

    Files

    qwen2vlFlux_unetQ80.gguf

    Mirrors

    Huggingface (1 mirrors)
    CivitAI (1 mirrors)