CivArchive

    Wan-S2V is an AI video generation model that can transform static images and audio into high-quality videos.

    WIP: working on description adding all needed infos/tools! Use with some caution 🤪

    Note: S2V has a very high chance of producing some 1st "flashy" over-saturated frames. That seems a limitation of all Wan 2.2 S2V models right now.

    Requirements:

    • lite lora for 4/8-step operation (optional)

    • Main Model Wan2.2-S2V-14B ComfyUI/models/unet GGUF

    • Audio Encoder wav2vec2_large_english ComfyUI/models/audio_encoders

    • Encoder Umt5-xxl ComfyUI/models/text_encoders

    • Wan2.1_VAE.safetensors ComfyUI/models/vae

    Usage hints:

    • Audio file should be about same length as the video file in seconds

    👂🎶 👉 Hint: Click the sample for full-screen and play from the post with SOUND ON!

    Sources:

    Clip: https://huggingface.co/city96/umt5-xxl-encoder-gguf/

    Model: https://huggingface.co/QuantStack/Wan2.2-S2V-14B-GGUF/

    Lite LoRA: https://huggingface.co/calcuis/wan2-gguf/


    YOU are responsible for outputs as always! If you make ToS violating content and I get aware I WILL report this.

    Description

    umt5-xxl-encoder-Q8_0

    FAQ

    Checkpoint
    Wan Video 2.2 I2V-A14B

    Details

    Downloads
    198
    Platform
    CivitAI
    Platform Status
    Deleted
    Created
    8/30/2025
    Updated
    11/22/2025
    Deleted
    11/22/2025

    Files

    wan22S2V14BGGUF_clipQ8.gguf

    Mirrors

    Huggingface (1 mirrors)
    CivitAI (1 mirrors)
    Other Platforms (TensorArt, SeaArt, etc.) (1 mirrors)

    Available On (1 platform)

    Same model published on other platforms. May have additional downloads or version variants.