Wan-S2V is an AI video generation model that can transform static images and audio into high-quality videos.
WIP: working on description adding all needed infos/tools! Use with some caution 🤪
Note: S2V has a very high chance of producing some 1st "flashy" over-saturated frames. That seems a limitation of all Wan 2.2 S2V models right now.
Requirements:
lite lorafor 4/8-step operation (optional)Main Model Wan2.2-S2V-14B
ComfyUI/models/unetGGUFAudio Encoder wav2vec2_large_english
ComfyUI/models/audio_encodersEncoder Umt5-xxl
ComfyUI/models/text_encodersWan2.1_VAE.safetensors
ComfyUI/models/vae
Usage hints:
Audio file should be about same length as the video file in seconds
👂🎶 👉 Hint: Click the sample for full-screen and play from the post with SOUND ON!
Sources:
Clip: https://huggingface.co/city96/umt5-xxl-encoder-gguf/
Model: https://huggingface.co/QuantStack/Wan2.2-S2V-14B-GGUF/
Lite LoRA: https://huggingface.co/calcuis/wan2-gguf/
YOU are responsible for outputs as always! If you make ToS violating content and I get aware I WILL report this.
Description
umt5-xxl-encoder-Q8_0
FAQ
Details
Files
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.