Wan-S2V is an AI video generation model that can transform static images and audio into high-quality videos.
WIP: working on description adding all needed infos/tools! Use with some caution 🤪
Note: S2V has a very high chance of producing some 1st "flashy" over-saturated frames. That seems a limitation of all Wan 2.2 S2V models right now.
Requirements:
lite lorafor 4/8-step operation (optional)Main Model Wan2.2-S2V-14B
ComfyUI/models/unetGGUFAudio Encoder wav2vec2_large_english
ComfyUI/models/audio_encodersEncoder Umt5-xxl
ComfyUI/models/text_encodersWan2.1_VAE.safetensors
ComfyUI/models/vae
Usage hints:
Audio file should be about same length as the video file in seconds
👂🎶 👉 Hint: Click the sample for full-screen and play from the post with SOUND ON!
Sources:
Clip: https://huggingface.co/city96/umt5-xxl-encoder-gguf/
Model: https://huggingface.co/QuantStack/Wan2.2-S2V-14B-GGUF/
Lite LoRA: https://huggingface.co/calcuis/wan2-gguf/
YOU are responsible for outputs as always! If you make ToS violating content and I get aware I WILL report this.
Description
FAQ
Looks like we don't have an active mirror for this one right now.
This archive is kept alive by community contributions, so there are still some gaps, especially for newly removed content.
Some files do get recovered over time through contributions. If you're looking for this one, feel free to ask in Discord, or help preserve it if you have a copy.
Details
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.