HuMo for Wan - CivArchive (CivitAI Archive)

HuMo for Wan - whisper large v3 fp16

NSFW

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

✨ Key Features

HuMo is a unified, human-centric video generation framework designed to produce high-quality, fine-grained, and controllable human videos from multimodal inputs—including text, images, and audio. It supports strong text prompt following, consistent subject preservation, synchronized audio-driven motion.

VideoGen from Text-Image - Customize character appearance, clothing, makeup, props, and scenes using text prompts combined with reference images.
VideoGen from Text-Audio - Generate audio-synchronized videos solely from text and audio inputs, removing the need for image references and enabling greater creative freedom.
VideoGen from Text-Image-Audio - Achieve the higher level of customization and control by combining text, image, and audio guidance.

Examples and models from the following sources reuploaded for your convenience here:
https://huggingface.co/bytedance-research/HuMo
https://github.com/Phantom-video/HuMo

Compatible with both 480P and 720P resolutions. 720P inference will achieve much better quality.

Description

FAQ

Comments (6)

MikeflowerSep 13, 2025· 6 reactions

CivitAI

Great. Workflow?

honryindianSep 15, 2025

Did you find any workflow?

DennyDan84Sep 14, 2025· 5 reactions

CivitAI

Pretty cool! Any workflow for your samples?

honryindianSep 15, 2025

Did you find any workflow?

marsele117268Sep 15, 2025· 3 reactions

@honryindian Here are two working processes. Only it is very demanding on video memory. My video card 3080 10 GB starts these processes, but you need to wait for the end of generation for an eternity. https://civitai.com/models/1957082/humo-dual-image-reference-digital-human?modelVersionId=2215110 , and https://civitai.com/models/1957156/humo-single-person-reference-digital-human?modelVersionId=2215207

oldman169Sep 29, 2025

Video Quality is Great; but, lip-sync is very Bad.

Checkpoint

Wan Video 14B t2v

by Cyph3r

Download (Beta)

base model

Details

Downloads

Platform

CivitAI

Platform Status

Deleted

Created

9/14/2025

Updated

6/3/2026

Deleted

11/10/2025

Files

humoForWan_whisperLargeV3Fp16.safetensors

Size:

1.58 GB

SHA256:

fe5624f5db7413815a5decbc2afb1f7a8015f37519d6e9ae6dc7fad7c0c6c253

Mirrors

HuggingFace (14 mirrors)

whisper_large_v3_encoder_fp16.safetensors

humoForWan_whisperLargeV3Fp16.safetensors

whisper_large_v3_encoder_fp16.safetensors

CivitAI (1 mirrors)

humoForWan_whisperLargeV3Fp16.safetensors

ModelScope CN (2 mirrors)

whisper_large_v3_encoder_fp16.safetensors

Description

FAQ

What is HuMo for Wan?

Why was this model removed from CivitAI?

How do I use HuMo for Wan?

What should I watch out for with Wan Video models?

What other Wan Video-based models are worth knowing?

Can I use this model commercially?

What files are available and where can I download them?

Comments (6)

Details

Files

humoForWan_whisperLargeV3Fp16.safetensors

Mirrors