🚀 Swap any on-camera speaker into your own character while keeping motion, expressions, and mouth shapes aligned to the original audio.
▶️ Run Directly in Cloud:
https://www.runcomfy.com/comfyui-workflows/wan-2-2-animate-swap-characters-lip-sync-workflow-comfyui?utm_source=civitai
💡 Overview
Wan 2.2 Animate: Swap Characters & Lip-Sync is a ComfyUI workflow for precise facial motion transfer, seamless character swapping, and natural video lip-syncing. Provide a source clip and a single clean reference image — the pipeline detects body pose and face frames, retargets them onto your new character, and renders a coherent, speech-synchronous result with the original audio.
Ideal for editors, VTubers, storytellers, and researchers who need reliable character replacement for interviews, reels, dubbed shorts, or slides.
✨ Key Features
Full-Body Motion Transfer: Pose tracking via ViTPose + YOLO detection reproduces every gesture, head tilt, and hand movement on the replacement character.
Accurate Lip-Sync: Per-frame face crops plus audio alignment ensure mouth shapes match the original speech perfectly.
Identity Preservation: CLIP Vision encodes your reference portrait so facial structure, clothing, and style stay locked across all frames.
LoRA-Tuned Lighting: Lightx2v and Wan22 Relight LoRAs keep shading consistent, even under changing scene lighting.
Automatic Audio Mux: The final export carries the original soundtrack in perfect sync.
🚀 Getting Started
Import your video: Load a source clip with
VHS_LoadVideo. Keep it trimmed near the speaking part for fastest processing.Provide a reference image: Upload a sharp, forward-facing portrait of your target character.
Run preprocessing: YOLO + ViTPose extract keypoints; SAM 2 builds a foreground mask.
Generate: Wan 2.2 Animate 14B synthesizes the retargeted frames. The decoded video is auto-muxed with the original audio.
Click the "Run Directly" link above to bypass local setup and test this workflow immediately in your browser.
Description
Initial release.