SCAIL-2 Three-Person Group Motion Guidance Long-Video Workflow

Watch the full video first if you want to understand how this SCAIL-2 three-person group motion guidance workflow works in practice. The video shows how three reference characters can follow the movement of a three-person driving video, while the workflow keeps left, center, and right character assignments, group motion structure, identity stability, and long-video continuity more consistent.

This ComfyUI workflow is designed for SCAIL-2 three-person biological motion guidance. Its main purpose is to transfer group motion from a three-person driving video onto three characters in a reference image. This is not a local replacement workflow. It does not aim to replace a masked region inside the original footage. Instead, it uses skeleton guidance, multi-subject tracking, colored mask assignment, reference identity encoding, and long-video continuation to generate a new three-character video following the source motion.

The workflow is built around wan2.1_14B_SCAIL_2_fp8_scaled.safetensors as the main SCAIL-2 model. It also uses WAN VAE, UMT5 WAN text encoding, CLIP Vision, SAM3 tracking, SCAIL2ColoredMask, WanSCAILToVideo, SamplerCustom, VAEDecode, ForLoop continuation, overlap-frame trimming, ColorTransfer, final video combining, and original audio restoration. A multi-LoRA enhancement chain is preserved, including LightX2V, WanAnimate relight, Wan2.2 Lightning I2V, FastWan 480p, Wan21 PusaV1, Wan2.2 Fun InP, and stage-based enhancement LoRAs.

The key setting in this workflow is replacement_mode=false. This means the workflow focuses on three-person skeleton-guided animation rather than direct character replacement. The reference image provides the three target character identities, while the driving video provides the group pose, body movement, timing, and spatial interaction.

The workflow uses a strict 512×896 alignment rule. Both the reference image and the driving video are resized to the same canvas before entering SAM3, CLIPVision, and SCAIL. This is especially important for a three-person workflow because mismatched input sizes can cause tracking errors, mask drift, missing subjects, identity confusion, and unstable motion transfer.

SAM3 is configured with max_objects=3. SCAIL2ColoredMask uses object_indices=0,1,2 and sort_by=left_to_right. This is the core three-person assignment rule: the left reference character should match the left driving subject, the center character should match the center subject, and the right character should match the right subject. The reference image and driving video should both keep all three people clearly visible. Severe occlusion, crossing bodies, unstable left-center-right order, or cropped characters can cause identity mixing, missing people, or motion contamination between subjects.

The long-video structure follows the SCAIL-2 continuation system. The first segment is 65 frames and establishes the three-character relationship, pose guidance, mask assignment, and motion direction. The continuation segment is 81 frames. Each loop removes 5 overlapping frames, so every loop effectively adds 76 new frames. The loop count is calculated as max(1, ceil((F - 65) / 76)), where F is the loaded driving video frame count.

The final output uses the accumulated generated frame sequence, original driving video audio, and a unified 24fps frame rate. ColorTransfer is used between continuation segments to help reduce tone jumps and keep the final video visually smoother.

Main features:

SCAIL-2 three-person group motion workflow
Three reference characters follow one three-person driving video
Three-person full-body skeleton guidance
replacement_mode=false for motion driving
512×896 unified input alignment
SAM3 max_objects=3 tracking
SCAIL2ColoredMask three-person control
object_indices=0,1,2 target assignment
sort_by=left_to_right identity order
Left / center / right character consistency
CLIP Vision reference identity encoding
WanSCAILToVideo first-segment generation
65-frame initial segment
81-frame continuation segment
5-frame overlap trimming
ForLoop long-video continuation
ColorTransfer segment consistency
Original driving video audio restored
Unified 24fps output control
Multi-LoRA enhancement chain

Suggested workflow:

Prepare one clear three-person reference image and one clean three-person driving video. Both inputs should show all three people clearly, preferably full-body, with limited occlusion and stable left-center-right order. Keep the default 512×896 setting first. Check that SAM3 tracks exactly three subjects in both the reference image and driving video. If identities swap, use a cleaner reference layout or a driving video where the three people do not cross positions too aggressively. If one person disappears or motion becomes polluted, reduce occlusion and make sure all three subjects remain visible. Run a short test first, then enable the long-video loop after the group assignment is stable.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2067154114169098241?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1jWL96nEpw/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验，无需安装。
👉 工作流： https://www.runninghub.ai/post/2067154114169098241?inviteCode=rh-v1111
如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频： https://www.bilibili.com/video/BV1jWL96nEpw/

我会在夸克网盘持续更新模型资源：
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户，方便进行创作与学习。

Description

Details

Files

scail2ThreePersonGroup_v10.json

Mirrors