Watch the full video first if you want to understand how this SCAIL-2 two-person reference editing workflow works in practice. The video shows how two reference characters can be placed onto two people in a driving video, while the workflow keeps left and right identity alignment, pose guidance, original scene structure, and long-video continuity more stable.
This ComfyUI workflow is designed for SCAIL-2 two-person biological reference editing. Its main purpose is not only to make two characters follow a two-person driving video, but to replace both selected people in the driving video with two reference characters. The workflow uses skeleton guidance, SAM3 tracking, colored mask matching, CLIP Vision identity encoding, and long-video continuation to keep the two identities separated and temporally consistent.
The workflow is built around wan2.1_14B_SCAIL_2_fp8_scaled.safetensors as the main SCAIL-2 model. It also uses WAN VAE, UMT5 XXL WAN text encoding, CLIP Vision, SAM3, SCAIL2ColoredMask, WanSCAILToVideo, SamplerCustom, VAEDecode, ForLoop continuation, frame trimming, and final video combining. A multi-LoRA enhancement chain is also preserved to improve generation stability, motion quality, and final visual output.
The most important difference from the normal two-person driving workflow is replacement_mode=true. In this version, the workflow is explicitly set to replace both selected people with two reference characters. The positive prompt focuses on replacing both selected people, keeping left and right identity alignment, preserving the original scene structure, and maintaining natural synchronized motion. The negative prompt suppresses common failure cases such as only one person being replaced, missing the second person, wrong identity order, identity swap, identity drift, deformed bodies, distorted faces, extra limbs, missing hands, flicker, blur, and low quality.
The workflow uses a strict 512×896 alignment rule. Both the reference image and the driving video are aligned to the same size before entering SAM3, CLIPVision, and SCAIL. This is critical because mismatched input sizes can cause mask errors, identity misalignment, and unstable motion transfer.
For subject tracking, SAM3 is configured with max_objects=2. SCAIL2ColoredMask uses object_indices=0,1 and sort_by=left_to_right. This means the left reference character is matched to the left tracked person, and the right reference character is matched to the right tracked person. This structure is especially useful for two-person dance, duet motion, character interaction, and multi-character reference editing, where role confusion can easily happen.
The long-video structure is also important. The first segment is 65 frames and establishes the replacement relationship, identity mapping, pose guidance, mask structure, and visual direction. The continuation segment is 81 frames. Each loop removes 5 overlapping frames, so each loop effectively adds 76 new frames. The loop count is calculated as max(1, ceil((F - 65) / 76)), where F is the loaded driving video frame count. This makes the workflow suitable for longer two-person AI video editing rather than only short tests.
The final video output connects the generated frame sequence directly from the loop output, while the audio is taken from the original driving video and the frame rate is controlled by the unified FPS node. This keeps the generated result aligned with the source rhythm and avoids unnecessary manual audio reconstruction.
Main features:
SCAIL-2 two-person reference editing workflow
Two reference characters replace two people in video
Skeleton-guided two-person motion transfer
replacement_mode=true for character replacement
Left-to-right identity alignment
SAM3 max_objects=2 subject tracking
SCAIL2ColoredMask dual-person mask control
512×896 unified input alignment
CLIP Vision reference identity encoding
WanSCAILToVideo first-segment replacement
65-frame initial segment
81-frame continuation segment
5-frame overlap trimming
ForLoop long-video continuation
Original driving video audio restored
Unified 24fps output control
WAN VAE and UMT5 WAN text encoder support
Multi-LoRA enhancement chain
Suggested workflow:
Prepare one clear two-person reference image and one clear two-person driving video. The reference image should show both characters clearly, with readable clothing, body shape, and visual identity. The driving video should contain two visible people with stable framing and limited occlusion. Start with the default 512×896 setting. Check that SAM3 correctly tracks two people in both the reference and driving inputs. Confirm that left-to-right identity alignment is correct before generating the full video. If the two identities swap, adjust the reference layout or driving video so the left and right roles are easier to distinguish. If only one person is replaced, check max_objects=2, object_indices=0,1, and replacement_mode=true. Run a short test first, then use the long-video loop after the identity mapping is stable.
⚙️ RunningHub Workflow
Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2064974434917769218?inviteCode=rh-v1111
If the results meet your expectations, you can later deploy it locally for customization.
🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!
📺 Bilibili Updates (Mainland China & Asia-Pacific)
If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1w2Ei6pEsJ/
☕ Support Me on Ko-fi
If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk
💼 Business Contact
For collaboration or inquiries, please contact aiksk95 on WeChat.
⚙️打开下方链接即可在线体验,无需安装。
👉 工作流: https://www.runninghub.ai/post/2064974434917769218?inviteCode=rh-v1111
如果觉得效果理想,你也可以在本地进行自定义部署。
🎁 粉丝福利: 注册即送 1000 积分,每日登录 100 积分,畅玩 4090 体验 48 G 超级性能!
📺 Bilibili 更新(中国大陆及南亚太地区)
如果你在中国大陆或南亚太地区,可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频: https://www.bilibili.com/video/BV1w2Ei6pEsJ/
我会在 夸克网盘 持续更新模型资源:
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户,方便进行创作与学习。

