Watch the full video first if you want to understand how this SCAIL-2 single-person reference editing workflow works in practice. The video shows how one reference character can replace the selected person in a driving video, while keeping the original motion, scene structure, lighting direction, audio rhythm, and long-video continuity more stable.
This ComfyUI workflow is designed for SCAIL-2 single-person biological reference editing. Its main purpose is to take one reference image and use it to replace the target person in a single-person driving video. Unlike a pure pose-driving workflow, this version is explicitly configured for character replacement. The reference image provides the replacement identity, while the driving video provides the body motion, timing, pose flow, and original scene context.
The workflow is built around wan2.1_14B_SCAIL_2_fp8_scaled.safetensors as the main SCAIL-2 model. It also uses WAN VAE, UMT5 XXL WAN text encoding, CLIP Vision, SAM3 tracking, SCAIL2ColoredMask, WanSCAILToVideo, SamplerCustom, VAEDecode, ForLoop continuation, overlap-frame trimming, final video combining, and original audio restoration. A multi-LoRA chain is preserved to improve generation stability, motion quality, and final visual consistency.
The key switch in this workflow is replacement_mode=true. This tells the SCAIL route to perform single-person skeleton guidance with reference character replacement. The positive prompt focuses on replacing the selected single target person, following one-person pose guidance, keeping the original scene structure, maintaining consistent identity, natural motion, coherent lighting, and smooth temporal consistency. The negative prompt suppresses common failure cases such as flicker, wrong-area replacement, identity drift, deformed body, distorted face, extra limbs, missing hands, warped hands, broken anatomy, blur, and low-quality output.
The workflow uses a strict 512×896 alignment rule. Both the reference image and the driving video are resized to the same canvas before entering SAM3, CLIPVision, and SCAIL. This is important because inconsistent input dimensions can cause mask mismatch, pose instability, identity drift, and poor replacement boundaries.
SAM3 is configured with max_objects=1. SCAIL2ColoredMask uses object_indices=0, sort_by=left_to_right, and replacement_mode=true. This keeps the workflow focused on one selected target person and avoids unnecessary multi-subject confusion. The CLIP Vision route encodes the reference image to help preserve the replacement character identity during generation.
The long-video structure is one of the main practical advantages of this workflow. The first segment is 65 frames and establishes the replacement relationship, pose guidance, mask structure, and identity direction. The continuation segment is 81 frames. Each loop removes 5 overlapping frames, so every loop effectively adds 76 new frames. The loop count is calculated as max(1, ceil((F - 65) / 76)), where F is the loaded driving video frame count. This makes the workflow better suited for longer AI dance videos, digital human edits, character motion edits, and short-form video production.
The final output does not rely on an extra ImageCompositeMasked stage. The generated frames from the loop output are sent directly into the final video combine node. The original driving video audio is restored, and the frame rate is controlled by the unified FPS node, making the final result easier to match with the source rhythm.
Main features:
SCAIL-2 single-person reference editing workflow
One reference character replaces one target person
Single-person skeleton-guided video editing
replacement_mode=true character replacement
512×896 unified input alignment
SAM3 max_objects=1 tracking
SCAIL2ColoredMask single-target control
object_indices=0 target selection
CLIP Vision reference identity encoding
WanSCAILToVideo first-segment generation
65-frame initial segment
81-frame continuation segment
5-frame overlap trimming
ForLoop long-video continuation
Direct generated-frame final output
Original driving video audio restored
Unified 24fps output control
Multi-LoRA enhancement chain
Suggested workflow:
Prepare one clean reference character image and one clean single-person driving video. The reference should show the face, outfit, body shape, and silhouette clearly. The driving video should contain one main target person with stable framing, visible motion, and limited occlusion. Start with the default 512×896 setting. Check that SAM3 tracks the correct person before running the full workflow. If the wrong area is replaced, adjust the source video or tracking result. If the identity drifts, use a clearer reference image and simplify the prompt. Run a short test first, then use the long-video loop after the replacement relationship is stable.
⚙️ RunningHub Workflow
Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2067141150162636802?inviteCode=rh-v1111
If the results meet your expectations, you can later deploy it locally for customization.
🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!
📺 Bilibili Updates (Mainland China & Asia-Pacific)
If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1jWL96nEpw/
☕ Support Me on Ko-fi
If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk
💼 Business Contact
For collaboration or inquiries, please contact aiksk95 on WeChat.
⚙️打开下方链接即可在线体验,无需安装。
👉 工作流: https://www.runninghub.ai/post/2067141150162636802?inviteCode=rh-v1111
如果觉得效果理想,你也可以在本地进行自定义部署。
🎁 粉丝福利: 注册即送 1000 积分,每日登录 100 积分,畅玩 4090 体验 48 G 超级性能!
📺 Bilibili 更新(中国大陆及南亚太地区)
如果你在中国大陆或南亚太地区,可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频: https://www.bilibili.com/video/BV1jWL96nEpw/
我会在 夸克网盘 持续更新模型资源:
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户,方便进行创作与学习。
