CivArchive
    SCAIL-2 Two-Person Reference Editing Long-Video Workflow - v1.0
    NSFW
    Preview 133532160
    Preview 133532170

    Watch the full video first if you want to understand how this SCAIL-2 two-person reference editing workflow works in practice. The video shows how two reference characters can be placed onto two people in a driving video, while the workflow keeps left and right identity alignment, pose guidance, original scene structure, and long-video continuity more stable.

    This ComfyUI workflow is designed for SCAIL-2 two-person biological reference editing. Its main purpose is not only to make two characters follow a two-person driving video, but to replace both selected people in the driving video with two reference characters. The workflow uses skeleton guidance, SAM3 tracking, colored mask matching, CLIP Vision identity encoding, and long-video continuation to keep the two identities separated and temporally consistent.

    The workflow is built around wan2.1_14B_SCAIL_2_fp8_scaled.safetensors as the main SCAIL-2 model. It also uses WAN VAE, UMT5 XXL WAN text encoding, CLIP Vision, SAM3, SCAIL2ColoredMask, WanSCAILToVideo, SamplerCustom, VAEDecode, ForLoop continuation, frame trimming, and final video combining. A multi-LoRA enhancement chain is also preserved to improve generation stability, motion quality, and final visual output.

    The most important difference from the normal two-person driving workflow is replacement_mode=true. In this version, the workflow is explicitly set to replace both selected people with two reference characters. The positive prompt focuses on replacing both selected people, keeping left and right identity alignment, preserving the original scene structure, and maintaining natural synchronized motion. The negative prompt suppresses common failure cases such as only one person being replaced, missing the second person, wrong identity order, identity swap, identity drift, deformed bodies, distorted faces, extra limbs, missing hands, flicker, blur, and low quality.

    The workflow uses a strict 512×896 alignment rule. Both the reference image and the driving video are aligned to the same size before entering SAM3, CLIPVision, and SCAIL. This is critical because mismatched input sizes can cause mask errors, identity misalignment, and unstable motion transfer.

    For subject tracking, SAM3 is configured with max_objects=2. SCAIL2ColoredMask uses object_indices=0,1 and sort_by=left_to_right. This means the left reference character is matched to the left tracked person, and the right reference character is matched to the right tracked person. This structure is especially useful for two-person dance, duet motion, character interaction, and multi-character reference editing, where role confusion can easily happen.

    The long-video structure is also important. The first segment is 65 frames and establishes the replacement relationship, identity mapping, pose guidance, mask structure, and visual direction. The continuation segment is 81 frames. Each loop removes 5 overlapping frames, so each loop effectively adds 76 new frames. The loop count is calculated as max(1, ceil((F - 65) / 76)), where F is the loaded driving video frame count. This makes the workflow suitable for longer two-person AI video editing rather than only short tests.

    The final video output connects the generated frame sequence directly from the loop output, while the audio is taken from the original driving video and the frame rate is controlled by the unified FPS node. This keeps the generated result aligned with the source rhythm and avoids unnecessary manual audio reconstruction.

    Main features:

    • SCAIL-2 two-person reference editing workflow

    • Two reference characters replace two people in video

    • Skeleton-guided two-person motion transfer

    • replacement_mode=true for character replacement

    • Left-to-right identity alignment

    • SAM3 max_objects=2 subject tracking

    • SCAIL2ColoredMask dual-person mask control

    • 512×896 unified input alignment

    • CLIP Vision reference identity encoding

    • WanSCAILToVideo first-segment replacement

    • 65-frame initial segment

    • 81-frame continuation segment

    • 5-frame overlap trimming

    • ForLoop long-video continuation

    • Original driving video audio restored

    • Unified 24fps output control

    • WAN VAE and UMT5 WAN text encoder support

    • Multi-LoRA enhancement chain

    Suggested workflow:

    Prepare one clear two-person reference image and one clear two-person driving video. The reference image should show both characters clearly, with readable clothing, body shape, and visual identity. The driving video should contain two visible people with stable framing and limited occlusion. Start with the default 512×896 setting. Check that SAM3 correctly tracks two people in both the reference and driving inputs. Confirm that left-to-right identity alignment is correct before generating the full video. If the two identities swap, adjust the reference layout or driving video so the left and right roles are easier to distinguish. If only one person is replaced, check max_objects=2, object_indices=0,1, and replacement_mode=true. Run a short test first, then use the long-video loop after the identity mapping is stable.

    ⚙️ RunningHub Workflow

    Try the workflow online right now — no installation required.
    👉 Workflow: https://www.runninghub.ai/post/2064974434917769218?inviteCode=rh-v1111

    If the results meet your expectations, you can later deploy it locally for customization.

    🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

    📺 Bilibili Updates (Mainland China & Asia-Pacific)

    If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
    📺 Bilibili Video: https://www.bilibili.com/video/BV1w2Ei6pEsJ/

    ☕ Support Me on Ko-fi

    If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
    Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
    👉 Ko-fi: https://ko-fi.com/aiksk

    💼 Business Contact

    For collaboration or inquiries, please contact aiksk95 on WeChat.

    ⚙️打开下方链接即可在线体验,无需安装。
    👉 工作流: https://www.runninghub.ai/post/2064974434917769218?inviteCode=rh-v1111
    如果觉得效果理想,你也可以在本地进行自定义部署。

    🎁 粉丝福利: 注册即送 1000 积分,每日登录 100 积分,畅玩 4090 体验 48 G 超级性能!

    📺 Bilibili 更新(中国大陆及南亚太地区)

    如果你在中国大陆或南亚太地区,可以通过下方视频查看该工作流的实测效果与构思讲解。
    📺 B站视频: https://www.bilibili.com/video/BV1w2Ei6pEsJ/

    我会在 夸克网盘 持续更新模型资源:
    👉 https://pan.quark.cn/s/20c6f6f8d87b
    这些资源主要面向本地用户,方便进行创作与学习。

    Description

    Workflows
    Other

    Details

    Downloads
    94
    Platform
    CivitAI
    Platform Status
    Available
    Created
    6/12/2026
    Updated
    6/29/2026
    Deleted
    -

    Files

    scail2TwoPersonReference_v10.json

    Mirrors