Watch the full video first if you want to understand how this SCAIL-2 single-person long-video driving workflow works in practice. The video shows how one reference character can follow a single-person driving video, while the workflow keeps identity, body structure, motion rhythm, silhouette stability, and long-video continuity more consistent.
This ComfyUI workflow is designed for SCAIL-2 single-person biological long-video driving. Its main purpose is to animate one reference character by following the motion of one person in a driving video. This is not a local character replacement workflow. It is a skeleton-guided animation route where the reference character follows the full-body movement from the source video while preserving the character’s visual identity.
The workflow is built around wan2.1_14B_SCAIL_2_fp8_scaled.safetensors as the main SCAIL-2 model. It also uses WAN VAE, UMT5 XXL WAN text encoding, CLIP Vision identity encoding, SAM3 subject tracking, SCAIL2ColoredMask, WanSCAILToVideo, SamplerCustom, VAEDecode, ForLoop continuation, frame trimming, ColorTransfer, final video combining, and original audio restoration. A multi-LoRA enhancement chain is also included, using modules such as LightX2V, WanAnimate relight, Wan2.2 Lightning I2V, FastWan 480p, Wan21 PusaV1, Wan2.2 Fun InP, and stage-based enhancement LoRAs.
The first important rule of the workflow is strict input alignment. Both the reference image and the driving video are aligned to 512×896 before entering SAM3, CLIPVision, and SCAIL. This helps avoid mask mismatch, pose instability, identity drift, and unexpected body deformation caused by inconsistent input dimensions.
The second key rule is single-subject tracking. SAM3 is configured with max_objects=1. SCAIL2ColoredMask uses object_indices=0 and sort_by=area. This tells the workflow to focus on one main character, select the dominant subject area, and use that tracked subject as the motion target. This is useful for single-person dance, character animation, creature motion transfer, digital human testing, mascot animation, anime character motion driving, and stylized biological character videos.
The workflow uses replacement_mode=false. This means the goal is skeleton-guided driving rather than local replacement. The reference image provides the character identity, the driving video provides the motion structure, and the mask system helps connect the tracked pose signal to the generated character animation.
The long-video system is one of the main strengths of this workflow. The first segment is 65 frames and is used to establish the character, identity relationship, pose guidance, and motion direction. The continuation segment is 81 frames. Each loop removes 5 overlapping frames, so every loop effectively adds 76 new frames. The loop count is calculated as max(1, ceil((F - 65) / 76)), where F is the loaded frame count of the driving video. This makes the workflow more suitable for longer character videos than a one-shot short animation route.
The continuation section also uses frame trimming and color matching. The workflow removes repeated overlap frames, takes the previous segment’s final frame as reference, and applies ColorTransfer to improve tone continuity between generated segments. The final video output uses the accumulated generated frame sequence, the original video audio, and the unified frame-rate node, making the result easier to match with the source rhythm.
Main features:
SCAIL-2 single-person long-video driving workflow
One reference character follows one driving video
Single-person full-body skeleton guidance
512×896 unified input alignment
SAM3 max_objects=1 subject tracking
SCAIL2ColoredMask single-subject mask control
object_indices=0 and sort_by=area
replacement_mode=false for motion driving
CLIP Vision reference identity encoding
WanSCAILToVideo first-segment generation
65-frame initial segment
81-frame continuation segment
5-frame overlap removal
ForLoop long-video continuation
ColorTransfer segment consistency
Original driving video audio restored
Unified 24fps output control
Multi-LoRA enhancement chain
Suggested workflow:
Prepare one clear reference character image and one clean single-person driving video. The reference image should show the full body or at least a readable body shape, with a clear face, outfit, and silhouette. The driving video should have a single visible subject, stable framing, readable motion, and limited occlusion. Keep the default 512×896 alignment first. Check that SAM3 tracks only one subject correctly. If the character identity drifts, use a cleaner reference image and simplify the prompt. If the motion becomes unstable, use a driving video with less camera shake and fewer extreme occlusions. Start with a short test segment before running the full long-video loop.
⚙️ RunningHub Workflow
Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2065059863197208578?inviteCode=rh-v1111
If the results meet your expectations, you can later deploy it locally for customization.
🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!
📺 Bilibili Updates (Mainland China & Asia-Pacific)
If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1w2Ei6pEsJ/
☕ Support Me on Ko-fi
If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk
💼 Business Contact
For collaboration or inquiries, please contact aiksk95 on WeChat.
⚙️打开下方链接即可在线体验,无需安装。
👉 工作流: https://www.runninghub.ai/post/2065059863197208578?inviteCode=rh-v1111
如果觉得效果理想,你也可以在本地进行自定义部署。
🎁 粉丝福利: 注册即送 1000 积分,每日登录 100 积分,畅玩 4090 体验 48 G 超级性能!
📺 Bilibili 更新(中国大陆及南亚太地区)
如果你在中国大陆或南亚太地区,可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频: https://www.bilibili.com/video/BV1w2Ei6pEsJ/
我会在 夸克网盘 持续更新模型资源:
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户,方便进行创作与学习。
