SCAIL-2 Multi-Image Reference Video Generation Workflow

Watch the full video first if you want to understand how this SCAIL-2 multi-image reference workflow works in practice. The video shows how a driving video, a main reference image, extra reference images, and an optional background image can be combined into one controlled video generation route.

This ComfyUI workflow is designed for SCAIL-2 multi-image reference video generation using WanAnimatePlus. Its main purpose is to make a character follow the motion and rhythm of a driving video while using multiple reference images to strengthen identity, clothing, style, and visual consistency. Compared with a single-reference workflow, this version gives the model more visual evidence and is better suited for character consistency tests, stylized dance videos, furry character motion, cosplay-style edits, and short-form AI video production.

The workflow has one required driving video and one required main reference image. The driving video provides the motion, timing, pose rhythm, and video structure. The main reference image provides the primary character identity, outfit, visual style, and core appearance. The workflow also includes two optional extra reference images, marked as multi-reference image A and multi-reference image B. These extra images can be used to reinforce identity details, clothing design, facial features, body shape, or character style when the main reference alone is not stable enough.

There is also an optional background image input. The workflow note gives a simple rule: if you want to use a separate background image, turn off preserve_main_ref_background and turn on prefix_alpha_crop. If you want to keep the main reference background, or if the background reference is the same as the main reference, turn on preserve_main_ref_background and turn off prefix_alpha_crop.

The model section uses SCAIL-2-Q8_0.gguf through WanAnimatePlus ModelLoader. BlockSwap is enabled with 20 blocks to reduce memory pressure. The LoRA section uses LightX2V and wan2.1_SCAIL_2_DPO_lora_bf16.safetensors, helping the workflow improve motion generation and SCAIL-2 video quality. The VAE route uses Wan2_1_VAE_bf16.safetensors.

The control section uses SAM3 tracking and SCAIL2ColoredMask. The main reference image, extra reference images, and background image are resized to match the driving video size. SAM3 detects the subject, SCAIL2ColoredMask generates the reference mask, and ImageBatchMulti batches multiple reference masks or reference images together before they enter the SCAIL-2 condition fusion route.

The central node is WanAnimatePlus SCAIL_2 Embeds. It receives the reference image, optional background image, pose images, prefix frames, prefix mask, pose mask, reference mask, and size settings. This is where the workflow combines the driving motion with the reference identity and background structure.

The sampling section uses WanAnimatePlus Sampler with 8 steps, CFG 1, shift 5, euler scheduler, and WanAnimatePlus ContextOptions set to an 81-frame context. After sampling, WanAnimatePlus Decode converts the latent into frames. GIMMVFI is then used for interpolation, and VHS_VideoCombine exports the final video.

Main features:

SCAIL-2 multi-image reference workflow
WanAnimatePlus generation route
Required driving video input
Required main reference image input
Optional reference image A
Optional reference image B
Optional background image input
SAM3 subject tracking
SCAIL2ColoredMask reference mask control
ImageBatchMulti multi-reference batching
SCAIL-2-Q8_0.gguf model support
LightX2V LoRA support
SCAIL-2 DPO LoRA support
Wan2.1 VAE support
BlockSwap memory optimization
81-frame context generation
8-step euler sampling
GIMMVFI interpolation
Final video export through VHS_VideoCombine

Suggested workflow:

Prepare one clean driving video first. The motion should be readable, with limited blur and stable framing. Then prepare a strong main reference image with clear face, body shape, clothing, and style. Add reference image A and B only when they improve identity consistency. Do not add conflicting references unless you intentionally want mixed results. Use the optional background image when you want stronger environment control. Start with the default settings first. If the character changes too much, simplify the prompt and use more consistent reference images. If the background is wrong, check preserve_main_ref_background and prefix_alpha_crop. If memory pressure is high, keep BlockSwap enabled.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2067260607614767105?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1jWL96nEpw/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验，无需安装。
👉 工作流： https://www.runninghub.ai/post/2067260607614767105?inviteCode=rh-v1111
如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频： https://www.bilibili.com/video/BV1jWL96nEpw/

我会在夸克网盘持续更新模型资源：
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户，方便进行创作与学习。

Description

Details

Files

scail2MultiImage_v10.json

Mirrors