SCAIL-2 GGUF MOTION TRANSFER Reference Image to Video Vid2Vid + MultiGPU

SCAIL-2 GGUF MOTION TRANSFER Reference Image to Video + MultiGPU

Turn a single character image into a fully animated video that copies the motion of any driving clip — at the **full length** of your input video, not just a fixed 5-second window. Built on the **Wan 2.1 SCAIL-2** model in quantized **GGUF** format so it runs on consumer GPUs, with optional **dual-GPU weight offloading** to keep speeds up.

---

 ✨ What this workflow does

Feed it **one reference image** (your character) and **one driving video** (the motion). SAM3 automatically tracks and masks the subject, the pose from the driving video drives the animation, and a **real chunked sampling loop** generates the entire clip — stitching sliding windows together with color-matching so there are no harsh seams between chunks.

- **Full-length output** — a sliding-window loop (81-frame initial window + 76-frame continuation windows) covers your whole driving video. No 5-second cap.
- **Motion transfer** — the subject in your reference image performs the exact motion of the driving clip.
- **Automatic subject masking** — SAM3.1 tracking isolates the character; no manual rotoscoping.
- **GGUF quantized model** — Q4_K_M weights fit comfortably in consumer VRAM.
- **Optional 2nd-GPU offload** — push ~10 GB of model weights to a second GPU instead of slow CPU offload.
- **Built-in side-by-side comparison output** — see reference vs. result in one render.
- **Organized & documented** — color-coded node groups and an on-canvas README note with every download link.

---

 🎬 How to use it

1. **Reference image** → load your character in the `LoadImage` node (INPUTS group).
2. **Driving video** → load your motion clip in `VHS_LoadVideo` (INPUTS group). Leave `custom_width = 480` — it keeps system RAM low and matches the working resolution.
3. **Prompts** → describe the scene in the positive prompt and what to avoid in the negative (PROMPTS group).
4. **Press Run.** The output appears **only after the loop finishes** — there are no mid-run previews (this is normal, not a freeze). The OUTPUT group holds the final stitched video; the COMPARISON group shows the side-by-side.

**Speed tip:** set `select_every_nth = 2` on `VHS_LoadVideo` to roughly halve render time at half the temporal resolution. You can also lower the sampler steps.

---

 🖥️ Single GPU vs. Dual GPU (model switcher built in)

The workflow includes two model loaders feeding an **Any Switch (rgthree)** "Model Switcher":

- **GGUF Loader – MULTI GPU (default)** — offloads ~10 GB of weights to your **second GPU** (`cuda:1`), keeping compute on `cuda:0`. Dramatically faster than CPU offload.
- **GGUF Loader – SINGLE GPU** — standard single-GPU GGUF loading.

**Switching is manual** (ComfyUI can't auto-detect GPU count). Use **Ctrl+B** to bypass the one you don't want — keep **exactly one** active:

- **Two GPUs:** leave as shipped → MultiGPU loader active, single-GPU loader bypassed.
- **One GPU:** bypass the MultiGPU loader and un-bypass the single-GPU loader. (If you leave the MultiGPU loader active with only one GPU, it will error trying to reach the missing `cuda:1`.)

> Tip: on the MultiGPU loader you can tune `virtual_vram_gb` (default 10) — lower it if your 2nd GPU OOMs, raise it if it has spare room. `donor_device` can also be set to `cpu` for a single-GPU fallback without bypassing.

---

 📦 Required models & paths

Place these under your ComfyUI `models/` folder:

```
ComfyUI/
└── models/
    ├── unet/  (or diffusion_models/)
    │   └── SCAIL-2-Q4_K_M.gguf            ← supply your own GGUF
    ├── text_encoders/
    │   └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
    ├── clip_vision/
    │   └── clip_vision_h.safetensors
    ├── vae/
    │   └── wan_2.1_vae.safetensors
    ├── loras/
    │   └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors
    └── checkpoints/
        └── sam3.1_multiplex_fp16.safetensors
```

 Download links

1. **Text encoder (UMT5 XXL fp8)** → `models/text_encoders`
   https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors?download=true
2. **CLIP Vision H** → `models/clip_vision`
   https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors?download=true
3. **Wan 2.1 VAE** → `models/vae`
   https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors?download=true
4. **LightX2V I2V rank64 step-distill LoRA** → `models/loras`
   https://huggingface.co/lgylgy/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64/resolve/main/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors?download=true
5. **SAM3.1 multiplex checkpoint** → `models/checkpoints`
   https://huggingface.co/Comfy-Org/sam3.1/resolve/main/checkpoints/sam3.1_multiplex_fp16.safetensors?download=true
6. **SCAIL-2 GGUF diffusion model** → `models/unet`
   https://huggingface.co/realrebelai/SCAIL-2_GGUF/resolve/main/SCAIL-2-Q4_K_M.gguf?download=true

---

 🧩 Required custom nodes

- **ComfyUI-GGUF** — GGUF UNet loading
- **ComfyUI-MultiGPU** — `UnetLoaderGGUFDisTorch2MultiGPU` (2nd-GPU weight offload)
- **ComfyUI-KJNodes** — WanChunkFeedForward, ImageResizeKJv2, KikoPurgeVRAM, SimpleCalculatorKJ, INTConstant, GetImageRangeFromBatch, Set/Get nodes
- **ComfyUI_Swwan** — WanSCAILToVideo, SCAIL2ColoredMask, SAM3_VideoTrack, ImageConcatMulti
- **ComfyUI-easy-use** — forLoopStart/End, compare, ComfySwitchNode, BatchImagesNode, ColorTransfer
- **ComfyUI-VideoHelperSuite** — VHS_LoadVideo, VHS_VideoCombine, VHS_VideoInfo
- **ComfyUI-Resolution-Master** — ResolutionMaster
- **rgthree-comfy** — Any Switch (Model Switcher), Display Int

All of these are installable through **ComfyUI-Manager** ("Install Missing Custom Nodes").

---

 ⚙️ Requirements & performance

- A recent ComfyUI build with **Wan 2.1 / SCAIL-2** support.
- **~16 GB VRAM** recommended for the main GPU.
- For MultiGPU offload: a **second GPU with ≥ 11 GB free** VRAM.
- **Render time scales with clip length** — each window is a full diffusion pass. A ~500-frame clip runs roughly 7 windows. Use `select_every_nth` or fewer steps to trade quality/length for speed.

---

 🗂️ Workflow layout

Nodes are organized into color-coded groups for clarity:

**INPUTS** (image & video) · **MODELS** (diffusion / VAE / CLIP / sampler) · **PROMPTS** · **PREPROCESS** (resolution / pose resize / CLIP vision) · **MASK & TRACKING (SAM3)** · **CHUNK 1** (first window) · **LOOP MATH** (window / count) · **LOOP BODY** (chunk-2 generation & accumulation) · **OUTPUT** (final video) · **COMPARISON OUTPUT** (side-by-side)

---

 📺 Tutorial

Watch how to use this workflow:
https://www.youtube.com/@AiMotionStudio

---

 📝 Notes & tips

- Output only appears when the full loop completes — longer clips take longer before you see anything. That's expected.
- Keep exactly one model loader active (single- vs. dual-GPU).
- If you hit a system-RAM error on very long/high-res inputs, keep `VHS_LoadVideo` `custom_width = 480` (already set) and/or raise `select_every_nth`.
- Credits: built on Wan 2.1 SCAIL-2, LightX2V distill LoRA, SAM3.1, and the open-source ComfyUI custom nodes listed above.
Description

FAQ

Details

Files

scail2GGUFMOTIONTRANSFER_v20.zip

Mirrors

Description

FAQ

What is SCAIL-2 GGUF MOTION TRANSFER Reference Image to Video Vid2Vid + MultiGPU?

What files are available and where can I download them?

Details

Files

scail2GGUFMOTIONTRANSFER_v20.zip

Mirrors