Wan2.1 InfiniteTalk LipSync — Workflow Guide
52 nodes · 4 groups · 3 component subgraphs · 1 pipeline loop 27 unique node types — 73% Eclipse nodes Built with ComfyUI_Eclipse custom nodes
What Is This?
This workflow is a template designed to generate lip-synced talking head videos of arbitrary length using the Wan2.1 InfiniteTalk / LipSync model in ComfyUI.
The core feature is smart audio budgeting and seamless looping. The workflow analyzes the duration of a background audio track (speech), automatically computes how many generation loops are needed to match the track, recursively generates matching video blocks using temporal context, and blends them into a continuous video. It also includes a manual override switch to cap generations at a fixed loop count.
How It Works — The Basics
Wireless Data Routing (Set/Get)
Rather than messy spaghetti wires running across the canvas, the workflow uses Set/Get nodes to route model, latent, audio, and loop count values. This keeps the layout clean and modular:
A setter publishes a value (e.g.
Set_loop_count_lspublishingloop_count_ls).Getters retrieve the value by name wherever it is needed in the samplers and loop groups.
Dual-Switch Loop Control
The loop iterations are controlled via the Any Dual-Switch [Eclipse] (id: 35) node. It allows the user to switch between:
Choice 1 (Manual): Uses a static
loop_countsetting configured inside the Settings group panel.Choice 2 (Auto-Calculated): Uses the dynamic loop count computed from the audio track's duration. By default, the workflow is configured to use the auto-calculated loops to automatically match the audio's length.
The Recursive Generation Loop
The generation process is structured inside an easy forLoopStart (id: 64) and easy forLoopEnd (id: 97) loop block:
Iteration 0 (Base Sampler): The first block runs the Base Sampler group. It takes the initial start image (face) and the first segment of encoded audio to generate the beginning of the video.
Iteration 1+ (Extend Sampler): For subsequent loops, the Extend Sampler group runs. It takes the ending frames of the previous loop (
previous_frames) as context to guide the model's starting state (ensuring visual continuity) and samples the next segment of speech audio. It outputs only the unique new frames (trim_image).
Group-by-Group Reference
1. Settings (Group Node)
This is the central configuration panel. It exposes:
Video Size & Resolution: Sets output width and height (typically 480p or 720p).
Frame Rate: Target output framerate (e.g.
24.0or30.0).Manual Loop Override: A
loop_countinput to limit the loops when manual override is selected in the Dual-Switch.
2. Model Loaders
Smart Model Loader v2 [Eclipse]: Loads the main Wan2.1 checkpoint, Text Encoder, and VAE. Default checkpoint is
Wan2_1-I2V-14B-720p_fp8_e4m3fn_scaled_KJ.safetensorsusing thedefaulttemplate.Audio Encoder Loader & Encode: Loads the speech analysis model
wav2vec2-chinese-base_fp16.safetensorsand encodes loaded speech audio into phonemic feature representations.Model Patch Loader: Applies the
wan2.1_infiniteTalk_multi_fp16.safetensorspatch to the diffusion model, adapting it for infinite talking generation.
3. Base Sampler (Group Node)
A component subgraph containing 19 internal nodes:
Uses
WanInfiniteTalkToVideoto condition the initial start image and the first segment of the audio encoder output.Utilizes a custom advanced sampler to generate the first talking head video block.
4. Extend Sampler (Group Node)
A component subgraph containing 19 internal nodes:
Inherits the main model, conditioning, and audio encoder outputs.
Takes
previous_frames(from the accumulated loop history) to guide the start of the next segment.Generates and outputs
trim_image(the newly generated frames with overlap cut off).
5. Loop Control & Save Video
Image Join & Loop Feedback: An
ImageBatchnode appends the newly generated frames fromExtend Samplerto the accumulated video batch (value1), which is updated in the loop feedback loop.Save Video [Eclipse]: Takes the final accumulated image batch, remuxes the original audio file, and outputs an MP4. The
trim_modeis set toshortest, which trims both the audio and video to the shorter of the two to guarantee perfect synchronization.
Quick Start Guide
Automatic Audio-budgeted Generation
Verify that Any Dual-Switch [Eclipse] is set to
2(Auto-calculated loops).Load a face image in the Load Image node.
Load a voice clip in the Load Audio node.
Queue the prompt. The workflow will automatically compute the required loops, generate the segments, and output a perfectly timed talking head video.
Manual Loop Count Generation
Locate the Any Dual-Switch [Eclipse] (id: 35) and set its widget value to
1.Set your desired loop count in the Settings panel (under
loop_count).Queue the prompt. The generation will stop at your configured loop limit, regardless of how long the audio track is.
Model Storage Locations (for Local Users)
Ensure your model files are placed in these folders under your ComfyUI directory:
📂 ComfyUI/
├── 📂 models/
│ ├── 📂 diffusion_models/
│ │ └─── wan/Wan2_1-I2V-14B-720p_fp8_e4m3fn_scaled_KJ.safetensors
│ ├── 📂 text_encoders/
│ │ └─── nsfw_wan_umt5-xxl_bf16_fixed.safetensors
│ ├── 📂 model_patches/
│ │ ├─── wan2.1_infiniteTalk_single_fp16.safetensors
│ │ └─── wan2.1_infiniteTalk_multi_fp16.safetensors
│ ├── 📂 audio_encoders/
│ │ └─── wav2vec2-chinese-base_fp16.safetensors
│ └── 📂 vae/
│ └─── Wan2_1_VAE_bf16.safetensors
Custom Node Packages Used
ComfyUI_Eclipse — Custom loader templates, Set/Get wireless routing, Loop Calculators, and the Save/Preview Video nodes.
ComfyUI-Easy-Use — The
easy forLoopStartandeasy forLoopEndnodes for graph-level iteration.ComfyUI-KJNodes — General utilities and crop helpers.
Description
needs the latest version of comfyui_eclipse
test version with image switch at a configured timestamp or loop
3 manual targets means you have to load 4 images (image 1 has no target) or reduce the amount of manual targets
the second file is the audio track, download it if you want to replicate the current setup (its my own so im allowed to upload it ;))
