🎬 Generation Modalities
📝➡️🎥 TextToVideo
Create completely new videos from scratch using text prompts or optional audio input.
🖼️➡️🎥 ImageToVideo
Animate static reference images using text prompts or optional audio input.
🎥⚪ VideoToVisualDub
Generates synchronized audio tracks such as ambience and speech, driven by the video visuals and prompts, while keeping the original video.
🎥⚪ VideoToMaskedFaceGen (Warning: With the new ComfyUI update v0.9.1, Inpaint doesn’t work anymore.)
Regenerate masked facial areas. Control expressions, lip-sync, and identity using prompts or optional audio input.
ℹ️ Info: The input video resolution after scaling is the same as the output video resolution. Internal spatial downscaling or upscaling is deactivated.
🎚️ Audio Input Settings
⚪ 🔇 No Audio Input
No external audio file is used. The AI generates completely new audio based on your text prompt.
⚪ 🔊 ++ Audio Input
Upload an existing voice or music file to drive the animation, for example for lip-sync.
* Note: Do not use this setting for VisualDub.
Description
Adjustable video duration in seconds – calculated for 8n+1 frame compatibility.
Toggle to save the last frame for subsequent video extension.