InfiniteTalk | Image-to-Speech AI Workflow (All Files + Setup Guide Included)
🗣️ InfiniteTalk Workflow – Make Any Image Speak!
This setup turns a single portrait or character image into a talking AI avatar using the InfiniteTalk system.
It includes everything you need for image-to-speech, from text-to-voice generation to synced lip animation.
✨ Features
• Fully working InfiniteTalk pipeline
• Converts any image → realistic talking video
• Built-in voice generation (Text-to-Speech)
• Lip-sync and animation nodes included
• Ready-to-export video output
⚙️ Requirements
All required components and Hugging Face model links are listed directly inside the workflow.
🔗 When you open the workflow, follow the notes in the comments — they show exactly where to download the model files and where to place them.
💾 Download the Workflow
[Insert your workflow link here]
🧠 How to Use
Open the workflow JSON in ComfyUI
Drop your portrait into the image input
Enter your dialogue or script text
Run the workflow and export your talking video
🎯 Perfect for AI creators, VTubers, storytellers, or anyone who wants to bring portraits to life!
💬 Join the Community https://discord.gg/qGSnE27NQn
Description
FAQ
Comments (3)
Hay so I managed to get this working locally on my 16GB card and I'm surprised how good it works at 480x832. However, it cuts at 40 seconds and I can't spot anything that indicates why that would be. Apologies if this is a stupid question, it's my first time using this. I can absolutely figure ways to cut up audio and do last-frame-in loops if that's necessary. But figured I'd ask just in case I missed something obvious.
can i put other action movement loras in this workflow without breaking it,
I've been using this workflow for a bit and I am ery pleased with it. In particular, it is very good with close up lip sync, though less so at more distant images.
I am wondering if there is a variant without the gguf files and if it would be more accuate.
Regardless, this workflow is very solid. WIth an 5090 and 64GB ram I have managed to generate a 37 second video that looked very, very good.
Thanks for the hard work!