There are descriptions in the workflow - READ THEM CAREFULLY.

Yo creators - if you're tired of flat, boring video audio and want to crank immersion to 11 with AI-powered SFX layers, breathy moans, custom voices, wet sounds, slapping, and seamless mixing - this is your new go-to workflow.

Built in ComfyUI specifically for NSFW / adult video generation. Takes any input clip, interpolates frames for smooth playback, generates ultra-realistic or exaggerated lewd sound effects via dual MMAudio branches (SFW for clean ambience & impacts, NSFW for gagging, slurping, sticky thrusts, heavy breathing), overlays expressive TTS dialogue/moans from Qwen3-TTS (voice cloning from reference or pure text-based design), and even lets you blend in the original video audio for that hybrid real-AI punch.

No more manual editing in Audacity or Resolve - queue it up, tweak volumes/seeds/prompts, and export MP4s with timestamped filenames ready for your stash. Perfect for adult ASMR, scene enhancement, erotic animation, or just experimenting with er... sound design.

### Key Features
- Dual MMAudio Branches - SFW (vanilla model for natural sounds) + NSFW (gold-tuned for explicit SFX). Mute groups to switch modes without errors.
- Qwen TTS Power - Separate groups for Voice Design (text-based fantasy voices like “sultry breathy moans”) or Voice Cloning (from ref audio + transcript). Auto-trims to match video length.
- Geeky AudioMixer Core - 4-track mixing with per-layer volume, start offsets, fades, master normalize/compress/limit. Crank original video audio or mute it via Primitive toggle.
- RIFE FPS Converter - Auto-handles any input FPS → targets 25 fps (or your choice) for perfect MMAudio sync.
- Shared Seed Control - One Primitive seeds MMAudio + TTS for consistent randomness across layers.

### How to Use
1. Drop your video into VHS_LoadVideo.
2. Tweak prompts:
- MMAudio SFW and/or NSFW
- TTS (Voice Design or Cloning)
3. Mute groups for modes: Bypass SFW/NSFW/TTS as needed
4. Adjust mixer volumes: audio_1 = main voice/TTS, optionals = SFX/original.
5. Queue — outputs MP4 with embedded audio.

### Pro Tip – Pair with LTX2 (Latent Text-to-Video)
LTX2 is blowing up for realistic lip sync and facial animation right now - but it lacks rich, layered lewd audio (moans, wet SFX, heavy breathing, gagging).
Easy hybrid:
1. Generate your talking-head / character video in LTX2.
2. Load the LTX2 MP4 into this workflow's VHS_LoadVideo node.
3. Run as usual - MMAudio adds filthy SFX layers, Qwen TTS overlays breathy moans/dialogue.
4. Mix with LTX2 original audio (or mute it) - final export has perfect visual sync + dirty sound design.
LTX2 does lips & face, this workflow does the lewd audio. Instant upgrade. 🔥

### Low-VRAM Tips (12GB seems to be a minimum):
- Use Qwen3-TTS 0.6B instead of 1.7B → saves ~3–4 GB with only minor quality drop on short clips.
- Set unload_model_after_generate = true in TTS nodes → unloads TTS immediately after generation.
- Reduce RIFE batch_size to 4 or 2 → lowers peak during interpolation.
- For tighter VRAM: mute RIFE group if not needed, or add Unload All Models node (ComfyUI-Unload-Model extension) after TTS before MMAudio.
- Peak usage drops to ~9-11 GB with these tweaks - 8 GB cards are very tight (possible only with extreme cuts like no RIFE + tiny clips).

### Requirements (Custom Nodes via ComfyUI Manager)
- comfyui-mmaudio (kijai/ComfyUI-MMAudio)
- qwen3-tts-comfyui (flybirdxx/ComfyUI-Qwen-TTS or similar fork)
- ComfyUI_Geeky_AudioMixer (GeekyGhost/ComfyUI_Geeky_AudioMixer)
- ComfyUI-VideoHelperSuite (Kosinkadink/ComfyUI-VideoHelperSuite)
- rgthree-comfy (rgthree/rgthree-comfy)
- ComfyUI-VFI (Fannovel16/ComfyUI-Frame-Interpolation) for RIFE

### Models Needed (auto-download or manual from Hugging Face)
- MMAudio SFW: kijai/mmaudio_large_44k_v2_fp16.safetensors
- MMAudio NSFW: phazei/mmaudio_large_44k_nsfw_gold_8.5k_final_fp16.safetensors
- Qwen3-TTS: Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice (or 0.6B for lower VRAM)

Description

clearer descriptions, trim added to Voice Design.

FAQ

Comments (4)

Generation3dXFeb 13, 2026

CivitAI

This is working well for MMAudio and it's picking up sync on "oohs" and "ahhs" well by monitoring the open/close mouth position of the figures in the video. Filthy "wet" sounds though seem few and far between, but I guess we have to blame the model for that.

What is unclear is why you've integrated Qwen TTS into this workflow as it's not going to change the video and so you won't get lipsync like you do with LTX-2. Or am I missing something?

Partisano

Author

Feb 15, 2026

Great question! Qwen TTS is an excellent choice for traditional narration and voice-over work. Given how easy and straightforward it was to integrate, it would be a shame not to include it.

TheRamPricesAreTooDamHighFeb 23, 2026

@Partisano But can you combine them both sound effect froms mmaudio with qwen tts?

Partisano

Author

Feb 24, 2026

@TheRamPricesAreTooDamHigh You do not "combine" audio. You mix it. And yes, that's what this worfklow is for.

Workflows

Other

by Partisano

Download (Beta) View on CivitAI

concept

nsfw