CivArchive
    MoanForge – MMAudio SFW+NSFW Audio Enhancer w/ Qwen TTS - v1.1
    NSFW

    Add killer audio to ANY clip - moaning/voice-over TTS, filthy SFX, breathy voices - or keep it clean with SFW sounds. No editing skills needed.

    There are descriptions in the workflow - READ THEM CAREFULLY.


    Yo creators - if you're tired of flat, boring video audio and want to crank immersion to 11 with AI-powered SFX layers, breathy moans, custom voices, wet sounds, slapping, and seamless mixing - this is your new go-to workflow.

    Built in ComfyUI specifically for NSFW / adult video generation. Takes any input clip, interpolates frames for smooth playback, generates ultra-realistic or exaggerated lewd sound effects via dual MMAudio branches (SFW for clean ambience & impacts, NSFW for gagging, slurping, sticky thrusts, heavy breathing), overlays expressive TTS dialogue/moans from Qwen3-TTS (voice cloning from reference or pure text-based design), and even lets you blend in the original video audio for that hybrid real-AI punch.

    No more manual editing in Audacity or Resolve - queue it up, tweak volumes/seeds/prompts, and export MP4s with timestamped filenames ready for your stash. Perfect for adult ASMR, scene enhancement, erotic animation, or just experimenting with er... sound design.

    ### Key Features
    - **Dual MMAudio Branches** - SFW (vanilla model for natural sounds) + NSFW (gold-tuned for explicit SFX). Mute groups to switch modes without errors.
    - **Qwen TTS Power** - Separate groups for Voice Design (text-based fantasy voices like “sultry breathy moans”) or Voice Cloning (from ref audio + transcript). Auto-trims to match video length.
    - **Geeky AudioMixer Core** - 4-track mixing with per-layer volume, start offsets, fades, master normalize/compress/limit. Crank original video audio or mute it via Primitive toggle.
    - **RIFE FPS Converter** - Auto-handles any input FPS → targets 25 fps (or your choice) for perfect MMAudio sync.
    - **Shared Seed Control** - One Primitive seeds MMAudio + TTS for consistent randomness across layers.

    ### How to Use
    1. Drop your video into VHS_LoadVideo.
    2. Tweak prompts:
    - MMAudio SFW and/or NSFW
    - TTS (Voice Design or Cloning)
    3. Mute groups for modes: Bypass SFW/NSFW/TTS as needed
    4. Adjust mixer volumes: audio_1 = main voice/TTS, optionals = SFX/original.
    5. Queue — outputs MP4 with embedded audio.

    ### Pro Tip – Pair with LTX2 (Latent Text-to-Video)
    LTX2 is blowing up for realistic lip sync and facial animation right now - but it lacks rich, layered lewd audio (moans, wet SFX, heavy breathing, gagging).
    Easy hybrid:
    1. Generate your talking-head / character video in LTX2.
    2. Load the LTX2 MP4 into this workflow's VHS_LoadVideo node.
    3. Run as usual - MMAudio adds filthy SFX layers, Qwen TTS overlays breathy moans/dialogue.
    4. Mix with LTX2 original audio (or mute it) - final export has perfect visual sync + dirty sound design.
    LTX2 does lips & face, this workflow does the lewd audio. Instant upgrade. 🔥

    ### Low-VRAM Tips (12GB seems to be a minimum):
    - Use **Qwen3-TTS 0.6B** instead of 1.7B → saves ~3–4 GB with only minor quality drop on short clips.
    - Set **unload_model_after_generate = true** in TTS nodes → unloads TTS immediately after generation.
    - Reduce **RIFE batch_size** to 4 or 2 → lowers peak during interpolation.
    - For tighter VRAM: mute RIFE group if not needed, or add **Unload All Models** node (ComfyUI-Unload-Model extension) after TTS before MMAudio.
    - Peak usage drops to ~9-11 GB with these tweaks - 8 GB cards are very tight (possible only with extreme cuts like no RIFE + tiny clips).

    ### Requirements (Custom Nodes via ComfyUI Manager)
    - comfyui-mmaudio (kijai/ComfyUI-MMAudio)
    - qwen3-tts-comfyui (flybirdxx/ComfyUI-Qwen-TTS or similar fork)
    - ComfyUI_Geeky_AudioMixer (GeekyGhost/ComfyUI_Geeky_AudioMixer)
    - ComfyUI-VideoHelperSuite (Kosinkadink/ComfyUI-VideoHelperSuite)
    - rgthree-comfy (rgthree/rgthree-comfy)
    - ComfyUI-VFI (Fannovel16/ComfyUI-Frame-Interpolation) for RIFE

    ### Models Needed (auto-download or manual from Hugging Face)
    - MMAudio SFW: kijai/mmaudio_large_44k_v2_fp16.safetensors
    - MMAudio NSFW: phazei/mmaudio_large_44k_nsfw_gold_8.5k_final_fp16.safetensors
    - Qwen3-TTS: Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice (or 0.6B for lower VRAM)

    Description

    clearer descriptions, trim added to Voice Design.

    FAQ

    Comments (4)

    Generation3dXFeb 13, 2026
    CivitAI

    This is working well for MMAudio and it's picking up sync on "oohs" and "ahhs" well by monitoring the open/close mouth position of the figures in the video. Filthy "wet" sounds though seem few and far between, but I guess we have to blame the model for that.

    What is unclear is why you've integrated Qwen TTS into this workflow as it's not going to change the video and so you won't get lipsync like you do with LTX-2. Or am I missing something?

    Partisano
    Author
    Feb 15, 2026

    Great question! Qwen TTS is an excellent choice for traditional narration and voice-over work. Given how easy and straightforward it was to integrate, it would be a shame not to include it.

    @Partisano But can you combine them both sound effect froms mmaudio with qwen tts?

    Partisano
    Author
    Feb 24, 2026

    @TheRamPricesAreTooDamHigh You do not "combine" audio. You mix it. And yes, that's what this worfklow is for.

    Workflows
    Other

    Details

    Downloads
    751
    Platform
    CivitAI
    Platform Status
    Available
    Created
    2/2/2026
    Updated
    6/29/2026
    Deleted
    -

    Files

    moanforgeMmaudioSFWNSFW_v11.zip

    Mirrors

    HuggingFace (1 mirrors)