CivArchive
    LongCat TTS Workflows: Text-to-Speech, Single & Dual Voice Cloning for Film Dubbing - v1.0
    NSFW

    Click here to try online first :

     

    Workflow: Text-to-Speech – LongCat AudioDiT TTS (Auto Translation)

    Experience link: https://www.runninghub.ai/post/2039728380941176834/?inviteCode=rh-v1401

     

    Workflow: Single-Person Voice Cloning – LongCat-AudioDiT (Auto Translation)

    Experience link: https://www.runninghub.ai/post/2039728236652924929/?inviteCode=rh-v1401

     

    Workflow: Two-Person Voice Cloning – LongCat-AudioDiT – Dialogue – Script – Screenplay

    Experience link: https://www.runninghub.ai/post/2039728409407918081/?inviteCode=rh-v1401

     

    Workflow: All-Purpose Image Pro – Text-to-Image – Single, Double, Triple, Quadruple Images – Image Editing

    Experience link: https://www.runninghub.ai/post/2026244873988345857/?inviteCode=rh-v1401

     

    Workflow: Lip-Sync Speaking & Singing – LTX2.3 Image-to-Digital Human – Auto Expansion – Module Optimization – No Subtitles

    Experience link: https://www.runninghub.ai/post/2038618856104665090/?inviteCode=rh-v1401

     

    Workflow: AA – Various Small Tools for Image, Audio, Video Processing (Continuously Updated)

    Experience link: https://www.runninghub.ai/post/2027021102093967362/?inviteCode=rh-v1401

     

     

     

     

    This set includes three ComfyUI workflows based on Meituan's open-source LongCat – a SOTA-level TTS model with high timbre fidelity and fast inference.

     

    Workflows included:

     

    Text-to-Speech – Convert any text to speech. Supports BF16/FP32. More steps = better quality.

     

    Single Voice Cloning – Upload a reference voice, write your target text, and generate cloned speech. Includes ASR (Qwen) for automatic transcription of the sample.

     

    Dual Voice Cloning – Generate a dialogue between two speakers. Format the prompt properly (speaker tags) for the model to recognize.

     

    Important notes:

     

    ⚠️ Numbers – Do NOT use Arabic numerals (e.g., 123). Always write numbers in their spoken Chinese form (e.g., "一百二十三"). Otherwise the output will be garbled.

     

    Voice loudness – The workflow automatically normalizes loudness to avoid pops/clipping. Keep the default reduction amount unless your source is extremely loud (red waveform in editors).

     

    Model precision – BF16 works well; FP32 requires ~20GB VRAM.

     

    Random seed – Controls some variation (cannot specify gender/tone directly).

     

    Recommended for:

     

    Film dubbing, AI-generated dialogue scenes, podcast-style dual-voice content.

     

    Combine with LTX 2.3 for image-to-digital human animation to create cinematic conversation scenes. For best results, use single cloning per fixed shot – dual cloning may cause unwanted frame transitions in LTX 2.3.

     

    Requirements:

     

    ComfyUI with custom nodes (ASR, LongCat, etc.)

     

    LongCat model files (see project page in comments)

     

    Timestamps (in video):

    Text-to-speech setup → Voice cloning → Dual dialogue → Integration with LTX 2.3

     

    Enjoy making AI-powered film shorts! Feel free to ask questions below.

    Description

    LongCat audio DIT

    Workflows
    Other

    Details

    Downloads
    56
    Platform
    CivitAI
    Platform Status
    Available
    Created
    4/8/2026
    Updated
    5/28/2026
    Deleted
    -

    Files

    longcatTTSWorkflowsTextTo_v10.zip

    Mirrors