CivArchive
    Stable Audio 3.0 Medium Base workflow in ComfyUI | Text-to-Audio - v1.0
    NSFW
    Preview 132848497

    Turn prompts into rich, realistic audio and music instantly.

    Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning - you still choose inputs, prompts, and settings.

    Open preloaded workflow on RunComfy

    Open preloaded workflow on RunComfy (browser)

    Why RunComfy first
    - Fewer missing-node surprises - run the graph in a managed environment before you mirror it locally.
    - Quick GPU tryout - useful if your local VRAM or install time is the bottleneck.
    - Matches the published JSON - the zip follows the same runnable workflow you can open on RunComfy.

    When downloading for local ComfyUI makes sense - you want full control over models on disk, batch scripting, or offline runs.

    How to use (local ComfyUI)
    1. Load inputs (images/video/audio) in the marked loader nodes.
    2. Set prompts, resolution, and seeds; start with a short test run.
    3. Export from the Save / Write nodes shown in the graph.

    Expectations - First run may pull large weights; cloud runs may require a free RunComfy account.


    Overview

    With this official audio generation setup, you can turn text prompts into expressive, high-quality music and ambient audio. It supports extended playback, smooth tonal transitions, and flexible sound layering. Great for sound designers, musicians, or developers experimenting with text-to-audio generation. The workflow uses T5Gemma and Qwen3.5 encoders to enhance prompt accuracy and output quality. Its reproducible structure ensures consistent creative results for professional audio projects.

    Important nodes:

    Key nodes in Comfyui Stable Audio 3.0 Medium Base workflow

    • ComfySwitchNode (#34). Toggles between the original user_input and the Qwen-generated text. Turn it on for structured, length-matched rewrites or off for direct control.

    • TextGenerate (#28). Runs Qwen3.5 with a category-specific system prompt to expand ideas. To customize the rewrite style, edit the category templates in JsonExtractString (#49) and the glue prompts in the adjacent Text Replace nodes.

    • EmptyLatentAudio (#11). Sets clip length. Keep this aligned with the inserted AUDIO_LENGTH token so the synthesis time matches the textual intent.

    • KSampler (#3). Governs the denoising trajectory for Stable Audio 3. Adjust seed for variations while keeping other settings stable to compare takes fairly.

    • SaveAudioMP3 (#19). Controls the output filename prefix and format for quick library building from multiple runs.

    Notes

    Stable Audio 3.0 Medium Base workflow in ComfyUI | Text-to-Audio - see RunComfy page for the latest node requirements.

    Description

    Initial release - Stable-Audio-3-Medium.

    Workflows
    Other

    Details

    Downloads
    86
    Platform
    CivitAI
    Platform Status
    Available
    Created
    6/5/2026
    Updated
    6/29/2026
    Deleted
    -

    Files

    stableAudio30MediumBase_v10.zip

    Mirrors

    HuggingFace (1 mirrors)