Stable Audio 3.0 Medium Base workflow in ComfyUI | Text-to-Audio

Turn prompts into rich, realistic audio and music instantly.

Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning - you still choose inputs, prompts, and settings.

Open preloaded workflow on RunComfy

Open preloaded workflow on RunComfy (browser)

Why RunComfy first
- Fewer missing-node surprises - run the graph in a managed environment before you mirror it locally.
- Quick GPU tryout - useful if your local VRAM or install time is the bottleneck.
- Matches the published JSON - the zip follows the same runnable workflow you can open on RunComfy.

When downloading for local ComfyUI makes sense - you want full control over models on disk, batch scripting, or offline runs.

How to use (local ComfyUI)
1. Load inputs (images/video/audio) in the marked loader nodes.
2. Set prompts, resolution, and seeds; start with a short test run.
3. Export from the Save / Write nodes shown in the graph.

Expectations - First run may pull large weights; cloud runs may require a free RunComfy account.

Overview

With this official audio generation setup, you can turn text prompts into expressive, high-quality music and ambient audio. It supports extended playback, smooth tonal transitions, and flexible sound layering. Great for sound designers, musicians, or developers experimenting with text-to-audio generation. The workflow uses T5Gemma and Qwen3.5 encoders to enhance prompt accuracy and output quality. Its reproducible structure ensures consistent creative results for professional audio projects.

Important nodes:

Key nodes in Comfyui Stable Audio 3.0 Medium Base workflow

ComfySwitchNode (#34). Toggles between the original user_input and the Qwen-generated text. Turn it on for structured, length-matched rewrites or off for direct control.
TextGenerate (#28). Runs Qwen3.5 with a category-specific system prompt to expand ideas. To customize the rewrite style, edit the category templates in JsonExtractString (#49) and the glue prompts in the adjacent Text Replace nodes.
EmptyLatentAudio (#11). Sets clip length. Keep this aligned with the inserted AUDIO_LENGTH token so the synthesis time matches the textual intent.
KSampler (#3). Governs the denoising trajectory for Stable Audio 3. Adjust seed for variations while keeping other settings stable to compare takes fairly.
SaveAudioMP3 (#19). Controls the output filename prefix and format for quick library building from multiple runs.

Notes

Stable Audio 3.0 Medium Base workflow in ComfyUI | Text-to-Audio - see RunComfy page for the latest node requirements.

Open preloaded workflow on RunComfy

Overview

Key nodes in Comfyui Stable Audio 3.0 Medium Base workflow

Notes

Description

Details

Files

stableAudio30MediumBase_v10.zip

Mirrors