【AI Voice】Beyond Closed Source! Fish Audio S2 TTS Voice AI Model!

Click for online experience and download:

Workflow: Emotion Tagging - Text-to-Speech - FishAudio S2 TTS

Experience link: https://www.runninghub.ai/post/2033180769274830850/?inviteCode=rh-v1401

Workflow: Multi-Speaker Voice Cloning - FishAudio S2 TTS - Multilingual - Up to 10 Speakers

Experience link: https://www.runninghub.ai/post/2033180796315508737/?inviteCode=rh-v1401

Workflow: Single-Speaker Voice Cloning - 83 Languages - FishAudio S2 TTS (with Special Sound Effects) Outperforms Proprietary Models

Experience link: https://www.runninghub.ai/post/2033180836341747713/?inviteCode=rh-v1401

Hello everyone, 17 is back with more workflow sharing!

Today, I’m introducing an open-source audio model that claims to outperform proprietary ones. It has a massive 4 billion parameters—small for text-to-image models, but compared to Qwen’s 1.3B TTS, it’s significantly larger. Here’s its open-source page; as you can see, it’s packed with details. The training data spans up to 10 million hours, and it natively supports ComfyUI.

It supports 83 languages, with Chinese, English, and Japanese delivering the best results in the top tier, followed by second and third tiers.

With over 1,500 emotion tags expressed in natural language descriptions, plus special sound effects like clearing throat, pauses, broadcast tones, and more, it’s already unmatched in the open-source space on these two fronts alone. In blind tests, it has also outperformed many proprietary audio models.

I’ve packaged the workflows for you. Below are the prompts I tested for multilingual, single-speaker voice cloning. Feel free to give them a try!

[interested] 你是否想过，AI用83种语言有情感地跟你说话？

[excited] Fish Audio S2 is here with its 1500 natural language emotion tags!

[professional broadcast tone] Dies ist das erste Open-Source-TTS-Modell der Welt, das Sprachgefühle mit natürlichen Sprachbefehlen steuert.

[curious] Vous voulez qu'il rie, qu'il chuchote, ou même qu'il change de rôle ?

[delighted] ¡Solo tienes que añadir una etiqueta en el texto!

[excited] Ele suporta clonagem de voz zero-shot,

[happy] 10 ثوانٍ من الصوت يمكنها نسخ صوتك،

[happy] и может генерировать высококачественный звук с частотой 44.1 кГц.

[confident] Che si tratti di creare audiolibri, dialoghi per giochi,

[happy] 아니면 맞춤형 음성 비서를 만드는 것까지,

[confident] Fish Audio S2 laat je het mechanische gevoel vergeten.

[proud] Helt öppen källkod, gratis att använda.

[excited] Haluatko kokea tulevaisuuden äänen viehätyksen?

[excited] Hãy dùng thử ngay!

Description

Details

Files

AIVoiceBeyondClosedSource_v10.zip

Mirrors