Available Models

Explore the wide range of AI models supported by TTS WebUI

Text-to-Speech Models

Vall-E-X
Multilingual text-to-speech model supporting English, Chinese, and Japanese

By Plachtaa

StyleTTS2
StyleTTS2 is a text-to-speech model that generates high-quality speech with controllable style

By StyleTTS2 Team

Seamless M4T
SeamlessM4T is a multilingual and multimodal translation model supporting text and speech

By Facebook

MMS
MMS (Massively Multilingual Speech) is a text-to-speech model supporting over 1000 languages

By Facebook

Tortoise TTS
Tortoise TTS is a high-quality text-to-speech model with voice cloning capabilities

By neonbjb

F5-TTS
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching.

By Yushen Chen et al.

Chatterbox
Chatterbox, Resemble AI's first production-grade open source TTS model

By Resemble AI

Kokoro
Kokoro: A small, fast, and high-quality TTS model

By hexgrad

Bark
Bark: A text-to-speech model

By Suno

XTTS
XTTS-Simple is a Gradio UI for XTTSv2

By rsxdalv

Parler-TTS
Parler-TTS is a training and inference library for high-fidelity text-to-speech (TTS) models.

By rsxdalv

CosyVoice
CosyVoice: High-quality text-to-speech synthesis.

By rsxdalv

MARS5
MARS5: A novel speech model for insane prosody

By CAMB.AI

DIA
DIA: A text-to-dialogue model

By Nari Labs

GPT-SoVITS
GPT-SoVITS: A TTS solution powered by GPT and SoftVC VITS Singing Voice Conversion.

By rsxdalv

Audio & Music Generation Models

ACE-Step
ACE-Step: A Step Towards Music Generation Foundation Model

By ACE-Step

Stable Audio
Stable Audio is a text-to-audio model for generating high-quality music and sound effects

By Stability AI

Audiocraft
Audiocraft provides MusicGen and MAGNeT models for high-quality music and audio generation

By Facebook

AudioCraft Plus
AudioCraft Plus is an all-in-one WebUI for the original AudioCraft, adding many quality features on top.

By GrandaddyShmax

Audio Conversion Models

Vocos
Vocos is a neural audio codec for high-quality audio compression and reconstruction

By charactr

RVC
RVC: Retrieval-based Voice Conversion

By RVC Team

Demucs
Demucs is a music source separation model that can separate drums, bass, vocals, and other instruments

By Facebook

Conversational AI Models

Kimi Audio
Kimi Audio is a powerful text-to-speech and speech-to-text model by Moonshot AI

By Moonshot AI

MiMo-Audio
MiMo-Audio by Xiaomi

By Xiaomi MiMo