Chatterbox Turbo is an open-source voice model built by Resemble AI. It's meant to be fast and ready to use in production with built-in safety like watermarking and support for over 23 languages. It runs under the MIT license and gives devs more control and a clear look at how it works.

Tech specs.

Size: about 350 million parameters
Delay: ~75ms on GPU
Voice sample: needs just 5 seconds
Watermark: built-in PerTh
Languages: 23+

It does text-to-speech almost 6× faster than real time on GPU with just 75ms delay.It can copy a voice using only about 5 seconds of audio, no extra training needed.

Paralinguistic tags.

This is the standout part. Other tools don’t really do this. You can change how emotional the voice sounds with one setting, from flat to dramatic. It lets you type tags like [gasp], [laugh], or [cough] to make the voice react naturally.

Usage. Good for anything that needs fast or live voice output, like:

Voice bots and smart assistants
Games or interactive stuff
Audiobooks or storytelling
Accessibility tools or mixed media

Key Features

Supported Languages

Model Performance Editor’s Rating

No editor performance evaluations available for this model yet.

User Ratings

Censorship

Lower = less censorship. Higher = stricter filtering.

Creativity

Expressiveness

Generation Speed

ID preservation

Prompt Following

Realism

Chatterbox Turbo audio model

Key Features

Supported Languages

Model Performance Editor’s Rating

User Ratings

Chatterbox Turbo Examples

Where To Find Chatterbox Turbo

Related Audio Models