Zonos
Zonos by Zyphra is an open-source AI-powered text-to-speech tool that copies voices from short samples, supports multiple languages, and offers dynamic speech generation.
Overview
Zonos is an open-source text-to-speech (TTS) tool built by Zyphra, a Palo Alto AI startup. It creates natural voices from text and can copy voices with just a short audio clip. Zyphra is also behind Zamba, a fast small-scale language model.
AI Voice Cloning
Zonos copies voices using only 5 to 30 seconds of recorded audio. No long training required.
Supports Multiple Languages
It’s trained on 200,000 hours of speech across six languages—English, Chinese, Japanese, French, Spanish, and German.
Dynamic Speech Generation
Users can tweak voice speed, pitch, and emotion, including settings for happiness, fear, sadness, and anger.
- Voice Assistants. Makes AI voices sound more natural and lifelike.
- Audiobooks. Creates high-quality spoken content with emotional depth.
- Custom Voice Cloning. Copies specific voices for unique applications.
Zonos is open-source under the Apache 2.0 license and available on GitHub, but Zyphra also offers a hosted API and model playground for easier access.
At the moment GitHub repository only supports Linux systems (preferably Ubuntu 22.04/24.04) with recent NVIDIA GPUs (3000-series or newer, 6GB+ VRAM).
Zonos runs with a real-time factor of ~2x on an RTX 4090 (i.e. generates 2 seconds of audio per 1 second of compute time).
There is Zonos for Windows https://github.com/sdbds/Zonos-for-windows
Supported Languages
- Chinese
- English
- French
- German
- Japanese
- Spanish
Tags
Freeware Apache License 2.0 Web-based #Voice & AudioLinks
- API Availability
- Pitch Editing
- Voice Cloning
- Voices with Emotions
- Open-source & free. Users love that Zonos is a no-cost alternative to pricey TTS options like ElevenLabs.
- Better voice quality? Many say Zonos sounds more expressive than ElevenLabs, especially without the low-bitrate issues.
- Emotional speech control. It supports adjustments like pitch, speed, and emotions (anger, happiness, sadness, etc.), but some say the docs don’t fully explain how.
- While trained mostly on English, Zonos claims multilingual capabilities, though real-world tests vary.
- Some are self-hosting the model, while others are using Zyphra's API ($0.02 per minute).
- Performance concerns. Runs at 2x real-time on an RTX 4090, but older GPUs struggle. Windows users report setup issues due to dependencies.
- Many see Zonos as perfect for audiobooks, home assistants, and AI-driven narration.
[ Reddit ]
"Someone cloned this voice and made me say 'blah bla bla bla' here for you guys. I'm a little bit pissed, but hey, actually... I'm not because I'm an AI and I don't have that kind of crap. Emotions, ya know? Who need that?! ... On the other hand, many people need voices with emotions, and that's where I come in. A synthetic voice generated by AI and then cloned by Zonos to demonstrate its capabilities for this website called www.aicreatrs.tools - check it out."
Generated on February 14, 2025:
Someone cloned this voice and made me say 'blah bla bla bla' here for you guys. I'm a little bit pissed, but hey, actually... I'm not because I'm an AI and I don't have that kind of carp. Emotions, ya know? Who needs that?!
Generated on February 14, 2025:
This page was last updated on February 15, 2025 at 1:05 AM