Zonos

Zonos by Zyphra is an open-source AI-powered text-to-speech tool that copies voices from short samples, supports multiple languages, and offers dynamic speech generation.

Overview

Zonos is an open-source text-to-speech (TTS) tool built by Zyphra, a Palo Alto AI startup. It creates natural voices from text and can copy voices with just a short audio clip. Zyphra is also behind Zamba, a fast small-scale language model.

AI Voice Cloning

Zonos copies voices using only 5 to 30 seconds of recorded audio. No long training required.

Supports Multiple Languages

It’s trained on 200,000 hours of speech across six languages—English, Chinese, Japanese, French, Spanish, and German.

Dynamic Speech Generation

Users can tweak voice speed, pitch, and emotion, including settings for happiness, fear, sadness, and anger.

  • Voice Assistants. Makes AI voices sound more natural and lifelike.
  • Audiobooks. Creates high-quality spoken content with emotional depth.
  • Custom Voice Cloning. Copies specific voices for unique applications.

Zonos is open-source under the Apache 2.0 license and available on GitHub, but Zyphra also offers a hosted API and model playground for easier access.

At the moment GitHub repository only supports Linux systems (preferably Ubuntu 22.04/24.04) with recent NVIDIA GPUs (3000-series or newer, 6GB+ VRAM).

Zonos runs with a real-time factor of ~2x on an RTX 4090 (i.e. generates 2 seconds of audio per 1 second of compute time).

There is Zonos for Windows https://github.com/sdbds/Zonos-for-windows

Supported Languages

  • Chinese
  • English
  • French
  • German
  • Japanese
  • Spanish

Tags

Freeware Apache License 2.0 Web-based #Voice & Audio

  • API Availability
  • Pitch Editing
  • Voice Cloning
  • Voices with Emotions

Educators and Trainers Creative Professionals Content Creators Media and Film Makers Marketing and Branding Specialists Voice and Audio Professionals Developers and Tech Creators Nonprofit and Advocacy Creators Small Business Owners Entertainment and Performance Artists Professional Content Creators

This tool is free to use and is offered under Apache License 2.0.

  • Open-source & free. Users love that Zonos is a no-cost alternative to pricey TTS options like ElevenLabs.
  • Better voice quality? Many say Zonos sounds more expressive than ElevenLabs, especially without the low-bitrate issues.
  • Emotional speech control. It supports adjustments like pitch, speed, and emotions (anger, happiness, sadness, etc.), but some say the docs don’t fully explain how.
  • While trained mostly on English, Zonos claims multilingual capabilities, though real-world tests vary.
  • Some are self-hosting the model, while others are using Zyphra's API ($0.02 per minute).
  • Performance concerns. Runs at 2x real-time on an RTX 4090, but older GPUs struggle. Windows users report setup issues due to dependencies.
  • Many see Zonos as perfect for audiobooks, home assistants, and AI-driven narration.

[ Reddit ]

Prompt: "Someone cloned this voice and made me say 'blah bla bla bla' here for you guys. I'm a little bit pissed, but hey, actually... I'm not because I'm an AI and I don't have that kind of crap. Emotions, ya know? Who need that?! ... On the other hand, many people need voices with emotions, and that's where I come in. A synthetic voice generated by AI and then cloned by Zonos to demonstrate its capabilities for this website called www.aicreatrs.tools - check it out."

Generated on February 14, 2025:

Cloned this voice I had generated earlier and made it say the text in the prompt. The result is pretty great.

Prompt: Someone cloned this voice and made me say 'blah bla bla bla' here for you guys. I'm a little bit pissed, but hey, actually... I'm not because I'm an AI and I don't have that kind of carp. Emotions, ya know? Who needs that?!

Generated on February 14, 2025:

This one's a little bit creepy as it started whispering and then stuttering at the end.

Rating:

This page was last updated on February 15, 2025 at 1:05 AM