AI creators tools

Pocket TTS audio model

Name: pocket-tts
Licence: MIT License
Creator: Kyutai Labs

pocket‑tts (TTS that fits in your CPU) is an open-source text-to-speech model by Kyutai Labs released in January 2026. It’s designed to generate natural-sounding speech locally, efficiently, and without requiring a GPU — even on ordinary laptops or desktops.

  • Lightweight, CPU-focused TTS engine that runs in real time on standard hardware (e.g., laptop CPUs).
  • Voice cloning support — you can make it imitate a voice from a short sample.
  • Targeted at developers who want TTS without cloud APIs or GPU requirements.

Model & Performance

  • ~100 million parameters, making it very small for a modern speech model.
  • Low latency: ~200 ms to first audio chunk and usually faster-than-real-time on CPUs.
  • Only 2 CPU cores needed, and doesn’t require GPU-enabled PyTorch.

APIs & Interfaces

  • Command-line interface (CLI) — for quick text-to-speech generation.
  • Python library API — integrate into Python apps.
  • Serve mode — run a local HTTP service to generate speech via REST calls.

Voice & Language

  • Includes a small catalog of builtin voices.
  • Voice cloning by providing a WAV file sample for personalization.
  • Primarily English support in the core project (some tools outside can supply voices).

Usage Scenarios

  • Local TTS engines for accessibility tools, desktop assistants, embedded applications.
  • Temporary voice synthesis (e.g., reading text aloud).
  • Prototyping speech apps without cloud dependency.

Limitations

  • Current build only supports CPU (no browser or GPU builds yet).
  • Primarily English — limited other language support out of the box.
  • Does not yet support some advanced features like silence control or quantized int8 models.
Key Features
No performance evaluations available for this model yet.
No sample outputs available for this model yet.

No tools currently list this model.