pocket‑tts (TTS that fits in your CPU) is an open-source text-to-speech model by Kyutai Labs released in January 2026. It’s designed to generate natural-sounding speech locally, efficiently, and without requiring a GPU — even on ordinary laptops or desktops.

Lightweight, CPU-focused TTS engine that runs in real time on standard hardware (e.g., laptop CPUs).
Voice cloning support — you can make it imitate a voice from a short sample.
Targeted at developers who want TTS without cloud APIs or GPU requirements.

Model & Performance

~100 million parameters, making it very small for a modern speech model.
Low latency: ~200 ms to first audio chunk and usually faster-than-real-time on CPUs.
Only 2 CPU cores needed, and doesn’t require GPU-enabled PyTorch.

APIs & Interfaces

Command-line interface (CLI) — for quick text-to-speech generation.
Python library API — integrate into Python apps.
Serve mode — run a local HTTP service to generate speech via REST calls.

Voice & Language

Includes a small catalog of builtin voices.
Voice cloning by providing a WAV file sample for personalization.
Primarily English support in the core project (some tools outside can supply voices).

Usage Scenarios

Local TTS engines for accessibility tools, desktop assistants, embedded applications.
Temporary voice synthesis (e.g., reading text aloud).
Prototyping speech apps without cloud dependency.

Limitations

Current build only supports CPU (no browser or GPU builds yet).
Primarily English — limited other language support out of the box.
Does not yet support some advanced features like silence control or quantized int8 models.

Pocket TTS audio model

Key Features