KittenTTS

Table of content

a voice layer for your AI that fits in less RAM than a browser tab.

what it is

KittenTTS by KittenML is a SOTA super-tiny text-to-speech model, v0.8, released with three model sizes:

14M parameters — smallest, fastest
40M parameters — balanced
80M parameters — best quality, still tiny

all weights are Apache 2.0. all three sizes run under 25MB. the quality is notably better than the model weights would suggest.

why size matters

“tiny TTS model” sounds like a compromise. it isn’t, or at least not in the way it used to be.

the previous generation of acceptable-quality local TTS required 400MB+ models with noticeable latency on CPU. KittenTTS v0.8 delivers comparable quality at a fraction of the footprint. this is the same curve that made 7B language models genuinely useful — model efficiency improvements compounding over time.

at <25MB, KittenTTS:

→ runs in RAM you already have
→ initializes in under a second
→ doesn’t require a GPU
→ works offline, permanently

what this enables

the personal AI OS with a voice layer that:

  1. doesn’t require a cloud TTS API
  2. doesn’t phone home
  3. runs on a laptop, Mac mini, or Raspberry Pi
  4. starts instantly
  5. works when the internet is down

combine this with a local language model and zvec for memory. you now have: voice I/O + language understanding + persistent memory. three local components. no cloud dependencies.

three model sizes, one decision

the 80M model fits in about 20MB of RAM. for comparison, this is smaller than most browser extensions.

self.md angle

voice is the last interface layer that’s been hard to do locally without trade-offs. KittenTTS v0.8 closes most of that gap. the quality isn’t indistinguishable from cloud TTS — but it’s good enough to use, which is the real threshold.

good enough + offline + zero cost = default choice for a local AI setup.

zvec — in-process memory layer
daytona — execution environment for AI-generated code