LiteRT-LM
Table of content
what it is
LiteRT-LM is google’s open-source inference framework for deploying LLMs on edge devices. built on LiteRT — the runtime already trusted by millions of android developers. supports android, iOS, and edge hardware.
not a research prototype. production-grade.
why it matters for personal AI
the “local AI” conversation just moved from enthusiast to mainstream. LiteRT-LM ships with gemma 4 E2B already quantized for mobile. your phone becomes a personal AI server — offline, zero cloud dependency.
combined with gemma 4’s frontier-class reasoning, the stack is complete:
- model: gemma 4
- runtime: LiteRT-LM
- device: your phone
personal AI you carry everywhere.
key features
- production-ready inference on constrained hardware
- pre-quantized models (gemma 4 E2B)
- cross-platform: android, iOS, edge devices
- built on the battle-tested LiteRT runtime
- open source (google-ai-edge)
the take
google just handed the open-source community the production plumbing that was missing. running a model on your laptop was step one. running it on your phone, in production, with google’s own runtime — that’s the last mile.
the phone isn’t an AI client anymore. it’s the server.