LiteRT-LM

Table of content

what it is

LiteRT-LM is google’s open-source inference framework for deploying LLMs on edge devices. built on LiteRT — the runtime already trusted by millions of android developers. supports android, iOS, and edge hardware.

not a research prototype. production-grade.

why it matters for personal AI

the “local AI” conversation just moved from enthusiast to mainstream. LiteRT-LM ships with gemma 4 E2B already quantized for mobile. your phone becomes a personal AI server — offline, zero cloud dependency.

combined with gemma 4’s frontier-class reasoning, the stack is complete:

personal AI you carry everywhere.

key features

the take

google just handed the open-source community the production plumbing that was missing. running a model on your laptop was step one. running it on your phone, in production, with google’s own runtime — that’s the last mile.

the phone isn’t an AI client anymore. it’s the server.