Pixel WatchでLLMを動かすGoogleのLiteRT-LM──オンデバイスAIの新ランタイム Google's LiteRT-LM runtime enables on-device LLM inference on constrained hardware like Pi…

Qiita LLM tag · qiita.com · 2026/05/28 12:49 · 3w ago · 📖 1 min

AI 3 行サマリ

GoogleがエッジデバイスでLLMを効率実行するランタイム「LiteRT-LM」を公開。
Pixel Watch 4のSmart ReplyやChromeの要約などがサーバ不要で動作する。

English summary

Google's LiteRT-LM runtime enables on-device LLM inference on constrained hardware like Pixel Watch, powering Smart Replies and Chrome summaries locally via Gemma models.

GoogleはオンデバイスAI推論向けの新ランタイム「LiteRT-LM」を発表した。従来のLiteRTを拡張し、大規模言語モデルをウェアラブルを含む小型デバイスで動かすことを主目的としている。Pixel Watch 4のSmart ReplyやChrome上のWebページ要約といった機能が、クラウドを経由せずGemmaモデルによって端末内で処理されると紹介されている。

エッジLLMは数年前まで実用性を疑われていたが、量子化技術やハードウェアアクセラレータの進化により、腕時計クラスの端末でも応答可能なレベルに達しつつある。LiteRT-LMはその基盤となるランタイム層を統一し、Android Wear OSを含む複数プラットフォームへの展開を見込んでいると推察される。詳細な対応モデルや性能指標については元記事で確認することを推奨する。

Google has introduced LiteRT-LM, a new runtime designed to run large language models directly on edge devices, including resource-constrained hardware like smartwatches. Building on the existing LiteRT (formerly TensorFlow Lite) stack, it provides a unified inference layer optimized for on-device LLM execution without requiring a server round-trip.

According to the available context, features such as Pixel Watch 4's Smart Replies and Chrome's webpage summarization are powered locally by Gemma models through this runtime. This marks a meaningful shift from the era when on-device LLMs were dismissed as impractical toys, now enabled by advances in quantization and hardware accelerators.

The runtime is expected to support multiple Android platforms including Wear OS, though specific supported model sizes, benchmark figures, and API details are not fully confirmed in this summary. Readers should consult the original Qiita article for technical specifics and official documentation.