Gemini / Gemma ⚠ 古い情報の可能性

LiteRTとNPUで実現するオンデバイスAIの実装 Building real-world on-device AI with LiteRT and NPU

Google Developers Blog · developers.googleblog.com · 2026/04/23 09:00 · 2mo ago · 📖 2 min

元記事を読む古い情報の可能性

AI 3 行サマリ

GoogleはLiteRTにNPUサポートを拡張し、Qualcomm、MediaTek等のチップ上で機械学習モデルを高速かつ省電力で実行可能にした。
早期アクセスプログラムを通じて開発者は実機での推論最適化を進められる。

English summary

LiteRT is a production-ready framework designed to help mobile developers unlock the power of Neural Processing Units (NPUs), overcoming the performance and battery limitations of traditional CPU or G

Googleは、オンデバイス機械学習ランタイムLiteRT(旧TensorFlow Lite)に、NPU(ニューラル処理ユニット)サポートを本格的に拡張したことを明らかにした。スマートフォンやエッジ端末で大規模化するAIワークロードを、CPU/GPUより高速かつ省電力で実行する基盤を整える狙いがある。

発表によると、LiteRTはQualcomm、MediaTek、Samsungなど主要SoCベンダーのNPUに対応する形で抽象化レイヤーを刷新し、開発者は単一のAPIから複数のハードウェアバックエンドを利用できるようになる。これまでNPU活用にはベンダー固有のSDKが必要で、移植性やデバッグの煩雑さが課題だったが、LiteRTがハードウェア選択とデリゲートの管理を担うことで、モデル配信・実行のワークフローが簡素化される。

また、量子化済みモデルの実行性能向上、メモリ消費の削減、レイテンシ予測可能性の改善といったオンデバイス特有の要件にも踏み込んでいる。Googleは早期アクセスプログラム(EAP)を通じ、画像処理、音声認識、生成AIなど実アプリでの最適化事例を集めながらAPIを成熟させる方針とみられる。

GoogleはLiteRTにNPUサポートを拡張し、Qualcomm、MediaTek等のチップ上で機械学習モデルを高速かつ省電力で実行可能にした。

✨ Gemini / Gemma · 本記事のポイント

背景として、Apple Neural EngineやQualcomm Hexagonなど各社のNPUは年々TOPS性能を伸ばしており、Gemini Nanoに代表される小型LLMをローカル実行する需要も高まっている。クラウドへの依存を減らすことはプライバシー・コスト・オフライン対応の観点で有利であり、Hugging FaceやONNX Runtimeなど競合フレームワークも同様にNPU対応を進めている。LiteRTの統合APIは、Androidエコシステム全体でのオンデバイスAI普及を加速させる可能性がある。

Google has announced a significant expansion of LiteRT (formerly TensorFlow Lite), bringing first-class support for NPUs (Neural Processing Units) from major silicon vendors. The move aims to give developers a unified path to running increasingly large on-device AI workloads with better latency and power efficiency than CPU or GPU execution alone.

According to the announcement, LiteRT now abstracts NPU backends from Qualcomm, MediaTek, Samsung and other SoC partners behind a single API. Historically, tapping into an NPU required vendor-specific SDKs, custom toolchains and substantial porting effort, which made cross-device deployment painful. By owning the delegate and hardware-selection layer, LiteRT lets a model author target multiple chips with the same runtime invocation, reducing the engineering burden of shipping AI features across the fragmented Android device landscape.

The update also addresses practical concerns specific to on-device inference, including efficient execution of quantized models, lower memory footprint, and more predictable latency for interactive use cases. Google is rolling out access through an early access program, suggesting that the company wants to mature the API alongside real production workloads in domains such as imaging, speech, and small generative models before a broader release.

The broader context matters here. NPU performance, measured in TOPS, has been climbing rapidly across recent flagship and even mid-range SoCs, and the rise of compact on-device LLMs — Gemini Nano being the most visible example — has intensified demand for hardware acceleration that goes beyond GPU shaders. Running models locally offers tangible benefits in privacy, offline availability, and per-query cost, all of which are increasingly relevant as AI features move from novelty to default expectation in mobile apps.

LiteRT is not alone in this push. ONNX Runtime, Apple's Core ML with the Neural Engine, Qualcomm's AI Hub, and MediaTek's NeuroPilot all pursue similar goals, and frameworks like MLC LLM and llama.cpp have shown community appetite for portable on-device inference. What Google appears to be betting on is that an Android-native, vendor-neutral runtime — backed by the same team that ships TensorFlow and Gemini — can become the default substrate for shipping AI features at Play Store scale.

For developers, the practical implication is that experimenting with NPU acceleration may soon require less vendor lock-in and less bespoke code. Whether LiteRT can deliver consistent performance across heterogeneous silicon, and how transparently it handles fallback when a given operator is not supported on a particular NPU, will likely determine adoption. Those questions are exactly what an early access program is positioned to answer, and the results could meaningfully shape how on-device AI is built on Android over the next year.