iPhoneでローカルLLM、結局どのランタイムが速い？ MLX / llama.cpp / LiteRT-LM / CoreML を実機ベンチした A hands-on benchmark comparing four on-device LLM runtimes—MLX, llama.cpp, LiteRT-LM, and …

Qiita LLM tag · qiita.com · 2026/05/28 11:47 · 3w ago · 📖 1 min

AI 3 行サマリ

iPhone実機でMLX・llama.cpp・LiteRT-LM・CoreMLの4ランタイムをベンチマークし、ローカルLLMの推論速度を比較検証した記事。

English summary

A hands-on benchmark comparing four on-device LLM runtimes—MLX, llama.cpp, LiteRT-LM, and CoreML—running on a physical iPhone to determine which delivers the fastest inference.

iPhoneでローカルLLMを動かす際、MLX・llama.cpp・LiteRT-LM・CoreMLという複数のランタイム選択肢が存在するが、横断的な比較ベンチマークはほとんど公開されていなかった。本記事はその空白を埋めるべく、実機を用いて各ランタイムの推論速度を計測・比較したものと推察される。

Appleが積極的に推進するMLXはローカルLLM分野への本格参入を示しており、他の選択肢と比べてどの程度のアドバンテージがあるかが焦点の一つとなっている。具体的なスコアや使用モデル・テスト条件の詳細は元記事を参照されたい。

Running large language models locally on an iPhone has become increasingly feasible, but developers face a fragmented landscape of runtimes: Apple's MLX, the cross-platform llama.cpp, Google's LiteRT-LM, and Apple's own CoreML pipeline. Until now, side-by-side inference-speed comparisons on real hardware have been scarce, making runtime selection largely guesswork.

This article appears to address that gap by benchmarking all four runtimes on a physical iPhone, with Apple's MLX—an initiative signaling Apple's serious push into the local-LLM space—being a key contender. The specific models tested, token-per-second figures, and test conditions are detailed in the original source and should be verified there, as results can vary significantly by model size and device generation.