Local LLM / Open Models 🔥 HOT

Ollama v0.30.0リリース――llama.cpp直接サポートとGGUF互換性を実現 v0.30.0-rc32: llama-server followups (#16353)

Ollama Releases · github.com · 2026/06/02 02:44 · 2w ago · 📖 2 min

元記事を読む鮮度 OK

AI 3 行サマリ

OllamaがバージョンのV0.30.0をリリースし、アーキテクチャをGGML上のラッパーからllama.cppの直接サポートへと刷新。
GGUFファイル形式との互換性を確保し、Apple SiliconではMLXによる高速化も導入された。

English summary

llama-server followups Misc fixes for #16031 Add back dropped ROCm build flag for multi-GPU support on windows Fix amdhip64_*.dll version detection for "latest" selection Fix embeddings API for consis

Ollamaがv0.30.0をリリースし、内部アーキテクチャに大きな変更を加えた。最大の変更点は、これまでGGML（GPT-Generated Model Language）ライブラリの上に構築されていた推論エンジンを、llama.cppを直接サポートする構成へと移行したことだ。この変更はユーザーインターフェースに直接影響するものではないが、パフォーマンスや将来的な機能拡張に対して重要な基盤となる。

llama.cppはGeorgii Gerganov氏が開発したC/C++製のLLM推論ライブラリで、量子化技術を活用することでCPUのみの環境でも大規模言語モデルを動作させられる点が特徴だ。GGMLはその基盤となるテンソル演算ライブラリだが、llama.cppプロジェクトは独自の進化を遂げており、両者を区別して直接サポートすることで、最新の最適化や機能をより迅速に取り込める可能性がある。

あわせて、GGUFファイル形式への対応が強化された。GGUFはGGMLの後継フォーマットとして2023年に導入されたもので、モデルのメタデータや量子化情報をより柔軟に格納できる設計になっている。Hugging FaceなどのモデルハブでもGGUF形式の配布が主流となっており、この互換性強化によりユーザーがサードパーティのモデルファイルをOllamaで利用しやすくなると見られる。

OllamaがバージョンのV0.30.0をリリースし、アーキテクチャをGGML上のラッパーからllama.cppの直接サポートへと刷新。

🏠 Local LLM / Open Models · 本記事のポイント

Apple Silicon環境においては、MLX（Machine Learning eXchange）を活用した推論加速が導入された。MLXはAppleが開発したオープンソースの機械学習フレームワークで、M1・M2・M3チップのUnified Memoryアーキテクチャを最大限に活かした高効率な行列演算が可能だ。MacユーザーにとってはローカルLLM推論の速度向上が期待できる。

ローカルLLMの実行環境としてOllamaはLM StudioやJan、llama.cpp本体と競合する位置にあるが、シンプルなCLIとREST APIで手軽に使える点が支持されている。今回のアーキテクチャ刷新により、エコシステム全体との連携がさらに深まる可能性がある。なお、本リリースはrc32というリリース候補番号が付与されており、最終的な安定版リリースに向けた検証が続いていると考えられる。

Ollama has shipped v0.30.0, and while the update may look incremental from the outside, the internal architecture changes are substantial. The most significant shift is a move away from building on top of the GGML tensor library toward directly integrating llama.cpp as the inference backend. For end users the interface remains familiar, but the change sets the stage for faster adoption of upstream optimizations and new model support.

llama.cpp, created by Georgi Gerganov, has become the de facto open-source engine for running large language models locally. Originally tightly coupled with GGML — its underlying tensor math library — llama.cpp has evolved into a broader project with its own optimizations, hardware backends, and model support that now extends well beyond what GGML alone provides. By integrating llama.cpp directly, Ollama positions itself to track upstream improvements more closely rather than relying on an intermediate abstraction layer.

Alongside this architectural change, v0.30.0 brings native compatibility with the GGUF file format. Introduced in 2023 as a successor to the older GGML format, GGUF stores model weights alongside richer metadata including quantization details, tokenizer configuration, and architecture parameters in a single self-contained file. Hugging Face and other model repositories have largely standardized on GGUF for community model distribution, so improved support here means users should find it easier to pull in third-party models without conversion headaches.

For macOS users on Apple Silicon, the release introduces MLX-based acceleration. MLX is Apple's open-source machine learning framework designed to exploit the unified memory architecture of M-series chips, where CPU and GPU share the same memory pool. This avoids the data-copy overhead typical of discrete GPU setups and can meaningfully improve throughput for local inference workloads. Apple has been quietly building out the MLX ecosystem, and its adoption in Ollama adds another practical showcase for the framework beyond research use cases.

Ollama competes in an increasingly crowded space of local LLM runners that includes LM Studio, Jan, and llama.cpp itself used directly via CLI. Ollama's appeal has always been its low-friction setup — a single binary, a simple REST API, and a model registry that abstracts download and configuration. The architectural improvements in v0.30.0 reinforce that positioning by ensuring the project stays current with the fast-moving llama.cpp ecosystem rather than lagging behind it.

It is worth noting that this release carries the designation rc32, suggesting it is still formally a release candidate and that final stability validation may be ongoing. Users in production environments may want to confirm stability before upgrading, though the high rc number implies the codebase has undergone extensive testing cycles.