Claude Code・Codex CLI・Copilot CLI を QCD で比較する（オトナの自由研究 #16）

Claude Code・Codex CLI・Copilot CLI を QCD で比較する A personal benchmark comparing three AI coding CLIs — Claude Code, Codex CLI, and GitHub C…

Zenn LLM tag · zenn.dev · 2026/05/09 15:32 · 8h ago · 📖 2 min

AI 3 行サマリ

Raspberry Pi 4を題材に、Claude Code、Codex CLI、GitHub Copilot CLIの3種類のコーディングエージェントをQCD（品質・コスト・納期）の観点で比較した個人検証記事。
各CLIの実用性や得意分野の違いを浮き彫りにしている。

English summary

A personal benchmark comparing three AI coding CLIs — Claude Code, Codex CLI, and GitHub Copilot CLI — using a Raspberry Pi 4 project, evaluated through the QCD (Quality, Cost, Delivery) framework to highlight their respective strengths and trade-offs.

Anthropic の Claude Code、OpenAI の Codex CLI、GitHub の Copilot CLI という主要3社のコーディングエージェントが出揃ったことで、開発者にとって「どれを選ぶか」が現実的な課題になりつつある。本記事はその選定の難しさに対し、製造業で使われるQCD（Quality・Cost・Delivery）の枠組みを用いて検証した個人ブログである。

検証の舞台は Raspberry Pi 4 上の小規模プロジェクトで、筆者はオトナの自由研究と銘打ったシリーズの第16回として、3つのCLIに同種のタスクを与え、生成コードの品質、トークン課金や利用枠を含むコスト、そしてタスク完了までの所要時間を比較している。Raspberry Pi のような限られたリソース環境でCLIエージェントを動かすこと自体が、エッジ寄りのワークロードでLLMをどう活用するかという論点と重なる。

背景として、Claude Code は長文コンテキストとリポジトリ理解の強さで支持を集め、Codex CLI は OpenAI が再投入した OSS のターミナル向けエージェントで GPT-5 系モデルとの統合が進んでいる。GitHub Copilot CLI は IDE 統合資産と GitHub アカウント基盤を活かし、エンタープライズ採用の容易さで差別化していると見られる。三者はいずれも MCP やツール呼び出しを介してシェル・ファイル操作を行う点で構造は近く、差は基盤モデルの推論力と料金体系、UX の作り込みに収束しつつある可能性がある。

Raspberry Pi 4を題材に、Claude Code、Codex CLI、GitHub Copilot CLIの3種類のコーディングエージェントをQCD（品質・コスト・納期）の観点で比較した個人検証記事。

🏠 Local LLM · 本記事のポイント

QCDという古典的な評価軸を AI ツール選定に持ち込む発想自体は珍しく、ベンチマーク数値ではなく現場感覚に近い比較として読み応えがある。一方で個人の小規模検証であるため、結果は題材やプロンプト設計に強く依存する点には留意が必要だろう。

With Anthropic's Claude Code, OpenAI's Codex CLI, and GitHub's Copilot CLI all now generally available, choosing among the major coding-agent CLIs has become a real decision point for developers. This personal blog post tackles that question by borrowing a framework familiar to manufacturing engineers — QCD, short for Quality, Cost, and Delivery — and applying it to a hands-on comparison of the three tools.

The author runs the experiment on a Raspberry Pi 4 as the 16th installment of a hobbyist series called Otona no Jiyuu Kenkyuu (Adult Free Research). Each CLI is given comparable tasks, and the results are scored along three axes: the quality of the generated code, the cost in terms of token usage and subscription tiers, and the delivery time required to complete the task. Running agentic CLIs on a Raspberry Pi is itself an interesting angle, touching on how LLM-driven workflows behave in resource-constrained or edge-leaning environments.

For context, Claude Code has built a reputation around long-context reasoning and repository-wide understanding, making it a favorite for refactoring and exploratory coding. Codex CLI is OpenAI's re-launched open-source terminal agent, tightly integrated with the GPT-5 family of models and positioned as a flexible, scriptable alternative. GitHub Copilot CLI leverages GitHub's account infrastructure and existing IDE integrations, which likely gives it an edge in enterprise adoption even if its raw model capabilities sit somewhere between the other two.

Architecturally, the three offerings are converging. All of them rely on tool-calling and, increasingly, the Model Context Protocol (MCP) to perform shell commands, edit files, and reach external services. That means the differentiators are narrowing to the underlying model's reasoning ability, pricing structure, and the polish of the user experience — areas where each vendor is iterating rapidly.

Applying a QCD lens to AI developer tooling is a relatively uncommon framing and gives the post a refreshing, practitioner-oriented feel rather than a pure benchmark write-up. That said, readers should treat the conclusions as directional. The sample is small, the workload is a single Raspberry Pi project, and outcomes for coding agents are known to vary significantly with prompt style, repository structure, and task type. As an individual experiment, however, it offers a useful data point alongside more formal evaluations and may help others structure their own selection process when picking a daily-driver coding CLI.