無料のexecutorが一番高くついた話:Opus+ローカルQwenを40試行で測ったら全タスク最高額だった 無料のexecutorが一番高くついた話:Opus+ローカルQwenを40試行で測ったら全タスク最高額だった
- 「強いモデルで段取りを組ませて、安いモデルに手を動かさせる」。
- エージェントコーディングのコスト削減レシピとして、ほぼ定石です。
- 私もそう信じて検証しました。
- Opus 4.7 にオーケストレータをやらせ、ローカルで動かしている Qwen 3.
エージェントコーディングのコスト削減策として、「高性能なモデルに段取りを組ませ、安価なモデルに手を動かさせる」分業はほぼ定石と見なされてきた。ところが、この常識を実測で検証したある記事は、むしろ逆の結果に至ったと報告している。
検証では、オーケストレータ役に Anthropic の上位モデルとされる Opus 4.7 を据え、実際のコード実装を担うエグゼキュータには、ローカル環境で動かす Qwen 3 系のオープンモデルを割り当てた。エグゼキュータはローカル実行のため API 料金は実質ゼロ、つまり「無料」である。にもかかわらず、40 試行を通して測定したところ、全タスクでこの構成が最も高くついたという。
なぜ無料のはずの構成が最高額になるのか。報告から推測される要因の一つは、ローカルモデルの実装品質が一定しない点だ。エグゼキュータが誤った差分や不完全なコードを返すたびに、オーケストレータ側が原因を解析し、指示をやり直す必要が生じる。その都度、高価な上位モデルが長いコンテキストを読み込み直し、トークン消費が膨らんでいく。結果として、安いモデルの失敗を高いモデルが繰り返し尻拭いする構図になり、総コストが押し上げられた可能性がある。
Opus 4.7 にオーケストレータをやらせ、ローカルで動かしている Qwen 3.
この知見は、ローカル LLM やオープンモデルの活用を考えるうえで示唆に富む。Cline や Aider、Claude Code など、モデルを役割ごとに切り替えられるエージェントツールは増えており、「強いモデルで設計、弱いモデルで実装」という最適化はしばしば推奨される。だが、エグゼキュータの能力がタスクの難度に見合わなければ、分業のオーバーヘッドが削減効果を上回りかねない。
もっとも、これは特定のモデルとタスク条件での一例であり、Qwen 3 の量子化設定やプロンプト設計、対象タスクの性質次第で結果は変わり得る。重要なのは、「無料」や「安価」という単価だけでなく、試行回数や手戻りまで含めた総額で評価する視点だろう。ローカルモデルの性能向上が進む中で、こうした実測ベースの検証は今後も価値を持ち続けると見られる。
The premise behind much of today's agentic coding tooling is simple: pair an expensive, highly capable model with a cheaper one to keep costs down. A strong model plans the work and breaks it into steps, while a smaller, cheaper model carries out the bulk of the token-heavy execution. A recent write-up on Qiita tests that assumption directly and reaches a counterintuitive conclusion. When the author paired Anthropic's Opus 4.7 as an orchestrator with a locally hosted Qwen 3 model as the executor, the combination ended up being the most expensive option across every task measured over 40 trials.
The orchestrator-executor split, sometimes called planner-worker or manager-agent, has become close to folk wisdom in the agent-building community. The reasoning is intuitive: orchestration involves relatively little output—high-level reasoning, task decomposition, decision-making—while execution generates the long, repetitive token streams of writing files, editing code, and running and interpreting tests. If you can offload that bulk work to a free local model, the bill should, in theory, fall sharply.
The author appears to have set out to confirm this and instead found the opposite. The likely mechanism is the feedback loop. A weaker executor produces lower-quality or incomplete output, which forces the orchestrator to intervene more often—reviewing, correcting, re-planning, and re-issuing instructions. Each of those interventions consumes expensive Opus tokens, and because the orchestrator must repeatedly re-read accumulated context to understand what went wrong, the per-correction cost grows as the conversation lengthens. A free executor that fails frequently can therefore amplify, rather than reduce, spending on the paid model.
Some background helps frame why this matters. Qwen 3 is Alibaba's open-weight model family, available in a range of sizes and widely used for local deployment via runtimes such as Ollama, llama.cpp, vLLM, and LM Studio. Running it locally carries no per-token API charge, which is exactly why it is attractive as an executor. But "free" here refers only to marginal inference cost; the hidden expense surfaces elsewhere—in the orchestrator's token usage, in retries, and in wall-clock time. A locally hosted model is also often a smaller or quantized variant rather than a frontier hosted system, so its coding reliability tends to lag, widening the gap the orchestrator must close.
The finding aligns with a broader tension in agentic systems: total cost is governed by end-to-end task completion, not by the headline price of any single component. Tool calls, multi-turn correction, and context re-ingestion can dominate the bill. This is part of why providers have pushed prompt caching, smaller frontier tiers such as Haiku- or mini-class models, and more structured handoffs—mechanisms aimed at reducing exactly the redundant context costs this experiment exposes.
For practitioners, the takeaway is not that local executors are useless, but that the economics depend heavily on the executor's success rate for a given class of task. Where a local model can complete steps reliably with minimal supervision, the orchestrator-executor split can still pay off. Where it cannot, every failed attempt is effectively paid for twice—once in local compute and again in the expensive orchestrator's corrective overhead.
The result should be read with the usual caveats. It reflects one author's task set, model versions, and prompting strategy over 40 trials, and outcomes are likely sensitive to executor size, quantization level, task difficulty, and how tightly the orchestration loop is engineered. A more capable local executor, or a different mix of tasks, could shift the balance. Even so, the experiment is a useful reminder to measure total spend empirically rather than assume that adding a free component lowers the bill. Benchmarking cost on your own workload, including retries and context growth, appears to be the only reliable way to know whether the popular planner-executor recipe actually saves money. The intuitive architecture and the cheapest line item, this case suggests, are not always the same thing.
本ページの本文・要約は AI による自動生成です。正確性は元記事 (qiita.com) をご確認ください。