MCPやAIエージェントのwrite actionに承認境界を置く設計 MCPやAIエージェントのwrite actionに承認境界を置く設計
- MCPやAIエージェントにwrite actionを渡すとき、最初に決めるべきことがあります。
- それは、AIが実行できるかどうかではありません。
- どの操作に、どんな承認境界を置くかです。
- Slackに投稿する メールを送る CRMを更新する
AIエージェントが外部ツールを操作する「write action」をどこまで自律的に許すか――その境界線をどこに引くかが、実運用における安全性の核心になりつつある。Anthropicが公開したMCP(Model Context Protocol)の普及で、AIがSlackへ投稿し、メールを送り、CRMを更新するといった「副作用を伴う操作」を現実に実行できる環境が整い始めた。ここで設計者が最初に向き合うべき問いは「AIが実行できるかどうか」ではなく、「どの操作に、どんな承認境界を置くか」だと整理できる。
背景にあるのは、read(読み取り)とwrite(書き込み)の性質の違いだ。読み取りは間違っても情報が漏れる程度で済む場合が多いが、書き込みは送信済みのメールや更新済みの顧客データのように、取り消しが難しい結果を外部に残す。AIの推論には一定の誤りや想定外の挙動が避けられない以上、write actionをそのまま無条件に委ねる設計はリスクが高いと見られる。
そこで重要になるのが、操作ごとに承認の粒度を変えるという考え方だ。たとえば社内チャンネルへの下書き保存は自動実行を許し、外部顧客へのメール送信や決済・契約に関わる更新は人間の明示的な承認(human-in-the-loop)を必須にする、といった切り分けである。すべてを一律に承認制にすると運用が回らず、逆にすべてを自動化すると事故時の影響が大きい。この緊張関係をどう設計に落とし込むかが問われている。
MCPやAIエージェントにwrite actionを渡すとき、最初に決めるべきことがあります。
MCP自体は、ツールをサーバーとして公開し、AIクライアントから呼び出すための共通インターフェースを定める仕様であり、承認フローの具体策は実装側に委ねられている部分が大きい。実際、OpenAIのFunction callingやLangChainなどのエージェント基盤でも、ツール実行前に確認を挟む仕組みやガードレールの整備が議論されてきた。各社が権限管理や監査ログ、ロールベースのアクセス制御をどう組み合わせるかは、今後の標準化や周辺ツールの成熟に左右される可能性がある。
write actionの承認境界は、単なる安全装置ではなく、AIエージェントにどこまでの裁量を与えるかという責任分界の表現でもある。能力を競う段階から、操作の信頼性と説明可能性を担保する段階へと、設計の重心が移りつつあると言えそうだ。
As AI agents move from answering questions to taking actions, one design question becomes central: when a model can write to external systems, what decides that the write should actually proceed? This matters because the Model Context Protocol (MCP) and similar agent frameworks make it easy to connect a language model to tools that post messages, send emails, or change records. The convenience of granting that access can obscure a more important decision about where approval should sit.
The central argument here is that the first thing to settle is not whether the AI is technically capable of performing an action. Once a tool is exposed to an agent, capability is largely a given. The harder and more consequential question is which operations require what kind of approval boundary. A boundary in this sense is the checkpoint between the agent's intent and the action's real effect on a system of record or a channel that other people can see.
Consider three common write actions: posting to Slack, sending an email, and updating a CRM. From a tool-calling perspective they look alike, since each is a single function the agent can invoke. But they differ sharply in their consequences. A post to a shared Slack channel is visible to colleagues immediately and is awkward to retract. An email may cross an organization's boundary and reach a customer, where it cannot be recalled. A CRM update changes shared business data that other workflows and people depend on. Treating these as equivalent simply because they are all "writes" overlooks how differently they fail.
A practical way to size each boundary is to weigh reversibility and blast radius. An action that is easy to undo and affects only the agent's own scratch space can often run autonomously. An action that is irreversible, externally visible, or touches data others rely on tends to warrant a human-in-the-loop step, a preview, or a staged "draft then confirm" pattern. Some teams add intermediate options: auto-approving low-risk writes, requiring confirmation for medium-risk ones, and blocking certain high-risk operations entirely unless a person initiates them. The point is to match the strength of the gate to the cost of being wrong, rather than applying one policy to every tool.
This is where MCP's design is relevant. MCP is an open protocol, introduced by Anthropic in late 2024, that standardizes how applications supply context and tools to large language models. It defines servers that expose resources, prompts, and tools, and hosts and clients that let a model discover and call them. Importantly, the protocol does not mandate a particular approval model, so the responsibility for gating write actions falls largely on the server implementation and the host application. A well-designed MCP server can make approval boundaries explicit, for example by separating read-only tools from mutating ones, returning a preview before committing, or requiring an explicit confirmation token.
Several supporting techniques make these boundaries more reliable. Idempotency keys help ensure that a retried action is not executed twice. Dry-run or "what would happen" modes let an agent surface intended effects before they occur. Scoped credentials and least-privilege access limit what a compromised or confused agent can reach in the first place. Audit logging records who or what approved each action, which is useful both for debugging and for accountability. None of these are new ideas in software engineering, but agent autonomy raises their importance because the actor proposing the change is probabilistic rather than deterministic.
The broader industry is converging on similar concerns. Function calling and tool use, popularized through OpenAI's function calling and frameworks such as LangChain, LlamaIndex, and various agent runtimes, all face the same gap between generating an action and authorizing it. Many of these stacks now include some notion of human approval, interrupts, or confirmation steps for sensitive operations. MCP fits into this landscape as a connective layer rather than a policy engine, which means design choices about approval still belong to the people building and deploying the integration.
The takeaway is a sequencing one. Before asking how to give an agent more capability, it appears more durable to first decide which write actions are safe to automate, which need a preview or confirmation, and which should remain human-initiated. Defining those approval boundaries early tends to be easier than retrofitting them after an agent has already sent the wrong email.
本ページの本文・要約は AI による自動生成です。正確性は元記事 (zenn.dev) をご確認ください。