自分のコードでHTMLインジェクションを踏んでから、AIの出力を信頼しなくなった自分のコードでHTMLインジェクションを踏んでから、AIの出力を信頼しなくなった

Zenn MCP tag · zenn.dev · 2026/06/29 08:03 · 4h ago · 📖 2 min

AI 3 行サマリ

AI が返してきた文字列を、信頼できる入力として扱っていないでしょうか。
自分はしばらく、無自覚にそうしていました。
最初の実装作っていたのは、ユーザーの質問に対して LLM が生成した回答を、そのまま画面に表示する機能でした。
モデルが返す

AI が生成した文字列を、信頼できる入力として扱っていないか。これは、LLM を使った機能を実装する開発者なら一度は立ち止まるべき問いだ。あるブログ記事は、筆者自身が無自覚にこの前提を置いてしまい、HTML インジェクションの脆弱性を作り込んだ体験を共有している。

きっかけは、ユーザーの質問に対して LLM が生成した回答を、そのまま画面に表示する機能だったという。モデルが返す文字列は人間が書いた文章のように見えるため、危険な値とは感じにくい。しかし AI の出力は本質的にはユーザー入力と同じ「外部から来た信頼できないデータ」であり、生成された文字列にタグやスクリプトが含まれていれば、それがエスケープされずにレンダリングされてクロスサイトスクリプティング(XSS)につながる可能性がある。

問題の核心は、出力元が AI だからといって安全になるわけではない点だ。プロンプトには利用者の入力が混ざるうえ、外部文書や検索結果を取り込む構成では、悪意ある内容がモデルの回答に紛れ込むこともある。こうした「間接プロンプトインジェクション」は近年広く指摘されており、AI が返した値を無検証で使うほど影響範囲が広がる。

対策の方向性は従来の Web セキュリティと変わらない。表示時のエスケープ処理、HTML を許容する場合の DOMPurify などによるサニタイズ、Content Security Policy(CSP)による実行制限といった多層防御が基本になる。AI の出力をテキストとして扱い、必要な箇所だけ限定的にマークアップを許可する設計が安全側に倒れやすいと考えられる。

最初の実装作っていたのは、ユーザーの質問に対して LLM が生成した回答を、そのまま画面に表示する機能でした。

🔗 MCP / Tooling · 本記事のポイント

この論点は、MCP(Model Context Protocol)のように LLM が外部ツールやデータソースと連携する仕組みが普及するほど重要性を増す。ツールの戻り値やリソースの内容がモデルの出力に反映され、最終的にアプリへ渡るため、信頼境界が見えにくくなるからだ。OpenAI や Anthropic も外部入力の扱いに注意を促しており、出力検証は今後さらに前提知識として求められそうだ。

教訓はシンプルだ。AI の出力もユーザー入力と同様に検証する。生成された文字列をそのまま信じない姿勢が、結果として堅牢な実装につながると言えるだろう。

Developers have spent years learning to distrust user input, sanitizing form fields, escaping query parameters, and validating uploads. Far fewer have applied the same caution to the text returned by large language models. A recent blog post in the MCP category describes how that blind spot led to a self-inflicted HTML injection bug, and it serves as a useful reminder that AI output is just another untrusted data source.

The author's setup was common enough. A feature took a user's question, passed it to an LLM, and rendered the model's answer directly into the page. Because the response came from a trusted backend service rather than a stranger on the internet, it was treated as safe and inserted into the DOM without escaping. The mistake is subtle: the model itself is not malicious, but its output is shaped by its input, and that input can include whatever a user typed. If someone asks the model to produce HTML, a script tag, or an event handler attribute, there is a reasonable chance it will, and that markup then executes in the browser of whoever views the result.

This is a variation of stored or reflected cross-site scripting, with the language model acting as an intermediary that launders attacker-controlled text into seemingly trusted content. The fix is the same fix that has always applied to dynamic content. Strings must be escaped or sanitized before they reach an HTML context, regardless of where they originated. Rendering Markdown adds another layer, since Markdown can contain raw HTML, so libraries should be configured to strip or neutralize embedded tags, and a Content Security Policy can limit the damage if something slips through. The broader lesson is to stop categorizing inputs by how much you like their origin and start categorizing them by where they end up.

The point grows sharper in the context of MCP, the Model Context Protocol, which is the post's category. MCP standardizes how AI applications connect to external tools, data sources, and services. It expands the number of channels through which untrusted text can enter a system: a model might read a document, query a database, or call a remote tool, and any of those can return content that the application later renders or, worse, treats as an instruction. Prompt injection, where hidden text in a fetched web page or file manipulates the model's behavior, is the conversational cousin of the HTML injection described here. Both stem from mixing data and instructions without a clear boundary.

Several industry efforts are converging on this problem. The OWASP Top 10 for Large Language Model Applications lists prompt injection and insecure output handling among its leading risks, and it explicitly recommends treating model responses as untrusted and applying output encoding before use. Front-end frameworks such as React, Vue, and Angular escape interpolated text by default, which is why the danger often hides in the deliberate escape hatches, like setting inner HTML directly, that developers reach for when they want rich formatting. Sanitization libraries such as DOMPurify exist precisely to clean HTML before it is inserted, and they appear to be the most practical defense for any feature that displays model-generated markup.

There is a cultural element worth naming. AI output looks fluent and authoritative, which makes it easy to assume it is well-behaved. That perception is misleading from a security standpoint. The model has no concept of trust boundaries, and it will happily reproduce or fabricate strings that break an unescaped template. Treating it like a trusted internal API encourages exactly the shortcut the author took.

The practical takeaways are modest but durable. Escape on output, not just on input, and do it at the boundary where data meets a rendering context. Restrict any rich-text rendering to an allowlist of safe tags. Add a Content Security Policy as defense in depth. Where MCP or tool calls are involved, isolate what the model retrieves from what the application executes. None of this is novel, which is rather the point. The arrival of AI features tempts teams to invent new mental models, when the safer instinct is to apply the old ones. As the author concludes, AI is not a trusted source; it is a sophisticated input, and it deserves the same skepticism as anything a user types.