Claude Mythos徹底解剖：50ドルで27年モノのゼロデイ脆弱性を発見、既存の防壁はなぜ崩壊したか An analysis of how Claude reportedly uncovered a 27-year-old zero-day vulnerability for ar…

Zenn Claude tag · zenn.dev · 2026/05/09 21:00 · 16h ago · 📖 2 min

AI 3 行サマリ

AnthropicのClaudeを用いた自動脆弱性発見の事例として、約50ドルのAPIコストで27年間放置されていたゼロデイ脆弱性を発見したという報告を解説。
LLMによるコード監査が現実的な攻撃・防衛コストを劇的に変えつつある現状と、既存セキュリティ防壁の限界を考察する。

English summary

An analysis of how Claude reportedly uncovered a 27-year-old zero-day vulnerability for around $50 in API costs, illustrating how LLM-driven code auditing is reshaping the economics of offensive and defensive security and exposing the limits of legacy defenses.

Anthropicの大規模言語モデルClaudeを用いて、わずか約50ドル相当のAPI利用コストで27年間見過ごされてきたゼロデイ脆弱性が発見されたという事例が話題になっている。本記事はこの「Claude Mythos」を題材に、LLM時代のセキュリティ防壁がなぜ崩れつつあるのかを解剖する。

記事の中心的な主張は、コード監査における経済性の逆転である。従来、長年枯れたとされるOSSコードの深部に潜む脆弱性を見つけるには、熟練リバースエンジニアによる長時間の精読が必要だった。だがLLMはコードベース全体を俯瞰しつつ意味的な異常検知を行えるため、人間レベルのレビューを桁違いの低コストで反復できる。50ドルという数字は象徴的で、攻撃者にも防御者にも同じ武器が手渡されたことを示すと見られる。

技術的には、Claudeのような長文コンテキストモデルが、関数間の暗黙的な前提条件や境界条件の食い違いを検出することで、静的解析ツールが取りこぼしてきた論理バグを浮かび上がらせている可能性がある。シグネチャベースのSAST、ファジング、形式検証など既存手法は依然有用だが、それらが前提としていた「人間の注意資源の希少性」というモデルが崩れた結果、コード監査のスループット自体が再定義されつつあると言える。

AnthropicのClaudeを用いた自動脆弱性発見の事例として、約50ドルのAPIコストで27年間放置されていたゼロデイ脆弱性を発見したという報告を解説。

🧡 Claude · 本記事のポイント

周辺動向としては、GoogleのProject Naptime/Big SleepによるLLMベースの脆弱性発見、OpenAIやMetaの内製AIレッドチーミング、さらにHackerOneやbugcrowdといったバグバウンティプラットフォームでもAI支援レポートの増加が報告されている。一方で誤検知や「もっともらしい嘘」の混入はなお課題で、人間によるトリアージとの併用が現実解とされている。

結論として、この事例はAIが単なる開発支援ツールから「自律的な脆弱性発見エージェント」へと進化する過渡期を象徴している可能性がある。OSSメンテナや企業のセキュリティチームは、攻撃者が同じ手法を低コストで運用しうる前提でパッチサイクルや脅威モデルを再設計する必要があるだろう。

A recent report claims that Anthropic's Claude was used to uncover a zero-day vulnerability that had lingered in code for roughly 27 years, at an API cost of only about $50. The piece, framed as a deconstruction of the emerging "Claude mythos," uses the incident as a lens to examine why long-standing security assumptions are starting to crack in the LLM era.

The central argument is economic rather than purely technical. Traditionally, finding deep logic bugs in mature open-source code required scarce, expensive expert attention: a senior reverse engineer slowly building a mental model of an unfamiliar codebase. Long-context LLMs can now traverse entire repositories, reason about cross-function invariants, and flag semantic anomalies repeatedly and cheaply. The symbolic $50 figure underscores that the same capability is now in the hands of both defenders and attackers, which arguably resets the cost curve of vulnerability research.

Technically, models like Claude appear to excel at surfacing implicit-precondition mismatches between callers and callees, off-by-one or boundary conditions hidden behind layers of abstraction, and stale assumptions in code paths rarely exercised by tests or fuzzers. Classical defenses such as signature-based SAST, coverage-guided fuzzing, and even formal verification remain useful, but many of them implicitly assumed that human review bandwidth was the scarce resource. With that assumption weakening, audit throughput is being redefined.

The broader ecosystem reflects the same trend. Google's Project Naptime and its Big Sleep follow-up have demonstrated LLM agents discovering real memory-safety bugs in widely deployed software. OpenAI and Meta have publicly discussed internal AI-assisted red teaming, and bug bounty platforms such as HackerOne and Bugcrowd report a rising share of AI-assisted submissions. At the same time, hallucinated vulnerabilities and plausible-sounding but incorrect root-cause analyses remain a genuine concern, so human triage layered on top of AI findings is still the pragmatic norm.

A few caveats are worth flagging. The headline cost of $50 likely excludes the engineering work to scaffold the agent, design prompts, and validate findings, so the true marginal cost of "AI-found zero-days" may be higher than the number suggests. It also remains unclear how well such results generalize beyond the specific class of bug discussed; some categories of vulnerability, particularly those requiring complex runtime state or cryptographic insight, may still resist purely static LLM analysis.

Even with those qualifications, the direction of travel seems clear. The episode marks a transition point where AI shifts from being a coding assistant to behaving as a semi-autonomous vulnerability discovery agent. Open-source maintainers, vendors, and corporate security teams would be wise to assume that adversaries can run similar pipelines at low cost, and to revisit patch cadence, dependency hygiene, and threat models accordingly. The defenders who internalize this shift first are likely to fare best as the asymmetry between attack and defense is rewritten.