コードグラフ系MCPを実機で叩いてみた — 26.6K行を約1.9秒でindex、つまずきも全部書く コードグラフ系MCPを実機で叩いてみた — 26.6K行を約1.9秒でindex、つまずきも全部書く
- これは何か AI生成コード向けの静的セキュリティスキャナーを個人で作っている。
- スキャン時にLLMを使わない、完全ローカルの決定的解析に振り切った構成だ。
- 同じ思想の隣人を見つけたくて GitHub Trending を眺めていたら、DeusD
AIが生成するコードの静的セキュリティスキャナーを個人開発する筆者が、コードをグラフ化してAIから参照させる「コードグラフ系MCP」を実機で検証した記録だ。約2万6600行を1.9秒前後でインデックスできた一方、導入時のつまずきも包み隠さず記すという、実利用に近い目線のレポートになっている。
MCP(Model Context Protocol)は、AnthropicがClaude向けに提唱し、その後広く採用が進んだオープンな接続規格で、AIモデルと外部ツールやデータソースをつなぐ共通の窓口として機能する。コードグラフ系のMCPは、関数やクラス、参照・呼び出し関係などをグラフ構造として解析し、AIが「どこから呼ばれているか」「依存関係はどうか」を素早くたどれるようにするのが狙いとされる。全文をそのままLLMに渡すよりも、構造化された関係を参照できる方が、大規模コードの理解や横断調査に有利と見られている。
筆者が運営に取り組むのは、AI生成コード向けの静的解析ツールだ。特徴はスキャン時にLLMを使わず、完全ローカルの決定的解析に振り切っている点で、同じ思想を持つ「隣人」を探すべくGitHub Trendingを眺めていたことが今回の検証のきっかけだという。約1.9秒でのインデックスは、ローカルで決定的に動くツールとしては実用的な速度の部類に入る可能性がある。
同じ思想の隣人を見つけたくて GitHub Trending を眺めていたら、DeusD
技術的な背景として、ソースコードのグラフ化にはtree-sitterやLSP(言語サーバープロトコル)などの解析基盤が用いられることが多い。類似領域では、関数の依存関係や影響範囲をクエリさせるアプローチが各所で試みられており、AIエージェントがコードベース全体を効率よく把握するための文脈供給手段として注目が集まっている。
この記事の価値は、成功例だけでなく導入手順や設定でのつまずきを全て書き残す点にある。MCPサーバーは仕様が比較的新しく、クライアント側の挙動や認証・接続周りで詰まりやすいため、実機ログは後続の利用者にとって参考になりやすい。LLM非依存の決定的解析とAI連携の橋渡しをどう両立させるかは、今後の開発でも検証が続くテーマと言えそうだ。
Code-graph servers built on the Model Context Protocol (MCP) are emerging as a practical way to give large language model assistants structured access to a codebase, and a recent hands-on report walks through indexing a roughly 26.6K-line project in about 1.9 seconds. The test matters because it touches a recurring question for developers: whether deterministic, local tooling can feed an AI coding assistant reliable structure without paying the latency and cost of running another model over the source. The author's background is itself instructive, since they are building a static security scanner for AI-generated code that deliberately avoids calling an LLM during scans, leaning entirely on fully local, deterministic analysis.
MCP is an open protocol that standardizes how AI clients connect to external tools, data, and context through a defined server interface. A code-graph MCP server, in this framing, parses a repository into nodes and edges representing files, symbols, functions, and their relationships, then exposes queries the assistant can call. Rather than dumping raw text into a context window, the server lets the model ask targeted questions such as where a function is defined or what references a symbol. That distinction is the appeal: graph indexing is reproducible and inspectable, which fits the author's stated philosophy of decoupling analysis from probabilistic generation.
The headline number, around 1.9 seconds for 26.6K lines, suggests the indexing step is fast enough to be unobtrusive on a mid-sized project. Readers should treat the figure as a single data point rather than a benchmark, because performance is likely to depend on language, file count, hardware, and whether the parse runs cold or warm. Even so, sub-two-second indexing implies the tool relies on conventional parsing rather than embeddings or model inference, which is consistent with deterministic design. For comparison, semantic search tools that compute vector embeddings often trade slower setup and external dependencies for fuzzy matching, whereas a syntactic graph favors precision and speed.
The report's value comes as much from the friction as the speed, since the author commits to documenting every stumbling point. Common pain areas for MCP servers include initial setup and transport configuration, where stdio versus HTTP modes and client expectations frequently cause silent failures. Indexing scope is another, as monorepos, generated files, and unsupported languages can inflate or skew the graph. Querying tends to surface mismatches between what the assistant asks and what the server actually returns, and keeping the index synchronized as files change is a known maintenance concern. Writing these down openly is useful because reproducibility issues are often the real barrier to adoption, not raw capability.
The motivation traces to a search for like-minded projects. The author was browsing GitHub Trending to find tools sharing the same local-first, no-LLM-at-scan-time stance, which points to a broader pattern. A wave of code-intelligence tooling now blends static analysis with AI front ends, and graph-oriented MCP servers sit alongside language-server protocols, tree-sitter parsers, and indexers such as ctags or SCIP. Editors like the assistant integrations in modern IDEs increasingly expect this kind of structured backend, and MCP appears positioned as a connective layer rather than a replacement for those underlying parsers.
For context, deterministic security scanning differs sharply from LLM-based review. A deterministic scanner produces the same output for the same input, which matters for audits and CI gates, whereas model-driven review can vary run to run. Pairing a code graph with a scanner is plausible because both depend on accurate symbol resolution; the graph can answer reachability and reference questions that pure pattern matching misses. The risk is over-trusting any index that quietly drops files or mishandles a language, which is why the author's emphasis on writing down failures is sensible.
Anyone evaluating a similar setup should verify language coverage, confirm how incremental updates work, and measure indexing on their own repositories before relying on the numbers. The wider takeaway is that MCP is becoming a standard seam between deterministic tooling and AI assistants, and fast, local code graphs look like a reasonable building block. As tooling matures, the projects that document their edge cases candidly are likely to be more trustworthy than those that publish only the headline timings.
本ページの本文・要約は AI による自動生成です。正確性は元記事 (zenn.dev) をご確認ください。