RepoMirage: 摂動を用いたコードエージェントのリポジトリコンテキスト推論の検証 RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations

arXiv cs.SE · arxiv.org · 2026/05/27 13:00 · 3w ago · 📖 1 min

AI 3 行サマリ

コードエージェントがリポジトリレベルのベンチマークで本当にコンテキストを理解しているか、摂動を加えて検証する研究。

English summary

RepoMirage probes whether code agents genuinely reason about repository context or exploit shortcuts, using controlled perturbations on repository-level benchmarks.

コードエージェントはリポジトリレベルのソフトウェアエンジニアリングベンチマークで高い性能を示しているが、その成功が真のコンテキスト理解に基づくものか、あるいはデータの表面的なパターンに依存しているかは不明確だった。本論文「RepoMirage」はこの問いに対し、リポジトリコンテキストに摂動（意図的な変更）を加えることで、エージェントの推論能力を体系的に検証する手法を提案する。

詳細な実験設定や具体的な結果については論文原文（arXiv:2605.26177）を参照されたい。コードエージェントの評価手法に関心のある研究者や実務者にとって注目すべき成果と見られる。

Code agents have demonstrated impressive performance on repository-level software engineering benchmarks, yet it remains an open question whether those results reflect genuine contextual reasoning or exploitation of superficial patterns. RepoMirage addresses this gap by introducing controlled perturbations to repository context and measuring how agents respond, effectively stress-testing their understanding.

The approach is designed to distinguish true reasoning from shortcut-based success—a distinction with significant implications for how the community interprets benchmark scores. By systematically altering repository signals, the authors can probe which aspects of context agents actually rely on.

Full experimental details, datasets, and quantitative findings are available in the paper (arXiv:2605.26177). Readers should consult the source directly to verify specific claims, as only the abstract was available at time of writing. This work appears relevant to researchers evaluating or building code agents for real-world software engineering tasks.

#arxiv #paper #code-agents #benchmark #robustness #repository-level #evaluation

SourcearXiv cs.SET1
Source Avg ★ 2.0
Type論文
Importance ★ 通常 (top 93% in Papers / Benchmarks)
Half-life 🏛️ 長期 (アーキテクチャ)
LangEN
Collected2026/05/27 20:00

元記事を読む

arxiv.org

本ページの本文・要約は AI による自動生成です。正確性は元記事 (arxiv.org) をご確認ください。

🔬 Papers / Benchmarks の他の記事 もっと見る →

🔬 Papers / Benchmarks の他の記事もっと見る →