SetupX: LLMエージェントはコードリポジトリのセットアップ失敗から学習できるか？ SetupX: Can LLM Agents Learn from Past Failures in Functionality-Correct Code Repository Setup?

arXiv cs.SE · arxiv.org · 2026/05/27 13:00 · 3w ago · 📖 1 min

AI 3 行サマリ

リポジトリの実行環境構成を正しく行うベンチマーク SetupX を提案し、LLMエージェントが過去の失敗から学習できるかを検証した研究。

English summary

SetupX is a benchmark studying whether LLM agents can learn from past failures to correctly configure execution environments for code repositories.

本論文は arXiv:2605.26186 として公開された研究で、「機能的に正しいリポジトリセットアップ」を対象とする新しいベンチマーク SetupX を提案している。依存関係やビルドスクリプトなど実行環境の構成を正確に行うタスクを評価対象とし、LLMエージェントがそこで生じた失敗履歴を活用して改善できるかを問う。

過去の失敗から学習するというアプローチはエージェントの自律的な改善において重要な研究課題であり、ソフトウェアエンジニアリング分野での実用性を見据えた設計となっている。詳細な手法・データセット・実験結果については原論文を参照されたい。

This paper, released as arXiv:2605.26186, introduces SetupX, a benchmark targeting what the authors call "functionality-correct repository setup" — the task of configuring execution environments, including dependencies and build scripts, so that a repository's functionality can be successfully exercised.

The central research question is whether LLM-based agents can leverage records of past failures to improve their setup success rates over time. This reflects a broader interest in self-improving agents within the software engineering domain. The benchmark appears designed to assess practical, real-world environment configuration challenges rather than purely syntactic code generation.

Full details on methodology, dataset construction, baseline results, and agent learning strategies are available in the paper itself. Readers interested in agentic software engineering or automated DevOps tooling should consult the source for precise findings.

#arxiv #paper #llm-agents #benchmark #repository-setup #software-engineering #autonomous-agents

SourcearXiv cs.SET1
Source Avg ★ 2.0
Type論文
Importance ★ 通常 (top 93% in Papers / Benchmarks)
Half-life 🏛️ 長期 (アーキテクチャ)
LangEN
Collected2026/05/27 20:00

元記事を読む

arxiv.org

本ページの本文・要約は AI による自動生成です。正確性は元記事 (arxiv.org) をご確認ください。

🔬 Papers / Benchmarks の他の記事 もっと見る →

🔬 Papers / Benchmarks の他の記事もっと見る →