Papers / Benchmarks ⚠ 古い情報の可能性

VISTA: ビジュアル仕様からWebアプリ生成を評価するエンドツーエンドベンチマーク VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents

arXiv cs.SE · arxiv.org · 2026/05/27 13:00 · 3w ago · 📖 1 min

元記事を読む古い情報の可能性

AI 3 行サマリ

LLMエージェントがビジュアル仕様からWebアプリを生成する能力を評価するベンチマーク「VISTA」を提案。

English summary

VISTA is a new benchmark for evaluating LLM-based agents on end-to-end web-app generation from visual specifications.

VISTAは、ビジュアル仕様（デザインモックアップなど）を入力としてWebアプリケーションを生成するLLMベースエージェントの能力を総合的に評価するベンチマークです。既存のコーディングベンチマークとは異なり、視覚的な入力からエンドツーエンドのアプリ生成までを対象としている点が特徴です。

評価の詳細な指標・タスク構成・データセット規模などは論文本文で確認が必要ですが、UIデザインからコード生成・動作確認までを一貫して測定するフレームワークとして設計されていると推察されます。

Webアプリ開発自動化の進展を測る基盤として、今後の研究コミュニティへの貢献が期待されます。詳細はarXiv:2605.26144を参照してください。

VISTA (VIsual Spec-To-App Benchmark) is a benchmark designed to evaluate the end-to-end web-application generation capabilities of LLM-based coding agents. Unlike conventional code-generation benchmarks that focus on isolated functions or snippets, VISTA targets the full pipeline from a visual specification—such as a UI mockup or design image—to a functioning web application.

The specific metrics, task categories, and dataset scale are detailed in the paper and should be verified at the source. Based on the abstract, the benchmark appears to fill a gap in the evaluation landscape by treating visual understanding and code synthesis as a unified challenge rather than separate subtasks.

VISTA is expected to serve as a useful reference point for the research community working on multimodal coding agents and automated front-end development. Readers interested in reproducing or extending the benchmark are encouraged to consult arXiv:2605.26144 for full methodology.

#arxiv #benchmark #paper #llm #web-development #code-generation #multimodal #agent

SourcearXiv cs.SET1
Source Avg ★ 2.0
Type論文
Importance ★ 通常 (top 93% in Papers / Benchmarks)
Half-life 🏛️ 長期 (アーキテクチャ)
LangEN
Collected2026/05/27 20:00

元記事を読む

arxiv.org

本ページの本文・要約は AI による自動生成です。正確性は元記事 (arxiv.org) をご確認ください。

🔬 Papers / Benchmarks の他の記事 もっと見る →

🔬 Papers / Benchmarks の他の記事もっと見る →