#long-horizon — TECH Dashboard

paper research 2w ago ·

arxiv-cs-lg

LongDS-Bench：長期的なエージェント型データ分析が失敗する理由を検証 LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 6月1日 Published Jun 1

AI要約 LongDS-Benchは、現実のデータ分析が持つ反復的・長期的な性質を再現した新しいベンチマーク。既存の評価手法では捉えられなかったAIエージェントの弱点を体系的に明らかにする研究成果。

EN arXiv:2605.30434v1 Announce Type: new Abstract: Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents' ability t

#agent #arxiv #paper +5

arxiv.org →

fallback

paper research 3w ago ·

arxiv-cs-ai

Anchor: エージェントベンチマーク生成におけるアーティファクトドリフトの軽減 Anchor: Mitigating Artifact Drift in Agent Benchmark Generation

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約 AIエージェント向けベンチマーク生成時に生じるアーティファクトドリフトを軽減する手法「Anchor」を提案した研究論文。

EN Anchor is a proposed method to reduce artifact drift when generating benchmarks for AI agents tackling long-horizon enterprise tasks.

#agent #arxiv #benchmark +6

arxiv.org →

og fallback

#long-horizon 2 total

Entries page 1/1 · 2 total

LongDS-Bench：長期的なエージェント型データ分析が失敗する理由を検証 LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

Anchor: エージェントベンチマーク生成におけるアーティファクトドリフトの軽減 Anchor: Mitigating Artifact Drift in Agent Benchmark Generation