Papers / Benchmarks ⚠ 古い情報の可能性

OmniToM: 明示的な信念モデリングによるLLMの心の理論ベンチマーク OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

arXiv cs.AI · arxiv.org · 2026/05/27 13:00 · 3w ago · 📖 1 min

元記事を読む古い情報の可能性

AI 3 行サマリ

LLMにおける「心の理論」を明示的な信念モデリングで評価する新ベンチマーク「OmniToM」を提案した研究論文。

English summary

OmniToM is a new benchmark for evaluating Theory of Mind in LLMs, moving beyond end-to-end tasks by requiring explicit belief modeling across knowledge, intentions, and emotions.

心の理論（Theory of Mind: ToM）とは、他者の知識・意図・感情を推論する能力であり、LLMの評価において重要な指標となっている。従来の評価手法はエンドツーエンドのタスクに依存していたが、OmniToMは明示的な信念モデリングを通じてより細粒度な評価を可能にする設計となっている。

本論文はarXivにて2025年5月27日に公開された。ベンチマークの具体的な構成や実験結果の詳細については原文を参照のこと。LLMの社会的推論能力の体系的評価に関心のある研究者に有益な内容と推察される。

Theory of Mind (ToM)—the capacity to infer others' knowledge, intentions, and emotions—is a key dimension for assessing higher-order reasoning in large language models. Existing evaluations typically rely on end-to-end tasks that do not isolate individual belief-modeling steps, making it difficult to diagnose where models succeed or fail.

OmniToM addresses this gap by introducing a benchmark centered on explicit belief modeling, enabling more granular analysis of ToM capabilities in LLMs. The paper was posted to arXiv on May 27, 2025 (arXiv:2605.26322). Specific dataset construction details, task categories, and experimental results should be verified at the source, but the work appears relevant to researchers studying social cognition and pragmatic reasoning in AI systems.

#arxiv #paper #theory-of-mind #benchmark #llm-evaluation #belief-modeling #social-reasoning

SourcearXiv cs.AIT2
Source Avg ★ 2.0
Type論文
Importance ★ 通常 (top 93% in Papers / Benchmarks)
Half-life 🏛️ 長期 (アーキテクチャ)
LangEN
Collected2026/05/28 09:00

元記事を読む

arxiv.org

本ページの本文・要約は AI による自動生成です。正確性は元記事 (arxiv.org) をご確認ください。

🔬 Papers / Benchmarks の他の記事 もっと見る →

🔬 Papers / Benchmarks の他の記事もっと見る →