LIVE · 05/15
research「Test of Time: Rethinking Temporal Signal of Benchmark Contamination」 …Test of Time: Rethinking Temporal Signal of Benchmark Contamination[arxiv-cs-ai]researchClaude が見逃した脆弱性を Codex が発見した話 ── C3 × Codex 並列レビュー (v2.5.0〜v2.6.0)JAClaude が見逃した脆弱性を Codex が発見した話 ── C3 × Codex 並列レビュー (v2.5.0〜v2.6.0)[zenn-ai]research「Think Twice, Act Once: Verifier-Guided Action Selection For Embodied …Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents[arxiv-cs-ai]research「Macro-Action Based Multi-Agent Instruction Following through Value Ca…Macro-Action Based Multi-Agent Instruction Following through Value Cancellation[arxiv-cs-ai]research「Do Androids Dream of Breaking the GameDo Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack[arxiv-cs-ai]research「Revealing Interpretable Failure Modes of VLMs」 (arxiv-cs-ai) の resear…Revealing Interpretable Failure Modes of VLMs[arxiv-cs-ai]research「Learning Transferable Latent User Preferences for Human-Aligned Decis…Learning Transferable Latent User Preferences for Human-Aligned Decision Making[arxiv-cs-ai]research「On the Size Complexity and Decidability of First-Order Progression」 (…On the Size Complexity and Decidability of First-Order Progression[arxiv-cs-ai]research「DisaBench: A Participatory Evaluation Framework for Disability Harms …DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models[arxiv-cs-ai]research「CHAL: Council of Hierarchical Agentic Language」 (arxiv-cs-ai) の resea…CHAL: Council of Hierarchical Agentic Language[arxiv-cs-ai]research「BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Hu…BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics[arxiv-cs-ai]research「State-Centric Decision Process」 (arxiv-cs-ai) の research 関連アップデートState-Centric Decision Process[arxiv-cs-ai]research「PROMETHEUS: Automating Deep Causal Research Integrating Text, Data an…PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models[arxiv-cs-ai]research「Multimodal Hidden Markov Models for Persistent Emotional State Tracki…Multimodal Hidden Markov Models for Persistent Emotional State Tracking[arxiv-cs-ai]research「Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dial…Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue[arxiv-cs-ai]research「Beyond Cooperative Simulators: Generating Realistic User Personas for…Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents[arxiv-cs-ai]research「When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interac…When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction[arxiv-cs-ai]research「Sustaining AI safety: Control-theoretic external impossibility, intri…Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements[arxiv-cs-ai]research「Position: Agentic AI System Is a Foreseeable Pathway to AGI」 (arxiv-c…Position: Agentic AI System Is a Foreseeable Pathway to AGI[arxiv-cs-ai]research「Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning …Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation[arxiv-cs-ai]research「Useful Memories Become Faulty When Continuously Updated by LLMs」 (arx…Useful Memories Become Faulty When Continuously Updated by LLMs[arxiv-cs-ai]research「Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solvin…Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education[arxiv-cs-ai]research「MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reaso…MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning[arxiv-cs-ai]research「An Agentic LLM-Based Framework for Population-Scale Mental Health Scr…An Agentic LLM-Based Framework for Population-Scale Mental Health Screening[arxiv-cs-ai]
Today 96
Total 275
Major 11
Active sources 13/51
Updated just now
Daily Summary

今日の更新

Today's Updates

Today 96 ▼ 70%
Yesterday 319
7-day 597
Last 7 days
5
7
38
48
84
319
96
05/09 05/10 05/11 05/12 05/13 05/14 05/15

過去日のカウントは保持ポリシー(per-source cap / half-life)により縮む場合があります。直近 1〜2 日の活動量の目安としてご覧ください。 Past-day counts may shrink over time due to retention (per-source cap / half-life). Use this chart as a rough gauge of the last 1-2 days.

Last 7 days article counts
DateCount
2026-05-095
2026-05-107
2026-05-1138
2026-05-1248
2026-05-1384
2026-05-14319
2026-05-1596
主要な更新 Top stories 05/15 · 10 件 重要Important RELリリースRelease
  1. 01 research 「Test of Time: Rethinking Temporal Signal of Benchmark Contamination」 … Test of Time: Rethinking Temporal Signal of Benchmark Contamination 「Test of Time: Rethinking Temporal Signal of Benchmark Contamination」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。 arXiv:2509.00072v4 Announce Type: replace Abstract: Post-cutoff performance decay of LLMs has been widely interpreted as a temporal signal for benchmark contamination, where public information release [arxiv-cs-ai]
  2. 02 research Claude が見逃した脆弱性を Codex が発見した話 ── C3 × Codex 並列レビュー (v2.5.0〜v2.6.0) JA Claude が見逃した脆弱性を Codex が発見した話 ── C3 × Codex 並列レビュー (v2.5.0〜v2.6.0) 前回記事: https://zenn.dev/satoh_y_0323/articles/3ead52ca37f3e5 C3 GitHub: https://github.com/satoh-y-0323/claude-code-condu [zenn-ai]
  3. 03 research 「Think Twice, Act Once: Verifier-Guided Action Selection For Embodied … Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents 「Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。 arXiv:2605.12620v1 Announce Type: new Abstract: Building generalist embodied agents capable of solving complex real-world tasks remains a fundamental challenge in AI. Multimodal Large Language Models [arxiv-cs-ai]
  4. 04 claude 「Introducing Claude Opus 4.7」 (anthropic-news) の claude 関連アップデート Introducing Claude Opus 4.7 「Introducing Claude Opus 4.7」 (anthropic-news) の claude 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。 Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users [anthropic-news]
  5. 05 claude 「Introducing Claude Design by Anthropic Labs」 (anthropic-news) の claud… Introducing Claude Design by Anthropic Labs 「Introducing Claude Design by Anthropic Labs」 (anthropic-news) の claude 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。 Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more. Claude D [anthropic-news]
  6. 06 research 個人開発者がAIに課金する損益分岐点——月いくらまで払うべきか考えた【2026年版】 JA 個人開発者がAIに課金する損益分岐点——月いくらまで払うべきか考えた【2026年版】 「AIツールにお金を払う価値はあるのか?」 個人開発を続けながら、AIへの課金について考えてきたことをまとめます。 まず無料で揃うものを確認する 課金を考える前に、無料で使えるものを整理します。 Gemini API → 無料(500回/日 [zenn-ai]
  7. 07 local-llm 【Codex】CLIのセットアップと操作方法まとめ JA 【Codex】CLIのセットアップと操作方法まとめ はじめに 普段から壁打ち用にOpenAIのChatGPT(Plusプラン)、AIコーディングエージェント用にAnthropicのClaude Codeを使用していましたが、どうやらOpenAIのAIコーディングエージェントであるCodexが [qiita-llm]
  8. 08 local-llm Mythosが変えた潮目──2026年5月第2週、AIエージェントが現実を侵食し始めた話 JA Mythosが変えた潮目──2026年5月第2週、AIエージェントが現実を侵食し始めた話 GW明け1週間、世間が五月病だなんだと言ってる横で、AI業界の地殻変動が一気に表面化した週でした。Anthropicの「Claude Mythos Preview」が政府答弁と金融庁の作業部会に名前ごと出てくるレベルにまで来て、AIエージェ [qiita-llm]
  9. 09 tech-news 「What the jury will actually decide in the case of Elon Musk vs What the jury will actually decide in the case of Elon Musk vs. Sam Altman 「What the jury will actually decide in the case of Elon Musk vs. Sam Altman」 (techcrunch) の tech-news 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。 Here's what the biggest tech court case of the year is all about. [techcrunch]
  10. 10 copilot GitHub Copilot app を使って感じた、日々の摩擦が減る4つのこと JA GitHub Copilot app を使って感じた、日々の摩擦が減る4つのこと はじめに 2026 年 5 月 14 日に、GitHub Copilot app が technical preview として案内されました。 GitHub Copilot app is now available in technica [zenn-copilot]
🔥 Today's Top 3 importance × recency
  1. 「langchain-core==1.4.0」 (langchain-releases) の agent-fw 関連アップデート langchain-core==1.4.0 langchain-releases 3d ago
  2. 「Test of Time: Rethinking Temporal Signal of Benchmark Contamination」 … Test of Time: Rethinking Temporal Signal of Benchmark Contamination arxiv-cs-ai 1m ago
  3. Claude が見逃した脆弱性を Codex が発見した話 ── C3 × Codex 並列レビュー (v2.5.0〜v2.6.0) [ja] Claude が見逃した脆弱性を Codex が発見した話 ── C3 × Codex 並列レビュー (v2.5.0〜v2.6.0) zenn-ai 11h ago

Timeline 275 total · page 1/10

TODAY 30 entries
NEW blog claude just now · anthropic-news

「Introducing Claude Opus 4.7」 (anthropic-news) の claude 関連アップデート Introducing Claude Opus 4.7

AI要約 「Introducing Claude Opus 4.7」 (anthropic-news) の claude 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users

anthropic.com
NEW blog claude just now · anthropic-news

「Introducing Claude Design by Anthropic Labs」 (anthropic-news) の claud… Introducing Claude Design by Anthropic Labs

AI要約 「Introducing Claude Design by Anthropic Labs」 (anthropic-news) の claude 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more. Claude D

anthropic.com
Introducing Claude Design by Anthropic Labs og
NEW paper research 1m ago · arxiv-cs-ai

「Test of Time: Rethinking Temporal Signal of Benchmark Contamination」 … Test of Time: Rethinking Temporal Signal of Benchmark Contamination

AI要約 「Test of Time: Rethinking Temporal Signal of Benchmark Contamination」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2509.00072v4 Announce Type: replace Abstract: Post-cutoff performance decay of LLMs has been widely interpreted as a temporal signal for benchmark contamination, where public information release

arxiv.org
NEW paper research 1m ago · arxiv-cs-ai

「Think Twice, Act Once: Verifier-Guided Action Selection For Embodied … Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents

AI要約 「Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12620v1 Announce Type: new Abstract: Building generalist embodied agents capable of solving complex real-world tasks remains a fundamental challenge in AI. Multimodal Large Language Models

arxiv.org
NEW paper research 1m ago · arxiv-cs-ai

「Macro-Action Based Multi-Agent Instruction Following through Value Ca… Macro-Action Based Multi-Agent Instruction Following through Value Cancellation

AI要約 「Macro-Action Based Multi-Agent Instruction Following through Value Cancellation」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12655v1 Announce Type: new Abstract: Multi-agent reinforcement learning (MARL) in real-world use cases may need to adapt to external natural language instructions that interrupt ongoing beh

arxiv.org
NEW paper research 1m ago · arxiv-cs-ai

「Do Androids Dream of Breaking the Game Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

AI要約 「Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12673v1 Announce Type: new Abstract: Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hackin

arxiv.org
NEW paper research 1m ago · arxiv-cs-ai

「Revealing Interpretable Failure Modes of VLMs」 (arxiv-cs-ai) の resear… Revealing Interpretable Failure Modes of VLMs

AI要約 「Revealing Interpretable Failure Modes of VLMs」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12674v1 Announce Type: new Abstract: Vision-Language Models (VLMs) are increasingly used in safety-critical applications because of their broad reasoning capabilities and ability to general

arxiv.org
Revealing Interpretable Failure Modes of VLMs og
NEW paper research 1m ago · arxiv-cs-ai

「Learning Transferable Latent User Preferences for Human-Aligned Decis… Learning Transferable Latent User Preferences for Human-Aligned Decision Making

AI要約 「Learning Transferable Latent User Preferences for Human-Aligned Decision Making」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12682v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as reasoning modules in many applications. While they are efficient in certain tasks, LLMs often stru

arxiv.org
Learning Transferable Latent User Preferences for Human-Aligned Decision Making og
NEW paper research 1m ago · arxiv-cs-ai

「On the Size Complexity and Decidability of First-Order Progression」 (… On the Size Complexity and Decidability of First-Order Progression

AI要約 「On the Size Complexity and Decidability of First-Order Progression」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12691v1 Announce Type: new Abstract: Progression, the task of updating a knowledge base to reflect action effects, generally requires second-order logic. Identifying first-order special cas

arxiv.org
On the Size Complexity and Decidability of First-Order Progression og
NEW paper research 1m ago · arxiv-cs-ai

「DisaBench: A Participatory Evaluation Framework for Disability Harms … DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

AI要約 「DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12702v1 Announce Type: new Abstract: General-purpose safety benchmarks for large language models do not adequately evaluate disability-related harms. We introduce DisaBench: a taxonomy of t

arxiv.org
DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models og
NEW paper research 1m ago · arxiv-cs-ai

「CHAL: Council of Hierarchical Agentic Language」 (arxiv-cs-ai) の resea… CHAL: Council of Hierarchical Agentic Language

AI要約 「CHAL: Council of Hierarchical Agentic Language」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12718v1 Announce Type: new Abstract: Multi-agent debate has emerged as a promising approach for improving LLM reasoning on ground-truth tasks, yet current methodologies face certain structu

arxiv.org
CHAL: Council of Hierarchical Agentic Language og
NEW paper research 1m ago · arxiv-cs-ai

「BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Hu… BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics

AI要約 「BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12730v1 Announce Type: new Abstract: Existing AI systems for modeling human behavior operate at the level of individuals or detect events after they occur. As a result, they systematically

arxiv.org
BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics og
NEW paper research 1m ago · arxiv-cs-ai

「State-Centric Decision Process」 (arxiv-cs-ai) の research 関連アップデート State-Centric Decision Process

AI要約 「State-Centric Decision Process」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12755v1 Announce Type: new Abstract: Language environments such as web browsers, code terminals, and interactive simulations emit raw text rather than states, and provide none of the runtim

arxiv.org
State-Centric Decision Process og
NEW paper research 1m ago · arxiv-cs-ai

「PROMETHEUS: Automating Deep Causal Research Integrating Text, Data an… PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models

AI要約 「PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12835v1 Announce Type: new Abstract: Large language models can extract local causal claims from text, but those claims become more useful when organized as persistent, navigable world model

arxiv.org
PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models og
NEW paper research 1m ago · arxiv-cs-ai

「Multimodal Hidden Markov Models for Persistent Emotional State Tracki… Multimodal Hidden Markov Models for Persistent Emotional State Tracking

AI要約 「Multimodal Hidden Markov Models for Persistent Emotional State Tracking」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12838v1 Announce Type: new Abstract: Tracking an interpretable emotional arc of a conversation via the sentiment of individual utterances processed as a whole is central to both understandi

arxiv.org
Multimodal Hidden Markov Models for Persistent Emotional State Tracking og
NEW paper research 1m ago · arxiv-cs-ai

「Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dial… Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue

AI要約 「Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12856v2 Announce Type: new Abstract: The emergence of multi-agent systems introduces novel moderation challenges that extend beyond content filtering. Agents with malicious intent may contr

arxiv.org
Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue og
NEW paper research 1m ago · arxiv-cs-ai

「Beyond Cooperative Simulators: Generating Realistic User Personas for… Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents

AI要約 「Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12894v1 Announce Type: new Abstract: Large Language Model (LLM) agents are increasingly deployed in settings where they interact with a wide variety of people, including users who are uncle

arxiv.org
Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents og
NEW paper research 1m ago · arxiv-cs-ai

「When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interac… When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction

AI要約 「When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12922v1 Announce Type: new Abstract: Large language models can follow complex instructions in a single turn, yet over long multi-turn interactions they often lose the thread of instructions

arxiv.org
When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction og
NEW paper research 1m ago · arxiv-cs-ai

「Sustaining AI safety: Control-theoretic external impossibility, intri… Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements

AI要約 「Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本…

EN arXiv:2605.12963v1 Announce Type: new Abstract: As AI systems become increasingly capable, safety strategies must be evaluated not only by how much they reduce present risk, but by whether they could

arxiv.org
Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements og
NEW paper research 1m ago · arxiv-cs-ai

「Position: Agentic AI System Is a Foreseeable Pathway to AGI」 (arxiv-c… Position: Agentic AI System Is a Foreseeable Pathway to AGI

AI要約 「Position: Agentic AI System Is a Foreseeable Pathway to AGI」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12966v1 Announce Type: new Abstract: Is monolithic scaling the only path to AGI? This paper challenges the dogma that purely scaling a single model is sufficient to achieve Artificial Gener

arxiv.org
Position: Agentic AI System Is a Foreseeable Pathway to AGI og
NEW paper research 1m ago · arxiv-cs-ai

「Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning … Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

AI要約 「Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12975v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has become a standard approach for knowledge-intensive question answering, but existing systems remain brittle on m

arxiv.org
Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation og
NEW paper research 1m ago · arxiv-cs-ai

「Useful Memories Become Faulty When Continuously Updated by LLMs」 (arx… Useful Memories Become Faulty When Continuously Updated by LLMs

AI要約 「Useful Memories Become Faulty When Continuously Updated by LLMs」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12978v1 Announce Type: new Abstract: Learning from past experience benefits from two complementary forms of memory: episodic traces -- raw trajectories of what happened -- and consolidated

arxiv.org
Useful Memories Become Faulty When Continuously Updated by LLMs og
NEW paper research 1m ago · arxiv-cs-ai

「Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solvin… Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education

AI要約 「Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12988v1 Announce Type: new Abstract: Students learning algorithms often need support as they interpret traces, debug reasoning errors, and apply procedures across unfamiliar problem instanc

arxiv.org
Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education og
NEW paper research 1m ago · arxiv-cs-ai

「MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reaso… MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

AI要約 「MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.13037v1 Announce Type: new Abstract: Current interactive LLM agents rely on goal-conditioned stepwise planning, where environmental understanding is acquired reactively during execution rat

arxiv.org
MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning og
NEW paper research 1m ago · arxiv-cs-ai

「An Agentic LLM-Based Framework for Population-Scale Mental Health Scr… An Agentic LLM-Based Framework for Population-Scale Mental Health Screening

AI要約 「An Agentic LLM-Based Framework for Population-Scale Mental Health Screening」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.13046v1 Announce Type: new Abstract: Mental health disorders affect millions worldwide, and healthcare systems are increasingly overwhelmed by the volume of clinical data generated from ele

arxiv.org
An Agentic LLM-Based Framework for Population-Scale Mental Health Screening og
NEW paper research 1m ago · arxiv-cs-ai

「GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-tr… GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

AI要約 「GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.13130v1 Announce Type: new Abstract: Existing reasoning data curation pipelines score whole samples, treating every intermediate step as equally valuable. In reality, steps within a trace c

arxiv.org
GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training og
NEW paper research 1m ago · arxiv-cs-ai

「A Constraint Programming Approach for $n$-Day Lookahead Playoff Clinc… A Constraint Programming Approach for $n$-Day Lookahead Playoff Clinching

AI要約 「A Constraint Programming Approach for $n$-Day Lookahead Playoff Clinching」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.13142v1 Announce Type: new Abstract: In professional sports, a team has clinched the playoffs if they are guaranteed a postseason spot, regardless of the outcomes of any remaining games. As

arxiv.org
A Constraint Programming Approach for $n$-Day Lookahead Playoff Clinching og
NEW paper research 1m ago · arxiv-cs-ai

「Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning」… Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning

AI要約 「Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.13153v1 Announce Type: new Abstract: Temporal Knowledge Graph Reasoning (TKGR) aims at inferring missing (especially future) events from historical data. Current evaluation in TKGR uniforml

arxiv.org
Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning og
NEW paper research 1m ago · arxiv-cs-ai

「Formal Conjectures: An Open and Evolving Benchmark for Verified Disco… Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics

AI要約 「Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.13171v1 Announce Type: new Abstract: As automated reasoning systems advance rapidly, there is a growing need for research-level formal mathematical problems to accurately evaluate their cap

arxiv.org
Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics og
NEW paper research 1m ago · arxiv-cs-ai

「Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning」 (arxiv-cs… Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

AI要約 「Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.13213v1 Announce Type: new Abstract: Multi-modal multi-agent systems (MM-MAS) have gained increasing attention for their capacity to enable complex reasoning and coordination across diverse

arxiv.org
Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning og