LIVE · 05/15
research「Test of Time: Rethinking Temporal Signal of Benchmark Contamination」 …Test of Time: Rethinking Temporal Signal of Benchmark Contamination[arxiv-cs-ai]researchClaude が見逃した脆弱性を Codex が発見した話 ── C3 × Codex 並列レビュー (v2.5.0〜v2.6.0)JAClaude が見逃した脆弱性を Codex が発見した話 ── C3 × Codex 並列レビュー (v2.5.0〜v2.6.0)[zenn-ai]research「Think Twice, Act Once: Verifier-Guided Action Selection For Embodied …Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents[arxiv-cs-ai]research「Macro-Action Based Multi-Agent Instruction Following through Value Ca…Macro-Action Based Multi-Agent Instruction Following through Value Cancellation[arxiv-cs-ai]research「Do Androids Dream of Breaking the GameDo Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack[arxiv-cs-ai]research「Revealing Interpretable Failure Modes of VLMs」 (arxiv-cs-ai) の resear…Revealing Interpretable Failure Modes of VLMs[arxiv-cs-ai]research「Learning Transferable Latent User Preferences for Human-Aligned Decis…Learning Transferable Latent User Preferences for Human-Aligned Decision Making[arxiv-cs-ai]research「On the Size Complexity and Decidability of First-Order Progression」 (…On the Size Complexity and Decidability of First-Order Progression[arxiv-cs-ai]research「DisaBench: A Participatory Evaluation Framework for Disability Harms …DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models[arxiv-cs-ai]research「CHAL: Council of Hierarchical Agentic Language」 (arxiv-cs-ai) の resea…CHAL: Council of Hierarchical Agentic Language[arxiv-cs-ai]research「BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Hu…BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics[arxiv-cs-ai]research「State-Centric Decision Process」 (arxiv-cs-ai) の research 関連アップデートState-Centric Decision Process[arxiv-cs-ai]research「PROMETHEUS: Automating Deep Causal Research Integrating Text, Data an…PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models[arxiv-cs-ai]research「Multimodal Hidden Markov Models for Persistent Emotional State Tracki…Multimodal Hidden Markov Models for Persistent Emotional State Tracking[arxiv-cs-ai]research「Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dial…Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue[arxiv-cs-ai]research「Beyond Cooperative Simulators: Generating Realistic User Personas for…Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents[arxiv-cs-ai]research「When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interac…When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction[arxiv-cs-ai]research「Sustaining AI safety: Control-theoretic external impossibility, intri…Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements[arxiv-cs-ai]research「Position: Agentic AI System Is a Foreseeable Pathway to AGI」 (arxiv-c…Position: Agentic AI System Is a Foreseeable Pathway to AGI[arxiv-cs-ai]research「Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning …Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation[arxiv-cs-ai]research「Useful Memories Become Faulty When Continuously Updated by LLMs」 (arx…Useful Memories Become Faulty When Continuously Updated by LLMs[arxiv-cs-ai]research「Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solvin…Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education[arxiv-cs-ai]research「MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reaso…MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning[arxiv-cs-ai]research「An Agentic LLM-Based Framework for Population-Scale Mental Health Scr…An Agentic LLM-Based Framework for Population-Scale Mental Health Screening[arxiv-cs-ai]
Today 247
Total 266
Major 11
Active sources 13/51
Updated just now
Daily Summary

今日の更新

Today's Updates

Today 247 ▼ 23%
Yesterday 320
7-day 749
Last 7 days
5
7
38
48
84
320
247
05/09 05/10 05/11 05/12 05/13 05/14 05/15

過去日のカウントは保持ポリシー(per-source cap / half-life)により縮む場合があります。直近 1〜2 日の活動量の目安としてご覧ください。 Past-day counts may shrink over time due to retention (per-source cap / half-life). Use this chart as a rough gauge of the last 1-2 days.

Last 7 days article counts
DateCount
2026-05-095
2026-05-107
2026-05-1138
2026-05-1248
2026-05-1384
2026-05-14320
2026-05-15247
主要な更新 Top stories 05/15 · 10 件 重要Important RELリリースRelease
  1. 01 research 「Test of Time: Rethinking Temporal Signal of Benchmark Contamination」 … Test of Time: Rethinking Temporal Signal of Benchmark Contamination 「Test of Time: Rethinking Temporal Signal of Benchmark Contamination」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。 arXiv:2509.00072v4 Announce Type: replace Abstract: Post-cutoff performance decay of LLMs has been widely interpreted as a temporal signal for benchmark contamination, where public information release [arxiv-cs-ai]
  2. 02 research Claude が見逃した脆弱性を Codex が発見した話 ── C3 × Codex 並列レビュー (v2.5.0〜v2.6.0) JA Claude が見逃した脆弱性を Codex が発見した話 ── C3 × Codex 並列レビュー (v2.5.0〜v2.6.0) 前回記事: https://zenn.dev/satoh_y_0323/articles/3ead52ca37f3e5 C3 GitHub: https://github.com/satoh-y-0323/claude-code-condu [zenn-ai]
  3. 03 research 「Think Twice, Act Once: Verifier-Guided Action Selection For Embodied … Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents 「Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。 arXiv:2605.12620v1 Announce Type: new Abstract: Building generalist embodied agents capable of solving complex real-world tasks remains a fundamental challenge in AI. Multimodal Large Language Models [arxiv-cs-ai]
  4. 04 claude 「Introducing Claude Opus 4.7」 (anthropic-news) の claude 関連アップデート Introducing Claude Opus 4.7 「Introducing Claude Opus 4.7」 (anthropic-news) の claude 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。 Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users [anthropic-news]
  5. 05 claude 「Introducing Claude Design by Anthropic Labs」 (anthropic-news) の claud… Introducing Claude Design by Anthropic Labs 「Introducing Claude Design by Anthropic Labs」 (anthropic-news) の claude 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。 Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more. Claude D [anthropic-news]
  6. 06 tech-news 「From policy to practice: supporting the future of AI in education」 (g… From policy to practice: supporting the future of AI in education 「From policy to practice: supporting the future of AI in education」 (google-keyword) の tech-news 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。 Explore insights from our global AI Policy Labs on building a safe, equitable and teacher-led future for every learner. [google-keyword]
  7. 07 local-llm セッションを跨いで文脈を維持する:AI Agentのための「Memory Layer」設計指針 JA セッションを跨いで文脈を維持する:AI Agentのための「Memory Layer」設計指針 はじめに 本記事では、LLMアプリケーションやAgentシステムを構築する開発者に向けて、「なぜ現在のAIは記憶を保持できないのか」「なぜRAGや要約では不十分なのか」をシステムアーキテクチャの視点から解き明かします。 その場しのぎのパッチ [qiita-llm]
  8. 08 local-llm 業務自動化にAIを使って、AIに対する考え方が変わった話 JA 業務自動化にAIを使って、AIに対する考え方が変わった話 はじめに 一時期、私はGenerative AIによって、システムやワークフローの作り方が大きく変わるのではないかと考えていました。 もちろん、「AIが開発者を置き換える」というような極端な話ではありません。もう少し現実的な意味です。これま [qiita-llm]
  9. 09 research Claude Codeのスケジュールタスクで運用を24時間まわす設計図 JA Claude Codeのスケジュールタスクで運用を24時間まわす設計図 こんにちは、エリスです。 最近、Claude Codeで一番ROIが高かった仕掛けは 「スケジュールタスク(cron的にClaude自身を定期起動する仕組み)」 でした。SNS投稿、Zenn記事執筆、KPI集計、リポジトリ監査 — 全部Cl [zenn-ai]
  10. 10 tech-news 「What the jury will actually decide in the case of Elon Musk vs What the jury will actually decide in the case of Elon Musk vs. Sam Altman 「What the jury will actually decide in the case of Elon Musk vs. Sam Altman」 (techcrunch) の tech-news 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。 Here's what the biggest tech court case of the year is all about. [techcrunch]
🔥 Today's Top 3 importance × recency
  1. 「langchain-core==1.4.0」 (langchain-releases) の agent-fw 関連アップデート langchain-core==1.4.0 langchain-releases 3d ago
  2. 「Test of Time: Rethinking Temporal Signal of Benchmark Contamination」 … Test of Time: Rethinking Temporal Signal of Benchmark Contamination arxiv-cs-ai 4h ago
  3. Claude が見逃した脆弱性を Codex が発見した話 ── C3 × Codex 並列レビュー (v2.5.0〜v2.6.0) [ja] Claude が見逃した脆弱性を Codex が発見した話 ── C3 × Codex 並列レビュー (v2.5.0〜v2.6.0) zenn-ai 15h ago

Timeline 266 total · page 1/9

TODAY 30 entries
NEW blog claude just now · anthropic-news

「Introducing Claude Opus 4.7」 (anthropic-news) の claude 関連アップデート Introducing Claude Opus 4.7

AI要約 「Introducing Claude Opus 4.7」 (anthropic-news) の claude 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users

anthropic.com
NEW blog claude just now · anthropic-news

「Introducing Claude Design by Anthropic Labs」 (anthropic-news) の claud… Introducing Claude Design by Anthropic Labs

AI要約 「Introducing Claude Design by Anthropic Labs」 (anthropic-news) の claude 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more. Claude D

anthropic.com
Introducing Claude Design by Anthropic Labs og
NEW blog tech-news 1m ago · google-keyword

「From policy to practice: supporting the future of AI in education」 (g… From policy to practice: supporting the future of AI in education

AI要約 「From policy to practice: supporting the future of AI in education」 (google-keyword) の tech-news 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN Explore insights from our global AI Policy Labs on building a safe, equitable and teacher-led future for every learner.

blog.google
From policy to practice: supporting the future of AI in education media
NEW blog local-llm 26m ago · qiita-llm

セッションを跨いで文脈を維持する:AI Agentのための「Memory Layer」設計指針 JA セッションを跨いで文脈を維持する:AI Agentのための「Memory Layer」設計指針

AI要約 はじめに 本記事では、LLMアプリケーションやAgentシステムを構築する開発者に向けて、「なぜ現在のAIは記憶を保持できないのか」「なぜRAGや要約では不十分なのか」をシステムアーキテクチャの視点から解き明かします。 その場しのぎのパッチ

qiita.com
セッションを跨いで文脈を維持する:AI Agentのための「Memory Layer」設計指針 og
NEW blog local-llm 2h ago · qiita-llm

業務自動化にAIを使って、AIに対する考え方が変わった話 JA 業務自動化にAIを使って、AIに対する考え方が変わった話

AI要約 はじめに 一時期、私はGenerative AIによって、システムやワークフローの作り方が大きく変わるのではないかと考えていました。 もちろん、「AIが開発者を置き換える」というような極端な話ではありません。もう少し現実的な意味です。これま

qiita.com
業務自動化にAIを使って、AIに対する考え方が変わった話 og
NEW blog local-llm 3h ago · qiita-llm

LLM API を OpenAI SDK で直叩きする:個人開発者の最小構成 JA LLM API を OpenAI SDK で直叩きする:個人開発者の最小構成

AI要約 はじめに ChatGPT PlusやClaude Proを解約し、APIの直叩きに切り替えた個人開発者の体験を共有します。月額固定費を払うよりも、従量課金で月数百円から千円程度に抑えられるケースが多いです。 この記事では、OpenAI SD

qiita.com
LLM API を OpenAI SDK で直叩きする:個人開発者の最小構成 og
NEW paper research 4h ago · arxiv-cs-ai

「Test of Time: Rethinking Temporal Signal of Benchmark Contamination」 … Test of Time: Rethinking Temporal Signal of Benchmark Contamination

AI要約 「Test of Time: Rethinking Temporal Signal of Benchmark Contamination」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2509.00072v4 Announce Type: replace Abstract: Post-cutoff performance decay of LLMs has been widely interpreted as a temporal signal for benchmark contamination, where public information release

arxiv.org
NEW paper research 4h ago · arxiv-cs-ai

「Think Twice, Act Once: Verifier-Guided Action Selection For Embodied … Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents

AI要約 「Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12620v1 Announce Type: new Abstract: Building generalist embodied agents capable of solving complex real-world tasks remains a fundamental challenge in AI. Multimodal Large Language Models

arxiv.org
NEW paper research 4h ago · arxiv-cs-ai

「Macro-Action Based Multi-Agent Instruction Following through Value Ca… Macro-Action Based Multi-Agent Instruction Following through Value Cancellation

AI要約 「Macro-Action Based Multi-Agent Instruction Following through Value Cancellation」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12655v1 Announce Type: new Abstract: Multi-agent reinforcement learning (MARL) in real-world use cases may need to adapt to external natural language instructions that interrupt ongoing beh

arxiv.org
NEW paper research 4h ago · arxiv-cs-ai

「Do Androids Dream of Breaking the Game Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

AI要約 「Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12673v1 Announce Type: new Abstract: Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hackin

arxiv.org
NEW paper research 4h ago · arxiv-cs-ai

「Revealing Interpretable Failure Modes of VLMs」 (arxiv-cs-ai) の resear… Revealing Interpretable Failure Modes of VLMs

AI要約 「Revealing Interpretable Failure Modes of VLMs」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12674v1 Announce Type: new Abstract: Vision-Language Models (VLMs) are increasingly used in safety-critical applications because of their broad reasoning capabilities and ability to general

arxiv.org
Revealing Interpretable Failure Modes of VLMs og
NEW paper research 4h ago · arxiv-cs-ai

「Learning Transferable Latent User Preferences for Human-Aligned Decis… Learning Transferable Latent User Preferences for Human-Aligned Decision Making

AI要約 「Learning Transferable Latent User Preferences for Human-Aligned Decision Making」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12682v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as reasoning modules in many applications. While they are efficient in certain tasks, LLMs often stru

arxiv.org
Learning Transferable Latent User Preferences for Human-Aligned Decision Making og
NEW paper research 4h ago · arxiv-cs-ai

「On the Size Complexity and Decidability of First-Order Progression」 (… On the Size Complexity and Decidability of First-Order Progression

AI要約 「On the Size Complexity and Decidability of First-Order Progression」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12691v1 Announce Type: new Abstract: Progression, the task of updating a knowledge base to reflect action effects, generally requires second-order logic. Identifying first-order special cas

arxiv.org
On the Size Complexity and Decidability of First-Order Progression og
NEW paper research 4h ago · arxiv-cs-ai

「DisaBench: A Participatory Evaluation Framework for Disability Harms … DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

AI要約 「DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12702v1 Announce Type: new Abstract: General-purpose safety benchmarks for large language models do not adequately evaluate disability-related harms. We introduce DisaBench: a taxonomy of t

arxiv.org
DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models og
NEW paper research 4h ago · arxiv-cs-ai

「CHAL: Council of Hierarchical Agentic Language」 (arxiv-cs-ai) の resea… CHAL: Council of Hierarchical Agentic Language

AI要約 「CHAL: Council of Hierarchical Agentic Language」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12718v1 Announce Type: new Abstract: Multi-agent debate has emerged as a promising approach for improving LLM reasoning on ground-truth tasks, yet current methodologies face certain structu

arxiv.org
CHAL: Council of Hierarchical Agentic Language og
NEW paper research 4h ago · arxiv-cs-ai

「BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Hu… BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics

AI要約 「BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12730v1 Announce Type: new Abstract: Existing AI systems for modeling human behavior operate at the level of individuals or detect events after they occur. As a result, they systematically

arxiv.org
BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics og
NEW paper research 4h ago · arxiv-cs-ai

「State-Centric Decision Process」 (arxiv-cs-ai) の research 関連アップデート State-Centric Decision Process

AI要約 「State-Centric Decision Process」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12755v1 Announce Type: new Abstract: Language environments such as web browsers, code terminals, and interactive simulations emit raw text rather than states, and provide none of the runtim

arxiv.org
State-Centric Decision Process og
NEW paper research 4h ago · arxiv-cs-ai

「PROMETHEUS: Automating Deep Causal Research Integrating Text, Data an… PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models

AI要約 「PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12835v1 Announce Type: new Abstract: Large language models can extract local causal claims from text, but those claims become more useful when organized as persistent, navigable world model

arxiv.org
PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models og
NEW paper research 4h ago · arxiv-cs-ai

「Multimodal Hidden Markov Models for Persistent Emotional State Tracki… Multimodal Hidden Markov Models for Persistent Emotional State Tracking

AI要約 「Multimodal Hidden Markov Models for Persistent Emotional State Tracking」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12838v1 Announce Type: new Abstract: Tracking an interpretable emotional arc of a conversation via the sentiment of individual utterances processed as a whole is central to both understandi

arxiv.org
Multimodal Hidden Markov Models for Persistent Emotional State Tracking og
NEW paper research 4h ago · arxiv-cs-ai

「Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dial… Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue

AI要約 「Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12856v2 Announce Type: new Abstract: The emergence of multi-agent systems introduces novel moderation challenges that extend beyond content filtering. Agents with malicious intent may contr

arxiv.org
Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue og
NEW paper research 4h ago · arxiv-cs-ai

「Beyond Cooperative Simulators: Generating Realistic User Personas for… Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents

AI要約 「Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12894v1 Announce Type: new Abstract: Large Language Model (LLM) agents are increasingly deployed in settings where they interact with a wide variety of people, including users who are uncle

arxiv.org
Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents og
NEW paper research 4h ago · arxiv-cs-ai

「When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interac… When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction

AI要約 「When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12922v1 Announce Type: new Abstract: Large language models can follow complex instructions in a single turn, yet over long multi-turn interactions they often lose the thread of instructions

arxiv.org
When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction og
NEW paper research 4h ago · arxiv-cs-ai

「Sustaining AI safety: Control-theoretic external impossibility, intri… Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements

AI要約 「Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本…

EN arXiv:2605.12963v1 Announce Type: new Abstract: As AI systems become increasingly capable, safety strategies must be evaluated not only by how much they reduce present risk, but by whether they could

arxiv.org
Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements og
NEW paper research 4h ago · arxiv-cs-ai

「Position: Agentic AI System Is a Foreseeable Pathway to AGI」 (arxiv-c… Position: Agentic AI System Is a Foreseeable Pathway to AGI

AI要約 「Position: Agentic AI System Is a Foreseeable Pathway to AGI」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12966v1 Announce Type: new Abstract: Is monolithic scaling the only path to AGI? This paper challenges the dogma that purely scaling a single model is sufficient to achieve Artificial Gener

arxiv.org
Position: Agentic AI System Is a Foreseeable Pathway to AGI og
NEW paper research 4h ago · arxiv-cs-ai

「Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning … Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

AI要約 「Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12975v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has become a standard approach for knowledge-intensive question answering, but existing systems remain brittle on m

arxiv.org
Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation og
NEW paper research 4h ago · arxiv-cs-ai

「Useful Memories Become Faulty When Continuously Updated by LLMs」 (arx… Useful Memories Become Faulty When Continuously Updated by LLMs

AI要約 「Useful Memories Become Faulty When Continuously Updated by LLMs」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12978v1 Announce Type: new Abstract: Learning from past experience benefits from two complementary forms of memory: episodic traces -- raw trajectories of what happened -- and consolidated

arxiv.org
Useful Memories Become Faulty When Continuously Updated by LLMs og
NEW paper research 4h ago · arxiv-cs-ai

「Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solvin… Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education

AI要約 「Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.12988v1 Announce Type: new Abstract: Students learning algorithms often need support as they interpret traces, debug reasoning errors, and apply procedures across unfamiliar problem instanc

arxiv.org
Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education og
NEW paper research 4h ago · arxiv-cs-ai

「MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reaso… MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

AI要約 「MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.13037v1 Announce Type: new Abstract: Current interactive LLM agents rely on goal-conditioned stepwise planning, where environmental understanding is acquired reactively during execution rat

arxiv.org
MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning og
NEW paper research 4h ago · arxiv-cs-ai

「An Agentic LLM-Based Framework for Population-Scale Mental Health Scr… An Agentic LLM-Based Framework for Population-Scale Mental Health Screening

AI要約 「An Agentic LLM-Based Framework for Population-Scale Mental Health Screening」 (arxiv-cs-ai) の research 関連アップデート。AI 要約が未生成のため、後続の Worker run で本文が補完されます。

EN arXiv:2605.13046v1 Announce Type: new Abstract: Mental health disorders affect millions worldwide, and healthcare systems are increasingly overwhelmed by the volume of clinical data generated from ele

arxiv.org
An Agentic LLM-Based Framework for Population-Scale Mental Health Screening og