LIVE · 05/11
researchData Contamination in Neural Hieroglyphic Translation: A Reproducibility StudyData Contamination in Neural Hieroglyphic Translation: A Reproducibility Study[arxiv-cs-cl]vscodeRELZed nightly: 自動ウォッチ機能を改善 (#56126)nightly: auto_update: Fix Windows installer task arguments syntax (#50464)[zed-releases]researchDomain-level metacognitive monitoring in frontier LLMs: A 33-model atlasDomain-level metacognitive monitoring in frontier LLMs: A 33-model atlas[arxiv-cs-cl]researchVITA-QinYu: Expressive Spoken Language Model for Role-Playing and SingingVITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing[arxiv-cs-cl]researchIntentGrasp: A Comprehensive Benchmark for Intent UnderstandingIntentGrasp: A Comprehensive Benchmark for Intent Understanding[arxiv-cs-cl]researchTajPersLexon: A Tajik-Persian Lexical Resource and Hybrid Model for Cross-Script Low-Resource NLPTajPersLexon: A Tajik-Persian Lexical Resource and Hybrid Model for Cross-Script Low-Resource NLP[arxiv-cs-cl]researchMIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart HomesMIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes[arxiv-cs-cl]researchReflections and New Directions for Human-Centered Large Language ModelsReflections and New Directions for Human-Centered Large Language Models[arxiv-cs-cl]researchMELD: Multi-Task Equilibrated Learning Detector for AI-Generated TextMELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text[arxiv-cs-cl]researchCan LLMs Take Retrieved Information with a Grain of Salt?Can LLMs Take Retrieved Information with a Grain of Salt?[arxiv-cs-cl]researchMultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social MediaMultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media[arxiv-cs-cl]researchGroup of Skills: Group-Structured Skill Retrieval for Agent Skill LibrariesGroup of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries[arxiv-cs-cl]researchTowards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream DiffusionTowards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion[arxiv-cs-cl]researchCognitive Agent Compilation for Explicit Problem Solver ModelingCognitive Agent Compilation for Explicit Problem Solver Modeling[arxiv-cs-cl]researchNSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language ModelsNSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models[arxiv-cs-cl]researchGSM-SEM: Benchmark and Framework for Generating Semantically Variant AugmentationsGSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations[arxiv-cs-cl]researchMedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical EnvironmentsMedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments[arxiv-cs-cl]researchWiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki SystemsWiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems[arxiv-cs-cl]researchSelf-Consolidating Language Models: Continual Knowledge Incorporation from ContextSelf-Consolidating Language Models: Continual Knowledge Incorporation from Context[arxiv-cs-cl]researchBeyond Single Ground Truth: Reference Monism as Epistemic Injustice in ASR EvaluationBeyond Single Ground Truth: Reference Monism as Epistemic Injustice in ASR Evaluation[arxiv-cs-cl]researchThe Translation Tax Is Not a Scalar: A Counterfactual Audit of English-Source Cue Inheritance in Chinese Multilingual BenchmarksThe Translation Tax Is Not a Scalar: A Counterfactual Audit of English-Source Cue Inheritance in Chinese Multilingual Benchmarks[arxiv-cs-cl]researchSAGE: Hierarchical LLM-Based Literary Evaluation through Ontology-Grounded Interpretive DimensionsSAGE: Hierarchical LLM-Based Literary Evaluation through Ontology-Grounded Interpretive Dimensions[arxiv-cs-cl]researchRetrieve, Integrate, and Synthesize: Spatial-Semantic Grounded Latent Visual ReasoningRetrieve, Integrate, and Synthesize: Spatial-Semantic Grounded Latent Visual Reasoning[arxiv-cs-cl]researchSecuring Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded ReliabilitySecuring Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability[arxiv-cs-cl]
Today 71
Total 244
Major 3
Active sources 12/51
Updated just now
Daily Summary

今日の更新

Today's Updates

Today 71 ▲ 31%
Yesterday 54
7-day 497
Last 7 days
34
39
65
80
154
54
71
05/05 05/06 05/07 05/08 05/09 05/10 05/11
Last 7 days article counts
DateCount
2026-05-0534
2026-05-0639
2026-05-0765
2026-05-0880
2026-05-09154
2026-05-1054
2026-05-1171
主要な更新 Top stories 05/11 · 9 件
  1. 01 research Data Contamination in Neural Hieroglyphic Translation: A Reproducibility Study Data Contamination in Neural Hieroglyphic Translation: A Reproducibility Study arXiv:2605.07453v1 Announce Type: new Abstract: Ancient and endangered languages pose a unique challenge for NLP: their datasets are inherently scarce, difficult to expand, and built from formulaic co [arxiv-cs-cl]
  2. 02 vscode REL Zed nightly: 自動ウォッチ機能を改善 (#56126) nightly: auto_update: Fix Windows installer task arguments syntax (#50464) Zedエディタのnightlyビルドで、デバッグ時の自動ウォッチ機能(auto watch)が改善された。これは変数を自動的に監視式として登録する機能で、デバッグ体験の向上を狙ったものとみられる。 Zed's nightly build includes an improvement to the auto-watch feature (#56126), which automatically tracks variables in the debugger view, aimed at smoother debugging workflows. [zed-releases]
  3. 03 research Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas arXiv:2605.06673v1 Announce Type: new Abstract: Aggregate metacognitive quality scores mask within-model variation across MMLU benchmark domains. We administered 1,500 MMLU items (250 per domain, unde [arxiv-cs-cl]
  4. 04 codex OpenAI Campus Network: Student club interest form OpenAI Campus Network: Student club interest form Join the OpenAI Campus Network—connect student clubs worldwide, access AI tools, host events, and build an AI-powered campus community. [openai-news]
  5. 05 codex How enterprises are scaling AI How enterprises are scaling AI How enterprises scale AI: from early experiments to compounding impact through trust, governance, workflow design, and quality at scale. [openai-news]
  6. 06 cursor DemoSystems株式会社、AI搭載CRM「DemoCRM」の提供を開始 (no English title) DemoSystems株式会社、AI搭載CRM「DemoCRM」の提供を開始 🚀 DemoSystems株式会社は、営業活動における顧客管理・対応履歴の確認・次回アクションの判断を支援するAI搭載CRM 「DemoCRM」 の提供を開始し [zenn-cursor]
  7. 07 local-llm 第2回:プロトコルエンジニアリング独立宣言―― なぜAIの「要約」は知性の輪郭を破壊するのか (no English title) 【理論差異自覚プロトコル:RESOLUTION_CONTRAST】 [理論差異自覚プロトコル] 目的 = "要約で理解している用語と実際の理論との差異を比較し自覚すること" 用語 = "プロトコルエンジニアリング" 原本URL = "htt [zenn-llm]
  8. 08 local-llm 全部Opusにしたら5時間枠が詰まった話 — skillのモデル配分を3層に分けた判断基準 (no English title) 「全部Opusで最高品質」は、思ったより早く限界が来た Claude Codeのサブスクプランで運用し始めて最初にやったことは、重要そうなskillを全部Opusに揃えることだった。「賢いモデルを使えば品質が上がる」という発想で、判断系・実 [zenn-llm]
  9. 09 vscode tmux + Claude Codeで、VS Codeに差分が表示されない問題を解決する (no English title) 前提 macOS VS Code VS Code統合ターミナルでtmux利用 tmuxでClaude Code利用 VS Code統合ターミナル -> tmux -> Claude Codeという入れ子構造での話です。 起こった [qiita-vscode]
🔥 Today's Top 3 importance × recency
  1. Zed Editor Releases v1.1.7 Zed Editor Releases v1.1.7 zed-releases 2d ago
  2. Zed Editor Releases v1.2.2-pre Zed Editor Releases v1.2.2-pre zed-releases 2d ago
  3. Zed Editor Releases v1.1.5-pre Zed Editor Releases v1.1.5-pre zed-releases 5d ago

Timeline 244 total · page 1/9

TODAY 30 entries
NEW release vscode 4h ago · zed-releases

Zed nightly: 自動ウォッチ機能を改善 (#56126) nightly: auto_update: Fix Windows installer task arguments syntax (#50464)

AI要約 Zedエディタのnightlyビルドで、デバッグ時の自動ウォッチ機能(auto watch)が改善された。これは変数を自動的に監視式として登録する機能で、デバッグ体験の向上を狙ったものとみられる。

EN Zed's nightly build includes an improvement to the auto-watch feature (#56126), which automatically tracks variables in the debugger view, aimed at smoother debugging workflows.

github.com
nightly: auto_update: Fix Windows installer task arguments syntax (#50464) media
NEW paper research 6h ago · arxiv-cs-cl

MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media

EN arXiv:2605.06940v1 Announce Type: new Abstract: Annotation automation via Large Language Models (LLMs) is the core approach for scaling NLP datasets; however, LLM behavior with respect to closed-set i

arxiv.org
MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media og
NEW paper research 6h ago · arxiv-cs-cl

NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models

EN arXiv:2605.07051v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and

arxiv.org
NEW paper research 6h ago · arxiv-cs-cl

The Translation Tax Is Not a Scalar: A Counterfactual Audit of English-Source Cue Inheritance in Chinese Multilingual Benchmarks The Translation Tax Is Not a Scalar: A Counterfactual Audit of English-Source Cue Inheritance in Chinese Multilingual Benchmarks

EN arXiv:2605.07093v1 Announce Type: new Abstract: The Translation Tax is often treated as a scalar: translated benchmarks are assumed to inflate scores by preserving English-source cues. We audit this c

arxiv.org
NEW paper research 6h ago · arxiv-cs-cl

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability

EN arXiv:2605.07110v1 Announce Type: new Abstract: Computer-use agents(CUAs)are moving frombounded benchmarks toward real software environments, wherethey operate browsers, desktops, mobile applications,

arxiv.org