LIVE · 05/11
researchData Contamination in Neural Hieroglyphic Translation: A Reproducibility StudyData Contamination in Neural Hieroglyphic Translation: A Reproducibility Study[arxiv-cs-cl]researchCode World Model の安全性評価レポートCode World Model Preparedness Report[arxiv-cs-se]claudeclaude-codeで見つけたTOCTOUについて自分の理解をまとめる(no English title)[zenn-claude]vscodeRELZed nightly: 自動ウォッチ機能を改善 (#56126)nightly: auto_update: Fix Windows installer task arguments syntax (#50464)[zed-releases]researchDomain-level metacognitive monitoring in frontier LLMs: A 33-model atlasDomain-level metacognitive monitoring in frontier LLMs: A 33-model atlas[arxiv-cs-cl]researchVITA-QinYu: Expressive Spoken Language Model for Role-Playing and SingingVITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing[arxiv-cs-cl]researchIntentGrasp: A Comprehensive Benchmark for Intent UnderstandingIntentGrasp: A Comprehensive Benchmark for Intent Understanding[arxiv-cs-cl]researchTajPersLexon: A Tajik-Persian Lexical Resource and Hybrid Model for Cross-Script Low-Resource NLPTajPersLexon: A Tajik-Persian Lexical Resource and Hybrid Model for Cross-Script Low-Resource NLP[arxiv-cs-cl]researchMIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart HomesMIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes[arxiv-cs-cl]researchReflections and New Directions for Human-Centered Large Language ModelsReflections and New Directions for Human-Centered Large Language Models[arxiv-cs-cl]researchMELD: Multi-Task Equilibrated Learning Detector for AI-Generated TextMELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text[arxiv-cs-cl]researchCan LLMs Take Retrieved Information with a Grain of Salt?Can LLMs Take Retrieved Information with a Grain of Salt?[arxiv-cs-cl]researchMultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social MediaMultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media[arxiv-cs-cl]researchGroup of Skills: Group-Structured Skill Retrieval for Agent Skill LibrariesGroup of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries[arxiv-cs-cl]researchTowards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream DiffusionTowards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion[arxiv-cs-cl]researchCognitive Agent Compilation for Explicit Problem Solver ModelingCognitive Agent Compilation for Explicit Problem Solver Modeling[arxiv-cs-cl]researchNSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language ModelsNSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models[arxiv-cs-cl]researchGSM-SEM: Benchmark and Framework for Generating Semantically Variant AugmentationsGSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations[arxiv-cs-cl]researchMedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical EnvironmentsMedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments[arxiv-cs-cl]researchWiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki SystemsWiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems[arxiv-cs-cl]researchSelf-Consolidating Language Models: Continual Knowledge Incorporation from ContextSelf-Consolidating Language Models: Continual Knowledge Incorporation from Context[arxiv-cs-cl]researchBeyond Single Ground Truth: Reference Monism as Epistemic Injustice in ASR EvaluationBeyond Single Ground Truth: Reference Monism as Epistemic Injustice in ASR Evaluation[arxiv-cs-cl]researchThe Translation Tax Is Not a Scalar: A Counterfactual Audit of English-Source Cue Inheritance in Chinese Multilingual BenchmarksThe Translation Tax Is Not a Scalar: A Counterfactual Audit of English-Source Cue Inheritance in Chinese Multilingual Benchmarks[arxiv-cs-cl]researchSAGE: Hierarchical LLM-Based Literary Evaluation through Ontology-Grounded Interpretive DimensionsSAGE: Hierarchical LLM-Based Literary Evaluation through Ontology-Grounded Interpretive Dimensions[arxiv-cs-cl]
Today 131
Total 438
Major 15
Active sources 23/51
Updated just now
Daily Summary

今日の更新

Today's Updates

Today 131 ▲ 143%
Yesterday 54
7-day 557
Last 7 days
34
39
65
80
154
54
131
05/05 05/06 05/07 05/08 05/09 05/10 05/11
Last 7 days article counts
DateCount
2026-05-0534
2026-05-0639
2026-05-0765
2026-05-0880
2026-05-09154
2026-05-1054
2026-05-11131
主要な更新 Top stories 05/11 · 10 件
  1. 01 research Data Contamination in Neural Hieroglyphic Translation: A Reproducibility Study Data Contamination in Neural Hieroglyphic Translation: A Reproducibility Study arXiv:2605.07453v1 Announce Type: new Abstract: Ancient and endangered languages pose a unique challenge for NLP: their datasets are inherently scarce, difficult to expand, and built from formulaic co [arxiv-cs-cl]
  2. 02 research Code World Model の安全性評価レポート Code World Model Preparedness Report Code World Model(CWM)のリリースに伴う安全性評価レポート。サイバーセキュリティ、化学・生物兵器、AI自己改善などの主要リスク領域について評価を実施し、重大なリスクは確認されなかったと報告している。 A preparedness report evaluating the Code World Model (CWM) across critical risk domains including cybersecurity, CBRN, and AI self-improvement, finding no significant risk thresholds crossed. [arxiv-cs-se]
  3. 03 claude claude-codeで見つけたTOCTOUについて自分の理解をまとめる (no English title) きっかけ anthropics/claude-code-action のバージョンアップにTOCTOU対策(v1.0.45)という見慣れない単語があった。 調べてみたら意外と身近で怖い脆弱性だったので、整理してみる。 TOCTOU(Time [zenn-claude]
  4. 04 vscode REL Zed nightly: 自動ウォッチ機能を改善 (#56126) nightly: auto_update: Fix Windows installer task arguments syntax (#50464) Zedエディタのnightlyビルドで、デバッグ時の自動ウォッチ機能(auto watch)が改善された。これは変数を自動的に監視式として登録する機能で、デバッグ体験の向上を狙ったものとみられる。 Zed's nightly build includes an improvement to the auto-watch feature (#56126), which automatically tracks variables in the debugger view, aimed at smoother debugging workflows. [zed-releases]
  5. 05 research Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas arXiv:2605.06673v1 Announce Type: new Abstract: Aggregate metacognitive quality scores mask within-model variation across MMLU benchmark domains. We administered 1,500 MMLU items (250 per domain, unde [arxiv-cs-cl]
  6. 06 research Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability arXiv:2605.07110v1 Announce Type: cross Abstract: Computer-use agents(CUAs)are moving frombounded benchmarks toward real software environments, wherethey operate browsers, desktops, mobile application [arxiv-cs-se]
  7. 07 codex OpenAI Campus Network: Student club interest form OpenAI Campus Network: Student club interest form Join the OpenAI Campus Network—connect student clubs worldwide, access AI tools, host events, and build an AI-powered campus community. [openai-blog]
  8. 08 codex How enterprises are scaling AI How enterprises are scaling AI How enterprises scale AI: from early experiments to compounding impact through trust, governance, workflow design, and quality at scale. [openai-blog]
  9. 09 claude 「ChatGPTにうまく答えてもらえない」を解決する5つのプロンプト技術【実例・コピペOK】 (no English title) 「ChatGPTにうまく答えてもらえない」を解決する5つのプロンプト技術【実例・コピペOK】 ChatGPT・Claudeを使い始めると、こんな悩みにぶつかりませんか? 「指示したのに的外れな回答が来た」 「長文が必要なのに短くまとめられた [qiita-claude]
  10. 10 claude MCPツール完全ガイド2026:AIエージェントのリアル接続を制する「USB-C for AI」 (no English title) はじめに 2026年、AIエージェント開発における最重要プロトコルとして定着したMCP(Model Context Protocol)。AnthropicがClaudeに向けて開発したこの標準規格は、今やOpenAI・Google・Micr [qiita-claude]
🔥 Today's Top 3 importance × recency
  1. Zed Editor Releases v1.1.7 Zed Editor Releases v1.1.7 zed-releases 2d ago
  2. Zed Editor Releases v1.2.2-pre Zed Editor Releases v1.2.2-pre zed-releases 2d ago
  3. CodeQL 2.25.3 adds Swift 6.3 support CodeQL 2.25.3 adds Swift 6.3 support github-changelog 3d ago

Timeline 438 total · page 1/15

TODAY 30 entries
NEW blog claude 1h ago · qiita-claude

「ChatGPTにうまく答えてもらえない」を解決する5つのプロンプト技術【実例・コピペOK】

AI要約 「ChatGPTにうまく答えてもらえない」を解決する5つのプロンプト技術【実例・コピペOK】 ChatGPT・Claudeを使い始めると、こんな悩みにぶつかりませんか? 「指示したのに的外れな回答が来た」 「長文が必要なのに短くまとめられた

qiita.com
「ChatGPTにうまく答えてもらえない」を解決する5つのプロンプト技術【実例・コピペOK】 og
NEW blog claude 2h ago · qiita-claude

エンジニアがClaude/GPT-4を最大限活用するためのプロンプト設計5パターン【実例付き】

AI要約 はじめに 「AIに聞いても期待した回答が返ってこない」「毎回プロンプトをゼロから書いている」 そんな経験はないでしょうか? プロンプトの書き方を少し変えるだけで、AIの出力品質は劇的に変わります。この記事では、エンジニアが日常業務でよく使う

qiita.com
エンジニアがClaude/GPT-4を最大限活用するためのプロンプト設計5パターン【実例付き】 og
NEW release vscode 5h ago · zed-releases

Zed nightly: 自動ウォッチ機能を改善 (#56126) nightly: auto_update: Fix Windows installer task arguments syntax (#50464)

AI要約 Zedエディタのnightlyビルドで、デバッグ時の自動ウォッチ機能(auto watch)が改善された。これは変数を自動的に監視式として登録する機能で、デバッグ体験の向上を狙ったものとみられる。

EN Zed's nightly build includes an improvement to the auto-watch feature (#56126), which automatically tracks variables in the debugger view, aimed at smoother debugging workflows.

github.com
nightly: auto_update: Fix Windows installer task arguments syntax (#50464) media
NEW blog cursor 5h ago · qiita-cursor

65行のAndrej Karpathy SkillsでAIコーディングを変える方法

AI要約 こんな経験、ありませんか? Claude CodeやCursorに「ログイン機能を追加して」と頼んだら、聞かずに勝手にJWT認証を実装されていた。 「このバグを直して」と頼んだら、バグ修正だけでなく周囲のコードまでリファクタリングされ、差分

qiita.com
65行のAndrej Karpathy SkillsでAIコーディングを変える方法 og
NEW paper research 7h ago · arxiv-cs-cl

MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media

EN arXiv:2605.06940v1 Announce Type: new Abstract: Annotation automation via Large Language Models (LLMs) is the core approach for scaling NLP datasets; however, LLM behavior with respect to closed-set i

arxiv.org
MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media og
NEW paper research 7h ago · arxiv-cs-cl

NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models

EN arXiv:2605.07051v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and

arxiv.org