TECH Dashboard — Pulse of the AI Ecosystem

Featured · 注目 Featured

Zed Editor Releases v1.1.5-pre Zed Editor Releases v1.1.5-pre

Fixed the git: worktree popup listing no worktrees when a project was opened at the parent of a .bare directory (bare-clone-with-sibling-worktrees layout). ( #55790 ) Fixed a crash when pasting an ima

release

zed-releases · 2d ago

Daily Summary

今日の更新

Today's Updates

Today 147 ▲ 36%

Yesterday 108

7-day 330

Last 7 days

108

147

05/02 05/03 05/04 05/05 05/06 05/07 05/08

Top categories · 7d

主要な更新 Top stories 05/08 · 10 件

🔥 Today's Top 3 importance × recency

Zed Editor Releases v1.1.5-pre Zed Editor Releases v1.1.5-pre zed-releases 2d ago
Evaluating Non-English Developer Support in Machine Learning for Software Engineering Evaluating Non-English Developer Support in Machine Learning for Software Engineering arxiv-cs-se 3h ago
Ollama Releases v0.30.0-rc7 Ollama Releases v0.30.0-rc7 ollama-releases 4h ago

Timeline 500 total · page 1/17

TODAY 30 entries

NEW blog claude 1h ago ·

qiita-claude

金曜の30分、Claudeに「私のプロンプト癖」を3つダメ出しさせた話 A personal experiment where the author spent 30 minutes on a Friday night asking Claude to…

AI要約筆者が金曜夜の30分を使い、Claudeに自分の過去プロンプトを分析させて改善点を3つ指摘してもらった体験を紹介する記事。プロンプトの曖昧さや前提条件の欠落など、自分では気づきにくい癖をAIにメタ的に指摘させることで、プロンプト品質の底上げを図る試みである。

EN A personal experiment where the author spent 30 minutes on a Friday night asking Claude to critique their own prompt habits, identifying three recurring weaknesses. The piece illustrates how using an LLM as a meta-reviewer of one's own prompts can surface blind spots that are hard to notice through self-review.

#claude #qiita #prompt-engineering #self-review

qiita.com →

NEW release vscode 1h ago ·

zed-releases

Zed nightly: 自動ウォッチ機能を改善 (#56126) nightly: Improve auto watch (#56126)

AI要約 Zedエディタのnightlyビルドで、デバッグ時の自動ウォッチ機能(auto watch)が改善された。これは変数を自動的に監視式として登録する機能で、デバッグ体験の向上を狙ったものとみられる。

EN Zed's nightly build includes an improvement to the auto-watch feature (#56126), which automatically tracks variables in the debugger view, aimed at smoother debugging workflows.

#editor #release #zed #bugfix

github.com →

media

NEW blog claude 1h ago ·

qiita-claude

Claude Code v2.1.132 リリース｜毎日Changelog解説 Claude Code v2.1.132 has been released

AI要約 AnthropicのCLIコーディングツール「Claude Code」のv2.1.132がリリースされた。今回はマイナーアップデートで、内部的な改善やバグ修正が中心と見られ、機能面での大きな変更は明示されていない。日々進化するClaude Codeの最新動向を追う記事である。

EN Claude Code v2.1.132 has been released. This appears to be a minor update focused on internal improvements and bug fixes rather than major feature additions, continuing Anthropic's rapid iteration cadence on its CLI coding assistant.

#claude #mcp-server #qiita #claude-code

qiita.com →

NEW blog mcp 2h ago ·

qiita-mcp

Claude Agent SDK と MCP server で業務自動化、半年間の実装メモ A six-month personal implementation log of automating the author's own work using Claude A…

AI要約筆者が Claude Agent SDK と MCP server を組み合わせ、自身の業務を半年かけて自動化した実装記録。エージェント設計や MCP サーバの構築過程、運用上の知見をまとめた個人的なノートとなっている。

EN A six-month personal implementation log of automating the author's own work using Claude Agent SDK combined with MCP servers, covering agent design, MCP server construction, and operational lessons learned.

#agent #mcp #mcp-server #qiita

qiita.com →

Claude Agent SDK + MCP server で自分の業務を自動化した、半年の実装メモ

NEW paper research 3h ago ·

arxiv-cs-ai

LCM: ロスレスなコンテキスト管理手法を提案する研究論文 LCM: Lossless Context Management

AI要約 arXivで公開された論文「LCM: Lossless Context Management」は、LLMの長文コンテキストを情報損失なく効率的に管理する手法を提案する。従来の要約や圧縮ベース手法と異なり、必要時に元情報を完全復元できる点が特徴とされる。

EN An arXiv paper titled 'LCM: Lossless Context Management' proposes a technique for handling long LLM contexts without information loss, contrasting with lossy summarization or compression approaches by preserving full recoverability of original tokens.

#arxiv #paper #llm #context-management

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

文脈が害となる時: マルチエージェント設計探索における知識転移のクロスオーバー効果 When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration

AI要約本論文はマルチエージェント設計探索において、エージェント間で知識を共有することが必ずしも性能向上につながらず、むしろ探索効率を低下させる「クロスオーバー効果」が生じることを示す。文脈の与え方次第で知識転移が逆効果となる条件を分析している。

EN This paper investigates how knowledge transfer between agents in multi-agent design exploration can backfire, producing a crossover effect where shared context degrades rather than improves search performance under certain conditions.

#agent #arxiv #paper #multi-agent

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

AuditRepairBench: エージェント修復の評価チャネル順位不安定性ベンチマーク AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair

AI要約 AuditRepairBenchは、ペア実行トレースを用いてLLMエージェントのコード修復における評価器チャネル間の順位不安定性を測定する新たなコーパス。同一修復案でも評価軸により順位が大きく揺らぐ問題を体系化し、エージェント評価の信頼性向上を目指す。

EN AuditRepairBench introduces a paired-execution trace corpus designed to measure evaluator-channel ranking instability in LLM agent code repair, exposing how identical patches can be ranked inconsistently across evaluation channels and pushing toward more reliable agent assessment.

#agent #arxiv #paper #agent-repair

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

展開時のアラインメントはモデル単体評価では判定不能 Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone

AI要約本論文は、LLMのアラインメントをモデル単体のベンチマークで測るだけでは、実運用環境での安全性を保証できないと主張する。展開時の文脈依存性を踏まえ、システムレベルでの評価枠組みが必要だと論じている。

EN This paper argues that model-level alignment evaluations are insufficient to guarantee safety in real deployments, since alignment behavior depends on the surrounding system context. The authors call for system-level evaluation frameworks that capture deployment-relevant risks.

#arxiv #benchmark #paper #ai-alignment

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

TSCG: エージェントLLM向け決定論的ツールスキーマコンパイル TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

AI要約 TSCGはエージェントLLM運用におけるツールスキーマを決定論的にコンパイルする手法を提案する研究。ツール呼び出しの信頼性と一貫性を高め、実運用でのエラー削減を目指す。

EN TSCG proposes a deterministic compilation approach for tool schemas in agentic LLM deployments, aiming to improve reliability and consistency of tool invocations and reduce runtime errors in production environments.

#agent #arxiv #mcp-server #paper

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

強化ファインチューニングの失敗を自動管理する堅牢なLLM事後学習手法 Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning

AI要約本論文は強化学習によるLLM事後学習(RFT)で生じる学習失敗を自動検出・対処する枠組みを提案する。報酬崩壊や勾配不安定などの障害を監視し、リトライや調整を行うことで、RFTの安定性と最終性能を高めることを狙う。

EN This paper proposes an automatic failure management framework for reinforcement fine-tuning (RFT) of LLMs, detecting and recovering from training instabilities such as reward collapse and gradient anomalies to improve robustness and final model quality.

#arxiv #paper #llm #reinforcement-learning

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

ソフトウェア工学におけるAIエージェントの責任:利用規約分析と研究ロードマップ Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap

AI要約本論文はソフトウェア開発に用いられるAIエージェントの「責任(アカウンタビリティ)」をテーマに、主要なAIコーディングサービスの利用規約を分析し、責任の所在に関する課題を整理する。さらに信頼できるエージェント実現に向けた研究ロードマップを提示する。

EN This paper examines accountability of AI agents in software engineering by analyzing terms of service of major AI coding services, highlighting how liability and responsibility are allocated, and proposing a research roadmap toward trustworthy and accountable agents.

#agent #arxiv #paper #ai-agents

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

検索を超えて:コード検索のためのマルチタスクベンチマークとモデル Beyond Retrieval: A Multitask Benchmark and Model for Code Search

AI要約本論文はコード検索を単一の検索タスクとしてではなく、複数の関連サブタスクを束ねたマルチタスク問題として再定義する新たなベンチマークと統合モデルを提案する。従来の評価指標の限界を指摘し、より実用的な開発者支援を目指す。

EN This paper proposes a multitask benchmark and unified model for code search, reframing it beyond pure retrieval to include related subtasks. It highlights limitations of current evaluation paradigms and aims for more practical developer assistance.

#arxiv #benchmark #paper #code-search

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

CodeEvolve: LLM進化的最適化による多言語コード強化 CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement

AI要約 CodeEvolveは、LLMを用いた進化的アルゴリズムでコードを自動最適化するフレームワーク。実行時情報を活用したターゲット選択により、複数のプログラミング言語にまたがるコード性能改善を実現する。

EN CodeEvolve is an LLM-driven evolutionary optimization framework that uses runtime-enriched target selection to automatically improve code performance across multiple programming languages.

#arxiv #paper #llm #evolutionary-algorithms

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

正則化付き中心化Emphatic TD学習の提案 Regularized Centered Emphatic Temporal Difference Learning

AI要約オフポリシー強化学習における価値関数推定の安定化を目指し、Emphatic TD学習に中心化と正則化を組み合わせた新手法を提案する論文。分散低減と収束性向上に寄与する可能性があり、関数近似下での学習安定化に貢献する。

EN This paper proposes a regularized and centered variant of Emphatic Temporal Difference (TD) learning for off-policy reinforcement learning, aiming to reduce variance and improve convergence stability under function approximation.

#arxiv #paper #reinforcement-learning #off-policy

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

Pro$^2$Assist: マルチモーダル一人称視点による長時間手順タスクの能動支援 Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks

AI要約 Pro$^2$Assistは、一人称視点のマルチモーダル知覚を用い、長時間にわたる手順型タスクで連続的にステップを把握し能動的に支援するAIアシスタントの研究である。視覚と音声を統合し、ユーザーの現在の進行状況を逐次認識して、必要なタイミングで助言や指示を提示する。

EN Pro$^2$Assist proposes a continuous, step-aware proactive assistant that leverages multimodal egocentric perception (vision and audio) to track user progress on long-horizon procedural tasks and deliver timely guidance without explicit prompts.

#arxiv #paper #egocentric #multimodal

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

時間推論はボトルネックではない:Neuro-Symbolic QAのための確率的不整合フレームワーク Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA

AI要約本論文は時間的質問応答(TQA)におけるLLMの誤りの主因が時間推論能力の不足ではなく、抽出された時間事実間の確率的不整合にあると主張する。著者らはこの不整合を定量化する枠組みを提案し、Neuro-Symbolic QAの精度改善に寄与する可能性を示す。

EN This paper argues that LLM failures in temporal QA stem not from weak temporal reasoning but from probabilistic inconsistencies among extracted temporal facts. It proposes a neuro-symbolic framework to quantify and mitigate such inconsistencies, improving QA accuracy.

#arxiv #paper #neuro-symbolic #temporal-reasoning

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

投機的生成のための並列プレフィックス検証手法の提案 Parallel Prefix Verification for Speculative Generation

AI要約本論文は大規模言語モデルにおける投機的デコーディングを高速化するため、生成途中のトークン列(プレフィックス)を並列に検証する新手法を提案する。従来の逐次検証に比べ、GPU上での並列性を最大化し、推論レイテンシを削減する可能性を示す。

EN This paper proposes a parallel prefix verification scheme for speculative decoding in large language models, aiming to maximize GPU parallelism and reduce inference latency compared with sequential token-by-token verification.

#arxiv #paper #speculative-decoding #llm-inference

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

マルチエージェントゲーム由来の汚染耐性ベンチマーク Agent Island Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games

AI要約 Agent Islandは、複数のLLMエージェントがゲーム形式で競い合う動的環境を用いた新しい評価ベンチマークである。従来の静的ベンチマークが抱える飽和や学習データ汚染の問題を回避し、エージェントの推論・交渉・戦略能力を継続的に測定可能とする点が特徴である。

EN Agent Island is a new benchmark that evaluates LLM agents through multiagent games, providing a dynamic environment that resists both saturation and training-data contamination, unlike traditional static benchmarks that quickly become outdated.

#agent #arxiv #benchmark #paper

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-ai

Transformerにおける暗黙的演繹推論のスケーリング特性 The Scaling Properties of Implicit Deductive Reasoning in Transformers

AI要約本論文はTransformerが多段の演繹推論をパラメータ内で暗黙的に行う能力のスケーリング則を分析する。モデル規模・データ量・推論ステップ数を変化させ、推論精度がどう変化するかを実験的に評価し、グロッキング現象との関連も議論している。

EN This paper investigates how Transformers' implicit deductive reasoning ability scales with model size, training data, and reasoning depth, characterizing empirical scaling laws and connecting findings to grokking phenomena observed during extended training.

#arxiv #paper #transformers #reasoning

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-cl

自由エネルギー駆動の強化学習による教師なしLLM推論の優位性整形 Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

AI要約本論文は、ラベルなし環境下でのLLM推論能力向上を目指し、自由エネルギー原理に基づく強化学習手法を提案する。適応的なアドバンテージ整形を導入し、報酬信号が乏しい状況でも安定した学習を実現する点が特徴とされる。

EN This paper proposes a free-energy-principle-based reinforcement learning method with adaptive advantage shaping to enhance reasoning in LLMs under unsupervised settings, aiming for stable training when reward signals are sparse or absent.

#arxiv #paper #reinforcement-learning #llm-reasoning

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-cl

Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning

EN arXiv:2605.04066v2 Announce Type: replace Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is an essential paradigm that enhances the reasoning capabilities of Large Language Models (LL

#arxiv #paper

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-cl

Data-Driven Variational Basis Learning Beyond Neural Networks: A Non-Neural Framework for Adaptive Basis Discovery Data-Driven Variational Basis Learning Beyond Neural Networks: A Non-Neural Framework for Adaptive Basis Discovery

EN arXiv:2605.05221v1 Announce Type: cross Abstract: Classical representation systems such as Fourier series, wavelets, and fixed dictionaries provide analytically tractable basis expansions, but they ar

#arxiv #paper

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-cl

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

EN arXiv:2605.05226v1 Announce Type: cross Abstract: The central challenge of reinforcement learning for reasoning lies not only in the sparsity of outcome-level supervision, but more fundamentally in ho

#arxiv #paper

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-cl

AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation

EN arXiv:2605.05245v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) remains brittle on multi-hop questions in realistic deployment settings, where retrieved evidence may be noisy or r

#arxiv #paper #rag

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-cl

Counterargument for Critical Thinking as Judged by AI and Humans Counterargument for Critical Thinking as Judged by AI and Humans

EN arXiv:2605.05353v1 Announce Type: new Abstract: This intervention study investigates the use of counterarguments in writing for critical thinking by students in the context of Generative AI (GenAI). T

#arxiv #paper

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-cl

Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets

EN arXiv:2605.05392v1 Announce Type: new Abstract: Large-scale datasets are widely used to perform summarization tasks, but they may not include queries alongside documents and summaries. In the search f

#arxiv #paper

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-cl

SLAM: Structural Linguistic Activation Marking for Language Models SLAM: Structural Linguistic Activation Marking for Language Models

EN arXiv:2605.05443v1 Announce Type: new Abstract: LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection wi

#arxiv #paper

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-cl

ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis

EN arXiv:2605.05485v1 Announce Type: new Abstract: LLMs can solve program synthesis tasks but remain inefficient and unreliable on hard instances requiring large combinatorial search. Given a small set o

#arxiv #paper

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-cl

Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks

EN arXiv:2605.05503v1 Announce Type: new Abstract: Statistical watermarking is a common approach for verifying whether text was written by a language model. Most existing schemes assume autoregressive ge

#arxiv #paper

arxiv.org →

NEW paper research 3h ago ·

arxiv-cs-cl

A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction

EN arXiv:2605.05532v1 Announce Type: new Abstract: This paper evaluates whether a domain trained Small Language Model (SLM) can outperform frontier Large Language Models on structured contract extraction

#arxiv #paper

arxiv.org →

AI の脈動を、
ひとつのダッシュボードに。

The pulse of AI,
on a single dashboard.

Zed Editor Releases v1.1.5-pre Zed Editor Releases v1.1.5-pre

今日の更新

Today's Updates

Timeline 500 total · page 1/17

金曜の30分、Claudeに「私のプロンプト癖」を3つダメ出しさせた話 A personal experiment where the author spent 30 minutes on a Friday night asking Claude to…

Zed nightly: 自動ウォッチ機能を改善 (#56126) nightly: Improve auto watch (#56126)

Claude Code v2.1.132 リリース｜毎日Changelog解説 Claude Code v2.1.132 has been released

Claude Agent SDK と MCP server で業務自動化、半年間の実装メモ A six-month personal implementation log of automating the author's own work using Claude A…

LCM: ロスレスなコンテキスト管理手法を提案する研究論文 LCM: Lossless Context Management

文脈が害となる時: マルチエージェント設計探索における知識転移のクロスオーバー効果 When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration

AuditRepairBench: エージェント修復の評価チャネル順位不安定性ベンチマーク AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair

展開時のアラインメントはモデル単体評価では判定不能 Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone

TSCG: エージェントLLM向け決定論的ツールスキーマコンパイル TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

強化ファインチューニングの失敗を自動管理する堅牢なLLM事後学習手法 Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning

ソフトウェア工学におけるAIエージェントの責任:利用規約分析と研究ロードマップ Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap

検索を超えて:コード検索のためのマルチタスクベンチマークとモデル Beyond Retrieval: A Multitask Benchmark and Model for Code Search

CodeEvolve: LLM進化的最適化による多言語コード強化 CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement

正則化付き中心化Emphatic TD学習の提案 Regularized Centered Emphatic Temporal Difference Learning

Pro$^2$Assist: マルチモーダル一人称視点による長時間手順タスクの能動支援 Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks

時間推論はボトルネックではない:Neuro-Symbolic QAのための確率的不整合フレームワーク Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA

投機的生成のための並列プレフィックス検証手法の提案 Parallel Prefix Verification for Speculative Generation

マルチエージェントゲーム由来の汚染耐性ベンチマーク Agent Island Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games

Transformerにおける暗黙的演繹推論のスケーリング特性 The Scaling Properties of Implicit Deductive Reasoning in Transformers

自由エネルギー駆動の強化学習による教師なしLLM推論の優位性整形 Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning

Data-Driven Variational Basis Learning Beyond Neural Networks: A Non-Neural Framework for Adaptive Basis Discovery Data-Driven Variational Basis Learning Beyond Neural Networks: A Non-Neural Framework for Adaptive Basis Discovery

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation

Counterargument for Critical Thinking as Judged by AI and Humans Counterargument for Critical Thinking as Judged by AI and Humans

Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets

SLAM: Structural Linguistic Activation Marking for Language Models SLAM: Structural Linguistic Activation Marking for Language Models

ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis

Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks

A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction

AI の脈動を、 ひとつのダッシュボードに。

The pulse of AI, on a single dashboard.

今日の更新

Today's Updates

Timeline 500 total · page 1/17

AI の脈動を、
ひとつのダッシュボードに。

The pulse of AI,
on a single dashboard.