#agent page 3/6

og fallback

paper research 3w ago ·

arxiv-cs-ai

JobBench: エージェントの仕事を人間の意志に合わせる JobBench: Aligning Agent Work With Human Will

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約職業AIエージェントを経済的価値だけでなく人間の意志との整合性で評価する新ベンチマーク「JobBench」を提案。

EN JobBench is a new benchmark for occupational AI agents that goes beyond economic replacement metrics to evaluate alignment with human will and intent.

#agent #arxiv #paper +4

og fallback

paper research 3w ago ·

arxiv-cs-ai

ScientistOne: Chain-of-Evidenceによる人間レベルの自律研究を目指して ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約自律研究エージェントの検証可能性の失敗を指摘し、証拠の連鎖（Chain-of-Evidence）で信頼性を高める新フレームワークを提案。

EN ScientistOne proposes a Chain-of-Evidence framework to address verifiability failures in autonomous research agents, pushing toward human-level scientific reliability.

og fallback

paper research 3w ago ·

arxiv-cs-se

VISTA: ビジュアル仕様からWebアプリ生成を評価するエンドツーエンドベンチマーク VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約 LLMエージェントがビジュアル仕様からWebアプリを生成する能力を評価するベンチマーク「VISTA」を提案。

EN VISTA is a new benchmark for evaluating LLM-based agents on end-to-end web-app generation from visual specifications.

#arxiv #benchmark #paper +5

og fallback

paper research 3w ago ·

arxiv-cs-se

ツールスキーマ圧縮により制約されたコンテキスト予算下でのAgentic RAGを実現 Tool-Schema Compression Enables Agentic RAG Under Constrained Context Budgets

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約多数のツール定義を持つAgentic RAGシステムで、ツールスキーマの圧縮によりコンテキスト制約問題を解決する手法を提案。

EN A new approach compresses tool schemas in agentic RAG systems to resolve the resource conflict between tool definitions and available context budget in LLMs.

og fallback

paper research 3w ago ·

arxiv-cs-se

Verus-SpecGym: 仕様の自動形式化を評価するエージェント環境 Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約 AIコーディングエージェントの出力の正しさを保証するため、仕様の自動形式化を評価するベンチマーク環境Verus-SpecGymを提案した研究論文。

EN Verus-SpecGym is a new agentic benchmark environment for evaluating how well AI agents can autoformalize software specifications, addressing correctness challenges in AI-generated code.

fallback

paper research 3w ago ·

arxiv-cs-se

構造的カバレッジ基準によるエージェントワークフローのテスト Testing Agentic Workflows with Structural Coverage Criteria

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約マルチエージェントシステムのワークフロー構造（エージェント・ツール・委譲パス等）を活用した新しいテスト手法を提案する研究論文。

EN A research paper proposing structural coverage criteria for testing multi-agent workflows, leveraging explicit structures such as agents, tools, access rules, and delegation paths.

#agent #arxiv #benchmark +6

fallback

paper research 3w ago ·

arxiv-cs-se

TrajAudit: エージェント型コーディングシステムの障害自動診断 TrajAudit: Automated Failure Diagnosis for Agentic Coding Systems

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約バグ修正などを行うエージェント型AIシステムの失敗原因を自動診断するフレームワーク「TrajAudit」を提案した研究論文。

EN TrajAudit is a proposed framework for automated failure diagnosis in agentic coding systems such as AI-driven bug fixers, helping explain why tasks go wrong.

#agent #arxiv #paper +4

fallback

release agent-fw 3w ago ·

langchain-releases

langchain-perplexity==1.3.0 リリース langchain-perplexity==1.3.0

重要度 Medium Medium priority 重要度 Medium · 公式リリース · Agent Frameworks Medium priority · official release · Agent Frameworks 公開 5月27日 Published May 27

AI要約 langchain-perplexity 1.3.0 がリリース。ChatPerplexity に use_responses_api フラグが追加され、インフラ依存関係も更新された。

EN Changes since langchain-perplexity==1.2.0 release(perplexity): 1.3.0 ( #37707 ) feat(perplexity): use_responses_api flag on ChatPerplexity ( #37359 ) chore(infra): bump langchain-tests floor to 1.1.9

#agent #langchain #release +3

developers.googleblog.com →

media fallback

🔥 HOT blog gemini 3w ago ·

google-developers

Google Pay の最新アップデート The latest updates to Google Pay

重要度 High High priority 重要度 High · 技術記事 · Gemini / Gemma High priority · technical post · Gemini / Gemma 公開 5月27日 Published May 27

AI要約 Google Pay が「エージェンティック・コマース」に対応。Universal Commerce Protocol と MCP サーバーを導入し、AI エージェントによる決済管理を実現。Android も強化。

EN Google Pay is evolving for "agentic commerce" by introducing the Universal Commerce Protocol and a new MCP server that allows AI agents to manage integrations and analyze trends. New Android updates i

#agent #google #mcp-server +5

fallback

🔥 HOT blog tech-news 3w ago ·

nvidia-blog

NVIDIAのVera CPUが競合に「強烈な一撃」——初期ベンチマーク結果が公開 NVIDIA Vera CPU Is ‘Packing a Heavy-Hitting Punch’ Against Competition

重要度 High High priority 重要度 High · 技術記事 · Industry & Policy High priority · technical post · Industry & Policy 公開 5月27日 Published May 27

AI要約 NVIDIAのVera CPUがPhoronixのベンチマークで競合を圧倒。エージェンティックAI時代に求められる高速コア・大帯域・全コア持続性能を備える。

EN The shift to agentic AI creates a new CPU requirement for the AI factory: fast cores, massive memory bandwidth and the ability to sustain high performance when all cores are active. Initial benchmark

#agent #benchmark #news +7

blogs.nvidia.com →

fallback

release agent-fw 3w ago ·

langchain-releases

langchain==1.3.2 リリース langchain==1.3.2

重要度 Medium Medium priority 重要度 Medium · 公式リリース · Agent Frameworks Medium priority · official release · Agent Frameworks 公開 5月27日 Published May 27

AI要約 langchain 1.3.2 がリリース。langgraph>=1.2.2 を要件に追加し、TodoListMiddleware のバグ修正などを含む。

EN Changes since langchain==1.3.1 chore(langchain): bump to 1.3.2, require langgraph>=1.2.2 ( #37703 ) fix(langchain): land final answer in last AIMessage for TodoListMiddleware ( #37643 ) feat(langch

#agent #langchain #release +4

media fallback

blog tech-news 3w ago ·

ars-technica

FBIが解説：同意なきAIポルノ投稿者の特定がいかに簡単か FBI agent explains how easy it is to ID people posting AI porn without consent

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Industry & Policy Medium priority · technical post · Industry & Policy 公開 5月27日 Published May 27

AI要約 FBIが、AIで生成した性的ディープフェイクを販売した男を逮捕。Instagramの保存済み投稿が決め手となり、捜査の容易さが明らかに。

EN An FBI agent detailed how straightforward it is to identify people who post non-consensual AI porn, after a man was caught selling deepfakes linked to his own Instagram activity.

#agent #ars-technica #news +5

arstechnica.com →

fallback

Tue, May 26 4 entries

blog local-llm 3w ago ·

qiita-llm

自称世界初！？社会シミュラクラ簡易デモ A lightweight social simulacra demo driven purely by prompts is introduced, allowing users…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月26日 Published May 26

AI要約プロンプトのみで動作する社会シミュレーション（シミュラクラ）の簡易デモを公開。ローカルLLMなしでも試せるおもちゃレベルの実装を紹介。

EN A lightweight social simulacra demo driven purely by prompts is introduced, allowing users to experiment with social simulation without requiring a local LLM setup.

#llm #qiita #social-simulation +4

qiita.com →

og fallback

blog claude 3w ago ·

qiita-claude

Claude Skills の評価駆動開発を skill-creator から読む A deep dive into Anthropic's recommended evaluation-driven development approach for Claude…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Claude / Claude Code Medium priority · technical post · Claude / Claude Code 公開 5月26日 Published May 26

AI要約 Anthropic の Agent Skills ベストプラクティスで推奨される「評価駆動開発」の具体的な実践方法を、skill-creator の実装を通じて読み解く記事。

EN A deep dive into Anthropic's recommended evaluation-driven development approach for Claude Agent Skills, examining how skill-creator implements the practice concretely.

#agent #claude #qiita +4

qiita.com →

Claude Skills の評価駆動開発を skill-creator から読む

og fallback

blog claude 3w ago ·

zenn-claude

Agent Skillsだけでポーリングを回してみる A practical guide showing how to use Claude's Agent Skills alone to implement a polling lo…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Claude / Claude Code Medium priority · technical post · Claude / Claude Code 公開 5月26日 Published May 26

AI要約 Claude の Agent Skills のみを使い、ポーリングで状態を監視しながら自律的に作業を行う方法と実際のユースケースを紹介する記事。

EN A practical guide showing how to use Claude's Agent Skills alone to implement a polling loop for state monitoring and automated task execution, with real-world skill examples.

#agent #claude #zenn +4

fallback

paper research 3w ago ·

arxiv-cs-ai

LLMを活用したエージェントワークフローの信頼性設計：レイテンシ・信頼性・コストのトレードオフ最適化 Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月26日 Published May 26

AI要約複数のLLMエージェントが連携するワークフローにおける、レイテンシ・信頼性・コストの三者トレードオフを最適化する設計手法を提案した研究論文。

EN A research paper proposing methods to optimize latency, reliability, and cost tradeoffs in agentic workflows composed of multiple interacting LLM-powered and conventional agents.

#agent #arxiv #paper +6

og fallback

Mon, May 25 4 entries

blog copilot 3w ago ·

qiita-copilot

GitHub Agentic AI Developer (GH-600) 受験体験記〜エージェント時代の新資格をベータで受けてきた〜 GitHub Agentic AI Developer (GH-600) 受験体験記🚀 ~エージェント時代の新資格をベータで受けてきた~

重要度 Medium Medium priority 重要度 Medium · 技術記事 · GitHub Copilot Medium priority · technical post · GitHub Copilot 公開 5月25日 Published May 25

AI要約 AIエージェント開発に特化したGitHubの新資格「GH-600」のベータ試験を受験したアーキテクトによる体験記。試験の内容や傾向を紹介。

EN A hands-on exam report from an architect who sat the beta of GitHub's new Agentic AI Developer (GH-600) certification, covering agentic workflows and AI-driven development practices.

#agent #qiita #gh-600 +5

qiita.com →

fallback

paper research 3w ago ·

arxiv-cs-lg

MARGIN: マルチエージェント基盤モデル協調のためのランタイム信頼度キャリブレーション MARGIN: Runtime Confidence Calibration for Multi-Agent Foundation Model Coordination

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月25日 Published May 25

AI要約複数の基盤モデルエージェントが協調する環境で、コーディネーターが各エージェントの応答をどれだけ信頼すべきかを実行時にキャリブレーションする手法MARGINを提案。

EN MARGIN proposes a runtime confidence calibration method for multi-agent deployments, helping a coordinator decide which foundation model agent's response to trust.

fallback

paper research 3w ago ·

arxiv-cs-lg

PACE: 小規模言語モデルエージェントの2タイムスケール自己進化 PACE: Two-Timescale Self-Evolution for Small Language Model Agents

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月25日 Published May 25

AI要約小規模LMエージェントを本番環境で効率的に運用するため、プロンプトやパーサーを自動チューニングする2タイムスケール自己進化フレームワークPACEを提案。

EN PACE introduces a two-timescale self-evolution framework that automates prompt and component tuning for small language model agents, reducing compute and human effort in production deployments.

#arxiv #paper #small-language-model +4

fallback

blog cursor 3w ago ·

zenn-cursor

Cursor Agent × Rubydex MCP 本当にトークン節約になる？ Rails モノリスで検証してみた Cursor Agent × Rubydex MCP 本当にトークン節約になる？ Rails モノリスで検証してみた

重要度 Medium Medium priority 重要度 Medium · 技術記事 · AI Editors Medium priority · technical post · AI Editors 公開 5月25日 Published May 25

AI要約こんにちは！ Hubble でバックエンドエンジニアをしている @koyakota です！ RubyKaigi 2026 で取り上げられた Rubydex、Cursor の Agent からも使えるらしいので試してみました。トークンは減る

#agent #cursor #mcp-server +7

fallback

Sun, May 24 1 entries

blog mcp 3w ago ·

zenn-mcp

AIチームでゲーム開発（Claude Code Agent Teams + Godot） AIチームでゲーム開発（Claude Code Agent Teams + Godot）

重要度 Medium Medium priority 重要度 Medium · 技術記事 · MCP / Tooling Medium priority · technical post · MCP / Tooling 公開 5月24日 Published May 24

AI要約 Claude Code Agent Teamsを使い、画像生成とGodotゲーム開発を複数のAIエージェントに分担させる手法を解説した実践記事。

EN A practical walkthrough of using Claude Code Agent Teams to split game development tasks—image generation and Godot scripting—across multiple AI agents working in parallel.

#agent #mcp #mcp-server +6

AIチームでゲーム開発（Claude Code Agent Teams + Godot）

og fallback

Sat, May 23 1 entries

blog copilot 3w ago ·

zenn-copilot

GitHub Copilot app を加えて考える CLI / VS Code / app の使い分け GitHub Copilot app を加えて考える CLI / VS Code / app の使い分け

重要度 Medium Medium priority 重要度 Medium · 技術記事 · GitHub Copilot Medium priority · technical post · GitHub Copilot 公開 5月23日 Published May 23

AI要約はじめに先日、C# 開発者のための GitHub Copilot CLI と VS Code Agent Mode の使い分けという記事を書きました。前回は「ターミナルで大きく回す GitHub Copilot CLI」と「エディターで

#agent #copilot #zenn +5

GitHub Copilot app を加えて考える CLI / VS Code / app の使い分け

og fallback

Fri, May 22 5 entries

blog copilot 4w ago ·

zenn-copilot

Microsoft 公式の WinUI agent plugin で WinUI 3 アプリ開発がぐっと楽になった話 Microsoft 公式の WinUI agent plugin で WinUI 3 アプリ開発がぐっと楽になった話

重要度 Medium Medium priority 重要度 Medium · 技術記事 · GitHub Copilot Medium priority · technical post · GitHub Copilot 公開 5月22日 Published May 22

AI要約はじめに Microsoft から GitHub Copilot CLI / Claude Code / OpenAI Codex 向けの公式プラグインとして「WinUI agent plugin」が公開されました。WinUI 3 と Wi

#agent #copilot #zenn +6

fallback

🔥 HOT release cursor 4w ago ·

zed-releases

Zed Editor がプレリリース版 v1.4.1-pre を公開 Zed Editor Releases v1.4.1-pre

重要度 High High priority 重要度 High · 公式リリース · AI Editors High priority · official release · AI Editors 公開 5月22日 Published May 22

AI要約 Rust 製の高速コードエディタ Zed が v1.4.1-pre をリリースした。プレリリース版として新機能や不具合修正が含まれており、次期安定版に向けた検証段階にある。

EN Fixed a crash when clicking a built-in skill mention in the agent panel while connected to a remote project (Preview only). ( #57442 ) Add agent::NewTerminalThread for defining custom shortcuts to lau

#agent #editor #release +5

media fallback

release agent-fw 4w ago ·

langchain-releases

langchain-openai 1.2.2 リリース — 安定性と互換性の改善 langchain-openai==1.2.2

重要度 Medium Medium priority 重要度 Medium · 公式リリース · Agent Frameworks Medium priority · official release · Agent Frameworks 公開 5月22日 Published May 22

AI要約 LangChain の OpenAI インテグレーションパッケージ langchain-openai のバージョン 1.2.2 がリリースされた。マイナーバージョンアップとして、バグ修正や依存関係の調整など通常メンテナンス的な更新が中心と見られる。

EN Changes since langchain-openai==1.2.1 release(openai): 1.2.2 ( #37617 ) chore(infra): bump langchain-tests floor to 1.1.9 ( #37610 ) test(openai): unbreak audio chat and Azure embedding integration te

#agent #langchain #release +4

media fallback

🔥 HOT release cursor 4w ago ·

zed-releases

コードエディタ Zed が v1.4.0-pre をリリース、新機能を先行公開 Zed Editor Releases v1.4.0-pre

重要度 High High priority 重要度 High · 公式リリース · AI Editors High priority · official release · AI Editors 公開 5月22日 Published May 22

AI要約 Rust 製の高速コードエディタ Zed が v1.4.0 のプレリリース版を公開した。パフォーマンスや開発者体験の改善を中心とした更新が含まれており、安定版リリースに向けた最終調整が進んでいる。

EN This week's release includes support for skills, a global AGENTS.md file for user-wide agent instructions, the ability to choose a base branch in the branch diff view, and a new editor: toggle all dif

#agent #editor #release +6

media fallback

release agent-fw 4w ago ·

langchain-releases

langchain-tests==1.1.9 リリース langchain-tests==1.1.9

重要度 Medium Medium priority 重要度 Medium · 公式リリース · Agent Frameworks Medium priority · official release · Agent Frameworks 公開 5月22日 Published May 22

AI要約 langchain-testsがv1.1.9にアップデート。ストリーミングアサーションで追加コンテンツブロックを許容する改善と依存ライブラリidnaのバージョンアップを含む。

EN Changes since langchain-tests==1.1.8 release(standard-tests): 1.1.9 ( #37609 ) test(standard-tests): allow extra content blocks in streaming assertions ( #37592 ) chore: bump idna from 3.11 to 3.15 in

#agent #langchain #release +4

media fallback

Thu, May 21 2 entries

release agent-fw 4w ago ·

langchain-releases

LangChain の Fireworks 連携パッケージ langchain-fireworks 1.4.1 がリリース langchain-fireworks==1.4.1

重要度 Medium Medium priority 重要度 Medium · 公式リリース · Agent Frameworks Medium priority · official release · Agent Frameworks 公開 5月21日 Published May 21

AI要約 LangChain エコシステム向けの Fireworks AI 連携ライブラリ「langchain-fireworks」のバージョン 1.4.1 が公開された。通常のメンテナンスリリースとして位置付けられ、既存ユーザーへの安定性向上が図られている。

EN Changes since langchain-fireworks==1.4.0 release(fireworks): 1.4.1 ( #37603 ) fix(fireworks): retry on bare APIConnectionError , default max_retries=2 ( #37602 ) test(fireworks): stabilize integration

#agent #langchain #release +3