#testing — TECH Dashboard

Entries page 1/1 · 4 total

Wed, May 27 1 entries

paper research 3w ago ·

arxiv-cs-se

構造的カバレッジ基準によるエージェントワークフローのテスト Testing Agentic Workflows with Structural Coverage Criteria

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約マルチエージェントシステムのワークフロー構造（エージェント・ツール・委譲パス等）を活用した新しいテスト手法を提案する研究論文。

EN A research paper proposing structural coverage criteria for testing multi-agent workflows, leveraging explicit structures such as agents, tools, access rules, and delegation paths.

#agent #arxiv #benchmark +6

arxiv.org →

fallback

Fri, May 22 1 entries

release agent-fw 4w ago ·

langchain-releases

langchain-tests==1.1.9 リリース langchain-tests==1.1.9

重要度 Medium Medium priority 重要度 Medium · 公式リリース · Agent Frameworks Medium priority · official release · Agent Frameworks 公開 5月22日 Published May 22

AI要約 langchain-testsがv1.1.9にアップデート。ストリーミングアサーションで追加コンテンツブロックを許容する改善と依存ライブラリidnaのバージョンアップを含む。

EN Changes since langchain-tests==1.1.8 release(standard-tests): 1.1.9 ( #37609 ) test(standard-tests): allow extra content blocks in streaming assertions ( #37592 ) chore: bump idna from 3.11 to 3.15 in

#agent #langchain #release +4

github.com →

media fallback

Tue, May 19 1 entries

🔥 HOT release agent-fw 4w ago ·

langchain-releases

langchain-tests バージョン 1.1.8 リリース langchain-tests==1.1.8

重要度 High High priority 重要度 High · 公式リリース · Agent Frameworks High priority · official release · Agent Frameworks 公開 5月19日 Published May 19

AI要約 LangChain のテストユーティリティパッケージ「langchain-tests」のバージョン 1.1.8 が公開された。本リリースはパッチ・メンテナンス系の更新であり、LangChain エコシステム全体の品質維持に貢献するコンポーネントの継続的な改善が行われている。

EN Changes since langchain-tests==1.1.7 hotfix(standard-tests): set langchain-core version bounds ( #37509 ) hotfix: bump lockfiles ( #37508 ) release(standard-tests): 1.1.8 ( #37507 ) test(standard-test

#agent #langchain #release +4

github.com →

media fallback

Thu, May 7 1 entries

blog copilot 1mo ago ·

github-copilot

正解が一意に定まらないAIエージェントの挙動を検証する手法 Validating agentic behavior when “correct” isn’t deterministic

重要度 Medium Medium priority 重要度 Medium · 技術記事 · GitHub Copilot Medium priority · technical post · GitHub Copilot 公開 5月7日 Published May 7

AI要約 GitHubは、エージェント型AIの出力が非決定的である場合に、従来のテスト手法では品質保証が困難であることを指摘。LLM-as-a-judgeやシナリオベース評価、トレース分析など、確率的システムを継続的に検証するためのアプローチを紹介している。

EN How to build the “Trust Layer” for GitHub Copilot cloud agent without brittle scripts or black-box judgements by using dominatory analysis. The post Validating agentic behavior when “correct” isn’t de

#agent #copilot #github +5

github.blog →

og fallback

#testing 4 total

Entries page 1/1 · 4 total

構造的カバレッジ基準によるエージェントワークフローのテスト Testing Agentic Workflows with Structural Coverage Criteria

langchain-tests==1.1.9 リリース langchain-tests==1.1.9

langchain-tests バージョン 1.1.8 リリース langchain-tests==1.1.8

正解が一意に定まらないAIエージェントの挙動を検証する手法 Validating agentic behavior when “correct” isn’t deterministic