#software-engineering — TECH Dashboard

Entries page 1/1 · 6 total

Thu, May 28 1 entries

paper research 3w ago ·

arxiv-cs-se

ベンチマークだけでは不十分：本番システムにおけるエージェント型モデルのランタイム評価フレームワーク「RAMP」 Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月28日 Published May 28

AI要約 LLMエージェントの本番運用向けランタイム評価フレームワーク「RAMP」を提案。既存ベンチマークの限界を指摘し、実環境での継続的アセスメントを可能にする。

EN arXiv:2605.27492v1 Announce Type: new Abstract: LLM agents are rapidly evolving from coding assistants into autonomous software engineering systems. However, existing evaluation methodologies remain l

#agent #arxiv #benchmark +7

arxiv.org →

fallback

Wed, May 27 4 entries

paper research 3w ago ·

arxiv-cs-se

普遍的な崖とデザイン指紋：LLMオーケストレーション下のクロスセクション欠陥検出 A Universal Cliff and a Design Fingerprint: Cross-Section Defect Detection Under LLM Orchestration

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約 LLMが複数のワーカーエージェントに処理を分散する際に生じるクロスセクション欠陥を検出する研究。設計上の「指紋」パターンと性能崖の存在を報告。

EN This paper investigates defect detection across the invisible orchestration layer of production LLM systems, identifying a universal performance cliff and a recurring design fingerprint in multi-agent architectures.

#arxiv #paper #llm +5

arxiv.org →

fallback

paper research 3w ago ·

arxiv-cs-se

SetupX: LLMエージェントはコードリポジトリのセットアップ失敗から学習できるか？ SetupX: Can LLM Agents Learn from Past Failures in Functionality-Correct Code Repository Setup?

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約リポジトリの実行環境構成を正しく行うベンチマーク SetupX を提案し、LLMエージェントが過去の失敗から学習できるかを検証した研究。

EN SetupX is a benchmark studying whether LLM agents can learn from past failures to correctly configure execution environments for code repositories.

#arxiv #paper #llm-agents +4

arxiv.org →

fallback

paper research 3w ago ·

arxiv-cs-se

構造的カバレッジ基準によるエージェントワークフローのテスト Testing Agentic Workflows with Structural Coverage Criteria

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約マルチエージェントシステムのワークフロー構造（エージェント・ツール・委譲パス等）を活用した新しいテスト手法を提案する研究論文。

EN A research paper proposing structural coverage criteria for testing multi-agent workflows, leveraging explicit structures such as agents, tools, access rules, and delegation paths.

#agent #arxiv #benchmark +6

arxiv.org →

fallback

paper research 3w ago ·

arxiv-cs-se

TrajAudit: エージェント型コーディングシステムの障害自動診断 TrajAudit: Automated Failure Diagnosis for Agentic Coding Systems

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約バグ修正などを行うエージェント型AIシステムの失敗原因を自動診断するフレームワーク「TrajAudit」を提案した研究論文。

EN TrajAudit is a proposed framework for automated failure diagnosis in agentic coding systems such as AI-driven bug fixers, helping explain why tasks go wrong.

#agent #arxiv #paper +4

arxiv.org →

fallback

Sun, May 17 1 entries

blog copilot 4w ago ·

zenn-copilot

コーディングエージェント時代にエンジニアは必要なのか本気で考えてみるコーディングエージェント時代にエンジニアは必要なのか本気で考えてみる

重要度 Medium Medium priority 重要度 Medium · 技術記事 · GitHub Copilot Medium priority · technical post · GitHub Copilot 公開 5月17日 Published May 17

AI要約 GWにCodex・Claude Code・GitHub Copilotを使って個人開発を試みた筆者が、エンジニアの存在意義をコーディングエージェント普及の観点から真剣に考察した記事。

EN A developer reflects on using Codex, Claude Code, and GitHub Copilot during a long holiday to build a community-based SNS, then seriously examines whether software engineers remain necessary in the age of coding agents.

#copilot #zenn #coding-agent +5

zenn.dev →

fallback

#software-engineering 6 total

Entries page 1/1 · 6 total

ベンチマークだけでは不十分：本番システムにおけるエージェント型モデルのランタイム評価フレームワーク「RAMP」 Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

普遍的な崖とデザイン指紋：LLMオーケストレーション下のクロスセクション欠陥検出 A Universal Cliff and a Design Fingerprint: Cross-Section Defect Detection Under LLM Orchestration

SetupX: LLMエージェントはコードリポジトリのセットアップ失敗から学習できるか？ SetupX: Can LLM Agents Learn from Past Failures in Functionality-Correct Code Repository Setup?

構造的カバレッジ基準によるエージェントワークフローのテスト Testing Agentic Workflows with Structural Coverage Criteria

TrajAudit: エージェント型コーディングシステムの障害自動診断 TrajAudit: Automated Failure Diagnosis for Agentic Coding Systems

コーディングエージェント時代にエンジニアは必要なのか本気で考えてみる コーディングエージェント時代にエンジニアは必要なのか本気で考えてみる

コーディングエージェント時代にエンジニアは必要なのか本気で考えてみるコーディングエージェント時代にエンジニアは必要なのか本気で考えてみる