#engineering — TECH Dashboard

/

Home›Tags›#engineering

#engineering page 1/1 · 12 total

TODAY 1 entries

NEW blog claude 33m ago · anthropic-engineering

AnthropicのClaude Code評価において、推論・コーディング・エージェント的タスクでモデル性能を測定する際、インフラ起因のノイ… Infrastructure Noise

AI要約 AnthropicのClaude Code評価において、推論・コーディング・エージェント的タスクでモデル性能を測定する際、インフラ起因のノイズ(ツールのタイムアウト、サンドボックス障害、レート制限等)が結果を歪める問題を分析。ノイズの特定・軽減手法を紹介し、信頼性の高いベンチマーク運用の重要性を論じる。

EN Anthropic discusses how infrastructure noise—tool timeouts, sandbox failures, rate limits, and flaky environments—can distort model evaluations for Claude Code, and shares techniques for identifying and mitigating such noise to produce reliable benchmarks.

#anthropic #engineering #evaluation #benchmarking

anthropic.com →

fallback

Sun, Apr 19 1 entries

NEW blog claude 2d ago · anthropic-engineering

Managed Agents Managed Agents

#anthropic #engineering

anthropic.com →

fallback

Fri, Apr 17 1 entries

NEW blog claude 4d ago · anthropic-engineering

Claude Code Auto Mode Claude Code Auto Mode

#anthropic #engineering

anthropic.com →

fallback

Wed, Apr 15 1 entries

NEW blog claude 6d ago · anthropic-engineering

Harness Design Long Running Apps Harness Design Long Running Apps

#anthropic #engineering

anthropic.com →

fallback

Mon, Apr 13 1 entries

NEW blog claude 1w ago · anthropic-engineering

Eval Awareness Browsecomp Eval Awareness Browsecomp

#anthropic #benchmark #engineering

anthropic.com →

fallback

Sat, Apr 11 1 entries

NEW blog claude 1w ago · anthropic-engineering

Building C Compiler Building C Compiler

#anthropic #engineering

anthropic.com →

fallback

Thu, Apr 9 1 entries

NEW blog claude 1w ago · anthropic-engineering

Demystifying Evals For Ai Agents Demystifying Evals For Ai Agents

#anthropic #engineering

anthropic.com →

fallback

Tue, Apr 7 1 entries

NEW blog claude 2w ago · anthropic-engineering

Effective Harnesses For Long Running Agents Effective Harnesses For Long Running Agents

#anthropic #engineering

anthropic.com →

fallback

Sun, Apr 5 1 entries

NEW blog claude 2w ago · anthropic-engineering

Advanced Tool Use Advanced Tool Use

#anthropic #engineering

anthropic.com →

fallback

Fri, Apr 3 1 entries

NEW blog claude 2w ago · anthropic-engineering

Code Execution With Mcp Code Execution With Mcp

#anthropic #engineering #mcp-server

anthropic.com →

fallback

Wed, Apr 1 1 entries

NEW blog claude 2w ago · anthropic-engineering

Claude Code Sandboxing Claude Code Sandboxing

#anthropic #engineering

anthropic.com →

fallback

Mon, Mar 30 1 entries

NEW blog claude 3w ago · anthropic-engineering

Equipping Agents For The Real World With Agent Skills Equipping Agents For The Real World With Agent Skills

#agent #anthropic #engineering

anthropic.com →

fallback