#llm page 3/4

og fallback

paper research 3w ago ·

arxiv-cs-cl

アノテーター立場性をシグナルとして活用：反自閉症的エイブリズム検出のための心理測定的重み付け Annotator Positionality as Signal: Psychometric Weighting for Anti-Autistic Ableism Detection

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約 LLMが意思決定に使われる中、自閉症者への差別的言語検出にアノテーターの属性・立場を心理測定的に重み付けする手法を提案した研究。

EN arXiv:2605.26397v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used in decision-making tasks where they can amplify or suppress perspectives, raising concerns in high-st

#arxiv #paper #ableism +5

og fallback

paper research 3w ago ·

arxiv-cs-cl

ジャストインタイム適応フィードバックに向けて：知識基盤LLMによる学習支援 Towards Just-in-Time Adaptive Feedback: Enhancing Student Learning via Knowledge-Grounded LLM

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約 LLMを活用し、学習状況に応じたタイムリーな適応フィードバックを生成する教育支援手法を提案した研究論文。

EN arXiv:2605.26405v1 Announce Type: new Abstract: Educational interventions are effective tools for enhancing student learning. While Large Language Models (LLMs) allow for generating adaptive feedback

#arxiv #paper #education +5

og fallback

paper research 3w ago ·

arxiv-cs-ai

LLMは自己内省できるか？現実的な検証 Can LLMs Introspect? A Reality Check

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約大規模言語モデルが自身の内部状態を検出・報告できるかを批判的に検証した論文。先行研究の主張に異議を唱える。

EN A critical examination of whether LLMs can genuinely detect and report their own internal states, challenging prior studies that claimed they can.

#arxiv #paper #llm +4

og fallback

paper research 3w ago ·

arxiv-cs-ai

仮想実験室計画のためのLLM生成手続き知識における不確実性の管理 Managing Uncertainty in LLM-Generated Procedural Knowledge for Virtual Laboratory Planning

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約 LLMが生成する実験手順知識の不確実性を管理し、教育用仮想実験室をよりスケーラブルかつ適応的にする手法を提案した研究論文。

EN A research paper proposing methods to manage uncertainty in LLM-generated procedural knowledge, aiming to make educational virtual laboratories more scalable and adaptive.

#arxiv #paper #llm +5

og fallback

paper research 3w ago ·

arxiv-cs-se

VISTA: ビジュアル仕様からWebアプリ生成を評価するエンドツーエンドベンチマーク VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約 LLMエージェントがビジュアル仕様からWebアプリを生成する能力を評価するベンチマーク「VISTA」を提案。

EN VISTA is a new benchmark for evaluating LLM-based agents on end-to-end web-app generation from visual specifications.

#arxiv #benchmark #paper +5

og fallback

paper research 3w ago ·

arxiv-cs-se

ツールスキーマ圧縮により制約されたコンテキスト予算下でのAgentic RAGを実現 Tool-Schema Compression Enables Agentic RAG Under Constrained Context Budgets

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約多数のツール定義を持つAgentic RAGシステムで、ツールスキーマの圧縮によりコンテキスト制約問題を解決する手法を提案。

EN A new approach compresses tool schemas in agentic RAG systems to resolve the resource conflict between tool definitions and available context budget in LLMs.

#agent #arxiv #paper +5

og fallback

paper research 3w ago ·

arxiv-cs-se

普遍的な崖とデザイン指紋：LLMオーケストレーション下のクロスセクション欠陥検出 A Universal Cliff and a Design Fingerprint: Cross-Section Defect Detection Under LLM Orchestration

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約 LLMが複数のワーカーエージェントに処理を分散する際に生じるクロスセクション欠陥を検出する研究。設計上の「指紋」パターンと性能崖の存在を報告。

EN This paper investigates defect detection across the invisible orchestration layer of production LLM systems, identifying a universal performance cliff and a recurring design fingerprint in multi-agent architectures.

#arxiv #paper #llm +5

fallback

paper research 3w ago ·

arxiv-cs-se

Verus-SpecGym: 仕様の自動形式化を評価するエージェント環境 Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約 AIコーディングエージェントの出力の正しさを保証するため、仕様の自動形式化を評価するベンチマーク環境Verus-SpecGymを提案した研究論文。

EN Verus-SpecGym is a new agentic benchmark environment for evaluating how well AI agents can autoformalize software specifications, addressing correctness challenges in AI-generated code.

#agent #arxiv #paper +5

fallback

blog local-llm 3w ago ·

qiita-llm

LLaMAってなあに A Qiita article unpacking the LLaMA architecture from its paper, explaining how models ran…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月27日 Published May 27

AI要約 LLaMAのアーキテクチャを論文ベースで解説。7B〜65Bモデルを公開データのみで学習し、GPT-3やPaLM-540Bに匹敵する性能を実現した経緯をまとめた記事。

EN A Qiita article unpacking the LLaMA architecture from its paper, explaining how models ranging from 7B to 65B parameters were trained on public data to match GPT-3 and PaLM-540B.

#llm #open-model #qiita +5

fallback

blog claude 3w ago ·

zenn-claude

AnthropicとOpenAI、公式プロンプトベストプラクティスを徹底比較してみた A practical comparison of Anthropic's and OpenAI's official prompt engineering best practi…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Claude / Claude Code Medium priority · technical post · Claude / Claude Code 公開 5月27日 Published May 27

AI要約 ChatGPTとClaudeを業務で併用する現場向けに、両社公式のプロンプトベストプラクティスを比較・整理した実践的な解説記事。

EN A practical comparison of Anthropic's and OpenAI's official prompt engineering best practices, aimed at teams integrating both Claude and ChatGPT into real workflows.

#claude #zenn #prompt-engineering +4

fallback

🔥 HOT blog cursor 3w ago ·

qiita-cursor

Cursor Composer 2.5入門 — Opus 4.7と同等性能を1/10コストで実現する仕組み JA Cursor Composer 2.5入門 — Opus 4.7と同等性能を1/10コストで実現する仕組み

重要度 High High priority 重要度 High · 技術記事 · AI Editors High priority · technical post · AI Editors 公開 5月27日 Published May 27

AI要約はじめに 2026年5月18日、CursorはAIコーディングモデル「Composer 2.5」を発表しました。注目すべきは、Claude Opus 4.7とほぼ同等のベンチマーク性能を、約1/10のコストで実現している点です。SWE-b

#cursor #qiita #llm +6

Cursor Composer 2.5入門 — Opus 4.7と同等性能を1/10コストで実現する仕組み

og fallback

Tue, May 26 14 entries

blog local-llm 3w ago ·

zenn-llm

M5 Max のローカル LLM ベンチ — MoE は GPU 性能、Dense はメモリ帯域幅がボトルネック、発熱の影響も調査 A benchmark report on running local LLMs on Apple M5 Max, finding that MoE models are GPU-…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月26日 Published May 26

AI要約 M5 Max 上でローカル LLM を実行し、MoE モデルは GPU 演算、Dense モデルはメモリ帯域幅がそれぞれボトルネックになることを検証。発熱による性能低下も測定した後編レポート。

EN A benchmark report on running local LLMs on Apple M5 Max, finding that MoE models are GPU-bound while Dense models are memory-bandwidth-bound, with thermal throttling effects also measured.

#llm #open-model #zenn +6

M5 Max のローカル LLM ベンチ — MoE は GPU 性能、Dense はメモリ帯域幅がボトルネック、発熱の影響も調査

og fallback

blog local-llm 3w ago ·

qiita-llm

自称世界初！？社会シミュラクラ簡易デモ A lightweight social simulacra demo driven purely by prompts is introduced, allowing users…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月26日 Published May 26

AI要約プロンプトのみで動作する社会シミュレーション（シミュラクラ）の簡易デモを公開。ローカルLLMなしでも試せるおもちゃレベルの実装を紹介。

EN A lightweight social simulacra demo driven purely by prompts is introduced, allowing users to experiment with social simulation without requiring a local LLM setup.

#llm #qiita #social-simulation +4

og fallback

blog claude 3w ago ·

zenn-claude

Anthropicに学ぶエージェント「設計」と「評価」——複雑なフレームワークより、シンプルなパターン A practical guide drawing on Anthropic's insights to help teams design and evaluate AI age…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Claude / Claude Code Medium priority · technical post · Claude / Claude Code 公開 5月26日 Published May 26

AI要約 AIエージェントの本番運用で品質を維持するため、Anthropicの知見をもとにシンプルな設計パターンと評価手法を解説した実践的記事。

EN A practical guide drawing on Anthropic's insights to help teams design and evaluate AI agents using simple patterns rather than complex frameworks, addressing quality pitfalls in production.

#claude #zenn #llm +4

fallback

blog claude 3w ago ·

zenn-claude

AIエージェントのツール定義設計原則：スキーマ・命名・レスポンスの実践ガイド A practical guide covering seven design principles for AI agent tool definitions using JSO…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Claude / Claude Code Medium priority · technical post · Claude / Claude Code 公開 5月26日 Published May 26

AI要約 AIエージェント向けツール定義（JSON Schema）の設計原則7つを解説。命名・説明文・パラメータ設計の具体的なベストプラクティスを紹介する実践ガイド。

EN A practical guide covering seven design principles for AI agent tool definitions using JSON Schema, with concrete best practices for naming, descriptions, and parameter design.

#claude #zenn #llm +5

fallback

blog local-llm 3w ago ·

zenn-llm

Gemma 4 の MMLU-Pro スコアを NVIDIA B200 で再現する：ステップ・バイ・ステップガイド A step-by-step guide on reproducing Google Gemma 4 31B-IT's claimed ~85.2% MMLU-Pro score …

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月26日 Published May 26

AI要約 Google の Gemma 4 31B-IT が主張する MMLU-Pro 約 85.2% を NVIDIA B200 上で lm_eval を使って手元再現する手順を詳解した実践ガイド。

EN A step-by-step guide on reproducing Google Gemma 4 31B-IT's claimed ~85.2% MMLU-Pro score on NVIDIA B200 hardware using lm_eval, covering practical pitfalls beyond a single command.

#llm #open-model #zenn +6

fallback

blog local-llm 3w ago ·

zenn-llm

ik_llama.cpp を Windows でビルドして動かしてみる A practical guide to building ik_llama.cpp on Windows from source, covering a fork of llam…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月26日 Published May 26

AI要約ローカルLLM実行エンジン ik_llama.cpp を Windows 向けに自力ビルドする手順を解説。通常の llama.cpp より約3割高速とされる。

EN A practical guide to building ik_llama.cpp on Windows from source, covering a fork of llama.cpp reported to run local LLMs roughly 30% faster than the upstream project.

#llm #open-model #zenn +6

fallback

paper research 3w ago ·

arxiv-cs-cl

科学的仮説の自動生成のためのマルチペルソナ討論システム Multi-Persona Debate System for Automated Scientific Hypothesis Generation

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月26日 Published May 26

AI要約断片的な知識を統合して科学的仮説を自動生成するマルチペルソナ討論フレームワークをarXivで発表。

EN A multi-persona debate system is proposed to automate scientific hypothesis generation by synthesizing fragmented knowledge into actionable research directions.

#arxiv #paper #hypothesis-generation +4

og fallback

paper research 3w ago ·

arxiv-cs-ai

大規模言語モデルにおける信頼度キャリブレーション Confidence Calibration in Large Language Models

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月26日 Published May 26

AI要約 LLMの信頼度キャリブレーションを多様なタスクで調査した事前登録済み研究。モデルの自信度と実際の正確さの整合性を検証。

EN A preregistered study investigates how well large language models calibrate their expressed confidence across diverse tasks, examining alignment between stated certainty and actual accuracy.

#arxiv #paper #calibration +4

og fallback

paper research 3w ago ·

arxiv-cs-ai

どれだけ考えれば十分か？LLM推論における冗長性の定量化と理解 How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月26日 Published May 26

AI要約 LLMの長い思考チェーンに含まれる冗長性を定量化し、レイテンシ・GPU時間・エネルギーコストを削減する手法を研究した論文。

EN A research paper quantifying redundancy in LLM chain-of-thought reasoning, aiming to reduce latency, GPU time, and energy costs without sacrificing accuracy.

#arxiv #paper #chain-of-thought +4

og fallback

paper research 3w ago ·

arxiv-cs-ai

LLMを活用したエージェントワークフローの信頼性設計：レイテンシ・信頼性・コストのトレードオフ最適化 Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月26日 Published May 26

AI要約複数のLLMエージェントが連携するワークフローにおける、レイテンシ・信頼性・コストの三者トレードオフを最適化する設計手法を提案した研究論文。

EN A research paper proposing methods to optimize latency, reliability, and cost tradeoffs in agentic workflows composed of multiple interacting LLM-powered and conventional agents.

#agent #arxiv #paper +6

og fallback

blog claude 3w ago ·

qiita-claude

Obsidian + LLM + Basesで構築する「開発プロジェクトの外部脳」：プロパティ駆動型の案件管理術 Obsidian + LLM + Basesで構築する「開発プロジェクトの外部脳」：プロパティ駆動型の案件管理術

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Claude / Claude Code Medium priority · technical post · Claude / Claude Code 公開 5月26日 Published May 26

AI要約 ObsidianとLLM、Basesを組み合わせ、SI案件のプロパティ駆動型管理システムを構築する手法を解説した実践的記事。

EN A practical guide on building a property-driven project management system for SI projects using Obsidian, LLMs, and Bases as an 'external brain'.

#claude #qiita #obsidian +5

Obsidian + LLM + Basesで構築する「開発プロジェクトの外部脳」：プロパティ駆動型の案件管理術

og fallback

🔥 HOT blog claude 3w ago ·

qiita-claude

Anthropic、初の黒字四半期を達成へ――AI企業の収益化フェーズが本格化 Anthropic、初の黒字四半期を達成へ――AI企業の収益化フェーズが本格化

重要度 High High priority 重要度 High · 技術記事 · Claude / Claude Code High priority · technical post · Claude / Claude Code 公開 5月26日 Published May 26

AI要約長らく赤字が続いていたAnthropicが四半期ベースでの黒字達成に近づき、生成AI企業の収益化フェーズが本格化しつつある。

EN Anthropic is reportedly approaching its first profitable quarter, signaling that the generative AI industry's monetization phase may be gaining real traction.

#claude #qiita #llm +5

og fallback

blog local-llm 3w ago ·

qiita-llm

ローカルLLM実行の実践：量子化とメモリ最適化のトレードオフを学ぶローカルLLM実行の実践：量子化とメモリ最適化のトレードオフを学ぶ

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月26日 Published May 26

AI要約ローカル環境でLLMを動かす際の量子化手法とメモリ最適化の選択肢を整理し、リソース制約とモデル精度のトレードオフを解説した実践的記事。

EN A practical Qiita article exploring quantization techniques and memory optimization strategies for running LLMs locally, examining the tradeoffs between resource constraints and model quality.

#llm #qiita #quantization +4

og fallback

blog local-llm 3w ago ·

zenn-llm

mdx MaaSのAPIでLLM-jp-4を使う第2回：文章の要約と情報の抽出 mdx MaaSのAPIでLLM-jp-4を使う第2回：文章の要約と情報の抽出

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月26日 Published May 26

AI要約東京大学鈴村研究室が運用するmdx MaaSのAPIを通じてLLM-jp-4を活用し、文章の要約や構造化情報抽出を行う方法を解説する連載第2回。

EN Second installment of a series on using LLM-jp-4 via the mdx MaaS API, focusing on practical text summarization and structured information extraction techniques.

#llm #zenn #llm-jp +5

og fallback

Mon, May 25 4 entries

paper research 3w ago ·

arxiv-cs-lg

残差から理由へ：表形式データにおけるLLM誘導メカニズム推論 From Residuals to Reasons: LLM-Guided Mechanism Inference from Tabular Data

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月25日 Published May 25

AI要約 LLMを活用して表形式データの統計的残差から因果メカニズムを推論する手法を提案。予測と理解の両立を目指す研究。

EN A new method uses LLMs to infer causal mechanisms from model residuals in tabular data, aiming to bridge predictive accuracy and scientific interpretability.

#arxiv #paper #llm +4

fallback

paper research 3w ago ·

arxiv-cs-lg

MARGIN: マルチエージェント基盤モデル協調のためのランタイム信頼度キャリブレーション MARGIN: Runtime Confidence Calibration for Multi-Agent Foundation Model Coordination

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月25日 Published May 25

AI要約複数の基盤モデルエージェントが協調する環境で、コーディネーターが各エージェントの応答をどれだけ信頼すべきかを実行時にキャリブレーションする手法MARGINを提案。

EN MARGIN proposes a runtime confidence calibration method for multi-agent deployments, helping a coordinator decide which foundation model agent's response to trust.

#agent #arxiv #paper +5

fallback

blog cursor 3w ago ·

zenn-cursor

Cursor Agent × Rubydex MCP 本当にトークン節約になる？ Rails モノリスで検証してみた Cursor Agent × Rubydex MCP 本当にトークン節約になる？ Rails モノリスで検証してみた

重要度 Medium Medium priority 重要度 Medium · 技術記事 · AI Editors Medium priority · technical post · AI Editors 公開 5月25日 Published May 25

AI要約こんにちは！ Hubble でバックエンドエンジニアをしている @koyakota です！ RubyKaigi 2026 で取り上げられた Rubydex、Cursor の Agent からも使えるらしいので試してみました。トークンは減る

#agent #cursor #mcp-server +7

fallback

blog mcp 3w ago ·

zenn-mcp

AI技術情報が多すぎるので、ObsidianをAIの外部記憶にして情報収集を自動化した AI技術情報が多すぎるので、ObsidianをAIの外部記憶にして情報収集を自動化した

重要度 Medium Medium priority 重要度 Medium · 技術記事 · MCP / Tooling Medium priority · technical post · MCP / Tooling 公開 5月25日 Published May 25

AI要約 LLMやMCPなど急増するAI技術情報をObsidianと連携させ、AIを外部記憶として活用することで情報収集・整理を自動化する手法を解説した実装記事。

EN A practical guide on using Obsidian as an AI-powered external memory system to automate the collection and organization of rapidly expanding AI and LLM-related technical information.

#mcp #mcp-server #zenn +5