#llm page 2/4

og fallback

blog local-llm 2w ago ·

qiita-llm

ローカルLLMでコード補完してみた（Radeon RX 9600 XT 16GB＋Qwen2.5-coder） A hands-on report of running local LLM code completion using a newly acquired Radeon RX 96…

通常 Normal 深掘り候補 · 技術記事 · Local LLM / Open Models Deep-dive candidate · technical post · Local LLM / Open Models 公開 5月31日 Published May 31

AI要約新調したRadeon RX 9600 XT 16GBとQwen2.5-coderを組み合わせ、ローカル環境でのコード補完を試みた実践レポート。AMD Ryzen 5 4500・RAM 32GBという一般的なミドルレンジ構成でも、コード補完レベルのLLM推論が十分に動作することを確認している。

EN A hands-on report of running local LLM code completion using a newly acquired Radeon RX 9600 XT 16GB paired with Qwen2.5-coder on a mid-range AMD Ryzen 5 4500 system with 32GB RAM, confirming that consumer-grade hardware can handle inference for coding tasks.

#llm #qiita #amd-gpu +5

ローカルLLMでコード補完してみた(Radeon RX 9600 XT 16GB+Qwen2.5-coder)

og fallback

blog local-llm 2w ago ·

qiita-llm

LLM推論を最大2倍高速化するEAGLE 3.1 — attention driftを克服した最新スペキュラティブデコーディング EAGLE 3.1, released May 26 2026, addresses 'attention drift' in speculative decoding and a…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月31日 Published May 31

AI要約 2026年5月26日に公開されたEAGLE 3.1は、スペキュラティブデコーディングの精度低下原因「attention drift」を解消し、vLLM公式ベンチマークでKimi K2.6のスループットを対EAGLE-3比2.03倍に向上させた。

EN EAGLE 3.1, released May 26 2026, addresses 'attention drift' in speculative decoding and achieves up to 2.03× throughput improvement over EAGLE-3 on Kimi K2.6, according to vLLM's official benchmarks.

#llm #qiita #speculative-decoding +4

fallback

Fri, May 29 6 entries

🔥 HOT blog claude 2w ago ·

qiita-claude

Claude Opus 4.8 登場: 最新フラッグシップモデルの概要と移行ガイド JA Claude Opus 4.8 登場: 最新フラッグシップモデルの概要と移行ガイド

重要度 High High priority 重要度 High · 技術記事 · Claude / Claude Code High priority · technical post · Claude / Claude Code 公開 5月29日 Published May 29

AI要約はじめに Anthropic が Claude Opus 4.8 を発表しました。HackerNews では 1,600 以上のいいねを獲得し、開発者コミュニティから大きな注目を集めています。 Claude シリーズの最上位グレードである「

#claude #qiita #anthropic +4

Claude Opus 4.8 登場: 最新フラッグシップモデルの概要と移行ガイド

og fallback

🔥 HOT blog claude 3w ago ·

qiita-claude

Claude Opus 4.8 徹底解説：前バージョン・Sonnet 4.6との比較まとめ A detailed breakdown of Claude Opus 4.8, released by Anthropic on May 28 2026, comparing i…

重要度 High High priority 重要度 High · 技術記事 · Claude / Claude Code High priority · technical post · Claude / Claude Code 公開 5月29日 Published May 29

AI要約 Anthropicが2026年5月28日にリリースしたClaude Opus 4.8を、Opus 4.6・4.7およびSonnet 4.6と多角的に比較・解説した記事。

EN A detailed breakdown of Claude Opus 4.8, released by Anthropic on May 28 2026, comparing it against previous Opus versions and Sonnet 4.6 across key capabilities.

#claude #qiita #anthropic +4

Claude Opus 4.8 徹底解説：前バージョン・Sonnet 4.6との比較まとめ

og fallback

blog claude 3w ago ·

qiita-claude

気づいたらClaude Opus 4.8が登場していたので、新機能／今までとの違いをまとめてみた Anthropic released Claude Opus 4.8 just six weeks after Opus 4.7

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Claude / Claude Code Medium priority · technical post · Claude / Claude Code 公開 5月29日 Published May 29

AI要約 AnthropicがOpus 4.7から約6週間でOpus 4.8をリリース。新機能と前バージョンとの違いをQiita記事でまとめた内容を紹介する。

EN Anthropic released Claude Opus 4.8 just six weeks after Opus 4.7. This Qiita post summarizes the new features and key differences from the previous version.

#claude #qiita #claude-opus +4

気づいたらOpus 4.8が登場していたので、新機能／今までとの違いをまとめてみた

og fallback

blog tech-news 3w ago ·

ars-technica

明示的な警告後もLLMは誤った情報を信じ込む——研究が示すバイアスの根深さ LLMs believe false statements even after explicit warnings that they're false

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Industry & Policy Medium priority · technical post · Industry & Policy 公開 5月29日 Published May 29

AI要約ファインチューニング実験により、LLMは虚偽と明示されても誤情報を真実として自信を持って出力するバイアスがあることが判明した。

EN Fine-tuning tests show "bias... toward confidently representing the claims as true."

#ars-technica #news #llm +4

arstechnica.com →

LLMs believe false statements even after explicit warnings that they're false

og fallback

🔥 HOT blog tech-news 3w ago ·

ars-technica

AppleがGeminiの巨大モデルをiPhoneに搭載し、新Siriを刷新へ Apple working to cram massive Gemini model into iPhone to power new Siri

重要度 High High priority 重要度 High · 技術記事 · Industry & Policy High priority · technical post · Industry & Policy 公開 5月29日 Published May 29

AI要約 AppleがGoogleのGeminiモデルを蒸留・圧縮しiPhone上で動作させる取り組みを進めており、新しいSiriの基盤として活用を検討している。クラウド併用も見込まれる。

EN As Apple tries to shrink Gemini for the iPhone, a cloud component is probably inevitable.

#ars-technica #news #apple +5

arstechnica.com →

og fallback

🔥 HOT blog claude 3w ago ·

zenn-claude

Claude Opus 4.8 の新機能まとめと GPT-5.5 比較 — どの分野でどちらを使うべきか JA Claude Opus 4.8 の新機能まとめと GPT-5.5 比較 — どの分野でどちらを使うべきか

重要度 High High priority 重要度 High · 技術記事 · Claude / Claude Code High priority · technical post · Claude / Claude Code 公開 5月29日 Published May 29

AI要約 2026 年 5 月 28 日、Anthropic が Claude Opus 4.8（モデル ID: claude-opus-4-8）を発表しました。前モデル Opus 4.7 のわずか 41 日後というハイペースなアップデートです。同

#claude #zenn #gpt-5-5 +5

Claude Opus 4.8 の新機能まとめと GPT-5.5 比較 — どの分野でどちらを使うべきか

og fallback

Thu, May 28 16 entries

blog local-llm 3w ago ·

qiita-llm

Qwenってなあに JA Qwenってなあに

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月28日 Published May 28

AI要約 Qwenのアーキテクチャを論文から読み解く概要 Qwenは、Alibaba Cloud（阿里云）が開発するオープンウェイトLLMシリーズである。Qwen-1（2023年9月）からQwen-3（2025年4月）まで急速に進化し、Llama

#llm #open-model #qiita +5

fallback

release opencode 3w ago ·

openhands-releases

cloud-1.36.0 リリース cloud-1.36.0

重要度 Medium Medium priority 重要度 Medium · 公式リリース · OpenHands / OpenCode Medium priority · official release · OpenHands / OpenCode 公開 5月28日 Published May 28

AI要約 OpenHands SaaS版 cloud-1.36.0 がリリース。プロファイル更新時に旧来の設定からデフォルト LLM プロファイルを自動的に引き継ぐ機能が追加された。

EN feat(saas): seed Default LLM profile from legacy config on profiles u…

#openhands #release #llm +3

github.com →

media fallback

blog copilot 3w ago ·

zenn-copilot

GitHub Copilotのモデルに関して JA GitHub Copilotのモデルに関して

重要度 Medium Medium priority 重要度 Medium · 技術記事 · GitHub Copilot Medium priority · technical post · GitHub Copilot 公開 5月28日 Published May 28

AI要約背景最近、ほぼ毎日GitHub Copilotを使用している中で、GPTやClaude Sonnetのようなモデルがかなりあると思います。ただ自分自身それぞれのモデルがどの用途で使用するといいのかという判断が付いていないため、この機会に調

#copilot #zenn #github-copilot +5

fallback

paper research 3w ago ·

arxiv-cs-ai

テキストにおける人間の価値観の特定と理解：カスタマイズ可能なLLMベースアーキテクチャ Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月28日 Published May 28

AI要約自律的なAIシステムへの倫理統合を目的に、テキストから人間の価値観を抽出・分析するカスタマイズ可能なLLMアーキテクチャを提案した研究論文。

EN arXiv:2605.27373v1 Announce Type: new Abstract: As intelligent systems become more autonomous, the scientific community focuses on creating decision-making mechanisms that include ethical and moral co

#agent #arxiv #paper +5

og fallback

paper research 3w ago ·

arxiv-cs-ai

Soro: タジク語向け軽量基盤モデルとチャットボット Soro: A Lightweight Foundation Model and Chatbot for Tajik

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月28日 Published May 28

AI要約タジク語に特化した会話型LLMファミリー「Soro」を発表。計算資源が限られた環境での実用展開を想定した軽量設計。

EN arXiv:2605.27379v1 Announce Type: new Abstract: We present Soro, a family of Tajik-specialized conversational large language models (LLMs) designed for real-world deployment under tight compute and co

#arxiv #paper #llm +5

og fallback

paper research 3w ago ·

arxiv-cs-ai

LLMが因果発見に失敗する理由と介入エージェントによる解決策 Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月28日 Published May 28

AI要約 LLMが因果発見タスクで信頼性に欠ける理由を分析し、介入ベースのエージェントアプローチで課題を克服する方法を提案した論文。

EN arXiv:2605.27567v1 Announce Type: new Abstract: Causal discovery is a cornerstone of scientific reasoning, yet whether large language models can perform it reliably remains an open question. Recent be

#arxiv #paper #causal-discovery +4

og fallback

paper research 3w ago ·

arxiv-cs-ai

LaneRoPE: 協調並列推論・生成のための位置エンコーディング LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月28日 Published May 28

AI要約複数シーケンスを並列生成するLLMのテスト時スケーリングに向け、専用の位置エンコーディング手法LaneRoPEを提案した研究論文。

EN arXiv:2605.27570v1 Announce Type: new Abstract: Parallel LLM test-time scaling techniques (e.g., best-of-$N$) require drawing $N>1$ sequences conditioned on the same input prompt. These methods boost

#arxiv #paper #llm +5

og fallback

paper research 3w ago ·

arxiv-cs-ai

Laguna M.1/XS.2 テクニカルレポート Laguna M.1/XS.2 Technical Report

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月28日 Published May 28

AI要約長期的なエージェント型コーディング向けに設計されたMixture-of-Experts基盤モデル、Laguna M.1（2258億パラメータ）とXS.2を発表。

EN arXiv:2605.27605v1 Announce Type: new Abstract: We present Laguna M.1 and Laguna XS.2, two Mixture-of-Experts foundation models built for long-horizon, agentic coding: M.1 has $225.8$B total parameter

#agent #arxiv #paper +5

og fallback

paper research 3w ago ·

arxiv-cs-se

ベンチマークだけでは不十分：本番システムにおけるエージェント型モデルのランタイム評価フレームワーク「RAMP」 Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月28日 Published May 28

AI要約 LLMエージェントの本番運用向けランタイム評価フレームワーク「RAMP」を提案。既存ベンチマークの限界を指摘し、実環境での継続的アセスメントを可能にする。

EN arXiv:2605.27492v1 Announce Type: new Abstract: LLM agents are rapidly evolving from coding assistants into autonomous software engineering systems. However, existing evaluation methodologies remain l

#agent #arxiv #benchmark +7

fallback

paper research 3w ago ·

arxiv-cs-se

LLMによるWebアクセシビリティ修復：検出・修正・コストの実証研究 LLM Based Web Accessibility Repair: An Empirical Study of Detection, Remediation, and Cost

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月28日 Published May 28

AI要約 LLMを用いたWebアクセシビリティ問題の自動検出・修正を実証評価し、精度とコストのトレードオフを分析した研究論文。

EN arXiv:2605.27716v1 Announce Type: new Abstract: Ensuring web accessibility at scale remains challenging because rule-based tools provide limited coverage while manual remediation is costly and error-p

#arxiv #paper #accessibility +5

fallback

paper research 3w ago ·

arxiv-cs-se

DeltaMCP: MCPサーバー向けスペック対応変換による差分再生成 DeltaMCP: Incremental Regeneration via Spec-Aware Transformation for MCP servers

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月28日 Published May 28

AI要約 LLMとModel Context Protocol（MCP）の普及を背景に、APIとのインタラクションを効率化する差分再生成手法DeltaMCPを提案した研究論文。

EN arXiv:2605.28148v1 Announce Type: new Abstract: The rapid development of LLMs coupled with the introduction of Model Context Protocol (MCP) has revolutionized how intelligent agents interact with APIs

#arxiv #mcp-server #paper +5

fallback

paper research 3w ago ·

arxiv-cs-se

GUIエージェントによる継続的なゲーム生成 GUI Agents for Continual Game Generation

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月28日 Published May 28

AI要約コード生成だけでなく実際にプレイ可能なゲームを作るため、GUIエージェントを活用した継続的ゲーム生成手法を提案する研究。

EN arXiv:2605.28258v1 Announce Type: new Abstract: Generating a game is not the same as making one that can be played. Despite advances in code generation, existing approaches treat game generation as on

#arxiv #paper #gui-agents +5

fallback

blog local-llm 3w ago ·

qiita-llm

Pixel WatchでLLMを動かすGoogleのLiteRT-LM──オンデバイスAIの新ランタイム Google's LiteRT-LM runtime enables on-device LLM inference on constrained hardware like Pi…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月28日 Published May 28

AI要約 GoogleがエッジデバイスでLLMを効率実行するランタイム「LiteRT-LM」を公開。Pixel Watch 4のSmart ReplyやChromeの要約などがサーバ不要で動作する。

EN Google's LiteRT-LM runtime enables on-device LLM inference on constrained hardware like Pixel Watch, powering Smart Replies and Chrome summaries locally via Gemma models.

#llm #open-model #qiita +6

fallback

blog local-llm 3w ago ·

qiita-llm

iPhoneでローカルLLM、結局どのランタイムが速い？ MLX / llama.cpp / LiteRT-LM / CoreML を実機ベンチした A hands-on benchmark comparing four on-device LLM runtimes—MLX, llama.cpp, LiteRT-LM, and …

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月28日 Published May 28

AI要約 iPhone実機でMLX・llama.cpp・LiteRT-LM・CoreMLの4ランタイムをベンチマークし、ローカルLLMの推論速度を比較検証した記事。

EN A hands-on benchmark comparing four on-device LLM runtimes—MLX, llama.cpp, LiteRT-LM, and CoreML—running on a physical iPhone to determine which delivers the fastest inference.

#llm #open-model #qiita +6

fallback

blog tech-news 3w ago ·

techcrunch

GoogleのAIが「Google」のスペルすら正確に書けない理由 Why Google’s AI can’t spell Google (or anything else)

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Industry & Policy Medium priority · technical post · Industry & Policy 公開 5月28日 Published May 28

AI要約 GoogleのAIが自社名を含む単語のスペルを正しく出力できないという問題が再び注目を集めている。LLMの文字レベル処理の限界が背景にある。

EN Google's AI systems are reportedly unable to correctly spell even the word 'Google,' highlighting a well-known but persistent limitation of large language models in character-level tasks.

#news #techcrunch #llm +4

techcrunch.com →

fallback

blog local-llm 3w ago ·

zenn-llm

Gemma 4が4種類もあって混乱したので整理してみた！ A hands-on breakdown of the four Gemma 4 model variants, written by a developer exploring …

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月28日 Published May 28

AI要約 LM StudioでGemma 4を試した著者が、4つのバリアントの違いを速度・性能の観点から整理・解説した記事。

EN A hands-on breakdown of the four Gemma 4 model variants, written by a developer exploring open-weight LLMs via LM Studio for local self-hosting.

#llm #open-model #zenn +5

og fallback

Wed, May 27 5 entries

blog local-llm 3w ago ·

zenn-llm

格安AI＋人間は、米ハイエンドAIだけより安いのか JA 格安AI＋人間は、米ハイエンドAIだけより安いのか

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月27日 Published May 27

AI要約同じ作業をさせるなら、どちらが安いのか。米国 frontier モデルにそのまま投げる DeepSeek のような低価格モデルを使い、人間が判断・修正・検収する SignalBloom AI の記事「Outsourcing plus Lo

#llm #open-model #zenn +6

fallback

paper research 3w ago ·

arxiv-cs-cl

Self-Verified Distillation：言語モデルは密かに自分自身の合成データパイプラインである Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約外部教師なしでLLM自身がラベルなしプロンプトから合成データを生成・自己検証し、さらに性能を向上させる蒸留手法を提案した研究。

EN arXiv:2605.26132v1 Announce Type: new Abstract: Can post-trained large language models (LLMs) further improve themselves using only unlabeled prompts, without external teachers or feedback from tools?

#arxiv #paper #self-improvement +4

og fallback

paper research 3w ago ·

arxiv-cs-cl

大規模言語モデルにおける事前学習データ露出：メンバーシップ推定・データ汚染・セキュリティへの影響に関するサーベイ Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約 LLMの事前学習データ露出問題を包括的に調査。メンバーシップ推定攻撃、データ汚染、セキュリティリスクを体系的に整理したサーベイ論文。

EN arXiv:2605.26133v1 Announce Type: new Abstract: Large Language Models (LLMs) have become the predominant paradigm in NLP, advancing both research and industry. As model sizes and pretraining data grow

#arxiv #paper #llm +5

og fallback

paper research 3w ago ·

arxiv-cs-cl

SPEAR: コード拡張型エージェント的プロンプト最適化 SPEAR: Code-Augmented Agentic Prompt Optimization

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約自動プロンプトエンジニアリングにコード生成を組み合わせ、オプティマイザ自体をエージェント的に改善するSPEARフレームワークを提案。

EN arXiv:2605.26275v1 Announce Type: new Abstract: Automatic prompt engineering (APE) rewrites prompts to improve downstream task performance, but existing APE loops treat the optimizer itself as a fixed

#agent #arxiv #paper +4

og fallback

paper research 3w ago ·

arxiv-cs-cl

The Daily Dose：放射線腫瘍学における臨床要約と治験識別のためのワークフロー統合LLM自動化 The Daily Dose: Workflow-Integrated Large Language Model Automation for Clinical Summarization and Trial Identification in Radiation Oncology

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 5月27日 Published May 27

AI要約放射線腫瘍学向けにLLMを活用した臨床要約・治験マッチングシステム「The Daily Dose」の設計と初期臨床評価を報告。

EN arXiv:2605.26346v1 Announce Type: new Abstract: Objective: To describe the design and early clinical evaluation of The Daily Dose (TDD), an LLM-driven, automated clinical summarization and clinical-tr

#arxiv #benchmark #paper +6