#speculative-decoding — TECH Dashboard

Entries page 1/1 · 2 total

Sun, May 31 1 entries

blog local-llm 2w ago ·

qiita-llm

LLM推論を最大2倍高速化するEAGLE 3.1 — attention driftを克服した最新スペキュラティブデコーディング EAGLE 3.1, released May 26 2026, addresses 'attention drift' in speculative decoding and a…

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 5月31日 Published May 31

AI要約 2026年5月26日に公開されたEAGLE 3.1は、スペキュラティブデコーディングの精度低下原因「attention drift」を解消し、vLLM公式ベンチマークでKimi K2.6のスループットを対EAGLE-3比2.03倍に向上させた。

EN EAGLE 3.1, released May 26 2026, addresses 'attention drift' in speculative decoding and achieves up to 2.03× throughput improvement over EAGLE-3 on Kimi K2.6, according to vLLM's official benchmarks.

#llm #qiita #speculative-decoding +4

qiita.com →

fallback

Mon, May 4 1 entries

blog gemini 1mo ago ·

google-developers

Google TPUでLLM推論を3倍高速化、拡散型投機デコードを採用 Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Gemini / Gemma Medium priority · technical post · Gemini / Gemma 公開 5月4日 Published May 4

AI要約 Googleは、TPU上でのLLM推論を高速化するため、拡散モデルに着想を得た投機的デコード手法を発表した。複数トークンを並列に予測・検証することで、最大3倍のスループット向上を実現したという。

EN Researchers at UCSD have successfully implemented DFlash, a block-diffusion speculative decoding method, on Google TPUs to bypass the sequential bottlenecks of traditional autoregressive drafting. By

#google #tpu #speculative-decoding +2

developers.googleblog.com →

Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding

og fallback

#speculative-decoding 2 total

Entries page 1/1 · 2 total

LLM推論を最大2倍高速化するEAGLE 3.1 — attention driftを克服した最新スペキュラティブデコーディング EAGLE 3.1, released May 26 2026, addresses 'attention drift' in speculative decoding and a…

Google TPUでLLM推論を3倍高速化、拡散型投機デコードを採用 Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding