Latent Cache Flow：テキストを介さないモデル間通信 Latent Cache Flow: Model-to-Model Communication Without Text

arXiv cs.LG · arxiv.org · 2026/05/25 13:00 · 3w ago · 📖 1 min

AI 3 行サマリ

LLMエージェントがテキストではなくKVキャッシュを直接共有することで、レイテンシ削減と情報損失の低減を図る手法を提案。

English summary

A proposed method enabling LLM agents to communicate via shared KV caches rather than text, reducing autoregressive decoding latency and information loss between models.

現在のLLMエージェントはテキストを介して情報をやり取りするが、この方式では自己回帰的なデコードが必要なためレイテンシが大きく、情報損失も生じやすい。本論文「Latent Cache Flow」はKVキャッシュを直接モデル間で受け渡すことで、これらの課題を回避しようとするアプローチを提案している。

KVキャッシュを共有することでデコードステップを省略でき、潜在表現のままモデル間通信が可能になると考えられる。ただし実装上の詳細や実験結果の範囲については原文（arXiv:2605.22863）で確認することを推奨する。

Today's LLM agents exchange information through natural language text, a process that requires full autoregressive decoding by the sharing model and re-encoding by the receiving model. This introduces meaningful latency and risks losing nuance that exists in the model's internal representations but is difficult to express in discrete tokens.

The paper proposes "Latent Cache Flow," a framework in which agents share KV cache tensors directly rather than generated text, allowing the receiving model to attend to the sender's intermediate representations without a costly decode-then-re-encode round trip. This could be particularly relevant for multi-agent pipelines where many sequential handoffs occur.

The approach sits at the intersection of multi-agent LLM systems and efficient inference research. Specific benchmark results, architectural constraints, and compatibility requirements should be verified in the full paper at arXiv:2605.22863.

#arxiv #paper #llm-agents #kv-cache #multi-agent #inference-efficiency #latent-communication

SourcearXiv cs.LGT2
Source Avg ★ 2.0
Type論文
Importance ★ 通常 (top 93% in Papers / Benchmarks)
Half-life 🏛️ 長期 (アーキテクチャ)
LangEN
Collected2026/05/26 07:03

元記事を読む

arxiv.org

本ページの本文・要約は AI による自動生成です。正確性は元記事 (arxiv.org) をご確認ください。

🔬 Papers / Benchmarks の他の記事 もっと見る →

🔬 Papers / Benchmarks の他の記事もっと見る →