LLM拡張による階層的ログ異常分析の対話型フレームワーク Detect, Localize, and Explain: Interactive Hierarchical Log Anomaly Analytics with LLM Augmentation

arXiv cs.SE · arxiv.org · 2026/05/12 13:00 · 1d ago · 📖 1 min

AI 3 行サマリ

本論文は、ログ異常の検出・特定・説明を統合した階層的フレームワークを提案する。
LLMを活用して異常箇所の局所化と人間に理解可能な説明を生成し、運用者との対話的分析を可能にする点が特徴とされる。

English summary

This paper proposes a hierarchical framework that integrates log anomaly detection, localization, and explanation, leveraging LLMs to produce human-readable rationales and enable interactive analysis with operators.

大規模システムの可観測性において、ログ異常検知は長年の課題でありながら、検出後の根本原因特定や運用者への説明可能性は依然として弱点とされてきた。本論文は、検出・局所化・説明という三段階を一体化した階層的フレームワークを提案し、LLM(大規模言語モデル)を組み込むことで、異常の意味的解釈と対話的分析の実現を狙うものである。

従来のログ異常検知は、DeepLogやLogBERTに代表される系列モデルや埋め込みベースの手法が主流で、異常スコアの算出には強みがあった一方、どのトークン・テンプレート・時間窓が異常に寄与したかの局所化や、なぜ異常と判断したかの自然言語による説明は限定的であった。提案手法は、まずログ系列レベルで異常候補を絞り込み、続いてテンプレートやイベント単位へと粒度を下げて局所化を行い、最後にLLMが文脈情報と組み合わせて運用者向けの説明文を生成する構成と見られる。

対話性を備える点も特徴である。運用者が説明に追加質問を行い、関連ログや過去事例を引き出すような分析ループを想定しているとみられ、これはMicrosoftやMetaが研究で示してきたAIOpsの方向性とも整合的である。近年はOpenTelemetryの普及で構造化テレメトリが増え、LLMをログ解析に活用するRCACopilotやLogPromptといった研究も相次いでおり、本論文もその系譜上に位置づけられる。

LLMを活用して異常箇所の局所化と人間に理解可能な説明を生成し、運用者との対話的分析を可能にする点が特徴とされる。

🔬 Research · 本記事のポイント

一方で、LLMによる説明はハルシネーションのリスクや推論コスト、機微情報の取り扱いといった実運用上の制約を抱える。階層化によって入力トークンを抑制する設計は妥当だが、実環境での精度や応答性、既存SIEM/オブザーバビリティ基盤との統合性については、評価結果を慎重に確認する必要があるだろう。

Log anomaly detection has been a long-standing pillar of system observability, yet operators still struggle with two downstream problems: pinpointing where exactly an anomaly originates inside a noisy log stream, and understanding why a model flagged it. This paper proposes a hierarchical framework that unifies detection, localization, and explanation, and augments the pipeline with a large language model to produce human-readable rationales and support interactive triage.

Classical approaches such as DeepLog, LogAnomaly, and LogBERT focus primarily on sequence-level scoring, treating logs as token or template sequences and learning normal patterns through LSTMs or transformer encoders. While effective at producing anomaly scores, these models tend to be opaque: they rarely indicate which template, field, or time window drove the decision, and they offer little narrative that an SRE can act upon. The proposed system appears to address this gap by first narrowing down suspicious sequences, then drilling into finer-grained templates or events, and finally letting an LLM compose a contextual explanation drawing on the localized evidence.

The interactive dimension is arguably the most interesting design choice. Rather than emitting a static alert, the framework seems intended to support follow-up queries from operators, letting them ask for related events, historical precedents, or alternative hypotheses. This direction echoes the broader AIOps research agenda pursued at Microsoft, Meta, and several cloud vendors, where copilots increasingly mediate between raw telemetry and on-call engineers. Recent work such as RCACopilot and LogPrompt has shown that LLMs can meaningfully assist with incident triage when grounded in structured signals, and this paper fits squarely within that emerging line.

Hierarchical decomposition also helps with a very practical issue: token budgets. Feeding entire log streams into an LLM is infeasible at scale, so the layered approach of filtering, localizing, and only then explaining is a sensible way to bound context size and inference cost. It may also improve faithfulness, since the LLM is asked to reason over a small, pre-selected slice of evidence rather than freely hallucinate over megabytes of text.

Several caveats remain. LLM-generated explanations carry well-known risks of plausible-sounding but incorrect narratives, and grounding them in retrieved log evidence does not eliminate the problem. Operational deployment also raises concerns about latency, cost per incident, and sensitivity of log content, which often contains credentials, customer identifiers, or internal hostnames that organizations are reluctant to send to external models. Integration with existing SIEM and observability stacks such as Elastic, Splunk, or OpenTelemetry-based pipelines will likely determine whether such frameworks see real adoption, and the empirical sections of the paper should be read with those constraints in mind.

Overall, the contribution looks incremental rather than revolutionary, but it reflects a meaningful trend: log analytics is shifting from pure anomaly scoring toward explainable, conversational workflows where the model is expected not only to detect but also to justify and converse. Whether this particular hierarchy generalizes beyond the benchmark datasets typically used in the field, such as HDFS or BGL, remains to be seen.

#arxiv #paper #log-analysis #aiops #anomaly-detection #llm #observability

SourcearXiv cs.SET1
Source Avg ★ 1.1
Type論文
Importance ★ 情報 (top 100% in Research)
Half-life 🏛️ 長期 (アーキテクチャ)
LangEN
Collected2026/05/13 08:00

元記事を読む

arxiv.org

本ページの本文・要約は AI による自動生成です。正確性は元記事 (arxiv.org) をご確認ください。

🔬 Research の他の記事 もっと見る →

🔬 Research の他の記事もっと見る →