コンパニオンAIの記憶を、普通のRAGじゃない設計にした話 A developer shares an experimental memory architecture for a companion AI that departs fro…

Zenn LLM tag · zenn.dev · 2026/05/12 11:05 · 1d ago · 📖 2 min

AI 3 行サマリ

コンパニオンAI向けに、汎用的なRAGではなく対話の文脈や感情を保持する独自の記憶設計を試みた実装報告。
ベクトル検索の単純流用では会話の連続性や関係性の蓄積が損なわれる課題を指摘し、構造化された記憶レイヤーを設けるアプローチを紹介している。

English summary

A developer shares an experimental memory architecture for a companion AI that departs from generic RAG, arguing that plain vector retrieval fails to preserve emotional context and conversational continuity, and proposes a structured memory layer instead.

コンパニオンAI、すなわち雑談や寄り添いを目的とする対話エージェントを作る上で、記憶の設計は単なる検索精度以上の意味を持つ。本記事は、よくあるRAG(Retrieval-Augmented Generation)構成をそのまま流用するのではなく、関係性や感情の蓄積を意識した独自設計を試みた開発者の知見を共有するものだ。

一般的なRAGは、ユーザーの発話をベクトル化して類似文書を引き出し、プロンプトに差し込む流れで動く。FAQ応答や知識ベース参照には強力だが、コンパニオン用途では会話の連続性が失われやすいとされる。例えば「昨日の続きを話したい」といった文脈依存の発話に対し、単純な類似度検索では時間軸や話題のつながりを再現しにくい、というのが筆者の問題意識のようだ。

記事では、記憶を単一のベクトルストアに押し込むのではなく、エピソード単位や属性単位に分けて保持する構造化アプローチが示唆されている。ユーザーの嗜好、過去の出来事、関係性の変化などをレイヤーで分けて扱うことで、検索ではなく「思い出す」挙動に近づける狙いがあると見られる。

ベクトル検索の単純流用では会話の連続性や関係性の蓄積が損なわれる課題を指摘し、構造化された記憶レイヤーを設けるアプローチを紹介している。

🏠 Local LLM · 本記事のポイント

関連する取り組みとしては、MemGPTやLetta、LangChainのMemoryモジュール、あるいはCharacter.AIのような商用コンパニオン系サービスでも、長期記憶と短期記憶を分離する設計が広く議論されている。要約による圧縮、知識グラフ的な関係抽出、感情タグ付けなどの手法が組み合わされる例が多く、本記事の方針もその系譜に位置づけられる可能性がある。

ローカルLLM環境で動かす場合、記憶の構造化はコンテキスト長の制約緩和にも寄与する。すべてを毎回プロンプトへ詰め込まずに済むため、小型モデルでも一貫した人格表現が成立しやすくなる。一方で、記憶の更新タイミングや矛盾解消、プライバシー上の扱いといった運用課題も残る。コンパニオンAIの設計はまだ確立された定石が少ない領域であり、こうした実装知の共有が分野全体の成熟に寄与していくと考えられる。

Designing memory for a companion AI, an agent meant for casual conversation and emotional presence, is more than a retrieval-accuracy problem. This article shares one developer's experiment in building a memory system that deliberately avoids the standard RAG pattern, arguing that a generic vector store does not capture what makes a companion feel continuous and personal.

A typical RAG pipeline embeds the user's utterance, retrieves the most similar chunks from a vector database, and injects them into the prompt. That works well for FAQ-style or knowledge-grounded assistants, but the author suggests it struggles for companion use cases. When a user says something like "let's continue yesterday's conversation," pure semantic similarity has a hard time reconstructing temporal flow or topic threads, because nearest-neighbor search has no native notion of episode or relationship.

Instead of cramming everything into one embedding index, the post points toward a structured approach: separating memories by type, such as episodic events, user preferences, and evolving relational state. The intent, as far as can be inferred, is to move from "retrieval" toward something closer to "recollection," where the agent surfaces memories based on contextual cues rather than raw cosine similarity.

This line of thinking is not isolated. Projects like MemGPT and Letta have explored hierarchical memory with explicit working and long-term tiers, while LangChain and LlamaIndex expose memory abstractions that combine summarization, entity tracking, and vector recall. Commercial companion services such as Character.AI and Replika are widely believed to use similar layered architectures, often pairing compressed summaries with structured user profiles and emotion tags. The design discussed here appears to sit within that broader tradition.

For local LLM deployments, structured memory has a practical payoff beyond fidelity. By keeping persistent state outside the prompt and only injecting what is relevant, developers can maintain a coherent persona even with smaller context windows. That matters when running 7B or 13B class models on consumer hardware, where every token of context competes with reasoning headroom. A well-designed memory layer effectively extends the model's apparent capacity without forcing a larger base model.

That said, structured memory introduces its own difficulties. Deciding when to write, when to update, and how to reconcile contradictions, for example when a user changes their stated preferences, is non-trivial. There are also privacy considerations: a companion that remembers intimate details indefinitely raises questions about retention policies, export, and deletion that the broader industry is still working through. Hallucinated memories, where the model confidently recalls events that never happened, remain a known failure mode that purely vector-based systems tend to amplify.

Companion AI design is still a young field without settled best practices, and much of the progress so far has come from individual developers sharing implementation notes rather than from formal benchmarks. Posts like this one contribute to that emerging body of practical knowledge, and the proposed direction, treating memory as a structured, multi-layered subsystem rather than a single retrieval call, is consistent with where the wider ecosystem appears to be heading.