Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
TL;DR Summary
This survey analyzes memory mechanisms in AI, introducing a unified taxonomy categorizing memory as parametric and contextual, and defining six key operations. These operations inform research directions on long-term and multi-source memory, providing a structured dynamic perspec
Abstract
Memory is a fundamental component of AI systems, underpinning large language models (LLMs)-based agents. While prior surveys have focused on memory applications with LLMs (e.g., enabling personalized memory in conversational agents), they often overlook the atomic operations that underlie memory dynamics. In this survey, we first categorize memory representations into parametric and contextual forms, and then introduce six fundamental memory operations: Consolidation, Updating, Indexing, Forgetting, Retrieval, and Compression. We map these operations to the most relevant research topics across long-term, long-context, parametric modification, and multi-source memory. By reframing memory systems through the lens of atomic operations and representation types, this survey provides a structured and dynamic perspective on research, benchmark datasets, and tools related to memory in AI, clarifying the functional interplay in LLMs based agents while outlining promising directions for future research\footnote{The paper list, datasets, methods and tools are available at \href{https://github.com/Elvin-Yiming-Du/Survey_Memory_in_AI}{https://github.com/Elvin-Yiming-Du/Survey\_Memory\_in\_AI}.}.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
1.2. Authors
The paper is authored by a collaborative team of researchers:
- Yiming Du and Kam-Fai Wong: The Chinese University of Hong Kong.
- Wenyu Huang, Danna Zheng, Zhaowei Wang, and Mirella Lapata: The University of Edinburgh. (Zhaowei Wang is also affiliated with HKUST).
- Sebastien Montella and Jeff Z. Pan: Poisson Lab, CSI, Huawei UK R&D Ltd. (Jeff Z. Pan is the corresponding author and also affiliated with the University of Edinburgh).
1.3. Journal/Conference
This paper is a comprehensive survey published as a preprint on arXiv (v2, May 2025). Given the prestigious affiliations (University of Edinburgh, HKUST, CUHK) and the rigorous methodology involving over 30,000 papers, it is likely intended for a top-tier venue such as the Journal of Artificial Intelligence Research (JAIR) or IEEE TPAMI.
1.4. Publication Year
2025 (Original submission May 2025).
1.5. Abstract
Memory is the backbone of Large Language Model (LLM)-based agents. While previous research focuses on specific applications like personalized chatbots, they often ignore the "atomic operations"—the basic building blocks—of memory. This survey introduces a unified framework by categorizing memory into Parametric and Contextual forms. It defines six fundamental operations: Consolidation, Updating, Indexing, Forgetting, Retrieval, and Compression. By analyzing nearly 4,000 high-relevance papers using a novel Relative Citation Index (RCI), the authors map these operations to four key research topics: Long-term Memory, Long-context Memory, Parametric Modification, and Multi-source Memory.
1.6. Original Source Link
-
PDF Link: https://arxiv.org/pdf/2505.00675v2.pdf
-
GitHub Repository: https://github.com/Elvin-Yiming-Du/Survey_Memory_in_AI
2. Executive Summary
2.1. Background & Motivation
In the era of AI agents, "memory" is what allows a model to remember who you are, what you said ten minutes ago, and how to use external tools. However, the field is currently fragmented. Some researchers work on "long context" (making the prompt window bigger), while others work on "model editing" (changing facts inside the model).
The core problem is the lack of a unified architectural view. Most surveys focus on what memory does (e.g., personalization) rather than how the system handles information at a fundamental level. This paper aims to solve this by identifying atomic operations—the basic mathematical and procedural steps that govern how memory is created, stored, and used.
2.2. Main Contributions / Findings
-
Unified Taxonomy: Categorizes memory based on representation (where and how it is stored) into Parametric (inside weights) and Contextual (external data).
-
Six Atomic Operations: Establishes a standard vocabulary for memory dynamics: Consolidation, Indexing, Updating, Forgetting, Retrieval, and Compression.
-
Large-Scale Analysis: Developed a pipeline that analyzed 30,000+ papers, filtering down to 3,923 high-relevance works using a GPT-based relevance score.
-
Relative Citation Index (RCI): Introduced a time-normalized metric to identify influential papers fairly, regardless of whether they were published yesterday or three years ago.
-
Gap Identification: Highlighted the "Retrieval-Generation Mismatch" (where models find the right info but fail to use it) and the lack of dynamic evaluation benchmarks.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To understand this paper, a beginner needs to grasp these concepts:
- Large Language Model (LLM): An AI model (like GPT-4) trained on massive text to predict the next word.
- Parametric Memory: This is the knowledge "baked into" the model's brain (its weights) during training. Think of it as a person's general knowledge.
- Contextual Memory (or Non-Parametric Memory): Information provided in the "prompt" or retrieved from a database. Think of this as an open book the AI can read while answering a question.
- Context Window: The maximum number of words (tokens) an LLM can process at once.
- Retrieval-Augmented Generation (RAG): A technique where the AI looks up information in an external database before generating an answer.
3.2. Previous Works
The authors cite several sub-field surveys:
-
Long-context modeling: Focuses on extending the context window (e.g.,
LongNet). -
Knowledge Editing: Focuses on changing specific facts in a model (e.g.,
ROME,MEMIT). -
Personalization: Focuses on making AI behavior consistent for a specific user.
A crucial mechanism often mentioned is the Attention Mechanism. While the paper focuses on higher-level operations, the underlying math for
Retrievaloften relies on: $ \mathrm{Attention}(Q, K, V) = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V $ Where (Query) represents what you are looking for, (Keys) represents the "labels" of the stored information, and (Values) is the actual information. This is the foundational formula for how models "retrieve" context internally.
3.3. Differentiation Analysis
Unlike previous surveys that categorize memory by application (e.g., "Medical AI Memory"), this paper categorizes by functional interplay. It asks: "Is the model consolidating this info into its weights, or just compressing it into a shorter prompt?" This operational view allows researchers to compare wildly different technologies (like database indexing and neural weights) using the same framework.
4. Methodology
4.1. Principles
The paper's core principle is that Memory is a Lifecycle. Information isn't just "stored"; it must be transformed from a short-term experience into a long-term structure, organized for search, updated when it changes, and forgotten when it becomes obsolete.
The following figure (Figure 1 from the original paper) illustrates the overall framework:
该图像是一个示意图,展示了AI系统中记忆的统一框架,包括分类、操作和应用。图中列出了六种记忆操作:巩固、更新、索引、遗忘、检索和压缩,并将这些操作与长期、长上下文、参数修改和多源记忆等关键研究主题相映射。
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. Memory Taxonomy
The authors first divide memory into two types:
- Parametric Memory: Implicit knowledge in model weights. It is fast and "instant" but opaque and hard to update.
- Contextual Memory: Explicit external info.
- Unstructured: Raw text, images, or audio.
- Structured: Knowledge Graphs, tables, or ontologies (logic-based structures).
4.2.2. Memory Management Operations
These operations govern how memory is maintained over time.
1. Consolidation This is the process of turning short-term experiences into persistent memory. If an agent has experiences between time and , consolidation integrates them: $ \mathcal{M}_{t + \Delta_t} = \mathsf{Consolidate}(\mathcal{M}t, \mathcal{E}{[t, t + \Delta_t]}) $
- : Existing memory at time .
- : New experiences (dialogues, actions).
- : Updated persistent memory.
2. Indexing To find information, we need "access points" or codes . This might be a vector embedding or a keyword: $ \mathcal{T}_t = \mathrm{Index}(\mathcal{M}_t, \phi) $
- : The searchable index structure.
- : The auxiliary codes (e.g., entity names or vector representations).
3. Updating When information changes (e.g., a user changes their favorite color), the system must reactivate and modify memory using new knowledge : $ \mathcal{M}_{t + \Delta_t} = \mathrm{Update}(\mathcal{M}t, \mathcal{K}{t + \Delta_t}) $
4. Forgetting Suppressing or erasing content that is outdated or harmful: $ \mathcal{M}_{t + \Delta_t} = \mathrm{Forget}(\mathcal{M}_t, \mathcal{F}) $
4.2.3. Memory Utilization Operations
These operations describe how the model uses the memory during a task.
1. Retrieval Finding a fragment that is most similar to a query : $ \mathsf{Retrieve}(\mathcal{M}t, \mathcal{Q}) = m{\mathcal{Q}} \in \mathcal{M}t \quad \mathrm{with} \quad \mathrm{sim}(\mathcal{Q}, m{\mathcal{Q}}) \ge \tau $
- : A similarity function (like Cosine Similarity).
- : A threshold.
2. Compression Reducing the size of memory by a ratio to fit into a small context window: $ \mathcal{M}_t^{comp} = \mathrm{Compress}(\mathcal{M}_t, \alpha) $
The following table (Table 1 from the original paper) shows how these operations align with memory types:
| Operations | Parametric | Contextual (Structured & Unstructured) | |
|---|---|---|---|
| Consolidation | Continual Learning, Personalization | Management, Personalization | Management, Personalization |
| Indexing | Utilization | Utilization, Management | Utilization, Management, Multi-modal |
| Updating | Knowledge Editing | Cross-Textual, Personalization | Cross-Textual, Personalization |
| Forgetting | Knowledge Unlearning | Management | Management |
| Retrieval | Utilization, Efficiency | Utilization, Contextual Utilization | Utilization, Contextual Utilization |
| Compression | Parametric Efficiency | Contextual Utilization | Contextual Utilization |
5. Experimental Setup
5.1. Datasets
The authors categorized existing memory datasets into several types based on their purpose:
- Long-term Memory:
LoCoMo(20-30 turn dialogues),MemoryBank(user personality adaptation). - Long-context:
LongBench(evaluation for 32k+ token contexts),∞Bench(100k+ tokens). - Parametric Modification:
CounterFact(changing facts like "The Eiffel Tower is in Rome"),TOFU(unlearning fictitious author facts).
5.2. Evaluation Metrics (The Survey Pipeline)
The authors didn't just list papers; they evaluated the research landscape using a mathematical model for citation impact.
1. Relative Citation Index (RCI) Since older papers naturally have more citations, RCI normalizes the count based on the paper's age. The authors first modeled the relationship between citations () and age () using a log-log regression model: $ \log(C_i + 1) = \beta + \alpha \log A_i + \epsilon_i $
-
: Number of citations for paper .
-
: Age of paper (Current date - Year published).
-
: Parameters learned from the 3,932 paper dataset.
-
: Error term.
From this, they calculate the Expected Citations : $ \hat{C}_i = \exp(\hat{\beta}) A_i^{\hat{\alpha}} $
Finally, the RCI is: $ RCI_i = \frac{C_i}{\hat{C}_i} $
- Interpretation: If , the paper is performing above the median for its age.
5.3. Baselines
The authors compared the AI memory lifecycle against Human Memory (Cognitive Science) to highlight gaps. For example, humans have a "Slow Consolidation" (sleep-based) while AI has "Fast Consolidation" (policy-driven).
6. Results & Analysis
6.1. Core Results Analysis
The survey revealed several critical trends across the four topics:
6.1.1. Long-term Memory Trends
The authors found a significant Retrieval-Generation Mismatch. As shown in Figure 4 (data analysis of SOTA models), models are very good at retrieving the right memory (Recall@5 > 90%) but very bad at generating the right answer based on that memory (F1 score drops by 30+ points).
6.1.2. Long-context Memory Trends
There is a fierce trade-off between Compression Rate and Performance.
-
KV Cache Dropping: (e.g.,
H2O) achieves high compression but risks losing information. -
KV Cache Quantization: (e.g.,
KIVI) preserves more info but is limited by the "quadratic nature" of memory growth.The following figure (Figure 6) illustrates this trade-off:
该图像是一个图表,展示了基于压缩的方法在不同压缩比下的性能得分。不同方法的得分随着压缩比例的增加而变化,基线方法的得分始终高于其他方法。
6.1.3. Parametric Modification
The authors found that "Locating-then-Editing" (finding specific neurons) is the most popular method for small/medium models (), while "Prompt-based Editing" is the only scalable way for massive models like GPT-4.
6.2. Data Presentation (SOTA Comparison)
The following are the results from Figure 10 of the original paper, comparing different editing/unlearning techniques:
该图像是一个图表,展示了在CounterFact(编辑)、ZsRE(编辑)和TOFU(学习遗忘)基准下,各个类别的SOTA解决方案的评分。图表左侧展示了ZsRE与CounterFact的评分对比,右侧则是TOFU的结果,其中包含效能、概括性和特异性的评分信息。
-
Key Insight: Prompt-based methods (blue bars) currently achieve the highest success across benchmarks like
ZsREandCounterFact, but they don't actually change the model, they just "trick" it temporarily.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper provides a much-needed "map" for AI memory. By defining memory through atomic operations (Consolidation, Indexing, Updating, Forgetting, Retrieval, Compression), it allows us to see that an LLM agent is essentially a tiny operating system that needs to manage its "hard drive" (parametric weights) and its "RAM" (context window) efficiently.
7.2. Limitations & Future Work
The authors identify three major challenges:
- Unified Evaluation: We need benchmarks that test all operations (e.g., how well a model forgets vs. how well it learns).
- Conflict Resolution: When the model's internal "brain" says one thing but the external "book" says another, how does the agent decide who to trust?
- Efficiency vs. Expressivity: Can we have 1-million-token context windows that don't cost a fortune to run?
7.3. Personal Insights & Critique
This survey is exceptionally rigorous. The use of RCI to filter papers is a brilliant way to remove the "recency bias" or "celebrity bias" of certain papers.
Critique: While the taxonomy is strong, the "Multi-source Memory" section feels less developed than the others. The integration of audio/video memory into LLMs is still in its infancy, and the paper reflects this lack of mature SOTA solutions. Additionally, the "Retrieval-Generation Mismatch" mentioned in Section 3.1.4 is perhaps the most important finding—it suggests that simply giving AI more memory isn't enough; we need better "reasoning over memory" architectures.
Similar papers
Recommended via semantic vector search.