Paper status: completed

Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions

Published:05/02/2025
Original LinkPDF
Price: 0.100000
Price: 0.100000
Price: 0.100000
5 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This survey analyzes memory mechanisms in AI, introducing a unified taxonomy categorizing memory as parametric and contextual, and defining six key operations. These operations inform research directions on long-term and multi-source memory, providing a structured dynamic perspec

Abstract

Memory is a fundamental component of AI systems, underpinning large language models (LLMs)-based agents. While prior surveys have focused on memory applications with LLMs (e.g., enabling personalized memory in conversational agents), they often overlook the atomic operations that underlie memory dynamics. In this survey, we first categorize memory representations into parametric and contextual forms, and then introduce six fundamental memory operations: Consolidation, Updating, Indexing, Forgetting, Retrieval, and Compression. We map these operations to the most relevant research topics across long-term, long-context, parametric modification, and multi-source memory. By reframing memory systems through the lens of atomic operations and representation types, this survey provides a structured and dynamic perspective on research, benchmark datasets, and tools related to memory in AI, clarifying the functional interplay in LLMs based agents while outlining promising directions for future research\footnote{The paper list, datasets, methods and tools are available at \href{https://github.com/Elvin-Yiming-Du/Survey_Memory_in_AI}{https://github.com/Elvin-Yiming-Du/Survey\_Memory\_in\_AI}.}.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions

1.2. Authors

The paper is authored by a collaborative team of researchers:

  • Yiming Du and Kam-Fai Wong: The Chinese University of Hong Kong.
  • Wenyu Huang, Danna Zheng, Zhaowei Wang, and Mirella Lapata: The University of Edinburgh. (Zhaowei Wang is also affiliated with HKUST).
  • Sebastien Montella and Jeff Z. Pan: Poisson Lab, CSI, Huawei UK R&D Ltd. (Jeff Z. Pan is the corresponding author and also affiliated with the University of Edinburgh).

1.3. Journal/Conference

This paper is a comprehensive survey published as a preprint on arXiv (v2, May 2025). Given the prestigious affiliations (University of Edinburgh, HKUST, CUHK) and the rigorous methodology involving over 30,000 papers, it is likely intended for a top-tier venue such as the Journal of Artificial Intelligence Research (JAIR) or IEEE TPAMI.

1.4. Publication Year

2025 (Original submission May 2025).

1.5. Abstract

Memory is the backbone of Large Language Model (LLM)-based agents. While previous research focuses on specific applications like personalized chatbots, they often ignore the "atomic operations"—the basic building blocks—of memory. This survey introduces a unified framework by categorizing memory into Parametric and Contextual forms. It defines six fundamental operations: Consolidation, Updating, Indexing, Forgetting, Retrieval, and Compression. By analyzing nearly 4,000 high-relevance papers using a novel Relative Citation Index (RCI), the authors map these operations to four key research topics: Long-term Memory, Long-context Memory, Parametric Modification, and Multi-source Memory.

2. Executive Summary

2.1. Background & Motivation

In the era of AI agents, "memory" is what allows a model to remember who you are, what you said ten minutes ago, and how to use external tools. However, the field is currently fragmented. Some researchers work on "long context" (making the prompt window bigger), while others work on "model editing" (changing facts inside the model).

The core problem is the lack of a unified architectural view. Most surveys focus on what memory does (e.g., personalization) rather than how the system handles information at a fundamental level. This paper aims to solve this by identifying atomic operations—the basic mathematical and procedural steps that govern how memory is created, stored, and used.

2.2. Main Contributions / Findings

  1. Unified Taxonomy: Categorizes memory based on representation (where and how it is stored) into Parametric (inside weights) and Contextual (external data).

  2. Six Atomic Operations: Establishes a standard vocabulary for memory dynamics: Consolidation, Indexing, Updating, Forgetting, Retrieval, and Compression.

  3. Large-Scale Analysis: Developed a pipeline that analyzed 30,000+ papers, filtering down to 3,923 high-relevance works using a GPT-based relevance score.

  4. Relative Citation Index (RCI): Introduced a time-normalized metric to identify influential papers fairly, regardless of whether they were published yesterday or three years ago.

  5. Gap Identification: Highlighted the "Retrieval-Generation Mismatch" (where models find the right info but fail to use it) and the lack of dynamic evaluation benchmarks.


3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand this paper, a beginner needs to grasp these concepts:

  • Large Language Model (LLM): An AI model (like GPT-4) trained on massive text to predict the next word.
  • Parametric Memory: This is the knowledge "baked into" the model's brain (its weights) during training. Think of it as a person's general knowledge.
  • Contextual Memory (or Non-Parametric Memory): Information provided in the "prompt" or retrieved from a database. Think of this as an open book the AI can read while answering a question.
  • Context Window: The maximum number of words (tokens) an LLM can process at once.
  • Retrieval-Augmented Generation (RAG): A technique where the AI looks up information in an external database before generating an answer.

3.2. Previous Works

The authors cite several sub-field surveys:

  • Long-context modeling: Focuses on extending the context window (e.g., LongNet).

  • Knowledge Editing: Focuses on changing specific facts in a model (e.g., ROME, MEMIT).

  • Personalization: Focuses on making AI behavior consistent for a specific user.

    A crucial mechanism often mentioned is the Attention Mechanism. While the paper focuses on higher-level operations, the underlying math for Retrieval often relies on: $ \mathrm{Attention}(Q, K, V) = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V $ Where QQ (Query) represents what you are looking for, KK (Keys) represents the "labels" of the stored information, and VV (Values) is the actual information. This is the foundational formula for how models "retrieve" context internally.

3.3. Differentiation Analysis

Unlike previous surveys that categorize memory by application (e.g., "Medical AI Memory"), this paper categorizes by functional interplay. It asks: "Is the model consolidating this info into its weights, or just compressing it into a shorter prompt?" This operational view allows researchers to compare wildly different technologies (like database indexing and neural weights) using the same framework.


4. Methodology

4.1. Principles

The paper's core principle is that Memory is a Lifecycle. Information isn't just "stored"; it must be transformed from a short-term experience into a long-term structure, organized for search, updated when it changes, and forgotten when it becomes obsolete.

The following figure (Figure 1 from the original paper) illustrates the overall framework:

Figure 1: A unified framework of memory Taxonomy, Operations, and Applications in AI systems. 该图像是一个示意图,展示了AI系统中记忆的统一框架,包括分类、操作和应用。图中列出了六种记忆操作:巩固、更新、索引、遗忘、检索和压缩,并将这些操作与长期、长上下文、参数修改和多源记忆等关键研究主题相映射。

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Memory Taxonomy

The authors first divide memory into two types:

  1. Parametric Memory: Implicit knowledge in model weights. It is fast and "instant" but opaque and hard to update.
  2. Contextual Memory: Explicit external info.
    • Unstructured: Raw text, images, or audio.
    • Structured: Knowledge Graphs, tables, or ontologies (logic-based structures).

4.2.2. Memory Management Operations

These operations govern how memory is maintained over time.

1. Consolidation This is the process of turning short-term experiences into persistent memory. If an agent has mm experiences E\mathcal{E} between time tt and t+Δtt + \Delta_t, consolidation integrates them: $ \mathcal{M}_{t + \Delta_t} = \mathsf{Consolidate}(\mathcal{M}t, \mathcal{E}{[t, t + \Delta_t]}) $

  • Mt\mathcal{M}_t: Existing memory at time tt.
  • E[t,t+Δt]\mathcal{E}_{[t, t + \Delta_t]}: New experiences (dialogues, actions).
  • Mt+Δt\mathcal{M}_{t + \Delta_t}: Updated persistent memory.

2. Indexing To find information, we need "access points" or codes ϕ\phi. This might be a vector embedding or a keyword: $ \mathcal{T}_t = \mathrm{Index}(\mathcal{M}_t, \phi) $

  • Tt\mathcal{T}_t: The searchable index structure.
  • ϕ\phi: The auxiliary codes (e.g., entity names or vector representations).

3. Updating When information changes (e.g., a user changes their favorite color), the system must reactivate and modify memory using new knowledge K\mathcal{K}: $ \mathcal{M}_{t + \Delta_t} = \mathrm{Update}(\mathcal{M}t, \mathcal{K}{t + \Delta_t}) $

4. Forgetting Suppressing or erasing content F\mathcal{F} that is outdated or harmful: $ \mathcal{M}_{t + \Delta_t} = \mathrm{Forget}(\mathcal{M}_t, \mathcal{F}) $

4.2.3. Memory Utilization Operations

These operations describe how the model uses the memory during a task.

1. Retrieval Finding a fragment mQm_{\mathcal{Q}} that is most similar to a query Q\mathcal{Q}: $ \mathsf{Retrieve}(\mathcal{M}t, \mathcal{Q}) = m{\mathcal{Q}} \in \mathcal{M}t \quad \mathrm{with} \quad \mathrm{sim}(\mathcal{Q}, m{\mathcal{Q}}) \ge \tau $

  • sim\mathrm{sim}: A similarity function (like Cosine Similarity).
  • τ\tau: A threshold.

2. Compression Reducing the size of memory by a ratio α\alpha to fit into a small context window: $ \mathcal{M}_t^{comp} = \mathrm{Compress}(\mathcal{M}_t, \alpha) $

The following table (Table 1 from the original paper) shows how these operations align with memory types:

Operations Parametric Contextual (Structured & Unstructured)
Consolidation Continual Learning, Personalization Management, Personalization Management, Personalization
Indexing Utilization Utilization, Management Utilization, Management, Multi-modal
Updating Knowledge Editing Cross-Textual, Personalization Cross-Textual, Personalization
Forgetting Knowledge Unlearning Management Management
Retrieval Utilization, Efficiency Utilization, Contextual Utilization Utilization, Contextual Utilization
Compression Parametric Efficiency Contextual Utilization Contextual Utilization

5. Experimental Setup

5.1. Datasets

The authors categorized existing memory datasets into several types based on their purpose:

  • Long-term Memory: LoCoMo (20-30 turn dialogues), MemoryBank (user personality adaptation).
  • Long-context: LongBench (evaluation for 32k+ token contexts), ∞Bench (100k+ tokens).
  • Parametric Modification: CounterFact (changing facts like "The Eiffel Tower is in Rome"), TOFU (unlearning fictitious author facts).

5.2. Evaluation Metrics (The Survey Pipeline)

The authors didn't just list papers; they evaluated the research landscape using a mathematical model for citation impact.

1. Relative Citation Index (RCI) Since older papers naturally have more citations, RCI normalizes the count based on the paper's age. The authors first modeled the relationship between citations (CiC_i) and age (AiA_i) using a log-log regression model: $ \log(C_i + 1) = \beta + \alpha \log A_i + \epsilon_i $

  • CiC_i: Number of citations for paper ii.

  • AiA_i: Age of paper (Current date - Year published).

  • β,α\beta, \alpha: Parameters learned from the 3,932 paper dataset.

  • ϵi\epsilon_i: Error term.

    From this, they calculate the Expected Citations C^i\hat{C}_i: $ \hat{C}_i = \exp(\hat{\beta}) A_i^{\hat{\alpha}} $

Finally, the RCI is: $ RCI_i = \frac{C_i}{\hat{C}_i} $

  • Interpretation: If RCI1RCI \ge 1, the paper is performing above the median for its age.

5.3. Baselines

The authors compared the AI memory lifecycle against Human Memory (Cognitive Science) to highlight gaps. For example, humans have a "Slow Consolidation" (sleep-based) while AI has "Fast Consolidation" (policy-driven).


6. Results & Analysis

6.1. Core Results Analysis

The survey revealed several critical trends across the four topics:

The authors found a significant Retrieval-Generation Mismatch. As shown in Figure 4 (data analysis of SOTA models), models are very good at retrieving the right memory (Recall@5 > 90%) but very bad at generating the right answer based on that memory (F1 score drops by 30+ points).

There is a fierce trade-off between Compression Rate and Performance.

  • KV Cache Dropping: (e.g., H2O) achieves high compression but risks losing information.

  • KV Cache Quantization: (e.g., KIVI) preserves more info but is limited by the "quadratic nature" of memory growth.

    The following figure (Figure 6) illustrates this trade-off:

    Figure 6: Compression based method performance with respect to compression rate on LongBench (Bai et al., 2024). Data borrowed from Yuan et al. (2024). 该图像是一个图表,展示了基于压缩的方法在不同压缩比下的性能得分。不同方法的得分随着压缩比例的增加而变化,基线方法的得分始终高于其他方法。

6.1.3. Parametric Modification

The authors found that "Locating-then-Editing" (finding specific neurons) is the most popular method for small/medium models (20B\le 20\text{B}), while "Prompt-based Editing" is the only scalable way for massive models like GPT-4.

6.2. Data Presentation (SOTA Comparison)

The following are the results from Figure 10 of the original paper, comparing different editing/unlearning techniques:

Figure 10: SOTA solutions across different categories on the CounterFact (editing), ZsRE (editing) and TOFU (unlearning) benchmark. 该图像是一个图表,展示了在CounterFact(编辑)、ZsRE(编辑)和TOFU(学习遗忘)基准下,各个类别的SOTA解决方案的评分。图表左侧展示了ZsRE与CounterFact的评分对比,右侧则是TOFU的结果,其中包含效能、概括性和特异性的评分信息。

  • Key Insight: Prompt-based methods (blue bars) currently achieve the highest success across benchmarks like ZsRE and CounterFact, but they don't actually change the model, they just "trick" it temporarily.


7. Conclusion & Reflections

7.1. Conclusion Summary

This paper provides a much-needed "map" for AI memory. By defining memory through atomic operations (Consolidation, Indexing, Updating, Forgetting, Retrieval, Compression), it allows us to see that an LLM agent is essentially a tiny operating system that needs to manage its "hard drive" (parametric weights) and its "RAM" (context window) efficiently.

7.2. Limitations & Future Work

The authors identify three major challenges:

  1. Unified Evaluation: We need benchmarks that test all operations (e.g., how well a model forgets vs. how well it learns).
  2. Conflict Resolution: When the model's internal "brain" says one thing but the external "book" says another, how does the agent decide who to trust?
  3. Efficiency vs. Expressivity: Can we have 1-million-token context windows that don't cost a fortune to run?

7.3. Personal Insights & Critique

This survey is exceptionally rigorous. The use of RCI to filter papers is a brilliant way to remove the "recency bias" or "celebrity bias" of certain papers.

Critique: While the taxonomy is strong, the "Multi-source Memory" section feels less developed than the others. The integration of audio/video memory into LLMs is still in its infancy, and the paper reflects this lack of mature SOTA solutions. Additionally, the "Retrieval-Generation Mismatch" mentioned in Section 3.1.4 is perhaps the most important finding—it suggests that simply giving AI more memory isn't enough; we need better "reasoning over memory" architectures.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.