Initializing viewer…

English Analysis

Details

1. Bibliographic Information

Title: Multi-Interest Recommendation: A Survey
Authors:
- ZHAO Lei (Wuhan University, China)
- QIANG CHEN (Tencent WeChat, China)
- LIXIN ZOU (Wuhan University, China)
- AIXIN SUN (Nanyang Technological University, Singapore)
- CHENLIANG LI (Wuhan University, China; Corresponding Author)
Journal/Conference: This paper is a preprint available on arXiv. It has not yet been published in a peer-reviewed journal or conference. arXiv is a widely respected platform for researchers to share their work pre-publication.
Publication Year: Submitted to arXiv on June 23, 2024.
Abstract: The paper addresses the challenge of modeling users' complex and evolving preferences in recommender systems. It argues that traditional methods often fail because of diverse user behaviors and ambiguous item attributes. Multi-interest recommendation is presented as a solution that extracts multiple interest representations from a user's history, allowing for finer-grained modeling. The survey systematically reviews the field by answering three key questions: (1) Why is multi-interest modeling important? (2) What aspects does it focus on? and (3) How is it implemented? The paper aims to provide a foundational framework and overview for researchers, and it includes a link to a GitHub repository summarizing implementations.
Original Source Link:
- arXiv Page: https://arxiv.org/abs/2506.15284
- PDF Link: https://arxiv.org/pdf/2506.15284v1.pdf

2. Executive Summary

Background & Motivation (Why):
- Core Problem: Traditional recommender systems often represent a user with a single interest vector. This is insufficient because real-world users have multiple, diverse, and dynamic interests. For example, a user might be interested in watching both sci-fi action movies and historical documentaries, two very different genres. A single vector struggles to capture this multifaceted nature.
- Importance & Gaps: As user interactions on platforms like e-commerce, streaming, and news sites become more complex, accurately modeling these diverse interests is crucial for providing relevant and engaging recommendations. Existing surveys on recommender systems are either too broad or focus on specific niche areas, leaving a gap for a dedicated, systematic review of the multi-interest modeling paradigm.
- Innovation: This paper provides the first comprehensive survey dedicated solely to multi-interest recommendation. It introduces a systematic framework to classify and analyze existing methods based on their tasks, modeling aspects, and technical components.
Main Contributions / Findings (What):
- Systematic Categorization: The paper proposes a novel taxonomy for multi-interest recommendation methods, organizing them by recommendation tasks (e.g., sequential recommendation, CTR prediction), modeling aspects (e.g., user-oriented vs. item-oriented), and core technical modules.
- Technical Deep-Dive: It breaks down the methodology of multi-interest models into two key components: the Multi-Interest Extractor (how multiple interests are generated) and the Multi-Interest Aggregator (how these interests are combined for a final recommendation). It provides detailed explanations of representative techniques like dynamic routing and attention mechanisms.
- Comprehensive Review: The survey covers the motivations, challenges (e.g., interest representation collapse, efficiency), and future research directions in the field, making it a valuable resource for both newcomers and experienced researchers.
- Resource Compilation: The authors have compiled a list of public datasets and maintain a GitHub repository with implementations of the summarized methods, facilitating future research and reproducibility.

This survey builds upon fundamental concepts in recommender systems and deep learning.

Foundational Concepts:
- Recommender Systems: AI systems that predict a user's preference for an item (e.g., movie, product, song) and suggest relevant items. They are essential for personalizing user experience on digital platforms.
- Representation Learning (Embeddings): A core technique in modern AI where users, items, and other features are converted into dense numerical vectors called "embeddings." The idea is that similar items or users will have vectors that are close to each other in a multi-dimensional space.
- Single-Interest vs. Multi-Interest Modeling:
  - Single-Interest: Represents the entire preference profile of a user with a single embedding vector. This vector is an aggregate of all their past interactions.
  - Multi-Interest: Represents a user with a set of embedding vectors, where each vector corresponds to a distinct interest (e.g., one vector for "sci-fi," another for "comedy"). This is the central theme of the paper.
- Sequential Recommendation: A task that aims to predict the next item a user will interact with based on the chronological sequence of their past interactions.
- Click-Through Rate (CTR) Prediction: A task that predicts the probability a user will click on an item when it is shown to them. This is crucial for online advertising and e-commerce.
- Attention Mechanism: A technique inspired by human cognition that allows a model to weigh the importance of different parts of the input. In this context, it's used to identify which past items are most relevant to a specific interest.
- Capsule Networks (CapsNet): A type of neural network designed to better model hierarchical relationships. In multi-interest recommendation, it's used via Dynamic Routing to group historical items into clusters, with each cluster representing a distinct user interest.
Previous Works and Differentiation: The paper acknowledges that many surveys on recommender systems already exist (summarized in its Table 1). However, it distinguishes itself by highlighting the limitations of prior work:
- Comprehensive Surveys: These provide a broad overview but lack depth on the specific techniques for multi-interest modeling.
- Task/Application-Oriented Surveys: These focus on specific domains like news or music recommendation but don't unify the underlying multi-interest methodologies across different applications.
- Methodology-Oriented Surveys: These focus on specific technologies like Graph Neural Networks (GNNs) or Reinforcement Learning (RL) but don't frame them within the specific problem of multi-interest modeling.
  
  This survey fills the gap by being the first to focus exclusively on Multi-Interest Recommendation as a distinct subfield, providing a unified framework that connects tasks, methods, and applications.

The evolution of the field is shown in Image 5, which tracks the rapid growth in publications on this topic since 2019, with models like MIND and ComiRec being highly cited.

该图像是包含三个子图的图表，分别展示了多兴趣推荐领域相关论文发表数量的累计增长及年度分布（左图），多兴趣推荐重要方法的引用次数及时间分布（中图），以及该领域研究中关键术语的词云展示（右图）。

4. Methodology (Core Technology & Implementation)

The paper structures the methodology of multi-interest recommendation into a coherent pipeline, as illustrated in Image 10. The process begins with a user's interaction history, from which a Multi-Interest Extractor derives multiple interest vectors. These vectors are then combined by a Multi-Interest Aggregator to generate a final recommendation.

该图像是一个示意图，展示了多兴趣推荐中分歧正则化的两大类别及其细分方法，包括表示正则化（如余弦相似度和对比学习）和分布正则化（如协方差正则化和元素级正则化）。

4.1. Definition of Multi-Interest Recommendation

The paper first formalizes standard recommendation tasks and then extends them to the multi-interest setting.

Click-Through Rate (CTR) Prediction: Given a user $u$ and an item $i$ , the goal is to predict the probability of a click. The model $f_\theta$ learns this probability: $P ( i | u , \mathbf { R } ) = f _ { \theta } ( u , i , \mathbf { R } )$ where $\mathbf{R}$ is the matrix of historical user-item interactions.
Sequential Recommendation: Given a user's chronological interaction sequence $S_u = \{i_1, i_2, ..., i_t\}$ , the goal is to predict the next item $i_{t+1}$ : $P ( i _ { t + 1 } | i _ { 1 } , i _ { 2 } , . . . , i _ { t } ) = f _ { \theta } ( i _ { 1 } , i _ { 2 } , . . . , i _ { t } )$
Multi-Interest Recommendation: Traditional models learn a single user representation $\mathbf{h}_u \in \mathbb{R}^{1 \times d}$ and an item representation $\mathbf{x}_i \in \mathbb{R}^{1 \times d}$ , with the prediction score being their inner product: $\hat{y}_{u_i} = \mathbf{h}_u \mathbf{x}_i^\top$ .

Multi-interest recommendation extends this by learning a set of $K$ interest vectors for the user, $\mathbf{H}_u = [\mathbf{h}_u^1, \mathbf{h}_u^2, ..., \mathbf{h}_u^K]$ . A crucial design choice is how to use these multiple vectors to compute a final score. The paper identifies two main paradigms:
1. Representation Aggregation: The multiple interest vectors $\mathbf{H}_u$ are first fused into a single, context-aware user vector using an aggregation function $\phi_u(\cdot)$ . This final vector is then used for prediction. $\hat { y } _ { u _ { i } } = \phi _ { u } ( \mathbf { H } _ { u } ) \phi _ { i } ( \mathbf { X } _ { i } ) ^ { \top }$ (Note: The paper also allows for multi-aspect item representations $\mathbf{X}_i$ , aggregated by $\phi_i(\cdot)$ .)
2. Recommendation Aggregation: A separate prediction score is calculated for each interest vector. These individual scores are then combined (e.g., by taking the maximum score) to get the final recommendation. This assumes the user will be interested if the item matches any of their interests. $\hat { y } _ { u _ { i } } = \phi ( \mathbf { H } _ { u } \mathbf { X } _ { i } ^ { \top } )$ Here, $\phi(\cdot)$ could be a max or mean operation over the scores from each of the $K$ interests.

4.2. Multi-Interest Extractor

This module is responsible for generating the set of user interest vectors $\mathbf{H}_u$ from their historical item interactions $\{\mathbf{x}_1, ..., \mathbf{x}_t\}$ .

Dynamic Routing: This method, pioneered by the MIND paper, adapts the dynamic routing algorithm from Capsule Networks.
- Intuition: It iteratively clusters historical item embeddings into $K$ "capsules," where each capsule represents one user interest.
- Process (Algorithm 1): The algorithm refines the assignment of items to interests over several iterations. At each step, it calculates coupling coefficients $c_{ij}$ (how much item $i$ belongs to interest $j$ ) via a softmax over logits $b_{ij}$ . The interest vector $\mathbf{h}_j$ is a weighted sum of item vectors, which is then passed through a non-linear squash function. The logits are updated based on the agreement between the current interest vector and the item vectors.
- Squash Function: This function ensures that vectors representing strong interests have a magnitude close to 1, while weak interests are "squashed" towards 0. $\mathbf { h } _ { j } = \mathrm { s q u a s h } ( \mathbf { s } _ { j } ) = \frac { | | \mathbf { s } _ { j } | | ^ { 2 } } { 1 + | | \mathbf { s } _ { j } | | ^ { 2 } } \frac { \mathbf { s } _ { j } } { | | \mathbf { s } _ { j } | | }$ where $\mathbf{s}_j$ is the weighted sum of item vectors for interest $j$ .
Attention-Aware Extractor: This is a more common and flexible approach.
- Intuition: It uses learnable "interest seeds" or queries ( $\mathbf{e}_j$ ) to attend to different parts of the user's interaction history. Each seed learns to focus on items related to a specific interest.
- Process: For each interest $j$ , an attention weight $w_i^j$ is calculated for each historical item $i$ . The interest vector $\mathbf{h}_j$ is then the weighted sum of all item vectors. $\mathbf { h } _ { j } = \sum _ { i = 1 } ^ { t } w _ { i } ^ { j } \mathbf { x } _ { i }$ The weights $w_i^j$ are typically computed using a softmax function, where the score is based on the similarity between the interest seed $\mathbf{e}_j$ and the item embedding $\mathbf{x}_i$ : $w _ { i } ^ { j } = \mathrm { s o f t m a x } \left( \mathbf { e } _ { j } \sigma ( \mathbf { W } _ { j } \mathbf { x } _ { i } + \mathbf { b } _ { j } ) ^ { \top } \right)$ where $\mathbf{W}_j$ and $\mathbf{b}_j$ are learnable parameters, $\sigma(\cdot)$ is an activation function, and a temperature parameter $\tau$ can be used to control the sharpness of the attention distribution.

4.3. Multi-Interest Aggregator

Once multiple interest vectors are extracted, they need to be combined to make a final prediction. This corresponds to the two paradigms defined earlier.

Representation Aggregation (Fuse-then-Predict):
- Concat or Pooling: A simple but effective method where all $K$ interest vectors are concatenated or pooled (mean/max) into a single vector, which may then be passed through an MLP.
- Attention-Aware Aggregation: An attention mechanism is used to compute a weighted sum of the interest vectors, where the weights depend on the target item. This allows the model to dynamically emphasize the most relevant interest for a given candidate item.
- Reinforcement Learning Selector: Some methods frame interest selection as a sequential decision process, using RL to learn a policy that selects the best interest representation to use at a given moment.
Recommendation Aggregation (Predict-then-Fuse):
- Mean/Max Pooling: The simplest approach. Calculate the recommendation score for each interest, and then take the mean or max of these scores as the final score for the item. The max operation is very common, as it captures the idea that a user will like an item if it strongly matches at least one of their interests.
- Attention-Aware Aggregation: Similar to the above, but the final score is a weighted average of the individual interest scores, where the weights depend on the relevance of each interest to the target item.
  
  The paper provides a summary of which representative models use which combination of extractor and aggregator in the table below.

Table 3: Classification of representative methods from two perspectives: Interest Extractor and Interest Aggregator. (Manual Transcription)

Interest Extractor		Interest Aggregator	Representative Methods
Dynamic Routing	Representation Aggregation	Attention	MIND [95], M2GNN [68], MDSR [28]
	Representation Aggregation	Concat or Mean/Max Pooling	M2GNN [68], MISD [100]
		Recommendation Aggregation	Mean/Max Pooling	MINER [99], ComiRec [21], REMI [212], MGNM [179], UMI [22]
Attention	Representation Aggregation	Attention	MINER [99], DisMIR [46]
	Representation Aggregation	Concat or Mean/Max Pooling	PENR [189], M2GNN [68]
		Recommendation Aggregation	Mean/Max Pooling Attention Interest Selector with RL	MINER [99], TimiRec [185], PIMI [25], ComiRec [21], CMI [94] MINER [99], MI-GNN [194] PIPM [143], M2GNN [68] REMIT [169]
Iterative Attention		Recommendation Aggregation	Interest Selector with RL	MIMCR [235]
Non-linear Transformation		Representation Aggregation	Concat	CKML [134]

4.4. Multi-Interest Representation Regularization

A key challenge in multi-interest learning is representation collapse, where all extracted interest vectors become very similar to each other, defeating the purpose of multi-interest modeling. To prevent this, a regularization term is often added to the loss function to encourage diversity among the interest vectors.

Image 2 provides a high-level overview of these techniques, while Image 3 visually explains the intuition.

Representation Regularization (Image 3a): This operates directly on the interest vectors $\mathbf{h}_j$ to make them distinct.
- Cosine Similarity: A penalty term is added that minimizes the cosine similarity between every pair of interest vectors, pushing them apart in the embedding space. $\mathcal { L } _ { r e g } = \frac { 1 } { K ^ { 2 } } \sum _ { i = 1 } ^ { K } \sum _ { j = 1 } ^ { K } \frac { \mathbf { h } _ { i } \cdot \mathbf { h } _ { j } } { \vert \vert \mathbf { h } _ { i } \vert \vert \vert \vert \mathbf { h } _ { j } \vert \vert }$
- Contrastive Learning: Using a framework like InfoNCE, each interest vector is pulled closer to a "positive" version of itself (e.g., an augmented view) and pushed away from "negative" samples (the other interest vectors).
Distribution Regularization (Image 3b): This operates on the attention distributions or routing coefficients that generate the interest vectors, ensuring that different interests are formed from different subsets of items.
- Covariance Regularization: This encourages the routing coefficient vectors for different interests to be orthogonal. It penalizes the off-diagonal elements of the covariance matrix of the routing matrix $\mathbf{C} \in \mathbb{R}^{t \times K}$ . $\mathcal { L } _ { r e g } = | | \mathrm { C o v } ( \mathbf { C } , \mathbf { C } ) - \mathrm { d i a g } ( \mathrm { C o v } ( \mathbf { C } , \mathbf { C } ) ) | | _ { F } ^ { 2 }$ (Note: The paper's formula is slightly different but the goal is the same: minimize the covariance between interest distributions).
- Element-Wise Regularization: A simpler method that penalizes the element-wise product (Hadamard product) of the attention weight matrices for different interests.

5. Applications and Public Datasets

Multi-interest modeling has been successfully applied across a wide range of recommendation scenarios, as shown in Image 4.

Fig. 12. Application scenarios of multi-interest recommendation. 该图像是图表，展示了多兴趣推荐的应用场景，包含电影、新闻、课程、产品、旅游和营养六类推荐实例，直观体现多兴趣推荐在不同领域的具体应用。

The paper also provides a list of common public datasets used for evaluating these models.

Table 4: Public real-world datasets on different application scenarios. (Manual Transcription)

Applications	Public Datasets
News	MIND
Movies & Micro Videos	MovieLens, KuaiShou, ReDial, TG-ReDial
Online Travel and Check-In	FourSquare, Fliggy, Yelp, Gowalla
Online Shopping	Amazon, Taobao, RetailRocket
Online Education	MOOCCube

Movie and Micro Video Recommendation: Models extract interests from genres, user reviews, or other metadata. KuaiShou and MovieLens are popular datasets.
News Recommendation: Interests can be short-term (breaking news) or long-term (topical preferences). The MIND dataset from Microsoft News is a standard benchmark.
Life Services and Travel Recommendation: Spatial and temporal contexts are crucial. User interests can change based on location (e.g., business vs. vacation) and time (e.g., season). Datasets like Yelp and Gowalla are used.
Online Shopping Recommendation: User behavior is complex (e.g., clicks, add-to-cart, purchases). Models like MIND were first verified on large-scale e-commerce data from Taobao. The Amazon review dataset is also widely used.
Online Course Recommendation: Helps learners find relevant courses on platforms like Coursera. The MOOCCube dataset is used for this purpose.

6. Challenges and Future Directions

The survey concludes by identifying key open challenges and promising directions for future research.

Adaptive Multi-Interest Extraction: Most models use a fixed, pre-defined number of interests ( $K$ ). A major challenge is to dynamically and adaptively determine the optimal number of interests for each user, as some users may have more diverse preferences than others.
Efficiency in Multi-Interest Modeling: Multi-interest models are computationally more expensive than single-interest models due to the extraction and aggregation of multiple vectors. Iterative methods like dynamic routing can be particularly slow. Future work should focus on more efficient architectures, especially for real-world, large-scale deployment.
Multi-Interest Extraction for Denoising: User interaction history is often noisy. Multi-interest extraction can be viewed as a denoising mechanism, as the attention or routing process can learn to ignore irrelevant or noisy items. This potential has not been fully explored.
Explainability: Multi-interest models have great potential for explainability. By inspecting which items contribute to each interest vector, we can provide explanations like, "We recommend this movie because you have shown an interest in '90s action films'." The challenge is to align the learned latent interests with human-understandable concepts or aspects.
Alleviating Long-Tail and Cold-Start Problems: Multi-interest modeling can help recommend less popular (long-tail) items by matching them to a user's niche interests. For new (cold-start) users, interests could potentially be transferred from other domains or inferred from limited interactions.
Frontier Methodologies: The paper suggests exploring more advanced techniques like Reinforcement Learning (RL) for dynamic interest modeling and Large Language Models (LLMs) for leveraging rich textual information (e.g., reviews, item descriptions) to better understand and represent user interests.

7. Conclusion & Reflections

Conclusion Summary: "Multi-Interest Recommendation: A Survey" provides a much-needed, structured overview of a rapidly growing and critical subfield of recommender systems. It successfully demystifies the area by establishing a clear taxonomy, detailing the core technical components, summarizing key applications, and outlining future challenges. Its focus on the "why," "what," and "how" makes it an excellent starting point for anyone looking to enter the field.
Personal Insights & Critique:
- Strength: The paper's greatest strength is its clear and systematic organization. The breakdown into "Extractor," "Aggregator," and "Regularization" provides a powerful mental model for understanding and comparing different multi-interest approaches. The inclusion of application scenarios and public datasets is also highly practical.
- Clarity: The explanations are generally clear and well-supported by diagrams and formulas, making complex topics accessible to readers with a basic understanding of recommender systems.
- Limitation/Opportunity: While the survey is comprehensive, the field is evolving at an explosive pace, particularly with the advent of LLMs. The section on "Frontier Methodology" could be expanded. Future work could explore in more detail how pre-trained LLMs can serve as universal interest extractors, potentially revolutionizing the field in ways that go beyond the current GNN/attention-based paradigms.
- Overall Impact: This survey is a timely and valuable contribution. It successfully consolidates a fragmented body of work into a coherent narrative, establishing a common language and framework that will undoubtedly guide and accelerate future research in multi-interest recommendation.