- Title: Multi-Interest Recommendation: A Survey
- Authors:
- ZHAO Lei (Wuhan University, China)
- QIANG CHEN (Tencent WeChat, China)
- LIXIN ZOU (Wuhan University, China)
- AIXIN SUN (Nanyang Technological University, Singapore)
- CHENLIANG LI (Wuhan University, China; Corresponding Author)
- Journal/Conference: This paper is a preprint available on arXiv. It has not yet been published in a peer-reviewed journal or conference. arXiv is a widely respected platform for researchers to share their work pre-publication.
- Publication Year: Submitted to arXiv on June 23, 2024.
- Abstract: The paper addresses the challenge of modeling users' complex and evolving preferences in recommender systems. It argues that traditional methods often fail because of diverse user behaviors and ambiguous item attributes. Multi-interest recommendation is presented as a solution that extracts multiple interest representations from a user's history, allowing for finer-grained modeling. The survey systematically reviews the field by answering three key questions: (1) Why is multi-interest modeling important? (2) What aspects does it focus on? and (3) How is it implemented? The paper aims to provide a foundational framework and overview for researchers, and it includes a link to a GitHub repository summarizing implementations.
- Original Source Link:
2. Executive Summary
This survey builds upon fundamental concepts in recommender systems and deep learning.
The evolution of the field is shown in Image 5, which tracks the rapid growth in publications on this topic since 2019, with models like MIND
and ComiRec
being highly cited.
该图像是包含三个子图的图表,分别展示了多兴趣推荐领域相关论文发表数量的累计增长及年度分布(左图),多兴趣推荐重要方法的引用次数及时间分布(中图),以及该领域研究中关键术语的词云展示(右图)。
4. Methodology (Core Technology & Implementation)
The paper structures the methodology of multi-interest recommendation into a coherent pipeline, as illustrated in Image 10. The process begins with a user's interaction history, from which a Multi-Interest Extractor derives multiple interest vectors. These vectors are then combined by a Multi-Interest Aggregator to generate a final recommendation.
该图像是一个示意图,展示了多兴趣推荐中分歧正则化的两大类别及其细分方法,包括表示正则化(如余弦相似度和对比学习)和分布正则化(如协方差正则化和元素级正则化)。
4.1. Definition of Multi-Interest Recommendation
The paper first formalizes standard recommendation tasks and then extends them to the multi-interest setting.
-
Click-Through Rate (CTR) Prediction: Given a user u and an item i, the goal is to predict the probability of a click. The model fθ learns this probability:
P(i∣u,R)=fθ(u,i,R)
where R is the matrix of historical user-item interactions.
-
Sequential Recommendation: Given a user's chronological interaction sequence Su={i1,i2,...,it}, the goal is to predict the next item it+1:
P(it+1∣i1,i2,...,it)=fθ(i1,i2,...,it)
-
Multi-Interest Recommendation: Traditional models learn a single user representation hu∈R1×d and an item representation xi∈R1×d, with the prediction score being their inner product: y^ui=huxi⊤.
Multi-interest recommendation extends this by learning a set of K interest vectors for the user, Hu=[hu1,hu2,...,huK]. A crucial design choice is how to use these multiple vectors to compute a final score. The paper identifies two main paradigms:
-
Representation Aggregation: The multiple interest vectors Hu are first fused into a single, context-aware user vector using an aggregation function ϕu(⋅). This final vector is then used for prediction.
y^ui=ϕu(Hu)ϕi(Xi)⊤
(Note: The paper also allows for multi-aspect item representations Xi, aggregated by ϕi(⋅).)
-
Recommendation Aggregation: A separate prediction score is calculated for each interest vector. These individual scores are then combined (e.g., by taking the maximum score) to get the final recommendation. This assumes the user will be interested if the item matches any of their interests.
y^ui=ϕ(HuXi⊤)
Here, ϕ(⋅) could be a max
or mean
operation over the scores from each of the K interests.
This module is responsible for generating the set of user interest vectors Hu from their historical item interactions {x1,...,xt}.
-
Dynamic Routing: This method, pioneered by the MIND
paper, adapts the dynamic routing algorithm from Capsule Networks.
- Intuition: It iteratively clusters historical item embeddings into K "capsules," where each capsule represents one user interest.
- Process (Algorithm 1): The algorithm refines the assignment of items to interests over several iterations. At each step, it calculates coupling coefficients cij (how much item i belongs to interest j) via a softmax over logits bij. The interest vector hj is a weighted sum of item vectors, which is then passed through a non-linear
squash
function. The logits are updated based on the agreement between the current interest vector and the item vectors.
- Squash Function: This function ensures that vectors representing strong interests have a magnitude close to 1, while weak interests are "squashed" towards 0.
hj=squash(sj)=1+∣∣sj∣∣2∣∣sj∣∣2∣∣sj∣∣sj
where sj is the weighted sum of item vectors for interest j.
-
Attention-Aware Extractor: This is a more common and flexible approach.
- Intuition: It uses learnable "interest seeds" or queries (ej) to attend to different parts of the user's interaction history. Each seed learns to focus on items related to a specific interest.
- Process: For each interest j, an attention weight wij is calculated for each historical item i. The interest vector hj is then the weighted sum of all item vectors.
hj=i=1∑twijxi
The weights wij are typically computed using a softmax function, where the score is based on the similarity between the interest seed ej and the item embedding xi:
wij=softmax(ejσ(Wjxi+bj)⊤)
where Wj and bj are learnable parameters, σ(⋅) is an activation function, and a temperature parameter τ can be used to control the sharpness of the attention distribution.
4.3. Multi-Interest Aggregator
Once multiple interest vectors are extracted, they need to be combined to make a final prediction. This corresponds to the two paradigms defined earlier.
Table 3: Classification of representative methods from two perspectives: Interest Extractor and Interest Aggregator. (Manual Transcription)
Interest Extractor |
Interest Aggregator |
Representative Methods |
Dynamic Routing |
Representation Aggregation |
Attention |
MIND [95], M2GNN [68], MDSR [28] |
Concat or Mean/Max Pooling |
M2GNN [68], MISD [100] |
|
Recommendation Aggregation |
Mean/Max Pooling |
MINER [99], ComiRec [21], REMI [212], MGNM [179], UMI [22] |
Attention |
Representation Aggregation |
Attention |
MINER [99], DisMIR [46] |
Concat or Mean/Max Pooling |
PENR [189], M2GNN [68] |
|
Recommendation Aggregation |
Mean/Max Pooling Attention Interest Selector with RL |
MINER [99], TimiRec [185], PIMI [25], ComiRec [21], CMI [94] MINER [99], MI-GNN [194] PIPM [143], M2GNN [68] REMIT [169] |
Iterative Attention |
Recommendation Aggregation |
Interest Selector with RL |
MIMCR [235] |
Non-linear Transformation |
Representation Aggregation |
Concat |
CKML [134] |
4.4. Multi-Interest Representation Regularization
A key challenge in multi-interest learning is representation collapse, where all extracted interest vectors become very similar to each other, defeating the purpose of multi-interest modeling. To prevent this, a regularization term is often added to the loss function to encourage diversity among the interest vectors.
Image 2 provides a high-level overview of these techniques, while Image 3 visually explains the intuition.

该图像是一个示意图,展示了多兴趣推荐中分歧正则化的两大类别及其细分方法,包括表示正则化(如余弦相似度和对比学习)和分布正则化(如协方差正则化和元素级正则化)。
-
Representation Regularization (Image 3a): This operates directly on the interest vectors hj to make them distinct.
- Cosine Similarity: A penalty term is added that minimizes the cosine similarity between every pair of interest vectors, pushing them apart in the embedding space.
Lreg=K21i=1∑Kj=1∑K∣∣hi∣∣∣∣hj∣∣hi⋅hj
- Contrastive Learning: Using a framework like InfoNCE, each interest vector is pulled closer to a "positive" version of itself (e.g., an augmented view) and pushed away from "negative" samples (the other interest vectors).
-
Distribution Regularization (Image 3b): This operates on the attention distributions or routing coefficients that generate the interest vectors, ensuring that different interests are formed from different subsets of items.
- Covariance Regularization: This encourages the routing coefficient vectors for different interests to be orthogonal. It penalizes the off-diagonal elements of the covariance matrix of the routing matrix C∈Rt×K.
Lreg=∣∣Cov(C,C)−diag(Cov(C,C))∣∣F2
(Note: The paper's formula is slightly different but the goal is the same: minimize the covariance between interest distributions).
- Element-Wise Regularization: A simpler method that penalizes the element-wise product (Hadamard product) of the attention weight matrices for different interests.
5. Applications and Public Datasets
Multi-interest modeling has been successfully applied across a wide range of recommendation scenarios, as shown in Image 4.
该图像是图表,展示了多兴趣推荐的应用场景,包含电影、新闻、课程、产品、旅游和营养六类推荐实例,直观体现多兴趣推荐在不同领域的具体应用。
The paper also provides a list of common public datasets used for evaluating these models.
Table 4: Public real-world datasets on different application scenarios. (Manual Transcription)
Applications |
Public Datasets |
News |
MIND |
Movies & Micro Videos |
MovieLens, KuaiShou, ReDial, TG-ReDial |
Online Travel and Check-In |
FourSquare, Fliggy, Yelp, Gowalla |
Online Shopping |
Amazon, Taobao, RetailRocket |
Online Education |
MOOCCube |
- Movie and Micro Video Recommendation: Models extract interests from genres, user reviews, or other metadata.
KuaiShou
and MovieLens
are popular datasets.
- News Recommendation: Interests can be short-term (breaking news) or long-term (topical preferences). The
MIND
dataset from Microsoft News is a standard benchmark.
- Life Services and Travel Recommendation: Spatial and temporal contexts are crucial. User interests can change based on location (e.g., business vs. vacation) and time (e.g., season). Datasets like
Yelp
and Gowalla
are used.
- Online Shopping Recommendation: User behavior is complex (e.g., clicks, add-to-cart, purchases). Models like
MIND
were first verified on large-scale e-commerce data from Taobao
. The Amazon
review dataset is also widely used.
- Online Course Recommendation: Helps learners find relevant courses on platforms like Coursera. The
MOOCCube
dataset is used for this purpose.
6. Challenges and Future Directions
The survey concludes by identifying key open challenges and promising directions for future research.
- Adaptive Multi-Interest Extraction: Most models use a fixed, pre-defined number of interests (K). A major challenge is to dynamically and adaptively determine the optimal number of interests for each user, as some users may have more diverse preferences than others.
- Efficiency in Multi-Interest Modeling: Multi-interest models are computationally more expensive than single-interest models due to the extraction and aggregation of multiple vectors. Iterative methods like dynamic routing can be particularly slow. Future work should focus on more efficient architectures, especially for real-world, large-scale deployment.
- Multi-Interest Extraction for Denoising: User interaction history is often noisy. Multi-interest extraction can be viewed as a denoising mechanism, as the attention or routing process can learn to ignore irrelevant or noisy items. This potential has not been fully explored.
- Explainability: Multi-interest models have great potential for explainability. By inspecting which items contribute to each interest vector, we can provide explanations like, "We recommend this movie because you have shown an interest in '90s action films'." The challenge is to align the learned latent interests with human-understandable concepts or aspects.
- Alleviating Long-Tail and Cold-Start Problems: Multi-interest modeling can help recommend less popular (long-tail) items by matching them to a user's niche interests. For new (cold-start) users, interests could potentially be transferred from other domains or inferred from limited interactions.
- Frontier Methodologies: The paper suggests exploring more advanced techniques like Reinforcement Learning (RL) for dynamic interest modeling and Large Language Models (LLMs) for leveraging rich textual information (e.g., reviews, item descriptions) to better understand and represent user interests.
7. Conclusion & Reflections
-
Conclusion Summary: "Multi-Interest Recommendation: A Survey" provides a much-needed, structured overview of a rapidly growing and critical subfield of recommender systems. It successfully demystifies the area by establishing a clear taxonomy, detailing the core technical components, summarizing key applications, and outlining future challenges. Its focus on the "why," "what," and "how" makes it an excellent starting point for anyone looking to enter the field.
-
Personal Insights & Critique:
- Strength: The paper's greatest strength is its clear and systematic organization. The breakdown into "Extractor," "Aggregator," and "Regularization" provides a powerful mental model for understanding and comparing different multi-interest approaches. The inclusion of application scenarios and public datasets is also highly practical.
- Clarity: The explanations are generally clear and well-supported by diagrams and formulas, making complex topics accessible to readers with a basic understanding of recommender systems.
- Limitation/Opportunity: While the survey is comprehensive, the field is evolving at an explosive pace, particularly with the advent of LLMs. The section on "Frontier Methodology" could be expanded. Future work could explore in more detail how pre-trained LLMs can serve as universal interest extractors, potentially revolutionizing the field in ways that go beyond the current GNN/attention-based paradigms.
- Overall Impact: This survey is a timely and valuable contribution. It successfully consolidates a fragmented body of work into a coherent narrative, establishing a common language and framework that will undoubtedly guide and accelerate future research in multi-interest recommendation.