Controllable Multi-Interest Framework for Recommendation
TL;DR Summary
This paper introduces ComiRec, a controllable multi-interest framework that captures diverse user interests and balances recommendation accuracy and diversity, outperforming state-of-the-art models on Amazon and Taobao datasets.
Abstract
Recently, neural networks have been widely used in e-commerce recommender systems, owing to the rapid development of deep learning. We formalize the recommender system as a sequential recommendation problem, intending to predict the next items that the user might be interacted with. Recent works usually give an overall embedding from a user's behavior sequence. However, a unified user embedding cannot reflect the user's multiple interests during a period. In this paper, we propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec. Our multi-interest module captures multiple interests from user behavior sequences, which can be exploited for retrieving candidate items from the large-scale item pool. These items are then fed into an aggregation module to obtain the overall recommendation. The aggregation module leverages a controllable factor to balance the recommendation accuracy and diversity. We conduct experiments for the sequential recommendation on two real-world datasets, Amazon and Taobao. Experimental results demonstrate that our framework achieves significant improvements over state-of-the-art models. Our framework has also been successfully deployed on the offline Alibaba distributed cloud platform.
English Analysis
1. Bibliographic Information
- Title: Controllable Multi-Interest Framework for Recommendation
- Authors: Yukuo Cen, Jianwei Zhang, Xu Zou, Chang Zhou, Hongxia Yang, Jie Tang
- Affiliations: The authors are from Tsinghua University and DAMO Academy, Alibaba Group. This collaboration signifies a strong synergy between academic research and real-world industrial application, particularly in the e-commerce domain.
- Journal/Conference: Published in the Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20). KDD is a premier, top-tier international conference for data mining and knowledge discovery, making it a highly reputable venue for this work.
- Publication Year: 2020
- Abstract: The paper addresses the problem of sequential recommendation in e-commerce. It argues that traditional methods, which generate a single embedding to represent a user, fail to capture their multiple, diverse interests. To solve this, the authors propose a Controllable Multi-Interest Framework for Recommendation (ComiRec). This framework has two key components: a multi-interest module that extracts several interest representations from a user's behavior sequence, and an aggregation module that combines recommendations from these interests. A controllable factor is introduced in the aggregation module to explicitly balance recommendation accuracy and diversity. The framework's effectiveness is demonstrated through significant improvements over state-of-the-art models on two large-scale datasets, Amazon and Taobao, and its successful deployment on Alibaba's industrial platform.
- Original Source Link:
- arXiv: https://arxiv.org/abs/2005.09347
- PDF: https://arxiv.org/pdf/2005.09347v2.pdf
- Publication Status: Formally published in KDD '20.
2. Executive Summary
-
Background & Motivation (Why): Modern e-commerce recommender systems face a fundamental challenge: users often have multiple, distinct interests. For example, a user might be shopping for a new phone case, a birthday gift for a friend, and groceries, all within a short period. Most recommender systems try to compress a user's entire interaction history into a single vector (a unified user embedding). This approach struggles to represent such diverse intentions, leading to recommendations that might be too generic or only cater to one of the user's interests. The paper identifies this limitation as a key gap, especially in the matching stage of industrial systems, where the goal is to efficiently retrieve a good set of candidate items from a pool of millions.
-
Main Contributions / Findings (What):
- A Novel Multi-Interest Framework (ComiRec): The paper proposes a comprehensive framework that explicitly models a user's multiple interests. It presents two powerful methods for the core
multi-interest extraction module: one based on dynamic routing (from capsule networks) and another based on self-attention. - Controllable Accuracy-Diversity Trade-off: A novel
aggregation moduleis introduced that not only combines the candidate items retrieved by different interests but also includes a controllable hyperparameter (). This allows system operators to explicitly tune the balance between recommendation accuracy (recommending relevant items) and diversity (recommending a varied set of items), which is a crucial feature for improving user experience. - State-of-the-Art Performance and Industrial Validation: The ComiRec framework is shown to significantly outperform existing state-of-the-art models on two large, real-world datasets. Crucially, its successful deployment and strong performance on Alibaba's billion-scale industrial platform validate its effectiveness and practicality in a real-world setting.
- A Novel Multi-Interest Framework (ComiRec): The paper proposes a comprehensive framework that explicitly models a user's multiple interests. It presents two powerful methods for the core
3. Prerequisite Knowledge & Related Work
To understand this paper, it's helpful to be familiar with the following concepts.
-
Foundational Concepts:
- Sequential Recommendation: This task aims to predict the next item a user will interact with based on the chronological sequence of their past interactions (e.g., clicks, purchases). It's more dynamic than static recommendation models.
- Two-Stage Recommender Systems: Industrial systems with massive item pools (billions of items) cannot score every item for every user. They use a two-stage process:
- Matching (or Retrieval): A fast model generates a few hundred or thousand potentially relevant candidate items. This paper focuses on improving this stage.
- Ranking: A more complex, precise model scores and re-ranks these candidates to produce the final recommendation list.
- Collaborative Filtering (CF): A classic recommendation technique based on the idea that users who agreed in the past will agree in the future. It can be user-based (find similar users) or item-based (find similar items).
- Attention Mechanism: A neural network technique that allows a model to dynamically weigh the importance of different elements in an input sequence. For example, when predicting the next item, it can pay more "attention" to more relevant past interactions.
- Capsule Network (CapsNet): An alternative to traditional neural networks. Instead of single neurons, it uses "capsules"—groups of neurons that output a vector. The vector's length represents the probability of an entity's existence, and its orientation represents the entity's properties. Dynamic Routing is the algorithm used to decide how information flows between capsules, allowing them to form part-whole relationships and cluster features effectively.
-
Previous Works and Differentiation:
- Single-Embedding Sequential Models:
GRU4Rec: Uses a Recurrent Neural Network (RNN), specifically a GRU, to process the user's behavior sequence and produce a single hidden state to represent the user's current interest.SASRec: Employs a self-attention mechanism (like in Transformers) to capture dependencies in the sequence, but still outputs a single contextual representation for prediction.- Limitation: All these models summarize the user's interests into a single vector, which is the core problem ComiRec addresses.
- Multi-Interest Models:
MIND(Multi-Interest Network with Dynamic Routing): A key predecessor from the same research group (Alibaba). It also uses a capsule-like network to extract multiple interest vectors.- ComiRec's Differentiation:
-
ComiRec-DRuses the original dynamic routing mechanism from CapsNet, whereasMINDuses a custom "Behavior-to-Interest (B2I)" routing. The paper shows this difference leads to better performance. -
ComiRec introduces a self-attention based variant (
ComiRec-SA) as an alternative and effective way to extract multiple interests. -
Most importantly, ComiRec introduces the controllable aggregation module, which is absent in
MINDand provides a practical way to balance accuracy and diversity.This paper builds upon the idea of multi-interest recommendation popularized by
MINDbut refines the extraction module and, more significantly, adds a novel, controllable aggregation mechanism for practical deployment.
-
- Single-Embedding Sequential Models:
4. Methodology (Core Technology & Implementation)
The paper's proposed framework, ComiRec, is designed for the matching stage. Its architecture consists of an embedding layer, a multi-interest extraction module, and an aggregation module, as illustrated below.

3.1 Problem Formulation
Given a set of users and items , each user has a historical interaction sequence , where is the -th item the user interacted with. The goal is to predict the next item(s) the user is likely to interact with.
3.2 Multi-Interest Framework
The core idea is to represent each user not with one, but with different interest vectors.
1. Embedding Layer: The input sequence of item IDs is transformed into a sequence of low-dimensional dense vectors (embeddings). For the self-attention method, trainable positional embeddings are added to these item embeddings to incorporate sequence order information.
2. Multi-Interest Extraction Module: This module takes the sequence of item embeddings and outputs interest vectors. The paper explores two methods:
-
ComiRec-DR (Dynamic Routing Method): This method treats the item embeddings in the user's sequence as primary capsules and the desired user interests as interest capsules. It uses the dynamic routing algorithm from CapsNet to iteratively refine the interest capsules.
- Algorithm 1: Dynamic Routing
- For each primary capsule (item embedding) and each interest capsule , a prediction vector is computed: , where is a learnable transformation matrix.
- The total input to an interest capsule is a weighted sum of these predictions: .
- The coupling coefficients are computed via a "routing softmax" over initial logits :
- The output of the interest capsule, , is obtained by applying a non-linear squashing function to its total input . This function scales short vectors to almost zero and long vectors to a length just below 1.
- The logits are updated based on the agreement between the output and the prediction (i.e., their dot product).
- Steps 3-5 are repeated for a fixed number of iterations (). The final set of vectors forms the user's multi-interest representation matrix .
- Algorithm 1: Dynamic Routing
-
ComiRec-SA (Self-Attentive Method): This method uses a multi-head self-attention mechanism to generate multiple interest vectors.
- Given the matrix of item embeddings , a standard attention mechanism would compute a single attention weight vector.
- To get interests, the mechanism is extended. An attention matrix is computed:
Here, and are trainable weight matrices. The
softmaxis applied column-wise, so each column of represents a different attention distribution over the user's behavior sequence. - The final matrix of user interests is computed by multiplying the item embedding matrix with the attention matrix: Each column of is a weighted sum of item embeddings, representing one of the user's interests.
3. Model Training:
During training, for a given target item , the model needs to select the most relevant interest vector. This is done using an argmax operation:
This means the interest vector in that has the highest inner product with the target item's embedding is chosen to represent the user for this specific prediction.
The model is trained to maximize the likelihood of the correct next item, using a negative log-likelihood loss function. To make training feasible with millions of items, sampled softmax is used instead of a full softmax over the entire item vocabulary.
3.3 Aggregation Module
This module is used during inference (online serving) to generate the final top-N recommendations. After retrieving candidate items for each of the interest vectors (totaling items), this module aggregates them.
该图像是一幅示意图,展示了推荐系统中多兴趣框架的流程。用户点击序列经过多兴趣提取模块,识别出珠宝、手袋和化妆品三类兴趣。通过最近邻检索得到候选物品,最后聚合模块输出多样化推荐结果,实现点击和推荐的闭环。
The goal is to find a set of items that maximizes a value function Q(u, S), which balances accuracy and diversity:
- : This is the relevance score of item , defined as its maximum similarity to any of the user's interests.
g(i, j): This is a dissimilarity function. The paper uses , which is 1 if items and are from different categories and 0 otherwise.- : This is the controllable factor.
-
If , the module only maximizes relevance (accuracy).
-
As increases, the module increasingly prioritizes diversity (recommending items from different categories).
Since finding the optimal set is computationally hard, the paper proposes a greedy inference algorithm (Algorithm 2). It starts with an empty set and iteratively adds the item from the candidate pool that provides the largest marginal increase to the value function .
-
5. Experimental Setup
-
Datasets:
-
Amazon Books: A public dataset of product reviews. The user sequences are truncated to a length of 20.
-
Taobao: A large-scale industrial dataset of user click behaviors from the Taobao e-commerce platform. Sequences are truncated to a length of 50.
Below is a transcription of Table 2 from the paper, showing dataset statistics.
Dataset # users # items # interactions Amazon Books 459,133 313,966 8,898,041 Taobao 976,779 1,708,530 85,384,110
-
-
Experimental Setting: The experiments use a strong generalization setup, where users are split into 80% for training, 10% for validation, and 10% for testing. This means the model is evaluated on its ability to make recommendations for users it has never seen during training, which is a more realistic and challenging scenario. For each test user, the first 80% of their behavior sequence is used to infer their interests, and the model is evaluated on its ability to predict the remaining 20%.
-
Baselines:
MostPopular: A non-personalized baseline that recommends the globally most popular items.YouTube DNN: A successful deep learning model for recommendation, adapted for this task.GRU4Rec: A classic RNN-based model for sequential recommendation.MIND: The state-of-the-art multi-interest model from Alibaba, serving as the strongest baseline.
-
Evaluation Metrics:
-
Recall@N:
- Conceptual Definition: Measures the proportion of a user's ground-truth items (in the test set) that are found within the top-N recommended items. It answers: "Out of all the items the user actually wanted, what fraction did we recommend?"
- Mathematical Formula:
- Symbol Explanation: is the set of users, is the set of top-N recommended items for user , and is the set of ground-truth items for user in the test set.
-
Hit Rate (HR@N):
- Conceptual Definition: Measures the percentage of users for whom at least one correct item is present in the top-N recommendation list. It answers: "Did we get at least one recommendation right?"
- Mathematical Formula:
- Symbol Explanation: is the indicator function, which is 1 if the condition is true and 0 otherwise.
-
Normalized Discounted Cumulative Gain (NDCG@N):
- Conceptual Definition: A rank-aware metric that evaluates the quality of the recommendation list by assigning higher scores to correct items that appear earlier in the list. The "discounted" part means that correct items at lower ranks contribute less. The "normalized" part scales the score by the ideal possible score, so the value is always between 0 and 1.
- Mathematical Formula:
- Symbol Explanation: is the item at rank for user . is the DCG score for the ideal ranking.
-
6. Results & Analysis
-
Core Results: The main results (with for fair accuracy comparison) are shown in the transcribed Table 3 below.
Amazon Books Taobao Metrics@20 Metrics@50 Metrics@20 Metrics@50 Recall NDCG Hit Rate Recall NDCG Hit Rate Recall NDCG Hit Rate Recall NDCG Hit Rate MostPopular 1.368 2.259 3.020 2.400 3.936 5.226 0.395 2.065 5.424 0.735 3.603 9.309 YouTube DNN 4.567 7.670 10.285 7.312 12.075 15.894 4.205 14.511 28.785 6.172 20.248 39.108 GRU4Rec 4.057 6.803 8.945 6.501 10.369 13.666 5.884 22.095 35.745 8.494 29.396 46.068 MIND 4.862 7.933 10.618 7.638 12.230 16.145 6.281 20.394 38.119 8.155 25.069 45.846 ComiRec-SA 5.489 8.991 11.402 8.467 13.563 17.202 6.900 24.682 41.549 9.462 31.278 51.064 ComiRec-DR 5.311 9.185 12.005 8.106 13.520 17.583 6.890 24.007 41.746 9.818 31.365 52.418 Analysis: Both
ComiRec-SAandComiRec-DRconsistently and significantly outperform all baselines, including the strongMINDmodel, across both datasets and all metrics. This demonstrates the superiority of the proposed multi-interest extraction methods. The twoComiRecvariants perform comparably, withComiRec-SAslightly better on Amazon andComiRec-DRon Taobao, suggesting both are powerful and viable options. -
Parameter Sensitivity (Number of Interests, K): The paper investigates how performance changes with the number of interests, . The results (transcribed from Table 4) show that the optimal is dataset-dependent. For instance, on Taobao,
ComiRec-DR's performance generally improves as increases from 2 to 8, whileComiRec-SAperforms best with . This indicates that is a critical hyperparameter that needs to be tuned for the specific application. -
Controllable Study: This study validates the effectiveness of the aggregation module. The
Diversity@Nmetric is defined as the average proportion of dissimilar item pairs (from different categories) in a recommendation list. The results from the transcribed Table 5 show that as the control factor increases, diversity substantially increases, while recall (accuracy) decreases only slightly. For example, forComiRec-SA, increasing from 0 to 0.25 more than doubles the diversity (23.2 to 55.1) while recall drops by less than 5% (8.467 to 8.034). This confirms that the aggregation module provides an effective mechanism to tune the accuracy-diversity trade-off. -
Industrial Results and Case Study: The framework was deployed on Alibaba's platform, tested on a dataset with over 4 billion interactions.
ComiRec-SAandComiRec-DRimproved recall@50 by 1.39% and 8.65% respectively, compared toMIND. These are significant gains at such a scale.The case study in Figure 3 provides compelling qualitative evidence.
该图像是论文中用户多兴趣案例的示意图,通过模型从用户点击序列生成了四个兴趣向量,分别对应甜点、礼盒、手机壳和配件。图左显示了点击序列中的相关商品,图右展示了通过兴趣向量从大规模商品池中检索到的商品。From a user's click sequence, the model successfully identifies four distinct interests: sweets, gift boxes, phone cases, and accessories. Crucially, it does this using only item IDs, without being explicitly fed item category information. The items retrieved by each interest vector clearly correspond to these learned categories, demonstrating that the model truly captures a user's multiple intents.
7. Conclusion & Reflections
-
Conclusion Summary: The paper successfully proposes
ComiRec, a novel and practical framework for sequential recommendation that addresses the critical limitation of single-user embeddings. By modeling multiple user interests with two effective methods (dynamic routing and self-attention) and introducing a controllable aggregation module to balance accuracy and diversity, the framework achieves state-of-the-art performance. Its validation on massive industrial data confirms its real-world value. -
Limitations & Future Work: The authors suggest future directions, including leveraging memory networks to capture how user interests evolve over longer periods and incorporating cognitive theories for more sophisticated user modeling.
-
Personal Insights & Critique:
- Strengths: The paper's main strength is its practicality. The problem it tackles is real, the solution is well-aligned with industrial two-stage architectures, and the results are validated at scale. The controllable aggregation module is a particularly strong contribution, offering a practical knob for system operators to tune the user experience beyond pure accuracy.
- Novelty: While building on
MIND, the paper's novelty lies in refining the extraction module (showing original dynamic routing works better and proposing a competitive self-attention alternative) and, more importantly, introducing the controllable aggregation logic. - Potential Weaknesses and Open Questions:
- The greedy inference for aggregation is an approximation. While efficient, it may not find the optimal solution.
- The diversity metric is based on pre-defined item categories. This might not capture other important facets of diversity, such as brand, price point, or visual style.
- The optimal number of interests, , is a hyperparameter that must be tuned manually. An interesting future direction would be to learn adaptively based on the user's behavior.
Similar papers
Recommended via semantic vector search.
SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system
SPARC employs RQ-VAE for dynamic multi-interest discretization and end-to-end training, enabling adaptive user interest evolution and a probabilistic soft-search that actively explores novel interests, significantly enhancing recommendation performance and metrics.
Multi-Interest Recommendation: A Survey
This survey reviews multi-interest recommendation, addressing user preference diversity through multiple interest representations to enhance recommendation accuracy. It covers importance, focus areas, implementation, challenges, and future directions, establishing a foundational
HyMiRec: A Hybrid Multi-interest Learning Framework for LLM-based Sequential Recommendation
HyMiRec integrates lightweight and LLM-based recommenders to model long user sequences and diverse interests, employing residual codebooks for embedding compression and disentangled modules to enhance sequential recommendation effectiveness.
Discussion
Leave a comment
No comments yet. Start the discussion!