- Title: Controllable Multi-Interest Framework for Recommendation
- Authors: Yukuo Cen, Jianwei Zhang, Xu Zou, Chang Zhou, Hongxia Yang, Jie Tang
- Affiliations: The authors are from Tsinghua University and DAMO Academy, Alibaba Group. This collaboration signifies a strong synergy between academic research and real-world industrial application, particularly in the e-commerce domain.
- Journal/Conference: Published in the Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20). KDD is a premier, top-tier international conference for data mining and knowledge discovery, making it a highly reputable venue for this work.
- Publication Year: 2020
- Abstract: The paper addresses the problem of sequential recommendation in e-commerce. It argues that traditional methods, which generate a single embedding to represent a user, fail to capture their multiple, diverse interests. To solve this, the authors propose a Controllable Multi-Interest Framework for Recommendation (ComiRec). This framework has two key components: a multi-interest module that extracts several interest representations from a user's behavior sequence, and an aggregation module that combines recommendations from these interests. A controllable factor is introduced in the aggregation module to explicitly balance recommendation accuracy and diversity. The framework's effectiveness is demonstrated through significant improvements over state-of-the-art models on two large-scale datasets, Amazon and Taobao, and its successful deployment on Alibaba's industrial platform.
- Original Source Link:
2. Executive Summary
-
Background & Motivation (Why): Modern e-commerce recommender systems face a fundamental challenge: users often have multiple, distinct interests. For example, a user might be shopping for a new phone case, a birthday gift for a friend, and groceries, all within a short period. Most recommender systems try to compress a user's entire interaction history into a single vector (a unified user embedding). This approach struggles to represent such diverse intentions, leading to recommendations that might be too generic or only cater to one of the user's interests. The paper identifies this limitation as a key gap, especially in the matching stage of industrial systems, where the goal is to efficiently retrieve a good set of candidate items from a pool of millions.
-
Main Contributions / Findings (What):
- A Novel Multi-Interest Framework (ComiRec): The paper proposes a comprehensive framework that explicitly models a user's multiple interests. It presents two powerful methods for the core
multi-interest extraction module
: one based on dynamic routing (from capsule networks) and another based on self-attention.
- Controllable Accuracy-Diversity Trade-off: A novel
aggregation module
is introduced that not only combines the candidate items retrieved by different interests but also includes a controllable hyperparameter (λ). This allows system operators to explicitly tune the balance between recommendation accuracy (recommending relevant items) and diversity (recommending a varied set of items), which is a crucial feature for improving user experience.
- State-of-the-Art Performance and Industrial Validation: The ComiRec framework is shown to significantly outperform existing state-of-the-art models on two large, real-world datasets. Crucially, its successful deployment and strong performance on Alibaba's billion-scale industrial platform validate its effectiveness and practicality in a real-world setting.
To understand this paper, it's helpful to be familiar with the following concepts.
4. Methodology (Core Technology & Implementation)
The paper's proposed framework, ComiRec
, is designed for the matching stage. Its architecture consists of an embedding layer, a multi-interest extraction module, and an aggregation module, as illustrated below.

3.1 Problem Formulation
Given a set of users U and items I, each user u∈U has a historical interaction sequence (e1(u),e2(u),…,en(u)), where et(u) is the t-th item the user interacted with. The goal is to predict the next item(s) the user is likely to interact with.
3.2 Multi-Interest Framework
The core idea is to represent each user not with one, but with K different interest vectors.
1. Embedding Layer:
The input sequence of item IDs is transformed into a sequence of low-dimensional dense vectors (embeddings). For the self-attention method, trainable positional embeddings are added to these item embeddings to incorporate sequence order information.
2. Multi-Interest Extraction Module:
This module takes the sequence of item embeddings and outputs K interest vectors. The paper explores two methods:
-
ComiRec-DR (Dynamic Routing Method):
This method treats the item embeddings in the user's sequence as primary capsules and the desired user interests as interest capsules. It uses the dynamic routing algorithm from CapsNet to iteratively refine the interest capsules.
- Algorithm 1: Dynamic Routing
- For each primary capsule (item embedding) ei and each interest capsule j, a prediction vector is computed: e^j∣i=Wijei, where Wij is a learnable transformation matrix.
- The total input to an interest capsule j is a weighted sum of these predictions: sj=∑icije^j∣i.
- The coupling coefficients cij are computed via a "routing softmax" over initial logits bij:
cij=∑kexp(bik)exp(bij)
- The output of the interest capsule, vj, is obtained by applying a non-linear squashing function to its total input sj. This function scales short vectors to almost zero and long vectors to a length just below 1.
vj=squash(sj)=1+∥sj∥2∥sj∥2∥sj∥sj
- The logits bij are updated based on the agreement between the output vj and the prediction e^j∣i (i.e., their dot product).
- Steps 3-5 are repeated for a fixed number of iterations (r). The final set of vectors {v1,…,vK} forms the user's multi-interest representation matrix Vu.
-
ComiRec-SA (Self-Attentive Method):
This method uses a multi-head self-attention mechanism to generate multiple interest vectors.
- Given the matrix of item embeddings H∈Rd×n, a standard attention mechanism would compute a single attention weight vector.
- To get K interests, the mechanism is extended. An attention matrix A∈Rn×K is computed:
A=softmax(W2⊤tanh(W1H))⊤
Here, W1 and W2 are trainable weight matrices. The
softmax
is applied column-wise, so each column of A represents a different attention distribution over the user's behavior sequence.
- The final matrix of user interests Vu∈Rd×K is computed by multiplying the item embedding matrix with the attention matrix:
Vu=HA
Each column of Vu is a weighted sum of item embeddings, representing one of the user's K interests.
3. Model Training:
During training, for a given target item i, the model needs to select the most relevant interest vector. This is done using an argmax
operation:
vu=Vu[:,argmax(Vu⊤ei)]
This means the interest vector in Vu that has the highest inner product with the target item's embedding ei is chosen to represent the user for this specific prediction.
The model is trained to maximize the likelihood of the correct next item, using a negative log-likelihood loss function. To make training feasible with millions of items, sampled softmax is used instead of a full softmax over the entire item vocabulary.
3.3 Aggregation Module
This module is used during inference (online serving) to generate the final top-N recommendations. After retrieving N candidate items for each of the K interest vectors (totaling K×N items), this module aggregates them.
该图像是一幅示意图,展示了推荐系统中多兴趣框架的流程。用户点击序列经过多兴趣提取模块,识别出珠宝、手袋和化妆品三类兴趣。通过最近邻检索得到候选物品,最后聚合模块输出多样化推荐结果,实现点击和推荐的闭环。
The goal is to find a set S of N items that maximizes a value function Q(u, S)
, which balances accuracy and diversity:
Q(u,S)=i∈S∑f(u,i)+λi∈S∑j∈S∑g(i,j)
- f(u,i)=max1≤k≤K(ei⊤vu(k)): This is the relevance score of item i, defined as its maximum similarity to any of the user's K interests.
g(i, j)
: This is a dissimilarity function. The paper uses g(i,j)=δ(CATE(i)=CATE(j)), which is 1 if items i and j are from different categories and 0 otherwise.
- λ: This is the controllable factor.
-
If λ=0, the module only maximizes relevance (accuracy).
-
As λ increases, the module increasingly prioritizes diversity (recommending items from different categories).
Since finding the optimal set S is computationally hard, the paper proposes a greedy inference algorithm (Algorithm 2). It starts with an empty set S and iteratively adds the item from the candidate pool that provides the largest marginal increase to the value function Q.
5. Experimental Setup
-
Datasets:
-
Amazon Books: A public dataset of product reviews. The user sequences are truncated to a length of 20.
-
Taobao: A large-scale industrial dataset of user click behaviors from the Taobao e-commerce platform. Sequences are truncated to a length of 50.
Below is a transcription of Table 2 from the paper, showing dataset statistics.
Dataset |
# users |
# items |
# interactions |
Amazon Books |
459,133 |
313,966 |
8,898,041 |
Taobao |
976,779 |
1,708,530 |
85,384,110 |
-
Experimental Setting:
The experiments use a strong generalization setup, where users are split into 80% for training, 10% for validation, and 10% for testing. This means the model is evaluated on its ability to make recommendations for users it has never seen during training, which is a more realistic and challenging scenario. For each test user, the first 80% of their behavior sequence is used to infer their interests, and the model is evaluated on its ability to predict the remaining 20%.
-
Baselines:
MostPopular
: A non-personalized baseline that recommends the globally most popular items.
YouTube DNN
: A successful deep learning model for recommendation, adapted for this task.
GRU4Rec
: A classic RNN-based model for sequential recommendation.
MIND
: The state-of-the-art multi-interest model from Alibaba, serving as the strongest baseline.
-
Evaluation Metrics:
6. Results & Analysis
-
Core Results:
The main results (with λ=0 for fair accuracy comparison) are shown in the transcribed Table 3 below.
|
Amazon Books |
Taobao |
Metrics@20 |
Metrics@50 |
Metrics@20 |
Metrics@50 |
|
Recall |
NDCG |
Hit Rate |
Recall |
NDCG |
Hit Rate |
Recall |
NDCG |
Hit Rate |
Recall |
NDCG |
Hit Rate |
MostPopular |
1.368 |
2.259 |
3.020 |
2.400 |
3.936 |
5.226 |
0.395 |
2.065 |
5.424 |
0.735 |
3.603 |
9.309 |
YouTube DNN |
4.567 |
7.670 |
10.285 |
7.312 |
12.075 |
15.894 |
4.205 |
14.511 |
28.785 |
6.172 |
20.248 |
39.108 |
GRU4Rec |
4.057 |
6.803 |
8.945 |
6.501 |
10.369 |
13.666 |
5.884 |
22.095 |
35.745 |
8.494 |
29.396 |
46.068 |
MIND |
4.862 |
7.933 |
10.618 |
7.638 |
12.230 |
16.145 |
6.281 |
20.394 |
38.119 |
8.155 |
25.069 |
45.846 |
ComiRec-SA |
5.489 |
8.991 |
11.402 |
8.467 |
13.563 |
17.202 |
6.900 |
24.682 |
41.549 |
9.462 |
31.278 |
51.064 |
ComiRec-DR |
5.311 |
9.185 |
12.005 |
8.106 |
13.520 |
17.583 |
6.890 |
24.007 |
41.746 |
9.818 |
31.365 |
52.418 |
Analysis: Both ComiRec-SA
and ComiRec-DR
consistently and significantly outperform all baselines, including the strong MIND
model, across both datasets and all metrics. This demonstrates the superiority of the proposed multi-interest extraction methods. The two ComiRec
variants perform comparably, with ComiRec-SA
slightly better on Amazon and ComiRec-DR
on Taobao, suggesting both are powerful and viable options.
-
Parameter Sensitivity (Number of Interests, K):
The paper investigates how performance changes with the number of interests, K. The results (transcribed from Table 4) show that the optimal K is dataset-dependent. For instance, on Taobao, ComiRec-DR
's performance generally improves as K increases from 2 to 8, while ComiRec-SA
performs best with K=2. This indicates that K is a critical hyperparameter that needs to be tuned for the specific application.
-
Controllable Study:
This study validates the effectiveness of the aggregation module. The Diversity@N
metric is defined as the average proportion of dissimilar item pairs (from different categories) in a recommendation list.
Diversity@N=N×(N−1)/2∑j=1N∑k=j+1Nδ(CATE(i^u,j)=CATE(i^u,k))
The results from the transcribed Table 5 show that as the control factor λ increases, diversity substantially increases, while recall (accuracy) decreases only slightly. For example, for ComiRec-SA
, increasing λ from 0 to 0.25 more than doubles the diversity (23.2 to 55.1) while recall drops by less than 5% (8.467 to 8.034). This confirms that the aggregation module provides an effective mechanism to tune the accuracy-diversity trade-off.
-
Industrial Results and Case Study:
The framework was deployed on Alibaba's platform, tested on a dataset with over 4 billion interactions. ComiRec-SA
and ComiRec-DR
improved recall@50 by 1.39% and 8.65% respectively, compared to MIND
. These are significant gains at such a scale.
The case study in Figure 3 provides compelling qualitative evidence.
该图像是论文中用户多兴趣案例的示意图,通过模型从用户点击序列生成了四个兴趣向量,分别对应甜点、礼盒、手机壳和配件。图左显示了点击序列中的相关商品,图右展示了通过兴趣向量从大规模商品池中检索到的商品。
From a user's click sequence, the model successfully identifies four distinct interests: sweets, gift boxes, phone cases, and accessories. Crucially, it does this using only item IDs, without being explicitly fed item category information. The items retrieved by each interest vector clearly correspond to these learned categories, demonstrating that the model truly captures a user's multiple intents.
7. Conclusion & Reflections
-
Conclusion Summary: The paper successfully proposes ComiRec
, a novel and practical framework for sequential recommendation that addresses the critical limitation of single-user embeddings. By modeling multiple user interests with two effective methods (dynamic routing and self-attention) and introducing a controllable aggregation module to balance accuracy and diversity, the framework achieves state-of-the-art performance. Its validation on massive industrial data confirms its real-world value.
-
Limitations & Future Work: The authors suggest future directions, including leveraging memory networks to capture how user interests evolve over longer periods and incorporating cognitive theories for more sophisticated user modeling.
-
Personal Insights & Critique:
- Strengths: The paper's main strength is its practicality. The problem it tackles is real, the solution is well-aligned with industrial two-stage architectures, and the results are validated at scale. The controllable aggregation module is a particularly strong contribution, offering a practical knob for system operators to tune the user experience beyond pure accuracy.
- Novelty: While building on
MIND
, the paper's novelty lies in refining the extraction module (showing original dynamic routing works better and proposing a competitive self-attention alternative) and, more importantly, introducing the controllable aggregation logic.
- Potential Weaknesses and Open Questions:
- The greedy inference for aggregation is an approximation. While efficient, it may not find the optimal solution.
- The diversity metric is based on pre-defined item categories. This might not capture other important facets of diversity, such as brand, price point, or visual style.
- The optimal number of interests, K, is a hyperparameter that must be tuned manually. An interesting future direction would be to learn K adaptively based on the user's behavior.