Large Language Model as Universal Retriever in Industrial-Scale Recommender System
TL;DR Summary
This paper proposes a Universal Retriever (URM) using LLMs for industrial-scale recommender systems, tackling diverse objectives via multi-query representation, matrix decomposition, and probabilistic sampling. URM efficiently retrieves from tens of millions of candidates, outper
Abstract
In real-world recommender systems, different retrieval objectives are typically addressed using task-specific datasets with carefully designed model architectures. We demonstrate that Large Language Models (LLMs) can function as universal retrievers, capable of handling multiple objectives within a generative retrieval framework. To model complex user-item relationships within generative retrieval, we propose multi-query representation. To address the challenge of extremely large candidate sets in industrial recommender systems, we introduce matrix decomposition to boost model learnability, discriminability, and transferability, and we incorporate probabilistic sampling to reduce computation costs. Finally, our Universal Retrieval Model (URM) can adaptively generate a set from tens of millions of candidates based on arbitrary given objective while keeping the latency within tens of milliseconds. Applied to industrial-scale data, URM outperforms expert models elaborately designed for different retrieval objectives on offline experiments and significantly improves the core metric of online advertising platform by .
English Analysis
1. Bibliographic Information
- Title: Large Language Model as Universal Retriever in Industrial-Scale Recommender System
- Authors: Junguang Jiang, Yanwen Huang, Bin Liu, Xiaoyu Kong, Xinhang Li, Ziru Xu, Han Zhu, Jian Xu, Bo Zheng
- Affiliations: Taobao & Tmall Group of Alibaba, China. The authors are from a major industrial e-commerce and advertising platform, which lends significant credibility to the paper's claims about industrial-scale application and online performance.
- Journal/Conference: The paper is available on arXiv, a preprint server. This indicates it is a recent work, possibly submitted to a top-tier conference like KDD, SIGIR, or NeurIPS, which are highly reputable venues for recommender systems research.
- Publication Year: 2025 (as listed in the preprint). The first version was submitted in February 2025.
- Abstract: The paper proposes using Large Language Models (LLMs) as a universal retriever in industrial recommender systems, capable of handling multiple, diverse retrieval objectives within a single model. To achieve this, the authors introduce three key techniques: (1) multi-query representation to model complex user-item relationships, (2) matrix decomposition to handle the extremely large candidate set and improve learnability, and (3) probabilistic sampling to ensure efficiency. Their proposed Universal Retrieval Model (URM) can generate a personalized set of items from tens of millions of candidates with low latency. URM is shown to outperform specialized expert models in offline experiments and achieve a significant 3% improvement in a core metric on an online advertising platform.
- Original Source Link:
- arXiv: https://arxiv.org/abs/2502.03041
- PDF: https://arxiv.org/pdf/2502.03041v2.pdf
- Status: Preprint.
2. Executive Summary
-
Background & Motivation (Why):
- Core Problem: Industrial recommender systems need to satisfy numerous, often conflicting, objectives (e.g., predicting clicks, purchases, recommending novel items, adapting to different scenarios). The standard approach is to build separate, specialized retrieval models for each objective. This "multi-channel retrieval" is time-consuming, expensive to maintain, lacks scalability, and struggles when objectives are not clearly defined or when data for a specific objective is scarce.
- Why It's Important: A unified model would simplify development, reduce operational overhead, and allow for more flexible and dynamic control over recommendation goals. Furthermore, for some online business metrics (like advertising revenue), there is no direct offline training objective. A universal model that can be "prompted" with different goals allows for rapid online experimentation to optimize these metrics directly.
- Fresh Angle: The paper hypothesizes that the versatility and natural language understanding capabilities of Large Language Models (LLMs) can be leveraged to create a single, universal retriever. Instead of engineering model architectures for each task, different objectives can be specified through simple text prompts, unifying all retrieval tasks into a single input-output framework.
-
Main Contributions / Findings (What):
- A Novel Framework (URM): The paper introduces the Universal Retrieval Model (URM), which uses an LLM as a backbone to function as a universal retriever. This framework unifies diverse retrieval tasks by conditioning the model on natural language prompts describing the desired objective.
- Multi-Query Representation: To overcome the limited expressiveness of standard generative retrieval (which uses a single vector to represent the user), URM generates multiple query representations from a single forward pass of the LLM. This allows the model to capture different facets of a user's interests and model complex user-item relationships more effectively.
- Scalable Mapping for Large Candidate Sets: To handle the tens of millions of candidate items common in industry, the paper proposes a two-part solution:
- Matrix Decomposition: The large output mapping matrix is decomposed to improve learnability and separate representations for discriminability (for well-known items) and transferability (for new or long-tail items).
- Probabilistic Sampling: An efficient, iterative sampling algorithm is introduced for inference, which approximates the full retrieval process over the entire candidate set with minimal computational cost and low latency.
- Strong Empirical Results: URM is shown to significantly outperform specialized, state-of-the-art models in both public and large-scale industrial datasets. Most impressively, a real-world online A/B test on an advertising platform demonstrated a 3.01% increase in revenue, validating the practical effectiveness of the approach.
3. Prerequisite Knowledge & Related Work
-
Foundational Concepts:
- Recommender System: A system that predicts a user's interest in an item and suggests relevant items. Industrial systems typically operate in a multi-stage pipeline: retrieval, ranking, and reranking. This paper focuses on the retrieval stage, which is responsible for efficiently selecting a few hundred promising candidates from a massive pool (millions or billions) of items.
- Embedding-Based Retrieval (EBR): A common retrieval technique where users and items are mapped to low-dimensional vectors (embeddings). The similarity between a user's embedding and an item's embedding (often calculated via inner product) determines the recommendation score. While efficient, its ability to model complex relationships is limited.
- Model-Based Retrieval: Uses more complex models (e.g., deep neural networks) than simple inner products to score user-item pairs. These are more expressive but computationally expensive, often requiring special indexing structures (like trees) to be feasible at scale.
- Generative Retrieval: Frames retrieval as a generation task. Instead of scoring candidates, the model directly generates the IDs of the recommended items. This approach aligns well with modern generative models like LLMs.
- Large Language Model (LLM): A large neural network (e.g., GPT, LLaMA) pre-trained on vast amounts of text data. LLMs excel at understanding natural language and can be fine-tuned for various downstream tasks.
- Multi-Task Learning (MTL): A machine learning paradigm where a single model is trained to perform multiple tasks simultaneously. A common challenge in MTL is the seesaw phenomenon, where improving performance on one task leads to a degradation in performance on another.
-
Previous Works & Differentiation:
- Traditional Multi-Channel Retrieval: As mentioned, this involves training and deploying separate models for each objective. URM replaces this entire complex system with a single, unified model.
- Traditional Multi-Task Learning Models: Models like
MMoE
andPLE
were designed to mitigate the seesaw phenomenon in multi-task recommenders. However, they require careful architectural design and become difficult to manage as the number of tasks grows. URM handles tasks via textual prompts, offering greater flexibility and scalability without complex, handcrafted architectures. - Previous Generative Retrieval (e.g., TIGER): Earlier methods often used semantic IDs (sequences of tokens representing an item) to reduce the vast output space. This requires autoregressive generation (multiple forward passes), which is too slow with large LLMs. Additionally, semantic IDs can struggle with fine-grained item discrimination and cold-start problems. URM avoids autoregressive generation by performing a single forward pass and uses a novel matrix decomposition and sampling scheme to handle the large output space directly.
4. Methodology (Core Technology & Implementation)
The goal of the Universal Retrieval Model (URM) is to use a single model to find the top-k items from a candidate set that maximize the conditional probability for a given user and any retrieval objective . The model estimates this probability as:
where F(u,o)
is the feature representation from the LLM and is a large mapping matrix.
The methodology is broken down into three key innovations to make this feasible at an industrial scale.
4.1 Representations for Users & Any Objective
This part details how user information and retrieval objectives are fed into the LLM.
-
Input Serialization: All user features (demographics, behavioral sequences like clicks and purchases) and the retrieval objective are converted into a single text sequence.
- User Description: "The user attributes are as follows: age {AGE}, gender {GENDER}... The user has ... purchased [8274].., clicked [8380]..."
- Retrieval Objectives: "Please retrieve items that the user will click on." or "Please retrieve long-tail items."
-
Embedding Layer: As shown in Figure 1, the input sequence is converted into embeddings.
-
Text tokens are mapped to embeddings using the LLM's standard vocabulary table.
-
Item IDs (e.g.,
[8274]
) are treated as special tokens. Their embeddings are stored in aDistributed HashTable
(for scalability) and projected by an MLP to match the LLM's hidden dimension. -
These embeddings are summed with position embeddings and fed into the LLM backbone.
该图像是图1:URM架构示意图。它展示了输入序列(用户描述 、检索目标 和查询令牌)如何通过嵌入层(包含令牌、项目和位置嵌入)后输入到LLM骨干网络。LLM输出的隐藏特征
F(u,o)
通过映射层 生成 ,最终输出为 ,以在推荐系统中进行项目检索。LLM骨干网络的参数被完全微调。
-
-
Multi-Query Representation: This is a core contribution to enhance model expressiveness.
- Intuition: A single user representation vector may be insufficient to capture a user's diverse interests. Function approximation theory suggests that a combination of multiple basis functions can represent more complex functions.
- Implementation: special, learnable query tokens (e.g.,
[Q1]
,[Q2]
, ...) are appended to the input sequence. The LLM processes the entire sequence in a single forward pass and produces corresponding output hidden states, forming a multi-query representation matrix . - Scoring: The score for each candidate item is calculated by taking the maximum inner product across all query vectors: The final probability is then calculated using softmax over these max scores. This allows different query vectors to specialize in capturing different aspects of user intent (e.g., one for clothing, another for electronics).
4.2 Mapping to Large-scale Candidate Sets
This section addresses the challenge that the mapping matrix is enormous when the candidate set is in the tens of millions, making it hard to learn and computationally expensive.
- Matrix Decomposition: The matrix is decomposed into two smaller matrices, and : where and , with . This reduces the number of parameters and makes learning more tractable.
- Hybrid Item Representations for : To further improve performance, especially for new (unseen) items, the item-side matrix is composed of two parts:
- (Discriminability): A fully learnable embedding matrix for items present in the training data. This allows the model to learn fine-grained distinctions between popular items.
- (Transferability): A representation for all items (including new ones) generated from their features. Item attributes (title, category, price, etc.) are serialized into text and fed into a general-purpose text embedding model to get a fixed representation. This allows the model to handle cold-start items based on their semantic content. The final item representation is the sum: .
4.3 Approximation of Large Matrix Multiplication
This section tackles the computational cost of training and inference with a massive candidate set.
-
Training with NCE Loss: During training, calculating the full softmax over millions of items is infeasible. The paper uses Noise Contrastive Estimation (NCE) Loss, which approximates the full softmax by distinguishing the positive sample from a small set of randomly sampled negative examples.
- : The set of positive items for user and objective .
- : A set of negative items sampled from the candidate pool.
-
Inference with Probabilistic Sampling (Algorithm 1): During inference, scoring all candidates is also too slow. The paper proposes an iterative sampling method.
- Algorithm Steps:
- Index Neighbors: An Approximate Nearest Neighbor (ANN) index is pre-built for the item representations in , so for any item , we can quickly find its neighbors .
- Initialize: Start with an initial random subset of candidates .
- Iterate (T times):
a. Calculate probabilities only for the current subset of candidates
N(t-1)
, using a temperature to control the sharpness of the distribution. b. Sample itemsS(t)
from this subset based on their calculated probabilities. c. Create the next candidate setN(t)
by taking the union of the sampled itemsS(t)
and all their pre-indexed neighbors. - Return: The final set of sampled items
S(T)
.
- Intuition: The algorithm assumes that items similar in the embedding space (neighbors) will have similar recommendation scores. By iteratively sampling promising items and exploring their neighborhoods, it can efficiently converge to a high-quality set of recommendations without scoring the entire catalog.
- Algorithm Steps:
5. Experimental Setup
-
Datasets:
- Public Datasets: Four datasets from Amazon reviews were used:
Sports & Outdoors
,Beauty
,Toys & Games
, andYelp
. These are standard benchmarks for sequential recommendation. - Industrial-Scale Dataset: A large dataset from Alibaba's online system, containing hundreds of millions of user-item interactions and a candidate set of tens of millions of items. It covers nine distinct retrieval objectives:
CPR
: Click Prediction RetrievalRSA
,RSB
,RSC
: Retrieval for different Scenarios (A, B, C)SR
: Serendipity Retrieval (recommending novel categories)LR
: Long-term RetrievalLIR
: Long-tail Item RetrievalPPR
: Purchase Prediction RetrievalRQ
: Retrieval with a user Query
- Public Datasets: Four datasets from Amazon reviews were used:
-
Evaluation Metrics:
- Hit Rate (HR@K):
- Conceptual Definition: Measures the percentage of test cases where the ground-truth item is found within the top-K recommended items. It's a measure of whether the model can find the correct item at all.
- Mathematical Formula:
- Symbol Explanation: is the total number of users (or test cases), and is an indicator function that is 1 if the rank of the true next item for user is within the top , and 0 otherwise.
- Normalized Discounted Cumulative Gain (NDCG@K):
- Conceptual Definition: A measure of ranking quality that assigns higher scores to recommendations where the ground-truth item is ranked higher in the top-K list. It rewards correct items at the top more than those at the bottom.
- Mathematical Formula:
- Symbol Explanation: is the relevance of the item at rank (1 if it's the ground-truth item, 0 otherwise). is the Discounted Cumulative Gain. is the ideal DCG, calculated for the perfect ranking, used for normalization.
- Recall (R@K):
- Conceptual Definition: Measures the fraction of all relevant items that are successfully retrieved in the top-K list. This is particularly useful when there can be multiple correct items (as in the industrial dataset setup).
- Mathematical Formula:
- Symbol Explanation: is the set of predicted items, and is the ground-truth set of relevant items.
- Hit Rate (HR@K):
-
Baselines:
- Public Datasets: A comprehensive set of baselines was used, including classic sequential models (
GRU4Rec
,Caser
), attention-based models (SASRec
,BERT4Rec
), and recent LLM-based models (P5
,E4SRec
,TIGER
,IDGenRec
). - Industrial Dataset: Strong, widely-used industrial models were chosen as baselines:
Two-tower Model
,Transformer-based Model
, andAttention-DNN
, including its multi-task variants (SharedBottom
,MMoE
,PLE
). Both single-task (STL) and multi-task (MTL) versions were tested.
- Public Datasets: A comprehensive set of baselines was used, including classic sequential models (
6. Results & Analysis
6.1 Core Results
-
Public Datasets:
-
The following is a transcription of Table 1, showing URM's performance against baselines on public datasets.
Methods Sports Beauty Toys Yelp HR@5 NDCG@5 HR@5 NDCG@5 HR@5 NDCG@5 HR@5 NDCG@5 HGN 0.0189 0.0120 0.0325 0.0206 0.0321 0.0221 0.0186 0.0115 GRU4Rec 0.0129 0.0086 0.0164 0.0099 0.0097 0.0059 0.0152 0.0099 Caser 0.0116 0.0072 0.0205 0.0131 0.0166 0.0107 0.0151 0.0096 BERT4Rec 0.0115 0.0075 0.0203 0.0124 0.0116 0.0071 0.0051 0.0033 FDSA 0.0182 0.0122 0.0267 0.0163 0.0228 0.0140 0.0158 0.0098 SASRec 0.0233 0.0154 0.0387 0.0249 0.0445 0.0236 0.0162 0.0100 S3-Rec 0.0251 0.0161 0.0387 0.0244 0.0443 0.0294 0.0201 0.0123 E4SRec 0.0281 0.0196 0.0525 0.0360 0.0566 0.0405 0.0266 0.0189 P5 0.0387 0.0312 0.0508 0.0379 0.0648 0.0567 0.0574 0.0403 TIGER 0.0264 0.0181 0.0454 0.0321 0.0521 0.0371 - - IDGenRec 0.0429 0.0326 0.0618 0.0486 0.0655 0.0481 0.0468 0.0368 COBRA 0.0305 0.0215 0.0537 0.0395 0.0619 0.0462 - - URM 0.0733 0.0488 0.0929 0.0671 0.0888 0.0619 0.0724 0.0476 RI +70.9% +49.7% +50.3% +38.1% +35.6% +9.2% +26.1% +18.1% -
Analysis: URM demonstrates a massive performance gain over all other methods, including recent strong LLM-based baselines like
IDGenRec
andP5
. For instance, on the Sports dataset, it achieves a 70.9% relative improvement inHR@5
over the best baseline. This highlights the effectiveness of the URM framework.
-
-
Industrial Dataset:
-
The following is a transcription of Table 2, showing URM's multi-task performance on the industrial dataset.
Model Learning Method CPR RSA RSB RSC SR LR LIR PPR RQ AVG Two-tower Model STL 0.129 0.271 0.166 0.129 0.069 0.066 0.117 0.146 0.355 0.161 MTL 0.120 0.205 0.166 0.135 0.064 0.115 0.103 0.173 0.257 0.149 Transformer- based Model STL 0.198 0.409 0.293 0.208 0.104 0.115 0.213 0.143 0.593 0.253 MTL 0.192 0.390 0.319 0.221 0.076 0.218 0.207 0.401 0.744 0.308 Attention- DNN STL 0.253 0.477 0.338 0.260 0.106 0.213 0.251 0.353 0.651 0.323 MTL 0.238 0.456 0.375 0.277 0.062 0.336 0.265 0.478 0.671 0.351 MTL-SharedBottom 0.243 0.442 0.376 0.270 0.072 0.337 0.224 0.505 0.745 0.357 MTL-MMoE 0.233 0.439 0.375 0.257 0.070 0.325 0.218 0.491 0.736 0.349 MTL-PLE 0.256 0.451 0.397 0.274 0.062 0.327 0.224 0.512 0.761 0.363 URM MTL 0.263 0.530 0.439 0.362 0.093 0.285 0.240 0.581 0.835 0.403 -
Analysis: URM outperforms all baselines, including the strong
Attention-DNN+PLE
model, achieving an 11.0% relative improvement on the average recall. Unlike other models that suffer from the seesaw phenomenon (e.g., MTL performance is sometimes worse than STL), URM consistently benefits from multi-task training, achieving the best results on 6 out of 9 tasks.
-
6.2 Ablation Studies
-
Multi-Query Representation:
该图像是图3,展示了查询令牌数量 对CPR R@1000性能指标的影响。此图表显示,随着查询令牌数量 从1增加到128,CPR R@1000的值呈现持续上升的趋势,表明增加查询令牌数量有助于提升模型的检索性能。
Figure 3 shows that as the number of query tokens () increases from 1 to 128, the
CPR R@1000
metric consistently improves. This validates the hypothesis that using multiple query vectors enhances the model's expressive power, allowing it to better capture the diverse set of target items. -
Matrix Decomposition:
-
The following is a transcription of Table 3.
V All Unseen 0.256 0.116 0.152 0.101 0.263 0.130 -
Analysis: Using only performs well on all items but less so on unseen items. Using only is weaker overall but provides a baseline for unseen items. Combining them () achieves the best performance on both all items and, crucially, unseen items. This confirms that the two components effectively capture discriminability and transferability, respectively.
-
-
Probabilistic Sampling:
-
The following is a transcription of Table 4.
T Recall Precision 1 0.2% 2 2.1% 3 41.7% 4 91.0% 5 91.1% -
Analysis: This table shows the precision of the sampling algorithm compared to a full search. With just one step (), the precision is very low. However, it rapidly improves, reaching 91.0% precision at . This demonstrates that the iterative sampling algorithm is highly effective at approximating the full softmax result with a fraction of the computation.
-
6.3 Universal Retrieval Performance
-
Multi-Task Learning:
-
The following is a transcription of Table 5.
(a) multi-scenario retrieval Objective RSA R@1000 RSB R@1000 RSC R@1000 CPR 0.440 0.335 0.278 RSA 0.530 0.409 0.240 RSB 0.522 0.439 0.257 RSC 0.444 0.327 0.362 RI +20.5% +31.0% +30.2% (b) serendipity retrieval Objective SR R@1000 Percent of New Category CPR 0.051 18.8% SR 0.093 46.2% RI +82.3% +145.7% -
Analysis: These tables show that URM is highly sensitive to the input text objective. When prompted with a specific scenario objective (e.g.,
RSA
), its performance on that scenario's metric improves dramatically (Table 5a). Similarly, when prompted for serendipity (SR
), it not only improves on the SR metric but also massively increases the proportion of novel categories in its output (Table 5b), confirming its ability to adapt its behavior based on instructions.
-
-
Zero-Shot Learning:
-
The following is a transcription of Table 5(c).
(c) hybrid objectives Objective RQ R@1000 Percent of Long-tail Items RQ 0.835 79.6% LIR 0.630 81.6% RQ × LIR 0.836 82.4% 该图像是图4的折线图,展示了模型在不同查询频率下对“已见查询”和“未见查询”的性能(RQ R@1000)。随着查询频率的对数增加,两种查询的性能均呈现上升趋势。在较低频率下,“未见查询”的性能略低于“已见查询”,但在中等频率(LOG2(Query Frequency)约为8-10)时,两者的性能表现相近或“未见查询”略高。
-
Analysis: URM can generalize to objectives it has not been explicitly trained on. By combining the prompts for
RQ
andLIR
, the model maintains high performance on the query task while simultaneously increasing the proportion of long-tail items (Table 5c). Figure 4 shows that the model performs almost as well on queries it has never seen during training as it does on seen queries, demonstrating strong generalization capabilities.
-
6.4 Online Results
-
The following is a transcription of Table 6.
Metric RI Revenue +3.01% CTR +0.78% CVR +1.24% #Long-tail Items +2.23% -
Analysis: This is the most critical result, demonstrating real-world business impact. A 3.01% increase in advertising revenue is a very significant gain for a large-scale platform. The simultaneous improvements in Click-Through Rate (CTR), Conversion Rate (CVR), and the number of long-tail items recommended show that the model is not just optimizing for revenue at the expense of user experience or fairness, but providing more relevant and diverse recommendations overall.
7. Conclusion & Reflections
-
Conclusion Summary: The paper successfully demonstrates that LLMs can serve as a powerful and practical Universal Retrieval Model (URM) for industrial-scale recommender systems. By framing diverse retrieval objectives as natural language prompts, URM unifies what were previously disparate tasks into a single, flexible framework. The key technical innovations—multi-query representation, matrix decomposition for discriminability/transferability, and efficient probabilistic sampling—are crucial for making this approach expressive, learnable, and efficient enough for real-world deployment. The strong offline results and, more importantly, the significant online revenue gains provide compelling evidence of the value of this paradigm shift.
-
Limitations & Future Work (from the paper):
- Computational Cost: Despite optimizations, using LLMs is still more computationally expensive than traditional models. The online system requires asynchronous processing and data sampling to manage costs, representing a trade-off between performance and efficiency.
- Task Versatility: While URM shows some zero-shot capabilities, it struggles to adapt to objectives that are entirely different from those seen during training. Expanding the diversity of training data (e.g., incorporating more search logs) is suggested as a path to improve generalization to novel prompts.
-
Personal Insights & Critique:
- Paradigm Shift: This work represents a significant step towards a more intelligent and flexible recommender system architecture. The idea of "prompting" a retriever for different goals (serendipity, long-tail, specific scenarios) is powerful and moves the field away from rigid, hard-coded model designs towards dynamic, language-instructable systems.
- Methodological Soundness: The three core technical contributions are well-motivated and directly address the primary challenges of using LLMs for retrieval at scale. The multi-query representation is an elegant solution to the expressiveness bottleneck, and the combination of matrix decomposition and probabilistic sampling is a practical approach to the learnability and efficiency problems.
- Potential Unstated Dependencies: The success of the
V_trans
component relies on having a powerful, general-purpose text embedding model. The quality of this external model could be a significant factor in URM's ability to handle cold-start items. - Future Impact: This paper could inspire a new wave of research into "instruction-tuned" recommender systems. Future work might explore more sophisticated prompting strategies (e.g., chain-of-thought reasoning to determine retrieval objectives), integrating multi-modal information more deeply, and developing more efficient LLM architectures tailored for retrieval tasks. The ability to directly optimize for fuzzy business metrics through online prompt tuning is a particularly impactful contribution for industrial applications.
Similar papers
Recommended via semantic vector search.
Controllable Multi-Interest Framework for Recommendation
This paper introduces ComiRec, a controllable multi-interest framework that captures diverse user interests and balances recommendation accuracy and diversity, outperforming state-of-the-art models on Amazon and Taobao datasets.
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
P5 unifies diverse recommendation tasks as text-to-text problems, using unified pretraining and personalized prompts to enable zero/few-shot prediction, enhancing knowledge transfer and generalization, paving the way for large-scale recommendation models.
IDGenRec: LLM-RecSys Alignment with Textual ID Learning
IDGenRec generates unique, semantically rich textual IDs for items, aligning LLMs with recommendation tasks. By jointly training a textual ID generator and LLM recommender, it surpasses existing sequential recommenders and enables strong zero-shot performance.
SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system
SPARC employs RQ-VAE for dynamic multi-interest discretization and end-to-end training, enabling adaptive user interest evolution and a probabilistic soft-search that actively explores novel interests, significantly enhancing recommendation performance and metrics.
Discussion
Leave a comment
No comments yet. Start the discussion!