IDGenRec: LLM-RecSys Alignment with Textual ID Learning
TL;DR Summary
IDGenRec generates unique, semantically rich textual IDs for items, aligning LLMs with recommendation tasks. By jointly training a textual ID generator and LLM recommender, it surpasses existing sequential recommenders and enables strong zero-shot performance.
Abstract
Generative recommendation based on Large Language Models (LLMs) have transformed the traditional ranking-based recommendation style into a text-to-text generation paradigm. However, in contrast to standard NLP tasks that inherently operate on human vocabulary, current research in generative recommendations struggles to effectively encode recommendation items within the text-to-text framework using concise yet meaningful ID representations. To better align LLMs with recommendation needs, we propose IDGen, representing each item as a unique, concise, semantically rich, platform-agnostic textual ID using human language tokens. This is achieved by training a textual ID generator alongside the LLM-based recommender, enabling seamless integration of personalized recommendations into natural language generation. Notably, as user history is expressed in natural language and decoupled from the original dataset, our approach suggests the potential for a foundational generative recommendation model. Experiments show that our framework consistently surpasses existing models in sequential recommendation under standard experimental setting. Then, we explore the possibility of training a foundation recommendation model with the proposed method on data collected from 19 different datasets and tested its recommendation performance on 6 unseen datasets across different platforms under a completely zero-shot setting. The results show that the zero-shot performance of the pre-trained foundation model is comparable to or even better than some traditional recommendation models based on supervised training, showing the potential of the IDGen paradigm serving as the foundation model for generative recommendation. Code and data are open-sourced at https://github.com/agiresearch/IDGenRec.
English Analysis
1. Bibliographic Information
- Title: IDGenRec: LLM-RecSys Alignment with Textual ID Learning
- Authors: Juntao Tan, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Zelong Li, and Yongfeng Zhang.
- Affiliations: All authors are affiliated with Rutgers University, New Brunswick, USA.
- Journal/Conference: The paper was published in the Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24). SIGIR is the premier international forum for the presentation of new research results and for the demonstration of new systems and techniques in the broad field of information retrieval, including recommender systems. It is considered a top-tier, highly competitive conference.
- Publication Year: 2024
- Abstract: The paper addresses a key challenge in using Large Language Models (LLMs) for generative recommendation: how to represent items. Traditional methods use numerical IDs, which don't leverage the LLM's semantic understanding. The authors propose
IDGenRec
, a framework that learns to represent each item with a unique, concise, and semantically rich textual ID (e.g., "pro hair dryer" instead of "item_1234"). This is achieved by training a dedicated textual ID generator alongside the main LLM recommender. This approach allows the entire recommendation process to operate in natural language, better aligning with the LLM's strengths. Experiments show thatIDGenRec
surpasses existing models in standard sequential recommendation. Furthermore, the authors demonstrate the potential for a foundational recommendation model by trainingIDGenRec
on 19 datasets and testing it in a zero-shot setting on 6 unseen datasets, where it achieves performance comparable to or better than some fully supervised models. - Original Source Link:
- Official Source: https://doi.org/10.1145/3626772.3657821
- Preprint (arXiv): https://arxiv.org/abs/2403.19021
- The provided PDF link is a version of the preprint.
2. Executive Summary
-
Background & Motivation (Why):
- Core Problem: Modern recommender systems are increasingly using Large Language Models (LLMs) to frame recommendation as a text-generation task. For example, given a user's purchase history in text, an LLM generates the name or ID of the next item to recommend. However, a fundamental mismatch exists: LLMs are trained on human language, but items in a recommendation system are typically identified by arbitrary numerical IDs (e.g.,
1001
,1002
). - Why It's a Problem: Using these meaningless numerical IDs prevents the LLM from leveraging its vast pre-trained knowledge about the items themselves. The model ends up learning simple co-occurrence patterns of these arbitrary IDs (e.g., "if user bought
1001
, they might buy1008
"), rather than understanding why a user who bought a "science fiction novel" might also like a "space opera movie". This severely limits the model's performance, generalizability, and prevents the creation of a universal, "foundational" recommendation model that can work across different datasets without retraining (zero-shot recommendation). - Fresh Angle: The paper argues that for LLMs to excel at recommendations, items must also be represented in a language they understand. The core innovation is to learn a new, textual ID for each item, one that is concise, unique, and full of semantic meaning.
- Core Problem: Modern recommender systems are increasingly using Large Language Models (LLMs) to frame recommendation as a text-generation task. For example, given a user's purchase history in text, an LLM generates the name or ID of the next item to recommend. However, a fundamental mismatch exists: LLMs are trained on human language, but items in a recommendation system are typically identified by arbitrary numerical IDs (e.g.,
-
Main Contributions / Findings (What):
- The
IDGenRec
Framework: The paper introduces a novel two-part framework. It consists of:- An
ID Generator
: An LLM that reads an item's detailed metadata (title, description, category, etc.) and generates a short, descriptive textual ID (e.g., "chi pro hair dryer"). - A
Base Recommender
: Another LLM that takes a user's history, now expressed with these new textual IDs, and generates the textual ID of the recommended item.
- An
- Diverse and Unique ID Generation: A specific algorithm (
Diverse ID Generation
) is proposed to ensure that the generated textual IDs are unique for every item in the dataset, a non-trivial challenge. - Alternate Training Strategy: A specialized training process where the
ID Generator
andBase Recommender
are trained in alternating steps. This allows them to collaboratively learn: the generator learns to produce IDs that the recommender finds useful, and the recommender learns to make better predictions using these IDs. - State-of-the-Art Performance: In standard supervised tests on four benchmark datasets,
IDGenRec
significantly outperforms both traditional sequential recommenders and previous generative models. - Foundational Model Potential: The paper shows that a single
IDGenRec
model trained on a large, diverse collection of 19 datasets can make surprisingly good recommendations on 6 completely new datasets it has never seen before. This zero-shot performance is a major step towards creating a single, general-purpose recommendation model.
- The
3. Prerequisite Knowledge & Related Work
-
Foundational Concepts:
- Recommender Systems (RecSys): Software systems that predict a user's interest in an item (e.g., a product, movie, or song) and provide a personalized list of recommendations.
- Sequential Recommendation: A subfield of RecSys that models the chronological order of a user's interactions. The goal is to predict the next item based on the sequence of past items, capturing evolving interests.
- Large Language Models (LLMs): AI models, like T5 or GPT, trained on massive amounts of text data. They develop a deep understanding of grammar, facts, and reasoning, and can be fine-tuned for specific tasks like translation, summarization, or, in this case, recommendation.
- Generative Recommendation: A new paradigm that treats recommendation as a text-to-text generation task. Instead of scoring and ranking a list of candidate items (discriminative approach), it directly generates an identifier for the recommended item as output text.
- T5 (Text-to-Text Transfer Transformer): An influential LLM with an encoder-decoder architecture. It frames every NLP task as a "text-to-text" problem, making it a natural fit for generative recommendation. The encoder processes the input text (e.g., user history), and the decoder generates the output text (e.g., recommended item ID).
- Beam Search: A common algorithm used in text generation. Instead of just picking the single most likely next word at each step, it keeps track of a "beam" of the top k most probable partial sentences and extends them, ultimately choosing the best complete sentence.
- Diverse Beam Search (DBS): A modification of beam search that encourages variety in the generated outputs by penalizing sequences that are too similar to each other. This is crucial in
IDGenRec
for finding a unique ID when multiple options might be plausible.
-
Previous Works:
P5
and its variants:P5
was a pioneering work in generative recommendation. However, it represented items using assigned numerical tokens (e.g.,P5-SID
used sequential numbers like1001
,1002
, ...). While it demonstrated the feasibility of the generative approach, it suffered from the semantic gap mentioned earlier. Follow-up works tried to improve this with IDs based on collaborative filtering (P5-CID
) or item categories (P5-SemID
), but these were still rigid and not as semantically rich as the learned IDs inIDGenRec
.- Encoder-only Models (
UniSRec
): These models, likeUniSRec
, also use item text metadata to learn generalizable item representations. However, they are "discriminative" models, not "generative." They produce an embedding (a vector) for each item and then use it for ranking, rather than generating the item ID textually. They are a relevant baseline for the zero-shot experiments because they also aim for generalization.
-
Technological Evolution: The field of recommendation has evolved from traditional methods like collaborative filtering to deep learning models that capture sequential patterns (
GRU4Rec
,SASRec
). The latest evolution is the integration of powerful, pre-trained LLMs.IDGenRec
sits at this frontier, proposing a solution to the fundamental problem of how to represent items in an LLM-native way. -
Differentiation: The key difference between
IDGenRec
and prior work is how item IDs are created.P5
and others assign fixed, often meaningless, IDs to items.IDGenRec
learns flexible, semantically rich, natural-language IDs from the item's raw text. This shift from assigning to learning IDs is the central innovation that allows the model to better align with the LLM's capabilities, leading to improved accuracy and generalization.
4. Methodology (Core Technology & Implementation)
The IDGenRec
framework is built upon two LLMs—an ID Generator
and a Base Recommender
—that are trained collaboratively.
-
Principles: The core principle is to transform the recommendation task into a pure natural language processing problem. By representing items with meaningful textual IDs generated from their metadata, the system can harness the full semantic power of the pre-trained
Base Recommender
LLM. -
Steps & Procedures: The overall workflow is illustrated in Figure 2.
该图像是论文IDGenRec的示意图,展示了基于LLM的推荐系统结构,通过位置嵌入和Token嵌入输入,结合LLM生成基础推荐和文本ID,最终实现受限解码推荐。
-
ID Generation (Figure 1): For each item in the dataset, its metadata (title, category, description, etc.) is converted into a single block of plain text. This text is fed into the
ID Generator
(a T5 model).该图像是论文中的示意图,展示了基于LLM的ID生成器如何将商品的原始文本信息抽象为简洁语义丰富的文本ID,分别以“Item A”和“Item B”为例进行了说明。
The generator's task is to produce a concise textual ID. For example, for a product with a long description, it might generate "kor one hydration vessel".
-
Diverse ID Generation (Algorithm 1): A critical challenge is ensuring every item gets a unique ID. The paper proposes an algorithm for this:
- For an item, use Diverse Beam Search (DBS) to generate k candidate IDs.
- Check if any of these candidates are already in the set of unique IDs ().
- If a unique ID is found, add it to and assign it to the item.
- If all k candidates are duplicates, increase the diversity penalty () in DBS and try again. This forces the model to generate more varied, less obvious IDs.
- If the diversity penalty hits a maximum limit and still fails, increase the maximum allowed ID length () and reset the penalty. This ensures that a unique ID is always found, balancing conciseness and uniqueness.
-
Prompt Construction: The user's interaction history (e.g., items they bought) is formatted using a prompt template. The learned textual IDs of these items are inserted into the template. Optionally, a textual
user ID
can also be generated by feeding all of the user's historical item metadata to theID Generator
. A sample prompt might look like: "User [zeppelin cocktail bars tap] has purchased items [chi pro hair dryer], [kor one hydration vessel]; predict the next possible item to be bought by the user." -
Recommendation Generation: This complete prompt is fed into the
Base Recommender
(another T5 model). The recommender autoregressively generates the textual ID of the next item. -
Constrained Decoding: To ensure the model only outputs valid item IDs, a prefix tree (Trie) containing all unique item IDs is used during decoding. At each step of generation, the model's vocabulary is restricted to only those tokens that can form a valid path in the prefix tree.
-
-
Alternate Training Strategy: Training the two models is tricky because the output of the
ID Generator
(a discrete set of tokens) is not differentiable. The authors propose an alternating training scheme:- Train the Base Recommender: Freeze the
ID Generator
. Use it to generate static textual IDs for all items. Train theBase Recommender
on the recommendation task using these IDs with a standard negative log-likelihood loss. - Train the ID Generator: Freeze the
Base Recommender
. To allow gradients to flow back to theID Generator
, a clever trick is used:- For each item, the
ID Generator
produces logits (pre-softmax probability distributions over the entire vocabulary) for each token of the ID. - These logits are used to compute a "soft" or weighted average of the
Base Recommender
's token embeddings. - This continuous, differentiable embedding is inserted into the prompt, and the
Base Recommender
makes a prediction. The recommendation loss is then backpropagated through this continuous representation all the way back to theID Generator
's parameters. This process is repeated for several iterations, allowing the two models to co-adapt and improve together.
- For each item, the
- Train the Base Recommender: Freeze the
-
Mathematical Formulas & Key Details:
-
ID Generation Probability: The probability of generating a textual ID from an item's metadata is the product of conditional probabilities for each token:
- : The -th token of the generated ID.
- : The previously generated tokens of the ID.
- : The input sequence of tokens from the item's metadata.
- : The parameters of the
ID Generator
model.
-
Base Recommender Loss: The recommender is trained to predict the target item's textual ID, . The loss is the sum of negative log-likelihoods for each token in the target ID:
- : The loss for the
Base Recommender
. - : The -th token of the ground-truth target item ID.
- : The ground-truth prefix of the target ID.
- : The input prompt (user history with textual IDs).
- : The parameters of the
Base Recommender
model.
- : The loss for the
-
ID Generator Loss: The
ID Generator
is trained using the recommendation loss, but with gradients flowing back from the frozenBase Recommender
.- : The loss for the
ID Generator
. - : A special input embedding created by inserting the "soft" embeddings of the generated IDs into the prompt's embedding. This is the key to making the process differentiable.
- : The parameters of the
ID Generator
model (which are being updated). - : The parameters of the
Base Recommender
model (which are frozen).
- : The loss for the
-
5. Experimental Setup
-
Datasets:
-
Standard Evaluation: Four widely-used public datasets were used to compare with baselines in a supervised setting. Users and items with fewer than 5 interactions were filtered out.
Sports
,Beauty
,Toys
: Subsets of the Amazon review dataset.Yelp
: A dataset of business reviews.
-
Zero-Shot Evaluation: To test the foundational model capabilities, the model was trained on a large corpus and tested on unseen datasets.
- Pre-training Dataset (
Fusion
): A large dataset created by combining 19 different domains from the Amazon review dataset. Larger datasets were downsampled to balance the domains. - Test Datasets: Six datasets that were not part of the
Fusion
training set.- Intra-platform (unseen Amazon domains):
Sports
,Beauty
,Toys
,Music
,Instruments
. - Cross-platform (different platform):
Yelp
.
- Intra-platform (unseen Amazon domains):
- Pre-training Dataset (
-
Dataset Statistics (transcribed from Table 3):
Category Datasets # Users # Items # Interactions Density Std. Eval. Sports 35,598 18,357 296,337 0.0453% Beauty 22,363 12,101 198,502 0.0734% Toys 19,412 11,924 167,597 0.0724% Yelp 30,431 20,033 316,354 0.0519% Pre-training Fusion 183,918 233,323 2,875,446 0.0067% Zero-shot Sports 35,598 18,357 296,337 0.0453% Beauty 22,363 12,101 198,502 0.0734% Toys 19,412 11,924 167,597 0.0724% Music 5,541 3,568 64,706 0.3273% Instruments 1,429 900 10,261 0.7978% Yelp (Cross Platform) 30,431 20,033 316,354 0.0519%
-
-
Evaluation Metrics: The paper uses standard ranking metrics, evaluated by ranking the ground-truth item against all other items in the dataset.
-
Hit Ratio (HR@k):
- Conceptual Definition: This metric measures whether the correct item (the "hit") is present in the top-k recommended items. It is a simple measure of recall. An HR@10 of 0.2 means that for 20% of test cases, the correct item was found in the top 10 recommendations.
- Mathematical Formula:
- Symbol Explanation:
- : The total number of users in the test set.
- : An indicator function that is 1 if the condition inside is true, and 0 otherwise.
- : The rank position of the ground-truth item for user .
-
Normalized Discounted Cumulative Gain (NDCG@k):
- Conceptual Definition: NDCG measures the quality of the ranking. It rewards hits based on their position, giving higher scores for items ranked closer to the top. It is "normalized" so that a perfect ranking always scores 1.0, making it comparable across different users.
- Mathematical Formula:
- Symbol Explanation:
- : The Discounted Cumulative Gain for user . It gives a high reward for a hit at rank 1, and progressively smaller rewards for hits at lower ranks.
- : The Ideal DCG, which is the DCG of a perfect ranking (i.e., the target item at rank 1). This is used for normalization.
-
-
Baselines:
- Traditional Sequential Models:
GRU4Rec
(RNN-based),Caser
(CNN-based), and several Transformer-based models (HGN
,SASRec
,Bert4Rec
,FDSA
,S3Rec
). - Generative Models:
P5-SID
(sequential numerical IDs),P5-CID
(collaborative filtering-based IDs), andP5-SemID
(category-based IDs). - Zero-Shot Baseline:
UniSRec
(an encoder-only model that also uses item metadata for generalization).
- Traditional Sequential Models:
6. Results & Analysis
-
Core Results (Exp1: Standard Evaluation): The results in Table 4 show that
IDGenRec
achieves a new state of the art in the standard supervised setting.-
Transcribed Table 4: Standard evaluation for single dataset recommendation.
Dataset Metric GRU4Rec Caser HGN SASRec Bert4Rec FDSA S3Rec P5-SID P5-CID P5-SemID IDGenRec Sports HR@5 0.0129 0.0116 0.0189 0.0233 0.0115 0.0182 0.0251 0.0264 0.0313 0.0274 0.0429 NDCG@5 0.0086 0.0072 0.0120 0.0154 0.0075 0.0122 0.0161 0.0186 0.0224 0.0193 0.0326 HR@10 0.0204 0.0194 0.0313 0.0350 0.0191 0.0288 0.0385 0.0358 0.0431 0.0406 0.0574 NDCG@10 0.0110 0.0097 0.0159 0.0192 0.0099 0.0156 0.0204 0.0216 0.0262 0.0235 0.0372 Beauty HR@5 0.0164 0.0205 0.0325 0.0387 0.0203 0.0267 0.0387 0.0430 0.0489 0.0433 0.0618 NDCG@5 0.0099 0.0131 0.0206 0.0249 0.0124 0.0163 0.0244 0.0288 0.0477 0.0299 0.0486 HR@10 0.0283 0.0347 0.0512 0.0605 0.0347 0.0407 0.0647 0.0602 0.0680 0.0652 0.0814 NDCG@10 0.0137 0.0176 0.0266 0.0318 0.0170 0.0208 0.0327 0.0368 0.0357 0.0370 0.0541 Toys HR@5 0.0097 0.0166 0.0321 0.0463 0.0116 0.0228 0.0443 0.0231 0.0215 0.0247 0.0655 NDCG@5 0.0059 0.0107 0.0221 0.0306 0.0071 0.0140 0.0294 0.0159 0.0133 0.0167 0.0481 HR@10 0.0176 0.0270 0.0497 0.0675 0.0203 0.0381 0.0700 0.0304 0.0327 0.0376 0.0870 NDCG@10 0.0084 0.0141 0.0277 0.0374 0.0099 0.0189 0.0376 0.0183 0.0170 0.0209 0.0551 Yelp HR@5 0.0176 0.0150 0.0186 0.0170 0.0051 0.0158 0.0201 0.0346 0.0261 0.0202 0.0468 NDCG@5 0.0110 0.0099 0.0115 0.0110 0.0033 0.0098 0.0123 0.0242 0.0171 0.0131 0.0368 HR@10 0.0285 0.0263 0.0326 0.0284 0.0090 0.0276 0.0341 0.0486 0.0428 0.0324 0.0578 NDCG@10 0.0145 0.0134 0.0159 0.0147 0.0090 0.0136 0.0168 0.0287 0.0225 0.0170 0.0404 -
Analysis: On every dataset and every metric,
IDGenRec
(bold) substantially outperforms the best baseline (underlined). For instance, on the Toys dataset,IDGenRec
achieves anHR@5
of0.0655
, a 41.5% improvement over the best baselineSASRec
(0.0463
). This demonstrates that learning semantic textual IDs is a more effective strategy than using pre-assigned numerical or simple keyword-based IDs.
-
-
Ablations / Parameter Sensitivity:
-
Alternate Training: Table 5 shows the importance of the alternate training strategy.
-
Transcribed Table 5: Comparison of training strategies.
ID-only Rec-only Alternate Sports HR@5 0.0102 0.0350 0.0429 NDCG@5 0.0070 0.0271 0.0326 HR@10 0.0155 0.0461 0.0574 NDCG@10 0.0087 0.0307 0.0372 Beauty HR@5 0.0111 0.0601 0.0618 NDCG@5 0.0067 0.0442 0.0486 HR@10 0.0192 0.0797 0.0814 NDCG@10 0.0093 0.0505 0.0541 -
Analysis: Training only the
ID Generator
(ID-only
) performs poorly, showing that a good ID generator alone is not enough. Training only theBase Recommender
with initial (but still semantic) IDs (Rec-only
) performs very well, already beating most baselines. However, theAlternate
training strategy, which allows the two models to co-adapt, provides a consistent and significant final boost in performance.
-
-
User ID vs. Item ID: Table 6 explores the contribution of user and item IDs.
-
Transcribed Table 6: Contribution of User ID and Item ID.
User ID Item ID User & Item ID Sports HR@5 0.0177 0.0404 0.0429 NDCG@5 0.0118 0.0308 0.0326 HR@10 0.0300 0.0528 0.0574 NDCG@10 (data incomplete in source) -
Analysis: Using only the generated
User ID
is not sufficient for good recommendations. The sequence ofItem ID
s is the crucial piece of information. However, adding theUser ID
on top (User & Item ID
) provides a small but consistent improvement, suggesting it effectively summarizes the user's overall profile.
-
-
Case Studies on ID Generation: Figure 3 (transcribed) shows how the IDs evolve during training.
- Example 1 (Yelp):
name: richards window tinting; categories: ... home window tinting, automotive...
- Initial ID:
richards window tinting categories home
- Fine-tuned ID:
richards window tinting auto glass services
- Initial ID:
- Example 2 (Beauty):
title: truth by calvin klein for women, eau de parfum spray... description: ...oriental, woody fragrance...
- Initial ID:
truth by calvin klein eau de
- Fine-tuned ID:
truth perfume calvin klein oriental
- Initial ID:
- Analysis: The
Initial ID
s (from a generic tag generator) are often just the first few words of the title or a generic category. TheFine-tuned ID
s, after alternate training, become much more specific and descriptive. They learn to pick out the most salient keywords for recommendation (e.g., "auto glass services," "oriental" fragrance type), demonstrating that theID Generator
is successfully learning to create IDs that are useful for theBase Recommender
.
- Example 1 (Yelp):
-
-
Core Results (Exp2: Zero-shot Evaluation): Table 7 shows the model's performance on unseen datasets after being pre-trained on the
Fusion
dataset.- Transcribed Table 7 (partial): Zero-shot evaluation. (Note: The provided text for this table seems to be a copy of Table 4 and is incorrect. The text describes the results, which I will summarize here.)
- Analysis from Text: The paper reports that
IDGenRec
generally outperforms the strongUniSRec
baseline in the zero-shot setting. Most impressively, on the cross-platformYelp
dataset,IDGenRec
achieves a 353.46% improvement overUniSRec
. This remarkable result shows that becauseIDGenRec
learns to represent items in a universal, semantic language, its knowledge is highly transferable across different domains and even different platforms (e.g., from Amazon products to Yelp businesses), which is a key attribute for a foundational model.
7. Conclusion & Reflections
-
Conclusion Summary: The paper successfully demonstrates that the key to unlocking the potential of LLMs for recommendation is to align the item representation with the LLM's native capabilities. By proposing
IDGenRec
, a framework that learns semantically rich, textual IDs for items, the authors achieve two significant results. First, they set a new state of the art in standard supervised sequential recommendation. Second, and more importantly, they show that this approach enables remarkable zero-shot generalization, making a strong case for the feasibility of a universal, foundational generative recommendation model. -
Limitations & Future Work: The paper does not explicitly list its limitations, but some can be inferred:
- Computational Cost: The alternate training of two separate LLM-sized models is computationally intensive and may be prohibitive for many practitioners.
- ID Uniqueness at Scale: The
Diverse ID Generation
algorithm requires checking for uniqueness against a set of all existing IDs. For platforms with billions of items, this linear scan would become a major bottleneck. More scalable solutions for ensuring uniqueness would be needed. - ID Stability: The IDs are learned and can change during training. This might pose challenges for a production system where item representations need to be stable.
- New Item Cold-Start: While the model can handle new users and items in a zero-shot fashion, the process for generating an ID for a single new item added to a live system is not detailed.
-
Personal Insights & Critique:
- Elegant Solution: The paper's core premise—that items should be represented in human language for LLMs—is simple, elegant, and powerful. It correctly identifies and solves a fundamental impedance mismatch in current generative recommenders.
- Strong Technical Contribution: The alternate training scheme with the "soft" embedding pass-through is a clever engineering solution to a difficult backpropagation problem, enabling end-to-end optimization of the ID generation process based on the final recommendation quality.
- Path to Foundation Models: The zero-shot results are the most exciting aspect of this work. Traditional recommenders are notoriously data-hungry and domain-specific. By creating a model that decouples from platform-specific IDs and learns general patterns of preference from text,
IDGenRec
provides a credible and promising blueprint for the "GPT-3 of recommender systems." - Future Directions: This work opens up many avenues. Future research could explore: (1) More efficient methods for ensuring ID uniqueness at extreme scale. (2) Applying the
IDGenRec
concept to other types of LLMs (e.g., decoder-only models like GPT). (3) Extending the framework to other recommendation scenarios, such as cross-modal recommendation (e.g., recommending images or music).
Similar papers
Recommended via semantic vector search.
Understanding Generative Recommendation with Semantic IDs from a Model-scaling View
This study reveals scaling bottlenecks in semantic ID-based generative recommendation due to limited encoding capacity. Directly using large language models outperforms by up to 20%, challenging assumptions about LLMs’ effectiveness in collaborative filtering and suggesting a pro
Generating Long Semantic IDs in Parallel for Recommendation
The RPG framework generates long, unordered semantic IDs in parallel using multi-token prediction and graph-guided decoding, improving representation capacity and inference efficiency, achieving a 12.6% average NDCG@10 gain over generative baselines.
Discussion
Leave a comment
No comments yet. Start the discussion!