Paper status: completed

TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation

Published:06/15/2024

LLM-based Recommendation Systems (6)Generative Recommendation Systems (36)User-Item ID Tokenization (1)Masked Vector-Quantized Tokenizer (1)Capturing High-Order Collaborative Knowledge for LLMs (1)

Original Link PDF

Price: 0.100000

1 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

TokenRec is introduced as a novel framework for enhancing LLM-based recommendation systems by effectively tokenizing user and item IDs. Featuring the Masked Vector-Quantized Tokenizer and generative retrieval, it captures high-order collaborative knowledge, improving recommendati

Abstract

There is a growing interest in utilizing large-scale language models (LLMs) to advance next-generation Recommender Systems (RecSys), driven by their outstanding language understanding and in-context learning capabilities. In this scenario, tokenizing (i.e., indexing) users and items becomes essential for ensuring a seamless alignment of LLMs with recommendations. While several studies have made progress in representing users and items through textual contents or latent representations, challenges remain in efficiently capturing high-order collaborative knowledge into discrete tokens that are compatible with LLMs. Additionally, the majority of existing tokenization approaches often face difficulties in generalizing effectively to new/unseen users or items that were not in the training corpus. To address these challenges, we propose a novel framework called TokenRec, which introduces not only an effective ID tokenization strategy but also an efficient retrieval paradigm for LLM-based recommendations. Specifically, our tokenization strategy, Masked Vector-Quantized (MQ) Tokenizer, involves quantizing the masked user/item representations learned from collaborative filtering into discrete tokens, thus achieving a smooth incorporation of high-order collaborative knowledge and a generalizable tokenization of users and items for LLM-based RecSys. Meanwhile, our generative retrieval paradigm is designed to efficiently recommend top- $K$ items for users to eliminate the need for the time-consuming auto-regressive decoding and beam search processes used by LLMs, thus significantly reducing inference time. Comprehensive experiments validate the effectiveness of the proposed methods, demonstrating that TokenRec outperforms competitive benchmarks, including both traditional recommender systems and emerging LLM-based recommender systems.

Mind Map

In-depth Reading

English Analysis~38 min read · 56,371 chars

1. Bibliographic Information

1.1. Title

The title of the paper is "TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation".

1.2. Authors

The authors of the paper are:

Haohao Qu
Wenqi Fan
Zihuai Zhao
Qing Li, Fellow, IEEE

Their affiliations primarily appear to be with The Hong Kong Polytechnic University, based on the author biographies provided.

1.3. Journal/Conference

The paper is published on arXiv. arXiv is a reputable open-access preprint server for research papers in fields such as physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. While it is not a peer-reviewed journal or conference in itself, it is a widely used platform for researchers to disseminate their work quickly and receive feedback before, or in parallel with, formal peer review. Papers on arXiv are typically considered preprints, but many later undergo peer review and are published in top-tier venues.

1.4. Publication Year

The paper was published at 2024-06-15T00:07:44.000Z, which indicates a publication year of 2024.

1.5. Abstract

This paper introduces TokenRec, a novel framework designed to enhance Large Language Model (LLM)-based Recommender Systems (RecSys). The core problem addressed is the efficient and generalizable tokenization (indexing) of users and items, particularly in a way that captures high-order collaborative knowledge and is compatible with LLMs, while also overcoming challenges in handling new or unseen users/items and the computational inefficiency of traditional LLM inference. TokenRec proposes a Masked Vector-Quantized (MQ) Tokenizer that quantizes masked user/item representations learned from collaborative filtering into discrete tokens. This approach seamlessly integrates collaborative knowledge and offers generalizable tokenization. Additionally, it features a generative retrieval paradigm that efficiently recommends top- $K$ items by generating item representations and retrieving them from a pool, thereby circumventing the time-consuming auto-reggressive decoding and beam search processes typically used by LLMs. Comprehensive experiments on four datasets demonstrate that TokenRec outperforms both traditional and emerging LLM-based RecSys benchmarks, showcasing superior recommendation performance and better generalization to new users and items.

1.6. Original Source Link

Original Source Link: https://arxiv.org/abs/2406.10450
PDF Link: https://arxiv.org/pdf/2406.10450v3.pdf
Publication Status: The paper is available as a preprint on arXiv.

2. Executive Summary

2.1. Background & Motivation

The integration of Large Language Models (LLMs) into Recommender Systems (RecSys) has garnered significant interest due to LLMs' advanced language understanding, reasoning, and in-context learning capabilities. However, several critical challenges hinder the effective and efficient application of LLMs in personalized recommendations:

ID Tokenization Compatibility: LLMs are primarily designed to process natural language tokens. Representing the vast number of discrete user and item IDs (which can easily number in billions in real-world systems) as LLM-compatible tokens poses a significant challenge. Assigning a unique token to each user/item (known as Independent Indexing (IID)) leads to an unmanageable vocabulary size for LLMs. While methods like using textual descriptions or continuous embeddings exist, they often struggle to capture the rich, high-order collaborative knowledge inherent in user-item interactions.
Capturing High-Order Collaborative Knowledge: Traditional Collaborative Filtering (CF) methods, especially those leveraging Graph Neural Networks (GNNs), excel at learning complex, high-order relationships from user-item interaction graphs. The challenge is how to effectively embed this intricate collaborative knowledge into the discrete tokens that LLMs can process, without losing crucial information.
Generalizability to New/Unseen Users/Items (Cold-Start Problem): Existing tokenization approaches often struggle to generalize to users or items that were not part of the training data. This "cold-start" problem is pervasive in RecSys, as new users and items are constantly introduced. Retraining large LLM-based models for every new entry is computationally prohibitive.
Inference Efficiency: Many LLM-based RecSys rely on auto-regressive decoding and beam search to generate recommendations as textual outputs (e.g., item titles). These processes are computationally intensive and slow, making them impractical for real-time recommendation scenarios, which demand high efficiency. Furthermore, LLMs can suffer from hallucination (generating non-existent items) and context length limitations when provided with extensive interaction histories.

The paper's entry point is to address these challenges by proposing a novel framework that reimagines ID tokenization and recommendation generation for LLM-based RecSys. The innovative idea is to leverage Vector Quantization on GNN-learned collaborative representations of users and items, combined with a generative retrieval approach, to create an efficient, generalizable, and LLM-compatible recommendation system.

2.2. Main Contributions / Findings

The paper makes several significant contributions to the field of LLM-based Recommender Systems:

Novel ID Tokenization Strategy (Masked Vector-Quantized Tokenizer - MQ-Tokenizer): The paper introduces an effective and generalizable strategy for tokenizing user and item IDs. This MQ-Tokenizer quantizes masked user/item representations, which are initially learned from collaborative filtering (specifically GNNs), into discrete tokens. This approach seamlessly integrates high-order collaborative knowledge into LLM-compatible tokens. It incorporates two novel mechanisms:
- Masking Operation: To enhance the tokenizer's generalization capability by creating a challenging reconstruction task.
- K-way Encoder: For multi-head feature extraction and a corresponding $K$ -way codebook for robust latent feature quantization.
Efficient Generative Retrieval Paradigm: TokenRec proposes a generative retrieval mechanism for recommendations. Instead of relying on time-consuming auto-regressive decoding and beam search to generate textual item descriptions, TokenRec generates a generative representation of a user's preference and then efficiently retrieves the top- $K$ items from a pre-computed item pool based on similarity matching. This significantly reduces inference time and mitigates issues like hallucination and context length limitations.
Enhanced Generalizability to New Users/Items: The proposed MQ-Tokenizer and the overall TokenRec framework demonstrate strong generalization capabilities to new and unseen users and items. By updating only the lightweight GNN component for new entities and keeping the MQ-Tokenizer and LLM backbone frozen, TokenRec effectively addresses the cold-start problem without costly retraining of the entire LLM.
State-of-the-Art Recommendation Performance: Extensive experiments conducted on four real-world benchmark datasets (Amazon-Beauty, Amazon-Clothing, LastFM, and MovieLens 1M) validate the effectiveness of TokenRec. It consistently outperforms competitive benchmarks, including both traditional recommender systems (e.g., MF, LightGCN) and cutting-edge LLM-based recommender systems (e.g., P5, TIGER, CoLLM).
Efficiency Gains: The generative retrieval paradigm leads to substantial improvements in inference efficiency, achieving approximately 1000-1400% acceleration compared to existing LLM-based methods.
Concise Prompts: TokenRec can leverage collaborative knowledge embedded in user ID tokens to make recommendations with only user ID tokens as input, significantly reducing prompt length and computation resources, and circumventing LLM context length limitations.

3.1. Foundational Concepts

To understand TokenRec, it's essential to grasp several fundamental concepts from recommender systems and large language models:

Recommender Systems (RecSys): Systems designed to predict user preferences and suggest items (e.g., movies, products, music) that users might like. They aim to alleviate information overload by filtering irrelevant content and surfacing personalized recommendations.
Collaborative Filtering (CF): A widely used technique in RecSys that makes recommendations based on the principle that users who agreed in the past (e.g., liked similar items) will agree in the future. It works by identifying users with similar tastes or items with similar appeal.
- User-Item Interaction Matrix: A sparse matrix where rows represent users, columns represent items, and entries indicate interactions (e.g., ratings, purchases, clicks).
- High-Order Collaborative Knowledge: This refers to complex, indirect relationships between users and items that go beyond direct interactions. For example, "users who like item A also like item B, and users who like item B also like item C, so users who like item A might like item C." This knowledge is crucial for understanding nuanced preferences.
Matrix Factorization (MF): A classic CF technique that decomposes the user-item interaction matrix into two lower-rank matrices: user latent factor matrix and item latent factor matrix. Each user and item is represented by a low-dimensional dense vector (embedding). The predicted interaction score is typically the dot product of the user and item embeddings.
Graph Neural Networks (GNNs): A class of neural networks designed to operate on graph-structured data. In RecSys, user-item interactions can be modeled as a bipartite graph (users and items as nodes, interactions as edges). GNNs can propagate information across this graph, effectively capturing high-order collaborative signals by aggregating information from neighbors.
- LightGCN: A simplified yet powerful GNN for recommendations that removes feature transformation and non-linear activation from traditional GNNs, focusing solely on neighborhood aggregation for learning user and item embeddings.
Large Language Models (LLMs): Deep learning models with billions of parameters, pre-trained on massive amounts of text data. They excel at understanding, generating, and reasoning with human language.
- Tokens: The fundamental units of text that LLMs process. These can be words, subwords, or characters. LLMs operate on a fixed vocabulary of tokens.
- Vocabulary Size: The total number of unique tokens an LLM can understand and generate.
- In-context Learning: The ability of LLMs to learn from examples provided in the prompt without explicit fine-tuning.
- Autoregressive Generation: The process by which LLMs generate text one token at a time, predicting the next token based on all previously generated tokens and the input prompt.
- Beam Search: A search algorithm used in sequence generation (like LLM text generation) to explore multiple possible sequences of tokens, aiming to find the most probable output sequence rather than just the single most likely next token at each step. It's more computationally expensive than greedy decoding.
- Hallucination: A phenomenon where LLMs generate plausible-sounding but factually incorrect or non-existent information. In RecSys, this could mean recommending non-existent item IDs or titles.
- Context Length Limitation: LLMs have a maximum number of tokens they can process in a single input (context window). Long user interaction histories can exceed this limit.
Vector Quantization (VQ): A technique that maps high-dimensional input vectors to a discrete set of codebook vectors (codewords). It involves learning a codebook, where each entry is a codeword (vector), and then representing an input vector by the index of the closest codeword in the codebook. This effectively converts continuous representations into discrete tokens.
- Codebook: A learned collection of discrete vectors (codewords or embeddings), each associated with a unique index (token).
- Quantization: The process of mapping a continuous input vector to a discrete codeword from the codebook.
Masking: A technique often used in self-supervised learning where parts of the input are intentionally hidden or corrupted, and the model is trained to reconstruct the original input. This forces the model to learn robust and comprehensive representations.
Metric Learning: A machine learning paradigm focused on learning a distance metric or similarity function from data. In RecSys, it's used to learn representations where similar items (or user preferences) are close in the embedding space, and dissimilar ones are far apart.
Projection Layer: A neural network layer (often an MLP) used to transform representations from one vector space to another, usually to align different modalities or dimensions.

3.2. Previous Works

The paper contextualizes its contributions against a backdrop of traditional and LLM-based recommender systems:

Traditional Collaborative Filtering (CF):
- Matrix Factorization (MF) [5]: A foundational CF method that decomposes user-item interaction matrices into low-rank user and item embeddings. It represents users and items with unique IDs.
- NeuCF [39]: The first deep neural network (DNN)-based model for CF, combining MF with neural networks to learn user and item embeddings.
- LightGCN [6], GTN [7], LTGNN [34]: Representative GNN-based CF methods that capture high-order collaborative knowledge by modeling user-item interactions as graphs. They learn user and item representations through message passing on the interaction graph. These methods provide the initial collaborative representations that TokenRec quantizes.
Sequential Recommendation Methods:
- SASRec [40]: An attention-based model for sequential recommendations, focusing on a user's recent interactions to predict the next item.
- BERT4Rec [41]: A bidirectional Transformer-based recommender that leverages the masked language model objective from BERT to predict masked items in a user's interaction sequence.
- S $^3$ Rec [42], CoSeRec [43]: Sequential recommendation models employing self-supervised learning techniques, often using contrastive learning, to learn robust sequence representations.
LLM-based Recommender Systems:
- Independent Indexing (IID): A naive approach where each user and item is assigned a unique token. The paper highlights its impracticality due to vocabulary explosion in large-scale systems.
- Textual Title Indexing [11], [13]: Uses item titles and descriptions to represent items, leveraging LLMs' in-vocabulary tokens. While avoiding vocabulary explosion, it may not capture collaborative knowledge effectively.
- P5 [12]: A pioneering framework that unifies diverse recommendation tasks (e.g., rating prediction, sequential recommendation, explanation generation) into a text-to-text generation paradigm using prompt-based pre-training. It uses positional and whole-word embeddings for users/items. The paper mentions its variants P5-RID (Random Indexing) and P5-SID (Sequential Indexing).
- POD [45]: Another LLM-based approach applying positional and whole-word embeddings.
- CID (Collaborative Indexing) [44]: A P5 variant that attempts to capture co-occurrence frequency for item indexing, showing that integrating collaborative knowledge can improve performance over random or sequential indexing.
- TIGER [46]: Uses residual vector quantization to condense textual data into a few semantic IDs for items, which are then used as tokens in a Sequence-to-Sequence Transformer for sequential recommendation. TIGER-G is a variant that incorporates graph-based collaborative knowledge.
- CoLLM [16], LlaRA [15], E4SRec [62]: These methods borrow the concept of soft prompts or use exogenous tokens with continuous embeddings to represent users and items, integrating collaborative embeddings into LLMs. The paper points out that continuous representations can challenge tight alignment with LLMs due to their discrete nature.
- META ID [63]: Suggests integrating collaborative knowledge into discrete tokens via clustering item/user representations from skip-gram models, but the paper argues it lacks a robust tokenizer for quantization.

3.3. Technological Evolution

The field of recommender systems has evolved from basic statistical methods to complex deep learning models:

Early CF (e.g., User-based/Item-based CF): Relied on direct similarity between users or items.
Matrix Factorization (MF): Introduced latent factors, allowing for more nuanced representations and better scalability.
Neural Collaborative Filtering (NCF): Integrated deep learning into CF, moving beyond simple dot products to capture non-linear relationships.
Graph Neural Networks (GNNs) for CF: Leveraged the graph structure of user-item interactions to capture complex, multi-hop (high-order) collaborative relationships, significantly improving representation learning.
Sequential Recommendation Models: Focused on the temporal order of user interactions, using architectures like RNNs, LSTMs, and Transformers (e.g., SASRec, BERT4Rec) to model dynamic preferences.
Large Language Models (LLMs) for RecSys: The latest frontier, attempting to harness the powerful language understanding and reasoning abilities of LLMs. This involves rephrasing recommendation as a language task (e.g., text-to-text generation, prompt-based recommendation).

This paper's work fits into the LLM-based RecSys era. It addresses the critical bottleneck of ID tokenization and inference efficiency, which are major challenges in making LLM-based RecSys practical and performant for real-world scenarios.

3.4. Differentiation Analysis

TokenRec differentiates itself from previous works, especially other LLM-based methods, in several key aspects:

Tokenization Strategy:
- Unlike IID (vocabulary explosion) or textual title indexing (limited collaborative knowledge), TokenRec explicitly integrates high-order collaborative knowledge derived from GNNs into discrete, LLM-compatible tokens via a Masked Vector-Quantized (MQ) Tokenizer.
- Compared to methods using continuous embeddings (e.g., CoLLM, LlaRA), TokenRec generates discrete tokens, addressing the potential misalignment between continuous representations and LLMs' inherently discrete token processing.
- While TIGER also uses vector quantization for semantic IDs from textual data, TokenRec's MQ-Tokenizer is specifically designed to incorporate graph-based collaborative knowledge and includes novel masking and K-way encoder mechanisms for enhanced robustness and generalization, which are not present in TIGER's approach. TokenRec focuses on learning representations directly from collaborative signals rather than just condensing textual information.
- Unlike META ID which suggests clustering, TokenRec provides a robust, learnable MQ-Tokenizer with specific design choices (masking, K-way encoder) for effective quantization.
Recommendation Paradigm:
- TokenRec adopts a generative retrieval paradigm instead of auto-regressive decoding and beam search, which are common in P5, CID, POD, and TIGER. This is a fundamental shift that significantly improves inference efficiency and mitigates hallucination and context length limitations. It generates a user's preference representation and then retrieves items, rather than generating item tokens sequentially.
Generalizability:
- TokenRec explicitly addresses the cold-start problem for new/unseen users and items. By leveraging a lightweight GNN to learn representations for new entities and keeping the MQ-Tokenizer and LLM backbone frozen, it achieves robust generalization without costly retraining. Other LLM-based methods often experience significant performance drops for unseen users.
Efficiency: The generative retrieval design makes TokenRec substantially more efficient at inference time compared to methods relying on auto-regressive generation.

4. Methodology

The TokenRec framework proposes a novel approach to integrate Large Language Models (LLMs) into recommender systems by addressing the core challenges of ID tokenization and efficient recommendation generation. It consists of two main modules: the Masked Vector-Quantized (MQ) Tokenizer for users and items, and a Generative Retrieval paradigm for recommendations.

4.1. Notations and Definitions

Let $\mathcal{U} = \{u_1, u_2, \ldots, u_n\}$ be the set of $n$ users and $\mathcal{V} = \{v_1, v_2, \ldots, v_m\}$ be the set of $m$ items. For a given user $u_i$ , $\mathcal{N}(u_i)$ denotes the set of items that user $u_i$ has interacted with in their history. Users $u_i$ and items $v_j$ are embedded into low-dimensional latent vectors, referred to as collaborative representations, denoted as $\mathbf{p}_i \in \mathbb{R}^d$ for user $u_i$ and $\mathbf{q}_j \in \mathbb{R}^d$ for item $v_j$ , where $d$ is the dimension of these vectors.

The traditional collaborative filtering (CF) goal is reformulated into a language model paradigm. Given an LLM, a textual prompt $\mathcal{P}$ , user tokens $\mathcal{T}_i$ and tokens for interacted items $\{\mathcal{T}_j | v_j \in \mathcal{N}(u_i)\}$ , the LLM aims to generate a representation $\mathbf{z}_i \in \mathbb{R}^d$ of items that user $u_i$ might like: $\mathbf{z}_{i}=\mathrm{LLM}\big({\mathcal{P}},{\mathcal{T}}_{i},\{{\mathcal{T}}_{j}|v_{j}\in{\mathcal{N}}(u_{i})\} \big) \quad (1)$ Here, $\mathrm{LLM}(\cdot)$ represents the large language model's processing function, $\mathcal{P}$ is the prompt, $\mathcal{T}_i$ are the tokens for user $u_i$ , and $\{\mathcal{T}_j | v_j \in \mathcal{N}(u_i)\}$ are the tokens for items in user $u_i$ 's interaction history. The interacted items are placed in a non-sequential way to align with CF settings.

4.2. An Overview of the Proposed Framework

The overall framework of TokenRec consists of two main modules, as illustrated in Figure 6 (Figure 2 in the paper):

Masked Vector-Quantized (MQ) Tokenizer for Users and Items: This module addresses the ID tokenization challenge. It learns specific codebooks and represents users and items with a list of special discrete tokens through encoder and decoder networks. This process aims to seamlessly integrate numerical IDs (users & items) into a natural language compatible form, incorporating high-order collaborative knowledge.
Generative Retrieval for Recommendations: This module focuses on user modeling via an LLM for personalized recommendations. It employs a generative retrieval paradigm to efficiently generate item representations and retrieve the $K$ -nearest items from the entire item set, producing a personalized top- $K$ recommendation list.

The overall framework of the proposed TokenRec, which consists of the masked vector-quantized tokenizer with a $K$ -way encoder for item ID tokenization and the generative retrieval paradigm for recommendation generation. Note that we detail the item MQ-Tokenizer while omitting the user MQ-Tokenizer for simplicity.

该图像是一个示意图，展示了TokenRec框架中的Masked Vector-Quantized Tokenizer和生成推荐的检索机制。左侧描述了如何通过GNN提取高阶协作知识，并利用K-way编码器进行用户和项目的分词表示，右侧展示了如何通过LLMs对用户和项目进行推荐生成。公式中提及了 $z$ 作为匹配评分的输入。

4.3. Masked Vector-Quantized Tokenizers for Users and Items

Instead of assigning a specific token to each user and item, which would lead to an explosion in vocabulary size, TokenRec proposes a novel tokenization strategy using vector quantization. This method represents each user and item with a set of discrete indices (tokens). To capture high-order collaborative knowledge, vector quantization is applied to well-trained representations learned from advanced Graph Neural Networks (GNNs). To enhance generalization and overcome noise inherent in cascading tokenization processes, a Masked Vector-Quantized Tokenizer (MQ-Tokenizer) is introduced.

The MQ-Tokenizer comprises three key components:

A masking operation on the input user/item representations.
A K-way encoder for multi-head feature extraction with a corresponding K-way codebook for latent feature quantization.
A K-to-1 decoder that reconstructs the input representations from the quantized features.

It's important to note that separate MQ-Tokenizers are designed for users (User MQ-Tokenizer) and items (Item MQ-Tokenizer), but they share the same architecture. For simplicity, the paper details the Item MQ-Tokenizer.

4.3.1. Collaborative Knowledge

The primary goal of the MQ-Tokenizer is to embed high-order collaborative knowledge into latent representations through vector quantization. Collaborative knowledge, often derived from user-item interactions, reveals deep behavioral similarities and is critical for accurate recommendations. Graph Neural Networks (GNNs) are highly effective at capturing these high-order collaborative signals on user-item interaction graphs. The MQ-Tokenizer quantizes these GNN-based collaborative representations into a small number of discrete tokens for each user and item. This means that users/items that are conceptually close in the collaborative latent space (i.e., have similar GNN-learned representations) will likely share similar tokens/indices, thereby aligning LLMs with recommendations by representing users and items with discrete, collaboratively-informed tokens.

4.3.2. Masking Operation

To create a more robust tokenizer with improved generalization capabilities, a masking operation is applied to the collaborative representations. This masking strategy forces the tokenizer to learn a more comprehensive understanding of the representations by reconstructing partially hidden inputs. An element-wise masking strategy $\mathcal{E}$ is introduced, sampled from a Bernoulli distribution: $\mathcal{E}\sim\mathrm{Bernoulli}(\rho) \quad (2)$ where $\rho$ is the masking ratio. The Bernoulli distribution assigns a value of 1 (keep) with probability $\rho$ and 0 (mask) with probability $1 - \rho$ . Given the original collaborative representations $\mathbf{p}_{i}$ for user $u_i$ and $\mathbf{q}_{j}$ for item $v_j$ , the masking process is applied as: $\mathbf{p}'_{i}=\mathrm{Mask}(\mathbf{p}_{i},\mathcal{E}), \mathbf{q}'_{j}=\mathrm{Mask}(\mathbf{q}_{j},\mathcal{E}) \quad (3)$ Here, $\mathbf{p}'_{i}$ and $\mathbf{q}'_{j}$ are the masked representations. The mask $\mathcal{E}$ is randomly regenerated at each training epoch, creating diverse samples to enhance the tokenizer's generalization.

4.3.3. K-way Encoder and Codebook

The masked collaborative representations are then tokenized using a novel K-way vector quantization framework. This involves:

K-way Encoder: A set of $K$ different encoders, denoted $\mathrm{Enc}^k(\cdot)$ , process the masked item representation $\mathbf{q}_{j}'$ (or user representation) to generate $K$ corresponding latent vectors. Each encoder can be implemented as a Multilayer Perceptron (MLP) with three hidden layers. $\mathbf{a}_{j}^{k} = \mathrm{Enc}^{k}(\mathbf{q}_{j}') = \mathrm{MLP}^{k}(\mathbf{q}_{j}') \quad (4)$ where $\mathbf{a}_{j}^{k} \in \mathbb{R}^{d_{c}}$ is the $k$ -th latent vector for item $v_j$ , and $d_c$ is its dimension. The use of $K$ different encoders allows for multiple perspectives or "attentions" to uncover different patterns in the input, improving generalization.
K-way Codebook: A learnable codebook $\mathbf{C} = \{\mathbf{c}^{1},\mathbf{c}^{2},\dots,\mathbf{c}^{k},\dots,\mathbf{c}^{K}\}$ is developed for items (and similarly for users). Each $\mathbf{c}^{k} \in \mathbb{R}^{L \times d_{c}}$ represents a sub-codebook associated with the $k$ -th encoder. Each sub-codebook contains $L$ codewords (token embeddings), where $\mathbf{c}_{l}^{k}$ is the $l$ -th codeword (embedding) in the $k$ -th sub-codebook.
Quantization: The encoded vectors $\{\mathbf{a}_{j}^{k}\}_{k = 1}^{K}$ are quantized into discrete tokens (indices) by finding the nearest neighbor in their respective sub-codebooks. For each encoded vector $\mathbf{a}_{j}^{k}$ and its corresponding sub-codebook $\mathbf{c}^{k}$ , the token $w_{j}^{k}$ is found by minimizing the Euclidean distance: $w_{j}^{k} = \arg \min_{l}\lVert \mathbf{a}_{j}^{k} - \mathbf{c}_{l}^{k}\rVert^{2} \quad (5)$ Here, $w_{j}^{k}$ is the index (codeword/ID token) of the nearest neighbor in the $k$ -th sub-codebook for item $v_j$ . Thus, a discrete ID of item $v_j$ is tokenized into $K$ discrete codebook tokens and their corresponding codeword embeddings: $\begin{array}{r}{\mathrm{item} v_{j}\to \mathrm{tokens}\colon \{w_{j}^{1},w_{j}^{2},\dots,w_{j}^{K}\}}\\ {\to \mathrm{tokens}^{\prime}{\mathrm{embeddings}}\colon [\mathbf{c}_{w^{1}}^{1},\mathbf{c}_{w^{2}}^{2},\dots,\mathbf{c}_{w^{K}}^{K}].} \end{array} \quad (7)$ This process is also applied to tokenize users.

4.3.4. K-to-1 Decoder

After the $K$ -way encoder and quantization, a K-to-1 decoder $\operatorname{Dec}(\cdot)$ is used for input reconstruction. It takes the $K$ different embeddings corresponding to the selected tokens from the $K$ -way codebook and reconstructs the original input representation. Specifically, for item $v_j$ and its quantized tokens $\{w_{j}^{1},w_{j}^{2},\ldots ,w_{j}^{K}\}$ , the decoder first performs average pooling on their embeddings and then passes the result through a three-layer MLP to generate the reconstructed representation $\mathbf{r}_{j}$ : $\begin{array}{r}{\mathbf{r}_{j} = \operatorname {Dec}(\{w_{j}^{1},w_{j}^{2},\dots,w_{j}^{K}\}) = \operatorname {MLP}(\frac{1}{K}\sum_{k = 1}^{K}\mathbf{c}_{w_{j}^{k}}^{k}).} \end{array} \quad (9)$

4.3.5. Learning Objective

To train the K-way encoder, codebook, and K-to-1 Decoder for both user and item MQ-Tokenizers, a combined learning objective is defined:

Reconstruction Loss ( $\mathcal{L}_{recon}^{Item}$ ): This loss encourages the reconstructed representation $\mathbf{r}_{j}$ from the decoder to approximate the original GNN-learned representation $\mathbf{q}_{j}$ . $\mathcal{L}_{r e c o n}^{I t e m} = \| \mathbf{q}_{j} - \mathbf{r}_{j}\|^{2} \quad (10)$ A challenge here is that the $\arg\min$ operation in Equation (5) is non-differentiable. To address this, a straight-through gradient estimator is used, which directly passes the gradients of the decoder inputs (selected token embeddings) to the encoder outputs (encoded representations) during backpropagation.
Codebook Loss ( $\mathcal{L}_{cb}^{Item}$ ): This loss helps update the item's K-way codebook $\mathbf{C}$ by pulling the selected token's embedding $\mathbf{c}_{w_k}^{k}$ closer to the output of the K-way encoder $\operatorname{Enc}^{k}(\cdot)$ . A stop-gradient operator $\mathrm{sg}[\cdot]$ is used to ensure that gradients only flow to the codebook and not to the encoder during this part of the loss calculation. $\mathcal{L}_{c b}^{I t e m} = \sum_{k = 1}^{K} \| \mathrm{sg}[\operatorname {Enc}^{k}(\mathbf{q}_{j}^{\prime}) ] - \mathbf{c}_{w_{k}}^{k} \|^{2} \quad (11)$ The $\mathrm{sg}[\cdot]$ operator sets the gradient of its argument to zero during backpropagation.
Commitment Loss ( $\mathcal{L}_{cm}^{Item}$ ): This loss prevents the encoded features from fluctuating too frequently between different codewords, ensuring a smooth gradient flow for the $\arg\min$ operation. Unlike the codebook loss, it only applies to the encoder weights. $\mathcal{L}_{c m}^{I t e m} = \sum_{k = 1}^{K}\| \mathrm{Enc}^{k}(\mathbf{q}_{j}^{\prime}) -\mathrm{sg}[\mathbf{c}_{w_{j}^{k}}^{k}]\|^{2} \quad (12)$

The overall optimization objective for the item MQ-Tokenizer is a weighted sum of these losses: $\mathcal{L}_{MQ}^{I t e m} = \mathcal{L}_{r e c o n}^{I t e m} + \mathcal{L}_{c b}^{I t e m} + \beta^{I t e m} \times \mathcal{L}_{c m}^{I t e m} \quad (13)$ where $\beta^{I t e m}$ is a hyper-parameter balancing the commitment loss. A similar objective is defined for the user MQ-Tokenizer: $\mathcal{L}_{MQ}^{U S E R} = \mathcal{L}_{r e c o n}^{U S E R} + \mathcal{L}_{c b}^{U S E R} + \beta^{U S E R} \times \mathcal{L}_{c m}^{U S E R} \quad (14)$

4.4. Generative Retrieval for Recommendations

This subsection describes how TokenRec leverages LLMs for recommendations using a generative retrieval paradigm. This involves tokenization & prompting, user modeling via LLM, and generative retrieval.

4.4.1. Tokenization & Prompts

Tokenization: The MQ-Tokenizers generate out-of-vocabulary (OOV) tokens for user and item IDs. This is crucial because standard LLM vocabularies (e.g., 32,000 for LLaMA) are far too small for millions or billions of users/items. By using $K \times L$ OOV tokens (where $K$ is the number of sub-codebooks and $L$ is tokens per sub-codebook), TokenRec can efficiently tokenize a massive number of users/items. For instance, $3 \times 512 = 1,536$ OOV tokens can tokenize 39,387 items. Textual content within prompts (if any) is handled by the LLM's native tokenizer (e.g., SentencePiece).
Prompts: Prompts guide the LLM. TokenRec designs prompts that use the OOV tokens generated by the MQ-Tokenizers to represent users and items.
- Prompt 1 (User ID Only): $I wonder what the user <u1-128> <u2-21> <u3-35> will like. Can you help me decide?$
- Prompt 2 (with User's Historical Interactions): $According to what items the user <u1-128> <u2-21> <u3-35> has interacted with: item <v1-42> item <v2-12> item <v3-97>. Can you describe the user's preferences?$ Here, $\langle \mathbf{u}k^{*}\rangle$ denotes the $k$ -th sub-codebook's OOV token, e.g., $\langle \mathbf{u}2 - 21\rangle$ is the 21st token in the second sub-codebook for user_03. Similarly for item tokens. The interactions in $\mathcal{N}(u_i)$ are randomly shuffled.

4.4.2. User Modeling via LLM

This component aims to capture user preferences and generate representations of items a user might like. The input $\mathcal{X}_i$ for user $u_i$ is formed by a prompt template $\mathcal{P}$ and the corresponding ID tokens: $\mathcal{X}_{i}\to (\mathcal{P},\mathcal{T}^{c_{u_{i}}})\text{or}(\mathcal{P},\mathcal{T}^{c_{u_{i}}},\{\mathcal{T}_{v_{j}}^{c}\vert v_{j}\in \mathcal{N}_{(u_{i})}\}) \quad (15)$ where $\mathcal{T}_{u_{i}}^{c}$ are the ID tokens from the user MQ-Tokenizer for $u_i$ , and $\{\mathcal{T}_{v_{j}}^{c}\vert v_{j}\in \mathcal{N}_{(u_{i})}\}$ are the ID tokens from the item MQ-Tokenizer for $u_i$ 's interacted items. Unlike conventional text-to-text generation (e.g., P5) where LLMs auto-regressively generate item tokens ( $\mathcal{T}_t = \mathsf{LLM}(\mathcal{X}_i,\mathcal{T}_{t,t - 1})$ ), TokenRec passes $\mathcal{X}_i$ through an LLM backbone, denoted $\mathsf{LLM4Rec}(\cdot)$ , to generate a hidden representation $\mathbf{h}_i$ : $\mathbf{h}_i = \mathsf{LLM4Rec}(\mathcal{X}_i) \quad (17)$ This $\mathbf{h}_i$ represents user $u_i$ 's generative preferences for next items. The LLM4Rec acts as a powerful query encoder, leveraging LLM capabilities in understanding diverse prompts, interpreting preferences, and generating desired outcomes beyond just text.

4.4.3. Generative Retrieval

This paradigm aims to overcome the inefficiencies and limitations (hallucination, context length) of auto-regressive generation. The hidden state $\mathbf{h}_i$ from $\mathsf{LLM4Rec}(\cdot)$ is projected into a latent representation $\mathbf{z}_i \in \mathbb{R}^d$ via a projection layer $\mathsf{Proj}(\cdot)$ : $\mathbf{z}_i = \mathsf{Proj}(\mathbf{h}_i) \quad (18)$ where $\mathsf{Proj}(\cdot)$ can be a three-layer MLP. This $\mathbf{z}_i$ is the generative representation of the next recommended items for user $u_i$ . Subsequently, TokenRec retrieves the $K$ -nearest items from the entire item set $\mathcal{V}$ . This is done by calculating similarity scores between $\mathbf{z}_i$ and the GNN-based collaborative representations $\mathbf{q}_{j}$ of all items $v_j$ (stored in a vector database). The predicted similarity score $y_{ij}$ for user $u_i$ towards item $v_j$ is calculated using cosine similarity: $y_{ij} = \frac{\mathbf{z}_i\mathbf{q}_j}{\|\mathbf{z}_i\|\|\mathbf{q}_j\|} \quad (19)$ The top- $K$ items with the highest $y_{ij}$ scores are recommended. This approach offers:

Efficiency: Bypasses time-consuming auto-regressive decoding and beam search.
Accuracy: Avoids hallucination by retrieving from a valid item pool.
Generalizability: Unseen items can be retrieved by simply updating the item representations pool without retraining the LLM.
Alignment: This two-tower-like structure facilitates alignment between textual query information and collaborative knowledge.

4.5. TokenRec's Training and Inference

4.5.1. Training

TokenRec uses a two-step training process to handle the gap between quantization and language processing:

Step 1. Training Users & Items MQ-Tokenizers: The MQ-Tokenizers for users and items are trained independently to quantize collaborative representations. The combined losses from Equation (13) and Equation (14) are used: $\mathcal{L}_{MQ}^{I t e m}$ and $\mathcal{L}_{MQ}^{U S E R}$ .
Step 2. Tuning the LLM4Rec for Generative Retrieval: After training, the MQ-Tokenizers are frozen. The LLM backbone (e.g., T5), LLM token embeddings, and the projection layer are then tuned. The objective is to learn to generate user preference representations ( $\mathbf{z}_i$ $z_{i}$ ) that are close to positive items and far from negative items in the GNN-learned item representation space ( $\mathbf{q}_j$ $q_{j}$ ). This is achieved using a pairwise ranking loss: $\mathcal{L}_{\mathrm{LLM4Rec}} = \left\{ \begin{array}{ll}1 - \mathrm{sim}(\mathbf{z}_i, \mathbf{q}_j), \mathrm{if} \lambda = 1\\ \mathrm{max}(0, \mathrm{sim}(\mathbf{z}_i, \mathbf{q}_j) - \gamma), \mathrm{if} \lambda = -1 \end{array} \right. \quad (20)$
- $\mathbf{z}_i$ : Generative item representation of user $u_i$ .
- $\mathbf{q}_j$ : Collaborative representation of item $v_j$ .
- $\mathrm{sim}(\cdot, \cdot)$ : A similarity metric, typically cosine similarity.
- $\lambda = 1$ : Indicates a positive pair (user $u_i$ has interacted with item $v_j$ ). The loss minimizes $1 - \mathrm{sim}(\mathbf{z}_i, \mathbf{q}_j)$ , maximizing similarity.
- $\lambda = -1$ : Indicates a negative pair (user $u_i$ has not interacted with item $v_j$ ). The loss maximizes $\mathrm{max}(0, \mathrm{sim}(\mathbf{z}_i, \mathbf{q}_j) - \gamma)$ , ensuring the similarity between the user's preference and a negative item is at least $\gamma$ less than a positive item.
- $\gamma$ : The margin value for negative pairs. During tuning, a 1:1 negative sampling ratio is used, selecting one random un-interacted item as a negative sample for each positive sample.

4.5.2. Inference

The inference process of TokenRec capitalizes on the generative retrieval framework, offering several advantages over traditional LLM-based approaches:

Efficient Recommendations: By generating a generative item representation ( $\mathbf{z}_i$ ) and then performing similarity-based retrieval from a pre-computed item pool, TokenRec bypasses the computationally expensive auto-regressive decoding and beam search processes. This significantly reduces inference costs and enables real-time recommendation.
Generalizability to New Users and Items: TokenRec can effectively handle cold-start scenarios. When new users or items are introduced, only the lightweight GNN model needs to be updated to learn their collaborative representations and update the vector database. The MQ-Tokenizers and the LLM backbone remain frozen. This is due to the robustness provided by the masking and K-way encoder mechanisms in the MQ-Tokenizer. The efficiency of GNN retraining is much higher than LLM fine-tuning.

The TokenRec's efficiency and generalization capability for new users and items during the inference stage. Rather than retraining the MQ-Tokenizers and LLM backbone, which can be computationally expensive and time-consuming, only the GNN needs to be updated for learning representations for new users and items.

该图像是示意图，展示了 TokenRec 框架如何通过更新向量数据库，实现对新用户和新项目的泛化。图中展示了用户池和项目池的更新过程，结合 GNN 和 MQ-Tokenizer 进行用户和项目表示的生成，最终实现顶级项目的推荐。
Concise Prompts: TokenRec can make recommendations using only user ID tokens (Prompt 1 in Section 4.4.1), without needing to include the user's historical interactions in the prompt. This is possible because collaborative knowledge is already embedded within the user ID tokens via the MQ-Tokenizer. This significantly reduces prompt length, saves computing resources, and helps circumvent the context length limitations of many LLMs.

5. Experimental Setup

5.1. Datasets

The experiments were conducted on four widely used real-world benchmark datasets to evaluate the effectiveness of TokenRec:

Amazon-Beauty (Beauty): E-commerce user-item interactions related to beauty products from amazon.com.
Amazon-Clothing (Clothing): E-commerce user-item interactions related to clothing products from amazon.com.
LastFM: Music artist listening records from users on the Last.fm online music system.
MovieLens 1M (ML1M): A collection of movie ratings made by MovieLens users.

The basic statistics of these datasets are provided in Table I. The maximum item sequence length was set to 100 to accommodate the input length of the T5 LLM backbone (512 tokens). For training, validation, and testing, a leave-one-out policy was used, where all but the last observation in a user's interaction history form the training set. Users' interaction histories were randomly shuffled to align with collaborative filtering methods (i.e., neglecting sequential patterns).

The following are the results from Table I of the original paper:

Datasets	User-Item Interaction
Datasets	#Users	#Items	#Interactions	Density (%)
LastFM	1,090	3,646	37,080	0.9330
ML1M	6,040	3,416	447,294	2.1679
Beauty	22,363	12,101	197,861	0.0731
Clothing	23,033	39,387	278,641	0.0307

5.1.1. Data Sample Example

While the paper does not provide a concrete example of a data sample (e.g., an actual user interaction entry), we can infer from the dataset descriptions:

LastFM: A data sample would look like (user_ID, artist_ID), indicating a user listened to a specific artist.
ML1M: A data sample would be (user_ID, movie_ID, rating), indicating a user rated a movie.
Amazon-Beauty/Clothing: A data sample would typically be (user_ID, product_ID), indicating a user purchased or interacted with a product.

These datasets are effective for validating the method's performance because they represent diverse recommendation scenarios (music, movies, e-commerce) and vary in scale and density, allowing for a robust evaluation of TokenRec's capabilities.

5.2. Evaluation Metrics

The quality of recommendation results is evaluated using two widely adopted metrics: Hit Ratio at K (HR@K) and Normalized Discounted Cumulative Gain at K (NDCG@K). Higher values for both metrics indicate better recommendation performance. The average metrics over all users in the test set are reported. The value of $K$ is set to 10, 20, and 30, with $K=20$ being the default for ablation studies.

5.2.1. Hit Ratio at K (HR@K)

Conceptual Definition: HR@K measures the recall of the recommendation list. It quantifies how often the target item (the one the user actually interacted with in the test set) appears within the top $K$ recommended items. If the target item is present in the top $K$ recommendations, it's considered a "hit" for that user. HR@K is the proportion of users for whom a hit occurred. It's a simple, intuitive measure of whether the model successfully suggested any relevant item.
Mathematical Formula: $\mathrm{HR@K} = \frac{\text{Number of users for whom the target item is in the top K recommendations}}{\text{Total number of users}}$
Symbol Explanation:
- Number of users for whom the target item is in the top K recommendations: The count of unique users for whom the ground truth item (the item the user interacted with in the test set) is found within the list of the top $K$ items suggested by the recommender system.
- Total number of users: The total number of users in the test set for whom recommendations are being generated.

5.2.2. Normalized Discounted Cumulative Gain at K (NDCG@K)

Conceptual Definition: NDCG@K is a measure of ranking quality that accounts for both the relevance of recommended items and their position in the list. It assigns higher scores if more relevant items appear at higher ranks (closer to the top of the list). It normalizes the Discounted Cumulative Gain (DCG) by the Ideal DCG (IDCG), which is the DCG of a perfectly ordered list of relevant items, making it comparable across different queries.
Mathematical Formula: First, Cumulative Gain (CG): $\mathrm{CG_K} = \sum_{i=1}^{K} \mathrm{rel}_i$ Then, Discounted Cumulative Gain (DCG): $\mathrm{DCG_K} = \sum_{i=1}^{K} \frac{2^{\mathrm{rel}_i} - 1}{\log_2(i+1)}$ Finally, Normalized Discounted Cumulative Gain (NDCG): $\mathrm{NDCG@K} = \frac{\mathrm{DCG_K}}{\mathrm{IDCG_K}}$
Symbol Explanation:
- $K$ : The number of top recommendations being considered.
- $i$ : The rank (position) of an item in the recommendation list, from 1 to $K$ .
- $\mathrm{rel}_i$ : The relevance score of the item at rank $i$ . For implicit feedback (like in this paper where interactions are binary), $\mathrm{rel}_i$ is typically 1 if the item at rank $i$ is the target item, and 0 otherwise. For explicit feedback (e.g., ratings), $\mathrm{rel}_i$ would be the rating score.
- $\mathrm{DCG_K}$ : The Discounted Cumulative Gain for the top $K$ recommendations. It accumulates relevance scores, penalizing items that appear lower in the list by dividing by the logarithm of their rank.
- $\mathrm{IDCG_K}$ : The Ideal Discounted Cumulative Gain, which is the maximum possible DCG for the top $K$ items if the recommendation list were perfectly ordered by relevance. This serves as a normalization factor.

5.3. Baselines

TokenRec is compared against a comprehensive set of baselines, including traditional, sequential, and other LLM-based recommender systems:

Collaborative Filtering (CF) Methods:
- MF [38]: The classic Matrix Factorization method.
- NCF [39]: Neural Collaborative Filtering, a DNN-based CF model.
- LightGCN [6]: A simplified Graph Convolutional Network (GCN) for recommendations.
- GTN [7]: Graph Temporal Network, likely a GNN variant considering temporal aspects or specific graph structures.
- LTGNN [34]: Linear-time Graph Neural Networks for scalable recommendations.
Sequential Recommendation Methods:
- SASRec [40]: Self-Attentive Sequential Recommendation.
- BERT4Rec [41]: Bidirectional Encoder Representations from Transformers for Recommendation.
- S $^3$ Rec [42]: Self-Supervised Learning for Sequential Recommendation.
- CoSeRec [43]: Contrastive Self-Supervised Sequential Recommendation.
LLM-based Recommendation Methods:
- P5-RID (Random Indexing) [12]: P5 framework with randomly assigned item IDs.
- P5-SID (Sequential Indexing) [12]: P5 framework with item IDs indexed sequentially.
- CID [44]: Collaborative Indexing, a P5 variant incorporating co-occurrence frequencies.
- POD [45]: Prompt distillation for efficient LLM-based recommendation.
- TIGER [46]: Recommender systems with generative retrieval using semantic IDs.
- TIGER-G [46]: TIGER variant incorporating graph-based collaborative knowledge.
- CoLLM [16]: Integrating collaborative embeddings into Large Language Models for recommendation.
  
  These baselines are representative of the state-of-the-art in various recommendation paradigms, including traditional methods, sequence-aware models, and recent LLM-integrated approaches. They collectively provide a strong comparative context for TokenRec's performance.

5.4. Hyper-parameter Settings

The experimental setup involved specific hyper-parameter choices:

Implementation: The model is implemented using Hugging Face (likely for the LLM backbone) and PyTorch.
MQ-Tokenizer Hyper-parameters:
- Codebook number K (number of sub-encoders/sub-codebooks): Searched in the range of $\{1, 2, 3, 4, 5\}$ .
- Token number L at each sub-codebook: Searched in the range of $\{128, 256, 512, 1024\}$ .
- Masking ratio\rho $: Searched in the range of $\{0.0, 0.1, \ldots, 1.0\}$. * **LLM4Rec Fine-tuning Hyper-parameters:** * `Ratio of negative sampling`\lambda$ : Fixed at 1:1, meaning one randomly selected un-interacted item (negative sample) for each positive user-item interaction.
- Margin\gamma in the MQ-Tokenizer was investigated. This parameter controls how much of the input representation is masked.

The following figure (Figure 4 from the original paper) shows the performance change of TokenRec w.r.t.HR @20 and NDCG@20.

fig 1

Key Observations from Figure 1:

Benefit of Small Masking: Introducing a small masking ratio (e.g., $\rho = 0.1$ or $\rho = 0.2$ ) generally leads to performance improvements in TokenRec. This indicates that a moderate level of masking helps the tokenizer build a more robust and generalized understanding of the representations.
Optimal Masking Ratio: The optimal masking ratio appears to be around $\rho = 0.2$ , where TokenRec achieves its best performance.
Degradation with Excessive Masking: Performance degrades significantly when the masking ratio becomes too high (e.g., $\rho \ge 0.5$ ). Excessive masking makes the reconstruction task too difficult, hindering the tokenizer's ability to learn meaningful representations.

6.2.2.2. Effect of Codebook Settings K and L

The study analyzed the impact of the number of sub-codebooks ( $K$ ) and the number of tokens in each sub-codebook ( $L$ ) on TokenRec's performance.

The following figure (Figure 5 from the original paper) shows the effect of the number of sub-codebooks $K$ and the number of tokens in each sub-codebook $J$ under HR@20 and NDCG@20 metrics.

fig 2

Key Observations from Figure 2:

Impact of K (Number of Sub-codebooks):
- As the codebook depth (number of sub-codebooks $K$ ) increases from 1, a progressive improvement in recommendation performance is observed across all datasets. This validates the effectiveness of the K-way encoder and K-way codebook in capturing diverse patterns and enhancing quantization.
- However, the improvement becomes marginal when $K > 3$ . This suggests a trade-off between effectiveness and efficiency, with $K=3$ being a practical choice.
Impact of L (Tokens per Sub-codebook):
- The optimal value of $L$ varies with dataset size. For smaller datasets like LastFM/ML1M, an $L$ of 256 often provides a good balance of effectiveness and efficiency.
- For larger datasets like Amazon-Beauty/Clothing, a larger $L$ (e.g., 512) is often beneficial, indicating that more codebook tokens are needed to represent the greater diversity of users/items.
Importance of K-way Mechanism: Simply increasing $L$ with a single codebook (i.e., $K=1$ ) does not yield significant performance gains compared to using multiple sub-codebooks. This further emphasizes the effectiveness and necessity of the proposed K-way mechanism in the MQ-Tokenizer.

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper introduces TokenRec, a novel and comprehensive framework for LLM-based generative recommendations. TokenRec effectively addresses several critical challenges in the field:

ID Tokenization: It provides a generalizable strategy for tokenizing user and item IDs through its Masked Vector-Quantized (MQ) Tokenizer. This tokenizer quantizes masked representations derived from Graph Neural Networks (GNNs), thereby seamlessly incorporating high-order collaborative knowledge into LLM-compatible discrete tokens. The masking operation and K-way encoder mechanisms significantly enhance its robustness and generalization.
Inference Efficiency: TokenRec adopts an innovative generative retrieval paradigm that eliminates the need for time-consuming auto-regressive decoding and beam search processes typically used by LLMs. Instead, it generates a user's preference representation and retrieves top- $K$ items through similarity matching, leading to substantial gains in inference speed.
Generalizability: The framework demonstrates superior generalizability to new and unseen users and items (cold-start problem). By updating only the lightweight GNN component for new entities, while keeping the MQ-Tokenizers and LLM backbone frozen, TokenRec achieves robust performance without requiring expensive LLM retraining.
Performance: Extensive experiments on four real-world datasets confirm that TokenRec consistently outperforms both traditional and state-of-the-art LLM-based recommender systems across various evaluation metrics, while also demonstrating significant efficiency improvements.

7.2. Limitations & Future Work

The paper primarily focuses on overcoming existing limitations in LLM-based RecSys. While it doesn't explicitly list "limitations" of its own method, the challenges it addresses implicitly highlight areas for continued research and improvement in the broader field:

Reliance on GNNs for Initial Representations: TokenRec's effectiveness is heavily reliant on the quality of the initial collaborative representations learned by the chosen GNN (LightGCN in this case). Further research could explore the robustness of TokenRec to different GNN architectures or the integration of multi-modal information into these initial representations.
Prompt Sensitivity: Although TokenRec uses pre-defined prompts and shows robustness with unseen prompts, LLMs are generally known to be sensitive to prompt wording. Further research might explore adaptive or user-specific prompt generation strategies.
Static Codebook during LLM Tuning: The MQ-Tokenizers are frozen during the LLM fine-tuning stage. Exploring approaches that allow for more dynamic interaction or co-adaptation between the tokenizer and the LLM backbone could be a direction for future work.
Scalability of GNN Pre-training: While the GNN update for new users/items is efficient, the initial training of a powerful GNN on massive interaction graphs can still be resource-intensive. Research into more scalable GNN training or zero-shot GNN inference methods could further enhance the overall efficiency pipeline.
Exploring Deeper LLM Backbones: The paper uses T5-small. Investigating the performance with larger and more powerful LLMs, while managing their computational demands, could be a future research avenue.
Beyond Binary Interactions: The current setup focuses on implicit binary interactions. Extending TokenRec to handle explicit feedback (e.g., ratings) with varying relevance levels could be explored.

7.3. Personal Insights & Critique

TokenRec offers a highly impactful and pragmatic solution to some of the most pressing challenges in LLM-based Recommender Systems. Its key innovation lies in intelligently bridging the gap between the discrete nature of LLM tokens and the continuous, complex collaborative knowledge inherent in user-item interactions.

Key Strengths and Inspirations:

Elegant Solution to Tokenization: The MQ-Tokenizer is a clever approach to the ID tokenization problem. By quantizing GNN-learned embeddings, it ensures that the discrete tokens carry rich collaborative semantics, rather than being arbitrary identifiers or solely text-derived. The masking and K-way encoder further enhance the robustness and expressiveness of these tokens. This approach could be transferred to other domains where continuous, rich data needs to be discretely represented for LLM consumption, such as in bioinformatics (e.g., tokenizing protein sequences based on learned structural embeddings).
Practical Inference Strategy: The shift from auto-regressive generation to generative retrieval is a significant practical advancement. For real-time applications, the speedup is critical. This paradigm could inspire similar approaches in other LLM applications where a fixed set of "answers" needs to be efficiently selected rather than freely generated (e.g., LLM-based knowledge retrieval systems that select from a database of facts).
Strong Generalization: The ability to handle cold-start users and items without full LLM retraining is a major advantage for real-world deployment. The idea of efficiently updating only a lightweight component (GNN) and leveraging pre-trained LLM and tokenizer components is a powerful pattern for maintaining scalability and up-to-dateness in dynamic systems.
Concise Prompts: The finding that TokenRec can perform well with only user ID tokens in the prompt is insightful. It implies that the MQ-Tokenizer successfully compresses significant user preference information into these tokens, mitigating the context length limitation issue that plagues many LLM applications.

Potential Issues and Areas for Improvement:

Interpretability of Tokens: While the tokens carry collaborative knowledge, their direct interpretability to humans might be limited compared to textual descriptions. For explainable recommendation systems, further work might be needed to link these tokens back to understandable user preferences or item attributes.
Dependency on GNN Quality: The performance of TokenRec is fundamentally tied to the quality of the initial GNN-learned representations. If the chosen GNN struggles with sparse data or specific graph structures, TokenRec's foundation might be weakened. This suggests that robust and powerful GNNs are prerequisites.
Complexity of Multi-Stage Training: While effective, the two-stage training process (MQ-Tokenizer first, then LLM4Rec) introduces some complexity. Exploring end-to-end or more integrated training schemes, possibly with techniques like curriculum learning, could be an area for future research, although this might increase training instability.
Overhead of Codebook Management: For extremely large item sets, managing and updating the codebook (especially for $L$ tokens per sub-codebook) could still incur overhead, though significantly less than a full vocabulary expansion. Strategies for dynamic codebook updates or hierarchical quantization might be considered.

Overall, TokenRec makes a substantial contribution by offering a robust, efficient, and generalizable framework for LLM-based recommendations. It highlights the importance of careful design at the intersection of discrete and continuous representations and provides a strong blueprint for future advancements in this rapidly evolving field.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.