Pctx: Tokenizing Personalized Context for Generative Recommendation
TL;DR Summary
This paper introduces a personalized context-aware tokenizer generating context-dependent semantic IDs, enhancing generative recommendation personalization and improving NDCG@10 by up to 11.44% across datasets.
Abstract
Generative recommendation (GR) models tokenize each action into a few discrete tokens (called semantic IDs) and autoregressively generate the next tokens as predictions, showing advantages such as memory efficiency, scalability, and the potential to unify retrieval and ranking. Despite these benefits, existing tokenization methods are static and non-personalized. They typically derive semantic IDs solely from item features, assuming a universal item similarity that overlooks user-specific perspectives. However, under the autoregressive paradigm, semantic IDs with the same prefixes always receive similar probabilities, so a single fixed mapping implicitly enforces a universal item similarity standard across all users. In practice, the same item may be interpreted differently depending on user intentions and preferences. To address this issue, we propose a personalized context-aware tokenizer that incorporates a user's historical interactions when generating semantic IDs. This design allows the same item to be tokenized into different semantic IDs under different user contexts, enabling GR models to capture multiple interpretive standards and produce more personalized predictions. Experiments on three public datasets demonstrate up to 11.44% improvement in NDCG@10 over non-personalized action tokenization baselines. Our code is available at https://github.com/YoungZ365/Pctx.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
Pctx: Tokenizing Personalized Context for Generative Recommendation
1.2. Authors
- Qiyong Zhong (Zhejiang University)
- Jiajie Su (Zhejiang University)
- Yunshan Ma (Singapore Management University)
- Julian McAuley (University of California, San Diego)
- Yupeng Hou (University of California, San Diego)
1.3. Journal/Conference
This paper is published as a preprint on arXiv (arXiv preprint arXiv:2510.21276) and is scheduled for publication at a future date (2025-10-24T09:22:04.000Z). arXiv is a widely recognized open-access preprint server for research in physics, mathematics, computer science, and related disciplines. Papers published on arXiv are typically pre-peer-review versions.
1.4. Publication Year
2025
1.5. Abstract
Generative Recommendation (GR) models represent each user action as a sequence of discrete tokens, known as semantic IDs, and make predictions by autoregressively generating the next tokens. While GR offers advantages like memory efficiency and scalability, existing tokenization methods are often static and non-personalized, deriving semantic IDs solely from item features. This approach assumes a universal item similarity, overlooking individual user preferences. The autoregressive nature of GR models means that semantic IDs with common prefixes will have similar probabilities, implicitly enforcing this universal similarity standard.
To address this limitation, the paper proposes Pctx, a personalized context-aware tokenizer. Pctx incorporates a user's historical interactions to generate semantic IDs, allowing the same item to be tokenized into different semantic IDs based on varying user contexts. This design enables GR models to capture multiple interpretative standards for an item and produce more personalized predictions. Experiments conducted on three public datasets demonstrate Pctx's effectiveness, showing up to an 11.44% improvement in NDCG@10 over non-personalized action tokenization baselines. The authors have made their code publicly available.
1.6. Original Source Link
Official Source Link: https://arxiv.org/abs/2510.21276 PDF Link: https://arxiv.org/pdf/2510.21276v1.pdf Publication Status: Preprint on arXiv.
2. Executive Summary
2.1. Background & Motivation
The field of recommender systems has seen the rise of Generative Recommendation (GR) models, which offer significant advantages over traditional ID-based approaches. Instead of treating each item as a unique identifier, GR models convert user actions (interactions with items) into a few discrete tokens, called semantic IDs, and then use autoregressive models to predict the next semantic IDs in a sequence. This paradigm brings benefits such as enhanced memory efficiency, better scalability, and the potential to unify retrieval and ranking stages in recommendation pipelines.
However, a critical limitation of existing action tokenization methods in GR is their static and non-personalized nature. These methods typically generate semantic IDs based purely on item features (e.g., titles, descriptions), assuming that all users perceive items similarly. This universal item similarity assumption is problematic because, in reality, a single item can hold different meanings or appeal to different users based on their unique intentions and historical preferences. For example, a high-end watch could be an investment for one user, a gift for another, or a fashion statement for a third. Under the autoregressive generation paradigm, semantic IDs with shared prefixes are inherently assigned similar probabilities, which reinforces this static similarity standard and hinders the model's ability to provide truly personalized recommendations that account for diverse user interpretations.
The core problem the paper aims to solve is this lack of personalization and context-awareness in action tokenization for Generative Recommendation. The existing methods fail to capture the nuanced, user-specific ways items are interpreted, leading to less personalized and potentially suboptimal recommendations. The paper's innovative idea, or entry point, is to design a tokenizer that can adaptively generate semantic IDs not just from item features, but also by incorporating a user's historical interactions as a personalized context. This allows the same item to have multiple semantic IDs, each reflecting a different user-specific interpretation.
2.2. Main Contributions / Findings
The paper introduces Pctx, a novel personalized context-aware tokenizer for Generative Recommendation, making several key contributions:
-
Personalized Context-Aware Tokenization:
Pctxproposes the first tokenizer that explicitly incorporates a user's historical interactions into thesemantic IDgeneration process. This allows a single item to be mapped to differentsemantic IDsdepending on the user's specific context, thereby capturing diverse user interpretations and overcoming theuniversal item similarityassumption of previous methods. -
Balancing Generalizability and Personalizability: The paper addresses the challenge of creating
personalized semantic IDswithout sacrificing thegeneralizabilityoften sought intokenization. It introduces several strategies:- Adaptive Clustering:
Context representationsare clustered into a variable number of groups, withcluster centroidsserving as prototype representations. - Merging Infrequent Semantic IDs: Low-frequency
semantic IDsare merged with semantically similar ones of the same item to reduce sparsity. - Data Augmentation: The training process is enhanced by augmenting actions with alternative
semantic IDsfor the same item, both in model inputs and prediction targets, connecting different interpretations.
- Adaptive Clustering:
-
Multi-Facet Semantic ID Generation: During inference,
Pctxenables theGenerative Recommendationmodel to decode multiple potentialsemantic IDsfor the next item, each representing a different user interpretation. This allows for a richer understanding of recommendation probabilities and enhances theexplainabilityof the recommendations. -
Empirical Validation: Extensive experiments on three public datasets (Amazon Reviews: "Musical Instruments", "Industrial & Scientific", and "Video Games") demonstrate the effectiveness of
Pctx. The model achieves significant performance improvements, up to 11.44% inNDCG@10, compared to non-personalized action tokenization baselines, confirming thatpersonalized context-aware tokenizationleads to more accurate and relevant recommendations.In summary,
Pctxsuccessfully introduces personalization into the tokenization phase ofGenerative Recommendationmodels, enabling them to capture the multifaceted nature of user preferences and item interpretations, leading to substantial improvements in recommendation quality and explainability.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To fully understand Pctx, it is essential to grasp several fundamental concepts in recommender systems and machine learning:
- Sequential Recommendation: This is a subfield of recommender systems where the goal is to predict the user's next interaction (e.g., next item purchase, next movie watched) based on their sequence of past interactions. The order of interactions is crucial, as user preferences often evolve over time.
- ID-based Approaches: Traditional sequential recommenders (e.g.,
SASRec,GRU4Rec) typically represent each item with a unique integerID. TheseIDsare then embedded into dense vectors (item embeddings), which are learned during training. A major challenge is managing a largeembedding tablefor millions of items, leading to high memory consumption and scalability issues.
- ID-based Approaches: Traditional sequential recommenders (e.g.,
- Generative Recommendation (GR): A newer paradigm that contrasts with
ID-basedmethods. Instead of directly predicting itemIDs,GR modelsconvert each item or action into a sequence of discrete tokens, calledsemantic IDs. The model thenautoregressivelygenerates thesemantic IDsfor the next predicted item.- Benefits of GR:
- Memory Efficiency: By using a compact vocabulary of tokens,
GR modelscan significantly reduce memory usage compared to largeitem ID embeddingtables. - Scalability: The token-based approach allows for better scalability to large item catalogs.
- Unifying Retrieval and Ranking:
GR modelscan potentially perform both item retrieval (finding relevant items) and ranking (ordering them) within a single generative framework.
- Memory Efficiency: By using a compact vocabulary of tokens,
- Benefits of GR:
- Semantic IDs (Tokens): In
Generative Recommendation, asemantic IDis a short sequence of discrete tokens (e.g., ) that collectively represent an item or action. These tokens are drawn from a shared, compact vocabulary. The process of converting an item into itssemantic IDis calledtokenization. - Autoregressive Models: These models are designed to predict the next element in a sequence based on the preceding elements. In the context of
GR, anautoregressive modelgeneratessemantic IDstoken by token. For instance, after generating , it uses to predict , and so on. This mechanism implies thatsemantic IDssharing common prefixes will naturally receive similar prediction probabilities. - Tokenization (General): In computer science,
tokenizationis the process of breaking down a sequence of characters (or other data) into smaller units calledtokens. In natural language processing, this often involves splitting sentences into words or subword units. InGR, it means converting item features or representations into discretesemantic IDtokens. - Contrastive Learning: A self-supervised learning technique where the model learns to pull "similar" (positive) samples closer together in the embedding space while pushing "dissimilar" (negative) samples farther apart.
DuoRec, mentioned in the paper, usescontrastive learningto learnuser context representationsthat are more distinguishable, helping to mitigate representation degeneration where distinct inputs map to similar embeddings. - Residual Quantization Variational AutoEncoder (RQ-VAE): A neural network architecture used for
quantization.VAEs(Variational AutoEncoders) are generative models that learn a compressed, latent representation of data.Residual Quantizationinvolves quantizing residuals (errors) sequentially, allowing for finer-grained representation with multiple codebooks. InPctx,RQ-VAEis used to convert continuous item representations into discretesemantic IDtokens. - k-means++ Clustering: An advanced initialization method for the
k-means clustering algorithm.k-meansaims to partition observations into clusters, where each observation belongs to the cluster with the nearest mean (centroid).k-means++improves the quality of clustering by selecting initial cluster centers that are far apart, reducing the chance of converging to suboptimal solutions.Pctxuses it to condense multiplecontext representationsfor an item into a smaller set of representativecentroids. sentence-t5-base: A pre-trained sentence embedding model based on theT5(Text-To-Text Transfer Transformer) architecture. It is used to generate dense vector representations (embeddings) from textual features (e.g., item titles, descriptions).FAISS(Facebook AI Similarity Search): A library for efficient similarity search and clustering of dense vectors.PctxusesFAISSfor quantizing representations, which involves finding the closest codebook entries (quantization).PCA(Principal Component Analysis) and Whitening:PCA: A dimensionality reduction technique that transforms a set of correlated variables into a smaller set of uncorrelated variables called principal components.- Whitening: A data preprocessing step that transforms a set of variables so that they are uncorrelated and have unit variance. It often follows
PCA. These techniques are used inPctxto refine thesemantic qualityof item representations before quantization.
3.2. Previous Works
The paper discusses previous works primarily in two categories: Conventional Sequential Recommendation and Generative Recommendation.
3.2.1. Conventional Sequential Recommendation
These models typically rely on unique item IDs and embedding tables.
Caser(Tang & Wang, 2018): Appliesconvolutional neural networks (CNNs)to capture both sequential (temporal) and positional dependencies in user interaction sequences.HGN(Ma et al., 2019): Useshierarchical gating networksat both feature and instance levels to refine user preference representations.GRU4Rec(Hidasi et al., 2016): EmploysGated Recurrent Units (GRUs)to model sequential dynamics in user behaviors, an early and influential deep learning model for session-based recommendation.BERT4Rec(Sun et al., 2019): Adapts theBERT(Bidirectional Encoder Representations from Transformers) architecture to sequential recommendation. It uses amasked item predictionobjective, where some items in a sequence are masked, and the model tries to predict them using bidirectional context.- A core concept in Transformer-based models like
BERTandSASRecis theself-attentionmechanism. For an input sequence of vectors ,self-attentioncalculates the output where each is a weighted sum of all 's, with weights determined by their pairwise compatibility. $ \mathrm{Attention}(Q, K, V) = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V $ Here, (Query), (Key), and (Value) are matrices derived from the input embeddings. is the dimension of the keys.
- A core concept in Transformer-based models like
SASRec(Kang & McAuley, 2018): A prominent model that utilizes aunidirectional self-attentionmechanism to capture user interests along behavior trajectories. It focuses on how past items influence the next item in a sequence.DuoRec(Qiu et al., 2022): Addressesrepresentation degeneration(where distinct inputs get mapped to similar representations) in sequential modeling by usingcontrastive learningwithdropout-based augmentationandsupervised sampling. This is particularly relevant asPctxusesDuoRecfor user context encoding.
3.2.2. Generative Recommendation (GR)
These models are the direct predecessors and contemporaries that Pctx builds upon and differentiates itself from.
TIGER(Rajput et al., 2023): One of the foundationalGRmodels. It appliesRQ-VAEto discretize item embeddings intosemantic IDsand then uses agenerative retrieval paradigmfor recommendation.TIGERusesstatic tokenization, meaning each item is always mapped to the samesemantic ID.LETTER(Wang et al., 2024a): ExtendsTIGERby incorporatingcollaborative informationanddiversity-oriented constraintsinto theRQ-VAEprocess to improvesemantic IDquality. It also employsstatic tokenization.ActionPiece(Hou et al., 2025b): The firstcontext-aware tokenizationapproach mentioned by the paper. It merges frequent co-occurring features withprobabilistic weightingand introducesset permutation regularizationto better exploit action sequences. However, its context is typically limited toadjacent actions, making it less effective at capturing long-term user personalities.Multi-Identifier Tokenizers(e.g.,MTGRec(Zheng et al., 2025)): Assign multiplesemantic IDsto each item. However, the paper clarifies thatMTGRec's approach is for data augmentation during pre-training, samplingsemantic IDsfrom different epochs of the sameRQ-VAEmodel. It still relies on theuniversal similarity assumptionand is not inherently personalized based on user context.
3.3. Technological Evolution
The evolution in recommender systems has moved from simple collaborative filtering to matrix factorization, then to complex deep learning models for sequential recommendation (e.g., GRU4Rec, SASRec, BERT4Rec). These ID-based methods face challenges with memory and scalability due to large item embedding tables.
Generative Recommendation emerged as a solution to these issues, by tokenizing items into compact semantic IDs and using autoregressive models for prediction. Early GR models like TIGER and LETTER demonstrated the potential of this paradigm but inherited a critical limitation from ID-based systems: the static and non-personalized nature of item representations. They assumed a universal notion of item similarity.
ActionPiece took a step towards context-awareness, but its context was typically local. Pctx represents the next leap in this evolution by introducing truly personalized, long-term context-aware tokenization. It acknowledges that the semantic ID of an item should reflect the user's specific historical interactions, thereby enabling the GR model to capture diverse interpretations and generate highly personalized recommendations. Pctx fits into this timeline by addressing a fundamental personalization gap in the tokenization stage of Generative Recommendation.
3.4. Differentiation Analysis
Pctx differentiates itself from previous Generative Recommendation approaches primarily through its novel personalized context-aware tokenizer:
-
Compared to Static Tokenizers (
TIGER,LETTER):- Core Difference:
Static tokenizersassign a fixedsemantic IDto each item, regardless of the user or interaction context. This implicitly enforces auniversal item similaritystandard, as items with sharedsemantic IDprefixes will always receive similar prediction probabilities inautoregressive models. - Pctx Innovation:
Pctxovercomes this by tokenizing each item into differentsemantic IDsconditioned on the personalized user context (historical interactions). This allows the model to capture multiple interpretations of the same item, breaking free from the static similarity assumption.
- Core Difference:
-
Compared to Multi-Identifier Tokenizers (
MTGRec):- Core Difference: While
MTGRecalso assigns multiplesemantic IDsto an item, its primary mechanism isdata augmentationduring pre-training by samplingsemantic IDsfrom different model states. This approach does not inherently provide personalization based on user context; it still largely operates under theuniversal similarityassumption regarding the meaning of the differentsemantic IDs. - Pctx Innovation:
Pctx'smulti-identifier mappingis explicitly driven by distinct user interpretations derived frompersonalized context. Eachsemantic IDfor an item corresponds to a unique way a user might perceive or interact with that item, making the distinction inherently personalized and context-dependent.
- Core Difference: While
-
Compared to Context-Aware Tokenizers (
ActionPiece):-
Core Difference:
ActionPieceis indeedcontext-aware, but its context is typically limited toadjacent actionswithin a sequence. Thislocal contextcan capture immediate sequential patterns but often falls short in reflecting a user's broader, longer-termpersonalityorpreferences. -
Pctx Innovation:
Pctxexpands theperceived context windowto incorporate the entire user interaction history. Thislong-term contextallows the tokenizer to capture deeperpersonalitiesandevolving preferences, leading to more nuanced and accurate personalizedsemantic IDs.In essence,
Pctx's core innovation lies in its ability to dynamically adaptitem tokenizationbased on an individual user's comprehensive interaction history, thereby enablingGenerative Recommendationmodels to produce predictions that truly reflect diverse and personalized user intentions.
-
4. Methodology
4.1. Principles
The core idea behind Pctx is to overcome the limitations of static, non-personalized action tokenization in Generative Recommendation (GR) models. The fundamental principle is that the meaning or interpretation of an item can vary significantly based on the individual user's context, specifically their past interactions. Therefore, instead of assigning a fixed semantic ID to each item, Pctx aims to generate personalized semantic IDs that reflect these diverse user interpretations.
The theoretical intuition is that if a user has consistently interacted with items of a certain type, their interpretation of a new, potentially multi-faceted item will be colored by that historical preference. For example, a user who frequently buys story-driven games might perceive "StarCraft II" primarily for its narrative, while a user interested in real-time strategy might focus on its strategic gameplay. Pctx's design allows the Generative Recommendation model to capture these distinct perspectives by tokenizing the same item into different semantic IDs depending on the user context. This dynamic tokenization then enables the autoregressive model to generate predictions that are genuinely personalized, anticipating how a user might interpret a potential next item.
4.2. Core Methodology In-depth (Layer by Layer)
Pctx operates by taking both the current item and the user's interaction history as input to produce personalized semantic IDs. The framework involves deriving rich context representations, condensing them, constructing semantic IDs, and then training a Generative Recommendation model with these personalized semantic IDs. Figure 2 provides an overview of the Pctx framework.
该图像是一张示意图,展示了Pctx模型的整体框架,包括输入的特征和用户上下文,个性化语义ID的生成过程及自回归生成模型多方面语义ID的预测。左侧显示训练数据与上下文表示,中间部分为个性化语义ID融合,右侧为生成模型的多面语义ID生成机制及概率预测。
The overall framework of Pctx illustrates how user context is integrated into the tokenization process. On the left, user interaction history is used to derive context representations. These are then fused with item feature representations and quantized into personalized semantic IDs in the middle. The right side shows an autoregressive generative model that predicts these semantic IDs in a multi-faceted manner.
4.2.1. Problem Formulation
Following the standard sequential recommendation setting, the paper represents each user's historical interactions as a chronologically ordered sequence of items:
$
\mathcal{S} = [v_1, v_2, \ldots, v_n]
$
where:
-
denotes an interacted item from the item set .
-
is the number of past interactions in the sequence.
The traditional goal is to predict the next item given .
Generative Recommendationmodels reformulate this task. Each item istokenizedinto a sequence of discrete tokens: $ [m_1^i, m_2^i, \ldots, m_G^i] $ This sequence is referred to as asemantic ID, where is the fixed number of tokens persemantic ID. The task then becomes predicting thesemantic ID(s)of the target item given a sequence formed by concatenating thesemantic IDsof historical items.
4.2.2. Personalized Action Tokenization
This is the core component of Pctx, designed to tokenize an item based on the user's context.
4.2.2.1. Personalized Context Representation
This step involves obtaining rich context representations from the training data.
-
User Context Encoding: An auxiliary model is used to encode the user context for each item . This model takes the current item and its preceding historical interactions as input: $ \pmb{e}_{v_i}^{ctx} = f([v_1, v_2, \ldots, v_i]) $ where:
- is the context embedding (or representation) for item .
- represents the sequence of items up to and including .
- is a sequence model responsible for encoding this context.
The paper specifies that the goal is not merely next-item prediction, but to derive
user context representationsthat are sufficiently distinguishable to captureuser personalities. For this,DuoRec(Qiu et al., 2022) is adopted, which usescontrastive learningto preventrepresentation degeneration(where distinct inputs map to very similar embeddings).
-
Multi-Facet Condensation of Context Representations: An item might appear many times in the training data, each time with a different
user context, indicating diverse user interpretations. To manage the number ofsemantic IDsand avoid sparsity (where eachsemantic IDappears too rarely),Pctxgroups thesecontext representationsby the item and condenses them. Specifically, for each item ,k-means++clustering is applied to its associatedcontext representations(). This process generatescentroids, which serve as representativecontext representationsfor that item. The number of centroids, , is chosen proportionally to the number of availablecontext representationsfor , reflecting the richness of its interaction data while avoiding excessive splitting. The exact determination of is detailed in Section 4.2.4.
4.2.2.2. Personalized Semantic ID
After obtaining the representative context representations, these are tokenized into discrete semantic IDs.
-
Semantic ID Construction from Context Representations: In addition to the
context representations,Pctxincorporatesitem feature representationsto provide more comprehensive information. Afeature representationis derived for each item by encoding textual features (e.g., titles, descriptions) using a pre-trainedsentence embedding modellikesentence-t5-base(Ni et al., 2022). Thecontextandfeature representationsfor an item are then fused for each of its representative contexts. The fused representation for the -th context of item is given by: $ \pmb{e}{v_i, k} = \operatorname{concat}(\alpha \cdot \pmb{e}{v_i, k}^{ctx}, (1 - \alpha) \cdot \pmb{e}{v_i}^{feat}), \quad k \in {1, 2, \ldots, C{v_i}} $ where:- is the -th fused representation for item .
- is the -th encoded user
context representation(one of the centroids for ). - is a hyperparameter (a scalar weight between 0 and 1) that balances the contribution of the
context representationand thefeature representationto the fused embedding. - denotes the concatenation operation, combining the two vectors.
After obtaining these fused representations for all items,
PctxfollowsRajput et al. (2023)and appliesRQ-VAE(Residual Quantization Variational AutoEncoder, Zeghidour et al., 2021) toquantizeeach fused representation. This converts the continuous vector into a sequence ofG-1discrete tokens. An additional token is appended to this sequence to avoid conflicts betweensemantic IDs, resulting in a final -digitsemantic ID.
-
Redundant Semantic ID Merging: To further improve
generalizabilityand preventsparsitycaused by too many uniquesemantic IDs, two types of merging strategies are applied:-
Merging of duplicated semantic IDs: It's possible for an item to be assigned multiple
semantic IDsthat are identical except for their very last token. Since the last token is purely for conflict resolution and carries no semantic meaning, thesesemantic IDsare considered semantically equivalent.Pctxmerges these by retaining only one of them, ensuring the last token is only used to distinguishsemantic IDsbetween different items, not within the same item. -
Merging of infrequent semantic IDs: Some
semantic IDsmay appear very rarely in the dataset, potentially due to outliers or an excessive number of centroids during clustering. Theseinfrequent IDscan harmgeneralizationif kept.Pctxsets afrequency threshold. Anysemantic IDappearing less often than is removed, and all instances previously associated with it are re-assigned to the nearest remainingcentroidof the same item. This balancespersonalizationwith the need for sufficient training data for eachsemantic ID.As a result of these steps, each item can now be associated with multiple
semantic IDs, where eachsemantic IDrepresents a typicaluser interpretationunder different contexts.
-
4.2.3. Generative Recommendation Under Pctx
This section describes how the personalized semantic IDs are used for training and inference in a Generative Recommendation model.
-
Training with Data Augmentation: An
autoregressive encoder-decoder modelis trained on sequences ofpersonalized semantic IDsusing anext-token prediction loss(similar toRajput et al., 2023). During the tokenization process for training: when an item and itsuser contextare considered, a fusedpersonalized semantic representationis derived using Equation (2). Thesemantic IDfor is then selected as the one whosecentroid(from the representativecentroids) is closest to this fused representation. By performing this for all items in a user's sequence, training sequences ofpersonalized semantic IDsare constructed.To further enhance
data diversityand implicitly connect differentsemantic IDsfor the same item, anaugmentation strategyis introduced: Eachpersonalized semantic IDin a training sequence is randomly replaced with anothersemantic IDcorresponding to the same item with a probability . This means if an item hassemantic IDsand , and the chosen personalizedsemantic IDwas , there's a chance it might be swapped to . Even if the augmented sequence doesn't always reflect the most accurate user interpretation, it still represents a valid interaction possibility and helps the model generalize across different facets of an item. -
Multi-Facet Semantic ID Generation: During inference,
Pctxutilizesbeam search(followingRajput et al., 2023; Zheng et al., 2024) to generatesemantic ID predictions. Since an item can have multiplepersonalized semantic IDs, differentdecoding pathsinbeam searchcan lead to distinctpersonalized semantic IDsfor the same underlying item. Each of these predictedsemantic IDswill have an associated probability, representing the likelihood of a user perceiving a potential next item from a specific facet or interpretation. These probabilities for differentsemantic IDsof the same item are then aggregated to obtain the finalnext-item probabilities. Thismulti-facet semantic ID generationnot only provides the predicted items but also offers insights into the likelihoods of varioususer interpretations, thereby improving theexplainabilityof the recommendation process.
4.2.4. Determination of the Number of Centroids Per Item (Appendix B)
This section explains how Pctx determines , the number of context centroids for each item . The goal is to assign more centroids to items with higher user interpretation diversity while preventing excessive splitting that leads to sparsity. The strategy avoids a simple linear scaling with interaction count, which could over-allocate to popular items and under-allocate to rare ones.
The proposed strategy has three parts:
-
Interaction-aware Grouping:
- All items are sorted in ascending order based on their number of
context representations(i.e., how many times they appear in user histories). - These sorted items are then partitioned into groups.
- The proportion of items assigned to each group is determined by sampling discrete support points from a
normalized Gamma distributionover the integer interval[1, T]. - The
shape parameterof the Gamma distribution controls theskewnessof this allocation: a smaller favors items in the tail (less popular items), while a larger allocates more capacity towards head items (more popular items). - This ensures that items with similar
interaction volumesare grouped together.
- All items are sorted in ascending order based on their number of
-
Group-based Centroid Allocation:
- Each group (where ) is assigned a predefined number of
centroidsbased on anarithmetic progression. - The number of centroids for a group , denoted as , is calculated as:
$
\overline{C}^{(t)} \overset{\cdot}{=} C_{\mathrm{start}} + (t - 1) \cdot \delta
$
where:
- is the starting number of
centroids(for the first group). - is a small step size, which determines how much the number of centroids increases from one group to the next.
- is the starting number of
- All items within the same group are assigned the same number of centroids, . This provides a smooth scaling of capacity and ensures consistent treatment for items with similar interaction levels.
- Each group (where ) is assigned a predefined number of
-
Practical Adjustment:
- For
rare items(those with a number ofcontext representationssmaller than their initially assigned ), a simplification is applied. Instead of attempting to form multiple clusters from insufficient data, is set to 1. - Clustering is then performed with a single centroid for these rare items. This provides a robust solution for context condensation in the presence of
long-tailed data(many rare items), balancingspecialization(multiple centroids for diverse items) andgeneralization(single centroid for less diverse or rare items).
- For
4.3. Discussion
Pctx is positioned within the landscape of action tokenization paradigms in Generative Recommendation:
-
Static Tokenizers (
TIGER,LC-Rec): These assign fixedsemantic IDsto each item. As discussed, this imposes auniversal standard of item similaritydue to theautoregressivenature of GR models, limiting their representational power.Pctxdirectly addresses this by makingtokenizationcontext-dependent. -
Multi-Identifier Tokenizers (
MTGRec): While appearing to offer multiplesemantic IDsper item,MTGRec's approach is primarily adata augmentationstrategy. It samplessemantic IDsfrom different model states but does not inherently link these multiple IDs to distinctuser interpretationsbased on dynamicuser context.Pctxexplicitly ensures that each of the multiplesemantic IDsfor an item reflects a distinctuser interpretation. -
Context-Aware Tokenizers (
ActionPiece):ActionPiecetokenizes items based on theirsurrounding action context.Pctxbelongs to this family but extends the concept ofcontext.ActionPiecetypically considers onlyadjacent actions(local context), which may not fully capture a user'spersonality.Pctxincorporates the entire user interaction history, allowing it to capturepersonalitiesreflected inlonger-term contexts. This makesPctxa more comprehensivelypersonalized context-aware tokenizer.
5. Experimental Setup
5.1. Datasets
The experiments in this paper are conducted on three public datasets derived from the latest Amazon Reviews dataset (Hou et al., 2024). These datasets fall into different product categories, allowing for evaluation across diverse domains.
-
Source: Latest Amazon Reviews dataset.
-
Categories Used:
- "Musical Instruments" (Instrument)
- "Industrial & Scientific" (Scientific)
- "Video Games" (Game)
-
Preprocessing Pipeline (following Rajput et al., 2023; Zhou et al., 2020):
- Users and items with fewer than five interactions are excluded to mitigate data sparsity and noise.
- User-specific interaction histories are constructed and ordered chronologically.
- The maximum sequence length for user interactions is capped at 20 items.
-
Characteristics and Domain: These datasets represent real-world e-commerce interactions, covering diverse product types. "Musical Instruments" and "Video Games" are consumer-oriented, often reflecting personal hobbies and interests, while "Industrial & Scientific" might involve more professional or specialized purchasing patterns. This diversity helps validate the robustness of the proposed method.
The following are the results from Table 5 of the original paper:
Datasets Users Items Interactions Sparsity AvgLen Instrument 57,439 24,587 511,836 99.964% 8.91 Scientific 50,985 25,848 412,947 99.969% 8.10 Game 94,762 25,612 814,586 99.966% 8.60 -
Users: Number of unique users in the dataset.
-
Items: Number of unique items in the dataset.
-
Interactions: Total number of recorded interactions between users and items.
-
Sparsity: A measure of how few interactions there are compared to all possible interactions (User Item matrix). A high sparsity (close to 100%) indicates a very sparse dataset, which is common in recommendation. For example, 99.964% sparsity means only 0.036% of all possible user-item interactions have occurred.
-
AvgLen: Average length of user interaction sequences after preprocessing.
5.2. Evaluation Metrics
The paper uses two widely adopted metrics for evaluating recommendation system performance, particularly in ranking tasks: Recall@K and Normalized Discounted Cumulative Gain@K (NDCG@K). is set to 5 and 10 in the experiments.
5.2.1. Recall@K
-
Conceptual Definition:
Recall@Kmeasures the proportion of relevant items that are successfully retrieved within the top recommendations. In the context of sequential recommendation (specifically, the leave-one-out setting used here), it indicates whether the single ground-truth next item is present in the list of the top predicted items. A higherRecall@Kimplies that the model is better at identifying the relevant items. -
Mathematical Formula: For a single user with a single ground truth relevant item ,
Recall@Kis calculated as: $ \text{Recall@K}u = \begin{cases} 1 & \text{if } GT_u \in \text{Top-K recommendations for user } u \ 0 & \text{otherwise} \end{cases} $ When averaged over users, the overallRecall@Kis: $ \text{Recall@K} = \frac{1}{N} \sum{u=1}^{N} \text{Recall@K}_u $ -
Symbol Explanation:
- : The total number of users in the evaluation set.
- : An individual user.
- : The ground-truth relevant item for user (in this paper's setting, the next item in the user's sequence).
- : The list of the top items recommended by the model for user .
- : Denotes membership (i.e., whether an item is present in a set or list).
5.2.2. Normalized Discounted Cumulative Gain@K (NDCG@K)
-
Conceptual Definition:
NDCG@Kis a measure of ranking quality that accounts for the position of relevant items in the recommendation list. It assigns higher scores to relevant items that appear earlier in the list. The "Discounted Cumulative Gain" part sums the utility of items in the list, penalizing items at lower ranks. "Normalized" means it's divided by theIdeal DCG(the DCG of a perfectly ordered list), ensuring the score is between 0 and 1, regardless of the number of relevant items. A higherNDCG@Kindicates that relevant items are not only retrieved but also ranked highly. -
Mathematical Formula: First,
Discounted Cumulative Gain (DCG@K)is calculated: $ \text{DCG@K} = \sum_{i=1}^{K} \frac{2^{\text{rel}i} - 1}{\log_2(i+1)} $ Then, theIdeal DCG (IDCG@K)is calculated for a perfectly ordered list of relevant items: $ \text{IDCG@K} = \sum{i=1}^{|\text{Relevant items}|} \frac{2^{\text{rel}_{\text{ideal}, i}} - 1}{\log_2(i+1)} $ Finally,NDCG@Kis: $ \text{NDCG@K} = \frac{\text{DCG@K}}{\text{IDCG@K}} $ In the context of leave-one-out evaluation where there is only one ground-truth relevant item () at position in the recommended list (if found within top-K): $ \text{DCG@K}_u = \frac{1}{\log_2(p+1)} \quad \text{if } GT_u \text{ is at position } p \le K; \quad 0 \text{ otherwise} $ $ \text{IDCG@K}_u = \frac{1}{\log_2(1+1)} = 1 $ So for a single user, if is at position : $ \text{NDCG@K}u = \frac{1}{\log_2(p+1)} $ Otherwise, . The overallNDCG@Kis then the average over all users: $ \text{NDCG@K} = \frac{1}{N} \sum{u=1}^{N} \text{NDCG@K}_u $ -
Symbol Explanation:
- : The number of top recommendations considered.
- : The relevance score of the item at position in the recommended list. In binary relevance (item is either relevant or not), is 1 if the item is relevant, and 0 otherwise. In this paper's setting, it's 1 for the ground-truth next item, 0 for others.
- : The rank (position) of an item in the recommendation list.
- : The relevance score of the item at position in the ideal (perfectly ordered) recommendation list.
- : The rank (position) of the single ground-truth relevant item in the recommended list.
- : The logarithmic discount factor, which reduces the importance of relevant items found at lower ranks.
- : The total number of users.
5.3. Baselines
The paper compares Pctx against a comprehensive set of baselines, categorized into Conventional Sequential Recommendation and Generative Recommendation models.
5.3.1. Conventional Sequential Recommendation
These models predict the next item based on its unique ID.
Caser(Tang & Wang, 2018): Uses convolutional neural networks to capture sequential patterns.HGN(Ma et al., 2019): Leverages hierarchical gating networks for user preference representation.GRU4Rec(Hidasi et al., 2016): An early deep learning model using Gated Recurrent Units for session-based recommendations.BERT4Rec(Sun et al., 2019): Applies a bidirectional Transformer encoder with masked item prediction.SASRec(Kang & McAuley, 2018): A popular self-attentive sequential recommendation model.FMLP-Rec(Zhou et al., 2022): A fully MLP-based framework using learnable filters.HSTU(Zhai et al., 2024): Incorporates action-timestamp signals and hierarchical sequential transducers. (Still ID-based despite being recent)DuoRec(Qiu et al., 2022): Addresses representation collapse using contrastive learning. (Notably, Pctx uses DuoRec as its auxiliary context encoder).FDSA(Zhang et al., 2019): Employs a dual-stream self-attention design.S3-Rec(Zhou et al., 2020): Improves representation learning with self-supervised objectives.
5.3.2. Generative Recommendation
These models utilize action tokenization.
-
TIGER(Rajput et al., 2023): A pioneering GR model using RQ-VAE to discretize item embeddings into semantic IDs. (Static tokenizer) -
LETTER(Wang et al., 2024a): Extends TIGER by injecting collaborative information and diversity constraints. (Static tokenizer) -
ActionPiece(Hou et al., 2025b): Proposes a context-aware tokenization framework, but limited to local context (adjacent actions).These baselines are representative as they cover both traditional and modern sequential recommendation approaches, as well as the latest advancements in generative recommendation, including those with context-aware capabilities. This allows for a thorough comparison of
Pctx's personalized context approach.
6. Results & Analysis
6.1. Core Results Analysis
The experimental results demonstrate that Pctx consistently outperforms all baseline methods across all three datasets and evaluation metrics (Recall@5, Recall@10, NDCG@5, NDCG@10). This strongly validates the effectiveness of the proposed personalized context-aware tokenization approach.
The following are the results from Table 1 of the original paper:
| Methods | Instrument | Scientific | Game | |||||||||
| R@5 | R@10 | N@5 | N@10 | R@5 | R@10 | N@5 | N@10 | R@5 | R@10 | N@5 | N@10 | |
| Caser | 0.0241 | 0.0386 | 0.0151 | 0.0197 | 0.0159 | 0.0257 | 0.0101 | 0.0132 | 0.0330 | 0.0553 | 0.0209 | 0.0281 |
| HGN | 0.0321 | 0.0517 | 0.0202 | 0.0265 | 0.0212 | 0.0351 | 0.0131 | 0.0176 | 0.0424 | 0.0687 | 0.0281 | 0.0356 |
| GRU4Rec | 0.0324 | 0.0501 | 0.0209 | 0.0266 | 0.0202 | 0.0338 | 0.0129 | 0.0173 | 0.0499 | 0.0799 | 0.0320 | 0.0416 |
| BERT4Rec | 0.0307 | 0.0485 | 0.0195 | 0.0252 | 0.0186 | 0.0296 | 0.0119 | 0.0155 | 0.0460 | 0.0735 | 0.0298 | 0.0386 |
| SASRec | 0.0333 | 0.0523 | 0.0213 | 0.0274 | 0.0259 | 0.0412 | 0.0150 | 0.0199 | 0.0535 | 0.0847 | 0.0331 | 0.0438 |
| FMLP-Rec | 0.0339 | 0.0536 | 0.0218 | 0.0282 | 0.0269 | 0.0422 | 0.0155 | 0.0204 | 0.0528 | 0.0857 | 0.0338 | 0.0444 |
| HSTU | 0.0343 | 0.0577 | 0.0191 | 0.0271 | 0.0271 | 0.0429 | 0.0147 | 0.0198 | 0.0578 | 0.0903 | 0.0334 | 0.0442 |
| DuoRec | 0.0347 | 0.0547 | 0.0227 | 0.0291 | 0.0234 | 0.0389 | 0.0146 | 0.0196 | 0.0524 | 0.0827 | 0.0336 | 0.0433 |
| FDSA | 0.0347 | 0.0545 | 0.0230 | 0.0293 | 0.0262 | 0.0421 | 0.0169 | 0.0213 | 0.0544 | 0.0852 | 0.0361 | 0.0448 |
| S3-Rec | 0.0317 | 0.0496 | 0.0199 | 0.0257 | 0.0263 | 0.0418 | 0.0171 | 0.0219 | 0.0485 | 0.0769 | 0.0315 | 0.0406 |
| TIGER | 0.0370 | 0.0564 | 0.0244 | 0.0306 | 0.0264 | 0.0422 | 0.0175 | 0.0226 | 0.0559 | 0.0868 | 0.0366 | 0.0467 |
| LETTER | 0.0372 | 0.0580 | 0.0246 | 0.0313 | 0.0279 | 0.0435 | 0.0182 | 0.0232 | 0.0563 | 0.0877 | 0.0372 | 0.0473 |
| ActionPiece | 0.0383 | 0.0615 | 0.0243 | 0.0318 | 0.0284 | 0.0452 | 0.0182 | 0.0236 | 0.0591 | 0.0927 | 0.0382 | 0.0490 |
| Pctx | 0.0419 | 0.0655 | 0.0275 | 0.0350 | 0.0323 | 0.0504 | 0.0205 | 0.0263 | 0.0638 | 0.0981 | 0.0416 | 0.0527 |
| Improvements | +9.40% | +6.50% | +11.79% | +10.06% | +13.73% | +11.50% | +12.64% | +11.44% | +7.95% | +5.82% | +8.90% | +7.55% |
Key Observations:
- GR Models vs. ID-based Models: Generally,
Generative Recommendation (GR)models (TIGER, LETTER, ActionPiece, Pctx) achieve superior performance compared to conventionalID-based sequential recommendationapproaches. This confirms the benefits ofaction tokenizationand thegenerative retrieval paradigmfor improving recommendation quality. - ActionPiece's Strength: Among the baselines,
ActionPiecedemonstrates the best performance. This indicates that incorporatingcontext-aware action tokenization, even if limited to local context, provides stronger expressive power than static tokenization methods likeTIGERandLETTER. - Pctx's Superiority:
Pctxsignificantly outperforms all baselines on all four metrics (Recall@5,Recall@10,NDCG@5,NDCG@10) across all three datasets.- The improvements are substantial, reaching up to 11.44% in NDCG@10 over the best-performing baseline (
ActionPiece) on the "Scientific" dataset. This highlightsPctx's ability to provide more personalized and accurate predictions. - The "Scientific" dataset shows the largest percentage improvements for
Pctx, suggesting that personalized context might be particularly impactful in domains with potentially more diverse or specialized interpretations of items.
- The improvements are substantial, reaching up to 11.44% in NDCG@10 over the best-performing baseline (
- Reason for Pctx's Success: The paper attributes
Pctx's success to its unique design as the first paradigm to introduce apersonalized context-aware tokenizerforGR. By allowing the same action to be tokenized into differentpersonalized semantic IDsbased on a user's entire interaction history,Pctxenables the model to capture diverseuser interpretationsand generate more personalized recommendations, which is a fundamental advantage over existing approaches.
6.2. Ablation Studies / Parameter Analysis
To understand the contribution of each component within Pctx, an ablation study was conducted.
The following are the results from Table 2 of the original paper:
| Variants | Instrument | Scientific | ||||||
| R@5 | R@10 | N@5 | N@10 | R@5 | R@10 | N@5 | N@10 | |
| Personalized context | ||||||||
| (1.1) with SASRec | 0.0395 | 0.0612 | 0.0261 | 0.0330 | 0.0294 | 0.0458 | 0.0190 | 0.0243 |
| (1.2) with SASRec Item Embedding | 0.0360 | 0.0573 | 0.0231 | 0.0300 | 0.0281 | 0.0448 | 0.0182 | 0.0235 |
| (1.3) with DuoRec Item Embedding | 0.0378 | 0.0594 | 0.0249 | 0.0318 | 0.0278 | 0.0445 | 0.0180 | 0.0235 |
| TIGER | 0.0370 | 0.0564 | 0.0244 | 0.0306 | 0.0264 | 0.0422 | 0.0175 | 0.0226 |
| Tokenization | ||||||||
| (2.1) w/o Clustering | 0.0386 | 0.0596 | 0.0249 | 0.0316 | 0.0295 | 0.0462 | 0.0192 | 0.0245 |
| (2.2) w/o Redundant SID Merging | 0.0270 | 0.0415 | 0.0175 | 0.0221 | 0.0201 | 0.0316 | 0.0133 | 0.0170 |
| Model training and inference | ||||||||
| (3.1) w/o Data Augmentation | 0.0366 | 0.0577 | 0.0240 | 0.0308 | 0.0291 | 0.0457 | 0.0188 | 0.0242 |
| (3.2) w/o Multi-Facet Generation | 0.0376 | 0.0594 | 0.0242 | 0.0312 | 0.0282 | 0.0449 | 0.0181 | 0.0235 |
| Pctx | 0.0419 | 0.0655 | 0.0275 | 0.0350 | 0.0323 | 0.0504 | 0.0205 | 0.0263 |
6.2.1. Study of Personalized Context
This part investigates the impact of the source and nature of personalized context representations.
- (a) Pctx vs. (1.1) with SASRec:
Pctx(usingDuoRecfor context encoding) performs better than(1.1) with SASRec(usingSASRec). This suggests thatDuoRec, which usescontrastive learningto make sequence representations more distinguishable, is more effective for generating richuser context representationssuitable for personalization, even ifSASRecmight sometimes perform better on the next-item prediction task itself. The ability to capture distinctuser personalitiesfor tokenization is crucial. - (b) Pctx vs. (1.2) with SASRec Item Embedding & (1.3) with DuoRec Item Embedding: Variants using
item embeddingsfrom pre-trained models (static representations) show larger performance degradation compared to usingsequence representations. This confirms that incorporating actualuser context(sequential interactions) is vital, asitem embeddingsalone cannot capture dynamic user perspectives.
6.2.2. Effects of Tokenization
This section examines the impact of the strategies for managing personalized semantic IDs.
- (2.1) w/o Clustering: Removing the
clusteringstep (which condensescontext representationsinto centroids) leads to a performance drop. This indicates thatcontext condensationis important for creating meaningful and manageablesemantic IDprototypes, preventing over-personalization that could lead to sparsity. - (2.2) w/o Redundant SID Merging: Disabling the
redundant semantic ID mergingstrategy results in a more severe performance drop. This emphasizes the importance of managing the number ofsemantic IDs. Without merging, the system likely generates too manysparse semantic IDs, hindering thegeneralizationability of theGR model. The merging strategy is crucial for striking a balance betweenpersonalizationandgeneralizability.
6.2.3. Model Training and Inference
This part evaluates the strategies applied during the GR model's training and inference phases.
- (3.1) w/o Data Augmentation: When
data augmentation(randomly replacing personalizedsemantic IDswith other validsemantic IDsfor the same item) is removed, there's a clear performance drop. This confirms that the augmentation strategy is effective in enhancingdata diversity, implicitly connecting differentsemantic IDsassociated with the same item, and thereby improving thegeneralizationability of theGR model. - (3.2) w/o Multi-Facet Generation: If the model is restricted to a single
decoding path(a singlesemantic ID) during inference instead of leveragingmulti-facet generation(considering multiple candidatesemantic IDsand aggregating their probabilities), performance also drops. This highlights the importance of allowing theGR modelto decode multiple potentialuser interpretationsduring prediction, reflecting the nuanced nature of user preferences.
6.3. In-depth Analysis
6.3.1. Model Ensemble
To ensure Pctx's improvements are not simply due to combining strengths of existing models, an ensemble analysis was performed. SASRec and DuoRec predictions were ensembled with TIGER using a voting scheme.
The following are the results from Table 3 of the original paper:
| Methods | Instrument | Scientific | ||||||
| Recall@5 | Recall@10 | NDCG@5 | NDCG@10 | Recall@5 | Recall@10 | NDCG@5 | NDCG@10 | |
| SASRec | 0.0333 | 0.0523 | 0.0213 | 0.0274 | 0.0259 | 0.0412 | 0.0150 | 0.0199 |
| DuoRec | 0.0347 | 0.0547 | 0.0227 | 0.0291 | 0.0234 | 0.0389 | 0.0146 | 0.0196 |
| TIGER | 0.0370 | 0.0564 | 0.0244 | 0.0306 | 0.0264 | 0.0422 | 0.0175 | 0.0226 |
| TIGER+SASRec | 0.0374 | 0.0582 | 0.0245 | 0.0311 | 0.0268 | 0.0427 | 0.0169 | 0.0221 |
| TIGER+DuoRec | 0.0376 | 0.0586 | 0.0247 | 0.0314 | 0.0258 | 0.0418 | 0.0163 | 0.0215 |
| Pctx | 0.0419 | 0.0655 | 0.0275 | 0.0350 | 0.0323 | 0.0504 | 0.0205 | 0.0263 |
Key Findings:
Ensembled models(e.g., , ) generally outperform their individual components, suggesting that the different models capture complementary information.- However, even the best
ensembled resultsremain significantly belowPctx's performance. This confirms thatPctxis not merely a simple combination of existing models but that its fundamental innovation—personalized semantic IDs—expands the capabilities ofGR modelsin a unique way.
6.3.2. Study of the Number of Personalized Semantic IDs
This analysis focuses on the distribution of personalized semantic IDs per item, illustrated in Figure 3.
该图像是一个对比柱状图,展示了科学类和乐器类数据集中每个物品所拥有的个性化语义ID数量(SIDs)的分布情况,图中以对数刻度显示,并比较了Pctx和TIGER两种方法的结果。
This bar chart shows the distribution of the number of personalized semantic IDs assigned to each item for Pctx (across two datasets) compared to TIGER. TIGER (static tokenizer) assigns exactly one semantic ID per item, whereas Pctx shows a distribution across multiple semantic IDs.
Key Observations:
- Static vs. Personalized:
TIGER, as astatic tokenizer, assigns only onesemantic IDto each item, completely hinderingpersonalization. In contrast,Pctxassigns multiplepersonalized semantic IDsto the same item. - Distribution: In
Pctx, the majority of items are assigned twopersonalized semantic IDs, followed by one, then three, and a smaller fraction exceeding four. - Single SID Items: Items with only a single
semantic IDare typically infrequent orlong-tail entitieswith limited interactions, offering restricted diversity in user interpretations. - Redundancy Management: The number of items with an
excessive numberofpersonalized IDsremains small. This is attributed to theredundant semantic ID merging strategy, which effectively consolidates redundant representations and preventsover-personalizationandsparsity.
6.3.3. Parameter Analysis (from Appendix D.1)
6.3.3.1. Performance w.r.t. the Augmentation Probability
Figure 5 illustrates how NDCG@10 changes with varying augmentation probability .
该图像是两幅折线图,展示了不同增强概率γ下,Instrument和Scientific数据集上的NDCG@10指标变化趋势,反映模型性能随γ调整的敏感性。
The line charts display the NDCG@10 performance on the Instrument and Scientific datasets as the augmentation probability varies from 0.0 to 0.9.
Key Insights:
- Effectiveness of Augmentation: Setting (disabling
data augmentation) results in performance notably worse than most configurations with non-zero . This validates the effectiveness of the proposeddata augmentation strategyin enhancinggeneralization. - Critical Hyperparameter: is a critical hyperparameter. Inappropriate settings can lead to significant performance degradation.
- Stable Range: Performance remains relatively stable and within an acceptable margin when is in the range of 0.3 to 0.7.
- Extreme Values: Excessively small values of lead to
underwhelming outcomesdue to insufficient augmentation. Overly large values introduceinstabilityand may causeextreme performance fluctuations, suggesting a trade-off where too much augmentation can dilute the core personalized signals.
6.3.3.2. Performance w.r.t. the Frequency Threshold
Figure 6 analyzes the NDCG@10 performance and the percentage of semantic IDs in use as the frequency threshold varies.
该图像是图表,展示了不同频率阈值 下,个性化分词器与静态分词器在两个数据集(Instrument和Scientific)上的NDCG@10性能和语义ID使用量百分比变化。
This chart shows the NDCG@10 (lines) and the percentage of utilized semantic IDs (bars, relative to static tokenizer) as frequency threshold increases.
Main Observations:
- Semantic ID Count: As increases, the number of utilized
semantic IDsdecreases monotonically. This is because higher values cause moreinfrequent semantic IDsto be merged. The total number ofsemantic IDsdoes not grow excessively, as most items are low-frequency. - Performance Trend: Both
NDCG@10(and other evaluation metrics) initially improve with increasing , but then begin to decline after exceeds approximately 0.2. The best performance is observed around on both datasets. - Sparsity vs. Personalization:
- An excessive number of
personalized semantic IDs(when is too low) leads to poor performance due tosparsityissues. - While a higher alleviates
sparsityby merginginfrequent IDs, if it's too high, it sacrifices too muchpersonalization, leading to a decline in performance.
- An excessive number of
- Balance: Varying essentially embodies a crucial balance between
sparsity reduction(improvinggeneralizability) andpersonalization preservation.
6.3.4. Popular and Personalization (from Appendix D.3)
This section investigates the relationship between an item's position in the input sequence and the probability of it being tokenized as its most popular semantic ID. The popular rate is defined as the mean probability of an item being tokenized as its popular semantic ID at a given position.
Figure 7 illustrates this analysis across different models/variants.
该图像是图表,展示了图7中不同模型变体在三个领域(乐器、科学、游戏)中,随着交互序列位置变化,物品被标记为最流行语义ID的概率热力图。坐标轴显示位置序号,颜色深浅表示概率大小,颜色越浅概率越低。
The heatmap displays the popular rate (probability of tokenization as the most popular semantic ID) at different sequence positions for various models/variants on three datasets. Lighter colors indicate lower popular rates.
Key Observations:
-
TIGER(Static Tokenizer): AsTIGERuses astatic tokenizer, every item is always tokenized into its single, fixedsemantic ID(which is, by definition, itspopular semantic ID). Thus, thepopular rateis consistently 1 (or close to 1, depicted by dark blue/purple) across all sequence positions, independent of context. -
w/o Data Augmentation(Pctx with ): In this variant, as the sequence length (position) increases, the probability of tokenizing an item with itspopular semantic IDdecreases (lighter colors appear). This confirms that as moreuser contextaccumulates, the influence ofpersonalized contextrises, making it more likely for the item to be tokenized into a morepersonalized semantic IDthat reflects the specific user's evolving preferences, rather than just its general 'popular' interpretation. -
with augmentation probability\gamma = 1$$: When theaugmentation probabilityis set to 1, items are equally likely to be tokenized with any of their possiblesemantic IDs. This leads to a uniform distribution ofsemantic IDsacross the sequence, and consequently, thepopular rateis consistently low and uniform across all positions (lightest colors). This variant harmspersonalizationby excessively introducing noise and losing the contextual signal.These findings strongly support that
Pctx'spersonalized context-aware tokenizeradaptively tokenizes items based onuser context, moving beyond static representations and enabling more personalized representations.
6.3.5. Explainability (from Appendix D.4 and D.5)
To assess whether the personalized semantic IDs generated by Pctx correspond to human-interpretable user preferences, an explainability experiment was conducted using GPT-4o (a Large Language Model).
The following are the results from Table 6 of the original paper:
| Methods | Instrument (Acc.) | Scientific (Acc.) | Game (Acc.) |
| with SASRec | 0.8333 | 0.8030 | 0.8240 |
| Pctx | 0.8533 | 0.8534 | 0.8690 |
Experimental Design:
- Item Selection: Items with at least two
personalized semantic IDswere randomly selected. - Preference Summarization: For each selected item,
user interaction sequencesassociated with each of itssemantic IDswere grouped.GPT-4owas then used to summarize the underlyinguser preferencefor eachsemantic IDinto keywords and a descriptive summary. - Accuracy Assessment: For each selected item, 50 test sequences where the item was the target were randomly sampled. For each sequence, the
semantic IDthat appeared first inPctx's prediction list was identified.GPT-4owas then prompted to assess whether the summarized preference for this top-rankedsemantic IDaligned better with thesequence contextthan the preferences of othersemantic IDsfor the same item. A binary "Yes" or "No" judgment, along with an explanation, was obtained. - Metric:
Accuracywas defined as the proportion of "Yes" responses. This process was repeated for 25 items per dataset, totaling 1250 samples per dataset.
Key Findings:
Pctxachieved highaccuracy(over 0.85 across all three datasets), indicating that itspersonalized semantic IDsindeed capture diverse and coherentuser preferencesin a human-interpretable manner.- The
variant with SASRec(usingSASRecas the auxiliary model) underperformedPctx, but still achieved highaccuracy(above 0.80). This further reinforces that the quality ofcontext representationimpacts theinterpretabilityof thesemantic IDs. - The high
accuracydemonstrates thatPctxeffectively aligns its predictions with these learned preferences, validating theinterpretabilityof itstokenization mechanism.
6.3.5.1. Case Study (from Figure 4 and Appendix D.5)
Figure 4 and the detailed case in Appendix D.5 provide a concrete example of Pctx's personalized tokenization.
该图像是论文图4的示意图,展示了故事驱动类游戏玩家与实时战略类游戏玩家对同一物品StarCraft II的不同语义ID(token)分配,体现个性化上下文感知分词器对于同一物品在不同用户语境下的多样化表示。
The case study illustrates how Pctx assigns different semantic IDs to the same item, StarCraft II: Heart of the Swarm, based on different user contexts: a story-driven game player (upper row) and a real-time strategy game player (lower row).
Scenario: The item StarCraft II: Heart of the Swarm is a hybrid game, appealing to both story-driven and real-time strategy (RTS) players.
-
User 1 (Story-driven player): This user's history includes items like
Tomb Raider,The Last of Us,Saints Row: The Third(emphasizing narrative, adventure). For this user,PctxtokenizesStarCraft IIinto thesemantic ID[53, 395, 576, 770]. -
User 2 (RTS player): This user's history includes items like
Warcraft II,Command & Conquer,Company of Heroes(emphasizing strategy, management, competitive gameplay). For this user,PctxtokenizesStarCraft IIinto thesemantic ID[53, 412, 576, 770].GPT-4o Explainability Example: An example of the
GPT-4oprompt and response is provided for a user whose historical interactions primarily consist ofRTSgames (Command & Conquer,Company of Heroes,World in Conflict). -
Top-ranked Semantic ID's Preference: Keywords like "Gaming, RTS, Adventure, Strategy, Multiplayer, Fantasy, Competitive, Role-playing, Decision-making, Management" with a summary emphasizing competitive
RTSand strategizing. -
Other Semantic ID's Preference: Keywords like "Adventure, Narrative, Multiplayer, Open-world, Action, Fantasy, Survival, Shooter, Strategy, Customization" with a summary emphasizing narrative-driven and open-world experiences.
-
GPT-4o's Judgment:
Yes. The LLM confirms that the historical sequence strongly aligns with the top-rankedsemantic ID's preference forRTSgames, confirmingPctx's ability to adaptively tokenize and predict based on context.This case study vividly demonstrates
Pctx's capability to adaptively tokenize the same action (StarCraft II) into distinctpersonalized semantic IDsunder different user contexts, thereby enabling theGR modelto produce more user-specific and interpretable predictions.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper introduces Pctx, a pioneering personalized context-aware tokenizer designed for Generative Recommendation (GR) models. Pctx addresses a critical limitation of existing GR approaches, which typically rely on static, non-personalized action tokenization, implicitly assuming a universal standard of item similarity. By integrating a user's historical interactions into the semantic ID generation process, Pctx allows the same item to be represented by different semantic IDs under varying user contexts. This novel design effectively captures the diverse interpretations users may have for an item, enhancing the GR model's ability to generate truly personalized predictions. The method achieves a crucial balance between generalizability and personalization through strategies like adaptive clustering, redundant semantic ID merging, and data augmentation. Extensive experiments on three public datasets demonstrate Pctx's superior performance, yielding up to an 11.44% improvement in NDCG@10 over non-personalized tokenization baselines. This work is significant as it represents the first successful attempt to introduce a personalized action tokenizer within the Generative Recommendation paradigm, paving the way for more nuanced and user-centric recommendation systems.
7.2. Limitations & Future Work
The authors identify several directions for future research:
- Scaling Effective Semantic IDs: Investigating approaches for scaling the generation and management of effective
semantic IDswithin a broadersemantic ID spaceremains an open challenge. As the number of items and potential interpretations grows, efficiently handling and learning from a vast array ofpersonalized semantic IDswill be crucial. - End-to-End Personalized Action Tokenizers: Developing fully
end-to-end personalized action tokenizersis another future goal. Currently,Pctxrelies on an auxiliary model (DuoRec) forcontext encodingand a separateRQ-VAEforquantization. An end-to-end learning framework could potentially optimize the entire tokenization process more cohesively.
7.3. Personal Insights & Critique
The Pctx paper presents a highly innovative and necessary advancement in Generative Recommendation. The core idea of moving beyond static item representations to context-aware, personalized semantic IDs is a fundamental shift that significantly enhances the capabilities of GR models.
-
Novelty and Impact: The paper's primary contribution—introducing personalized context into
action tokenization—is genuinely novel forGR. Previous "context-aware" tokenizers were limited to local context, and "multi-identifier" approaches lacked true personalization.Pctx's comprehensive approach, considering full user history, allows for a more nuanced understanding of user intent and item interpretation. This has a direct impact on recommendation quality, as evidenced by the strong experimental results. The improvements, especially inNDCG@10, are compelling. -
Balancing Act: The explicit focus on balancing
generalizabilityandpersonalizabilityis a strong point. The strategies forcontext condensation,semantic ID merging, anddata augmentationdemonstrate a practical understanding of the trade-offs involved in managing a dynamic token vocabulary. Without these,over-personalizationcould lead to extreme sparsity, rendering the system ineffective. -
Explainability: The
explainability experimentusingGPT-4ois a particularly interesting and forward-looking aspect. While the rigor of LLM-based evaluation in this context is still a nascent area of research, it provides compelling qualitative evidence thatPctx'ssemantic IDscorrespond to meaningful and distinctuser preferences. This aligns with the broader trend of making AI systems more transparent and understandable. The detailed case study further strengthens this point by visually demonstrating the adaptive tokenization. -
Potential Areas for Improvement/Future Research (Beyond Authors' Scope):
- Dynamic Context Window: While
Pctxuses the "entire user interaction history," the maximum sequence length is capped at 20 items. Investigating more dynamic or adaptive context windows that can weigh recent interactions differently from older ones, or dynamically select relevant past interactions, could further refine personalization without increasing computational burden excessively. - Computational Cost: Generating
personalized semantic IDsinvolves training an auxiliary model, clustering, and then quantizing. The computational overhead of these steps, especially for very large item catalogs and complex histories, could be a practical consideration for deployment. Future work could explore more efficient approximation techniques. - Fine-grained Personalization: The current approach clusters
context representationsinto a fixed number ofcentroidsper group. Perhaps a more fine-grained or hierarchical approach tocontext representationcould capture even subtler personalized facets, especially for highly complex or multi-modal items. - Unified Learning Objective: The current framework has distinct steps (context encoding, semantic ID generation, GR model training). While
RQ-VAEis learnable, an end-to-end system where thesemantic IDgeneration is jointly optimized with the recommendation task might yield further benefits.
- Dynamic Context Window: While
-
Transferability: The methods proposed in
Pctxcould potentially be transferred to other domains where contextual interpretation of discrete entities is important. For example, in knowledge graph completion or relation extraction, where the "meaning" of an entity or relation might depend on its surrounding context within a specific graph path or query.Overall,
Pctxis a robust and timely contribution toGenerative Recommendation, demonstrating a clear path forward for achieving truly personalized and interpretable recommendations by revolutionizing the foundationaltokenizationstep.
Similar papers
Recommended via semantic vector search.