From IDs to Semantics: A Generative Framework for Cross-Domain Recommendation with Adaptive Semantic Tokenization
TL;DR Summary
This paper presents GenCDR, a novel generative cross-domain recommendation framework that overcomes limitations of traditional methods by using domain-adaptive tokenization for disentangled semantic IDs, significantly improving recommendation accuracy and generalization across mu
Abstract
Cross-domain recommendation (CDR) is crucial for improving recommendation accuracy and generalization, yet traditional methods are often hindered by the reliance on shared user/item IDs, which are unavailable in most real-world scenarios. Consequently, many efforts have focused on learning disentangled representations through multi-domain joint training to bridge the domain gaps. Recent Large Language Model (LLM)-based approaches show promise, they still face critical challenges, including: (1) the \textbf{item ID tokenization dilemma}, which leads to vocabulary explosion and fails to capture high-order collaborative knowledge; and (2) \textbf{insufficient domain-specific modeling} for the complex evolution of user interests and item semantics. To address these limitations, we propose \textbf{GenCDR}, a novel \textbf{Gen}erative \textbf{C}ross-\textbf{D}omain \textbf{R}ecommendation framework. GenCDR first employs a \textbf{Domain-adaptive Tokenization} module, which generates disentangled semantic IDs for items by dynamically routing between a universal encoder and domain-specific adapters. Symmetrically, a \textbf{Cross-domain Autoregressive Recommendation} module models user preferences by fusing universal and domain-specific interests. Finally, a \textbf{Domain-aware Prefix-tree} enables efficient and accurate generation. Extensive experiments on multiple real-world datasets demonstrate that GenCDR significantly outperforms state-of-the-art baselines. Our code is available in the supplementary materials.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
From IDs to Semantics: A Generative Framework for Cross-Domain Recommendation with Adaptive Semantic Tokenization
1.2. Authors
-
Peiyu Hu ()
-
Wayne Lu ()
-
Jia Wang ()
Affiliations:
- Xi'an Jiaotong-Liverpool University, Suzhou, China
- University of Liverpool, Liverpool, United Kingdom
1.3. Journal/Conference
The paper is published on arXiv, a preprint server, under the identifier arxiv.org/abs/2511.08006v1. While arXiv hosts preprints, papers often undergo peer review and are subsequently published in reputable conferences or journals in fields like recommender systems, machine learning, or natural language processing (e.g., SIGIR, KDD, AAAI, NeurIPS, ACM TiiS, TKDE). The specified publication date suggests it's a recent work.
1.4. Publication Year
2025
1.5. Abstract
Cross-domain recommendation (CDR) is crucial for improving recommendation accuracy and generalization, yet traditional methods are often hindered by the reliance on shared user/item IDs, which are unavailable in most real-world scenarios. Consequently, many efforts have focused on learning disentangled representations through multi-domain joint training to bridge the domain gaps. Recent Large Language Model (LLM)-based approaches show promise, they still face critical challenges, including: (1) the item ID tokenization dilemma, which leads to vocabulary explosion and fails to capture high-order collaborative knowledge; and (2) insufficient domain-specific modeling for the complex evolution of user interests and item semantics. To address these limitations, we propose GenCDR, a novel Generative Cross-Domain Recommendation framework. GenCDR first employs a Domain-adaptive Tokenization module, which generates disentangled semantic IDs for items by dynamically routing between a universal encoder and domain-specific adapters. Symmetrically, a Cross-domain Autoregressive Recommendation module models user preferences by fusing universal and domain-specific interests. Finally, a Domain-aware Prefix-tree enables efficient and accurate generation. Extensive experiments on multiple real-world datasets demonstrate that GenCDR significantly outperforms state-of-the-art baselines.
1.6. Original Source Link
- Official Source/PDF Link: https://arxiv.org/pdf/2511.08006.pdf
- Publication Status: This paper is a preprint available on arXiv, indicating it has not yet undergone formal peer review or been accepted by a conference or journal.
2. Executive Summary
2.1. Background & Motivation
The core problem the paper aims to solve is improving Cross-Domain Recommendation (CDR) accuracy and generalization, especially in real-world scenarios where shared user or item identifiers (IDs) are often unavailable. Traditional recommendation methods heavily rely on these IDs for knowledge transfer between domains, which creates a significant bottleneck when IDs are not strictly aligned.
The problem is important because users frequently interact across diverse online services (e-commerce, social media, content streaming), generating rich behavioral data that, if leveraged effectively, can greatly enhance recommendation quality. However, existing CDR methods face several challenges:
-
Reliance on Shared IDs: Most traditional
CDRmethods assume shareduser/item IDs, which is unrealistic for many cross-domain applications. -
Item ID Tokenization Dilemma: Recent
Large Language Model (LLM)-based approaches, while promising due to their semantic understanding capabilities, struggle with item representation. Using traditionalitem IDswithLLMsleads to avocabulary explosion(too many unique IDs) and fails to capturehigh-order collaborative knowledge(complex relationships between items and users). -
Insufficient Domain-Specific Modeling: Current
LLM-based methods often do not adequately model thecomplex evolution of user interestsanditem semanticsin a domain-specific manner, failing to disentangle universal preferences from domain-specific nuances (e.g., an "Apple" as a fruit vs. an "Apple" as a tech brand).The paper's entry point or innovative idea is to address these challenges by introducing a
generative semantic IDparadigm intoLLM-basedcross-domain recommendation. It moves away from arbitraryitem IDstowards semantically rich tokens that are transferable across domains, while also incorporating adaptive mechanisms to model both universal and domain-specific aspects of items and user interests.
2.2. Main Contributions / Findings
The paper makes the following primary contributions:
-
Novel Generative Framework (
GenCDR): It proposesGenCDR, the first framework to introduce thegenerative semantic IDparadigm intoLLM-basedcross-domain recommendation, effectively resolving theitem ID tokenization dilemma. This allowsLLMsto process items from diverse domains using semantically meaningful tokens. -
Domain-adaptive Tokenization Module:
GenCDRincludes a module that dynamically disentangles and precisely models bothuniversalanditem-wise domain-specific knowledgeat the semantic level. This ensures items are represented with both shared meaning and domain-specific attributes. -
Cross-Domain Autoregressive Recommendation Module: A symmetric module is designed to effectively disentangle and fuse
universalanduser-wise domain-specific interestsduring the recommendation process, enabling personalized recommendations that consider the user's overall preferences and their specific preferences within a target domain. -
Domain-aware Prefix-tree Decoding Strategy: A
Prefix-treemechanism is introduced to guide the decoding process, ensuring efficient and accurate generation of validsemantic IDsin cross-domain scenarios, preventing the generation of "hallucinated" (non-existent) recommendations.The key findings demonstrate that
GenCDRsignificantly outperforms state-of-the-art baselines across multiple real-world cross-domain datasets in terms of both accuracy and generalization, validating the effectiveness of its generativesemantic IDparadigm and adaptive modeling approach.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
Cross-Domain Recommendation (CDR)
Cross-Domain Recommendation (CDR) refers to the task of improving recommendation performance in one target domain by transferring knowledge from other source domains. This is particularly useful in situations like cold-start (new users or items with limited data) or data sparsity (insufficient interactions in a single domain). The core idea is that user preferences or item characteristics learned in one domain might be relevant and transferable to another, even if the domains are different (e.g., recommending movies based on a user's book preferences).
Large Language Models (LLMs)
Large Language Models (LLMs) are advanced artificial intelligence models, typically based on the Transformer architecture, that are trained on vast amounts of text data. They excel at understanding, generating, and processing human language. Key capabilities relevant to recommendation include:
- Semantic Understanding:
LLMscan capture the meaning and relationships between words, phrases, and concepts, which is crucial for understanding item descriptions and user reviews. - Sequence Generation:
LLMscan generate coherent and contextually relevant sequences of text, making them suitable forgenerative recommendationtasks where the goal is to predict a sequence of items. - World Knowledge: Through pre-training on diverse internet data,
LLMsacquire a broad base ofworld knowledgethat can enrich item and user representations beyond what is available in a specific recommendation dataset.
Item ID Tokenization
In recommender systems, item ID tokenization is the process of mapping unique items (e.g., a specific product, movie, or song) to discrete tokens or identifiers that can be processed by a model. Traditionally, this involves assigning an arbitrary integer ID to each item. However, for LLMs, which operate on sequences of meaningful tokens (like words or subwords), these arbitrary item IDs are problematic. The "item ID tokenization dilemma" arises because:
- Vocabulary Explosion: If each item is a unique token, the vocabulary size becomes extremely large, making
LLMtraining inefficient and difficult. - Lack of Semantics: Arbitrary
IDscarry no inherent semantic meaning or relationship to other items, hindering anLLM's ability to generalize or understandcollaborative knowledge(patterns of user interactions). The paper proposessemantic IDsto address this, meaning the tokens themselves encode meaningful information about the item.
Residual-Quantized Variational Autoencoder (RQ-VAE)
A Variational Autoencoder (VAE) is a type of neural network that learns a compressed, probabilistic representation (a latent space) of input data. It consists of an encoder that maps input data to the latent space and a decoder that reconstructs the input from a sample in the latent space. Quantization is the process of mapping continuous values to a finite set of discrete values (codebook entries).
RQ-VAE is a variant that performs residual quantization. Instead of quantizing the entire latent vector at once, it quantizes the residual (the error or remaining information) after a previous quantization step. This allows for a more expressive and hierarchical representation using multiple codebooks, where each codebook captures different levels of detail or aspects of the input. The output is a sequence of discrete codes (semantic IDs).
Low-Rank Adaptation (LoRA)
Low-Rank Adaptation (LoRA) is a Parameter-Efficient Fine-Tuning (PEFT) technique used to adapt large pre-trained models (like LLMs) to new tasks or domains with minimal computational cost. Instead of fine-tuning all parameters of the large model, LoRA injects small, trainable low-rank matrices into the Transformer layers. When fine-tuning, only these low-rank matrices are updated, while the original pre-trained model weights remain frozen. This significantly reduces the number of trainable parameters, memory usage, and training time, making fine-tuning LLMs more accessible and efficient.
Autoregressive Models
An autoregressive model is a type of statistical model that predicts future values based on past values. In the context of sequence generation (like LLMs generating text or recommending items), it means that the prediction of the next token in a sequence is conditioned on all previously generated tokens. For example, to predict the third word in a sentence, an autoregressive model would consider the first and second words. This allows LLMs to generate coherent and contextually relevant sequences.
Prefix-tree
A prefix-tree (also known as a trie) is a tree-like data structure used to store a dynamic set or associative array where the keys are usually strings. In the context of generative recommendation, a domain-aware prefix-tree can store all valid sequences of semantic IDs for items within a specific domain. During the generation process, this tree can be used to constrain the LLM's output, ensuring that only valid and existing semantic ID sequences (corresponding to real items) are generated. This prevents the LLM from "hallucinating" non-existent items or generating semantically incorrect sequences.
Variational Information Bottleneck (VIB)
The Variational Information Bottleneck (VIB) principle is a theoretical framework that aims to learn compressed representations of data that are maximally informative about a target variable while minimizing the information content related to other irrelevant aspects. In this paper, it's used as a regularization technique for the routing networks. By applying VIB, the routers are encouraged to extract only the most essential information from an item or user history to make a routing decision (i.e., whether to prioritize universal or domain-specific knowledge), thereby promoting more disentangled representations and preventing overfitting. It measures the mutual information between the input and the learned representation and tries to compress it while retaining task-relevant information. The KL-divergence term in the paper is a common way to implement VIB regularization, encouraging the posterior distribution (the router's internal representation distribution given the input) to be close to a simple prior distribution (e.g., a standard normal distribution), thus compressing the information.
3.2. Previous Works
The paper discusses related work in three main categories:
Cross-Domain Sequential Recommendation (CDSR)
- Traditional
CDSR: Many early methods rely oncollaborative item IDsas a bridge for knowledge transfer. They often employ architectures likegating mechanisms,attention modules(e.g.,Kang and McAuley 2018; Lu and Yin 2025; Cui et al. 2025), orGraph Neural Networks (GNNs)(e.g.,Liu et al. 2024b; Li and Lu 2024; Cao et al. 2022) to fuse and transfer knowledge, sometimes enhanced withcontrastive learningobjectives (e.g.,Ma et al. 2024; Xie et al. 2022). - Limitations: These methods are inherently limited by the availability of
shared user/item IDs, which is a strong assumption often not met in real-world scenarios. - Semantic-enhanced
CDSR: More recent trends incorporatericher semantic informationfrompre-trained language models(e.g.,Liu et al. 2025c; Li et al. 2022) to overcomeID-based limitations. - Gap Addressed by
GenCDR:GenCDRaims to effectively integrate these semantics into a unifiedgenerative framework, explicitly disentanglingsharedanddomain-specific knowledge, which remains an open challenge for existing methods.
Generative Recommendation
- Paradigm Shift:
Generative recommendationreframes the recommendation task from item ranking toautoregressive sequence generationofsemantic item IDs(e.g.,Petrov and Macdonald 2023; Hou et al. 2025). - Item Tokenization: A critical aspect is the construction of
item IDs. Approaches include:Content-based tokenization: Usingvector quantizationlikeRQ-VAE(e.g.,Li et al. 2025a; Rajput et al. 2023).Structure-aware methods: Employinghierarchical clustering(e.g.,Si et al. 2024).Collaborative signals: Embedding these directly into tokenization (e.g.,Mo et al. 2024).
- Limitations: These techniques have been developed almost exclusively for
single-domain datasets(e.g.,Zheng et al. 2025). - Gap Addressed by
GenCDR:GenCDRextends this generative paradigm tocomplex, multi-domain environments, which is an unexplored research question in prior generative recommendation works.
Large Language Models for Recommendation
LLMas Auxiliary Components:LLMsare used to enhance traditional models by providingrich semantic featuresordata augmentation(e.g.,Sun et al. 2024; Yin et al. 2025; Zhang et al. 2025a; Yuan et al. 2025).LLMas Core Generative Engines:LLMsare used to reformulate recommendation asautoregressively predicting item IDs(e.g.,Rajput et al. 2023; Zheng et al. 2024; Lin et al. 2024).- Fine-tuning:
Parameter-efficient fine-tuning (PEFT)techniques likeLoRA(e.g.,Hu et al. 2022; Bao et al. 2023; Liu et al. 2025a; Zhang et al. 2023) are crucial for aligningLLMsto recommendation tasks. - Limitations: Existing work predominantly focuses on
single-domain applications. - Gap Addressed by
GenCDR:GenCDRtackles the challenge ofeffective knowledge transfer and representation across heterogeneous domainsforLLM-based recommenders, which has largely been unaddressed.
3.3. Technological Evolution
The field of recommender systems has evolved from collaborative filtering based on user-item interactions (often using IDs) to incorporating richer content-based features. With the rise of deep learning, sequential recommendation models (e.g., SASRec, BERT4Rec) emerged, leveraging Transformer architectures to capture dynamic user preferences within a single domain.
The cross-domain recommendation paradigm arose to combat data sparsity and cold-start issues by transferring knowledge between related domains. Early CDR methods often relied on shared user/item IDs or mapping functions to align entities across domains. However, the limitation of ID-based approaches led to the exploration of semantic information and representation learning to bridge domain gaps more robustly.
The advent of Large Language Models (LLMs) marked a significant shift. Initially, LLMs were used to enrich item/user features with their vast world knowledge. More recently, researchers started framing recommendation as a language generation task, directly using LLMs to predict item sequences, often represented by item IDs. However, these LLM-based approaches still struggled with the inherent limitations of arbitrary item IDs (vocabulary explosion, lack of semantics) and the challenge of effectively disentangling universal and domain-specific knowledge in heterogeneous cross-domain settings.
GenCDR fits into this evolution by pushing the LLM-based generative recommendation paradigm further into the cross-domain setting. It innovates by:
- Replacing arbitrary
item IDswithgenerative semantic IDsderived from item content, making them inherently transferable and meaningful forLLMs. - Developing adaptive mechanisms (
Domain-adaptive Tokenization,Cross-Domain Autoregressive Recommendationwithrouting networks) that explicitly model and fuse bothuniversalanddomain-specificaspects of items and user interests, overcoming the "insufficient domain-specific modeling" challenge. - Ensuring efficient and valid generation through a
Domain-aware Prefix-tree, which is crucial for practicalLLM-based recommenders.
3.4. Differentiation Analysis
Compared to the main methods in related work, GenCDR offers several core differences and innovations:
-
Compared to Traditional ID-based
CDSR(e.g.,C2DSR,TriCDR):- Core Difference:
GenCDRcompletely bypasses the reliance onshared user/item IDsby introducingsemantic IDs. TraditionalCDSRmodels useIDsor alignID-based embeddings. - Innovation:
Semantic IDsare inherently transferable and capture rich content semantics, enabling knowledge transfer without explicitIDmapping. This tackles theID-dilemma directly.
- Core Difference:
-
Compared to
Generative Recommendation Systems (GRS)(e.g.,VQ-Rec,TIGER,HSTU):- Core Difference: Existing
GRSmodels are primarily designed forsingle-domainscenarios. While they usevector quantizationforitem tokenization, they don't explicitly address the complexities ofcross-domainknowledge transfer ordomain-specificadaptation. - Innovation:
GenCDRextends thegenerative paradigmtocross-domainrecommendation by integrating aDomain-adaptive Tokenizationmodule and aCross-Domain Autoregressive Recommendationmodule with dynamic routing, allowing for disentanglement and fusion ofuniversalanddomain-specificknowledge. This makes generative recommendations effective in heterogeneous environments.
- Core Difference: Existing
-
Compared to
LLM-based Recommendation (e.g.,LLM4CDSR):- Core Difference: While
LLM4CDSRformulatesCDRas a text generation task usingLLMs, it still faces theitem ID tokenization dilemmaandinsufficient domain-specific modeling.LLM4CDSRmight rely on prompting or fine-tuningLLMsto predict itemIDsor textual descriptions, but without a structuredsemantic IDsystem or explicit disentanglement mechanisms. - Innovation:
-
GenCDRexplicitly tackles theitem ID tokenization dilemmaby generatingdisentangled semantic IDsusing anRQ-VAEanddomain-specific adapters, providing a more structured and semantically rich input for theLLM. -
It uses
dynamic routing networksat both item and user levels to adaptively fuseuniversalanddomain-specificknowledge, addressing the "insufficient domain-specific modeling" challenge directly, which is often implicitly handled or overlooked in otherLLM-basedCDRmethods. -
The
Domain-aware Prefix-treeensures efficient and validsemantic IDgeneration, a practical consideration forLLM-based systems.In essence,
GenCDRuniquely combines the power ofgenerative modelsandLLMswith a principled approach tosemantic item representationandadaptive knowledge fusionto overcome the fundamental challenges incross-domain recommendation.
-
- Core Difference: While
4. Methodology
4.1. Principles
The core idea behind GenCDR is to move beyond arbitrary item IDs and embrace discrete semantic IDs (SIDs) as the fundamental representation for items in cross-domain recommendation. This paradigm shift allows Large Language Models (LLMs) to leverage their powerful semantic understanding and sequence generation capabilities more effectively. The theoretical basis is that raw semantic information (e.g., text descriptions) is inherently more transferable across domains than arbitrary item IDs.
The framework operates on two key intuitions:
-
Semantic Richness and Transferability: By converting items into
semantic IDsthat capture their inherent meaning, theseIDsbecome transferable between domains, circumventing the need for shareditem IDs. This is achieved by balancingdomain-agnostic universal semanticswithdomain-specific discriminative features. -
Adaptive Disentanglement and Fusion: User interests and item characteristics are a blend of
universal(general) anddomain-specific(niche) aspects.GenCDRproposes to explicitly model and dynamically fuse these two types of knowledge at both theitem-level(forsemantic IDgeneration) and theuser-level(for recommendation generation), preventingnegative transferand enabling fine-grained personalization. -
Efficient and Valid Generation: To ensure practicality, the
generative processmust be efficient and produce valid recommendations. This is addressed by constraining the generation space using aprefix-tree.The overall architecture of
GenCDRis illustrated in Figure 2 (a). It consists of three main modules: -
Domain-adaptive Tokenization: Converts items intosemantic IDs. -
Cross-Domain Autoregressive Recommendation: Models user preferences and generatessemantic IDsfor recommendations. -
Domain-aware Prefix-tree: Guides efficient and accurate inference.
4.2. Core Methodology In-depth (Layer by Layer)
The GenCDR framework is designed to tackle the item ID tokenization dilemma and insufficient domain-specific modeling. Let's break down its components:
4.2.1. Domain-adaptive Tokenization
This module is responsible for generating unified Semantic IDs (SIDs) for items. It balances domain-agnostic universal semantics with domain-specific discriminative features to create expressive representations suitable for generative recommendation tasks. The SIDs are designed to have Semantic Richness (capturing comprehensive item semantics) and Semantic Similarity (ensuring similar items across domains share comparable IDs). The process is visualized in Figure 2 (b).
4.2.1.1. Domain-Universal Semantic Token Generation
To establish a foundational universal semantic understanding, the module uses a Universal Discrete Semantic Encoder based on a Residual-Quantized Variational Autoencoder (RQ-VAE) framework. The RQ-VAE consists of an encoder (), a decoder (), and codebooks. It is pre-trained on the textual features of all items.
The process for converting an item's feature embedding into a sequence of discrete codes is as follows:
-
The
encodermaps the item embedding to a continuous latent representation . -
An initial residual vector is set: .
-
For each level from
0toM-1:- The current residual is quantized by finding the nearest vector in the -th
codebook. The indexc _ { d }corresponds to this nearest vector. - The next residual is computed by subtracting the quantized vector: .
- The current residual is quantized by finding the nearest vector in the -th
-
The final quantized representation is the sum of all chosen
codebookvectors: . -
This quantized representation is then passed through the
decoderto reconstruct the original item embedding: .The
RQ-VAEis optimized using a joint objective function during pre-training:
- Reconstruction Loss (): Measures how well the
decoderreconstructs the original input.- : The original input item feature embedding.
- : The reconstructed item feature embedding from the
decoder. - : The squared Euclidean distance (L2 norm).
- Quantization Loss (): Ensures the
encoder's output aligns with thecodebookvectors. It includescommitment termsfromVQ-VAEto pull theencoderoutput towards thecodebookvectors andcodebookvectors towards theencoderoutput.- : The residual vector at level before quantization.
- : The chosen
codebookvector fromcodebook. - : The
stop-gradientoperator, which means its argument is treated as a constant during backpropagation. - : A hyperparameter balancing the two terms of the quantization loss.
- The first term encourages the
encoderoutput to commit to thecodebookvector, while the second term updates thecodebookvector towards theencoderoutput.
- Masked Token Modeling (MTM) Loss (): To ensure
SIDsare contextually coherent, this loss trains the model to predict masked codes from their surrounding context, similar toBERT.-
: The set of all item feature embeddings.
-
: The set of indices of masked codes in a
semantic IDsequence. -
c _ { i }: A maskedsemantic IDat index . -
: The
semantic IDsequence with some codes masked. -
: Parameters of a contextual model (e.g., a
Transformer) that predicts masked codes. -
: The log-probability of predicting the correct masked code
c _ { i }given the masked sequence and contextual model parameters.The total pre-training loss is a weighted sum:
-
- : Hyperparameters balancing the different loss terms.
Upon completion of this pre-training, the
universal encoderandcodebooksare frozen.
4.2.1.2. Domain-specific Semantic Token Adapters
While the universal encoder learns shared semantics, domain-specific nuances need to be captured. This is done using Domain-specific Semantic Token Adapters, implemented with Low-Rank Adaptation (LoRA). For each domain , a lightweight LoRA module is introduced to adapt the frozen universal encoder .
A LoRA module consists of two low-rank matrices, and , where is the rank. These matrices augment the frozen weights of the universal encoder. The modified forward pass for an input becomes:
-
: Input to the augmented layer.
-
W _ { 0 }: Original, frozen weight matrix of theuniversal encoder. -
B _ { d } A _ { d }: The low-rank update matrix for domain . OnlyB _ { d }andA _ { d }are trainable.The adapted encoder for domain is denoted , where are its trainable parameters. These parameters are fine-tuned in a second training phase. For each item embedding from domain , the objective is to minimize a self-supervised reconstruction loss:
-
: The set of item feature embeddings specific to domain .
-
: The latent representation produced by the domain-adapted encoder for item .
-
: The
quantizationfunction that maps the continuous latent representation to discretesemantic IDs. -
: The
decoderthat reconstructs the item embedding from the quantizedSIDs. -
The
quantizationfunction anddecoderremain frozen during this phase. This ensures efficient adaptation by only training theLoRAparameters.
4.2.1.3. Item-level Dynamic Semantic Routing Network
To integrate the universal and domain-specific representations, an Item-level Dynamic Semantic Routing Network is used. This network adaptively balances these representations on a per-item basis, mitigating negative transfer from static fusion.
The routing network, (e.g., a multi-layer perceptron (MLP)) with parameters , takes an item's embedding as input and outputs a gating weight .
- For an item from domain , two latent representations are computed before quantization:
Universal representation: from the frozenuniversal encoder.Domain-specific representation: from the adapted encoder for domain .
- The router calculates the gating weight :
- : The
sigmoid function, which squashes the output to a range between 0 and 1. - : The output of the
routing networkfor item .
- : The
- The
universalanddomain-specificrepresentations are then fused based on :-
: The final fused latent representation. If is close to 0,
universalrepresentation dominates; if is close to 1,domain-specificrepresentation dominates.The
routing networkis regularized using theVariational Information Bottleneck (VIB)principle to promotedisentangled representationsand preventoverfitting. This is enforced via aKL-divergenceterm:
-
- : The
Kullback-Leibler (KL) divergence, which measures the difference between two probability distributions. - : The posterior distribution of the router's internal representation given the item embedding . This is typically a learned distribution (e.g., Gaussian with mean and variance predicted by ).
- : A prior distribution for the router's internal representation (e.g., a standard normal distribution). This loss is incorporated into the second-phase training objective to ensure that the router learns to make routing decisions based on essential information while compressing irrelevant details.
4.2.2. Cross-Domain Autoregressive Recommendation
This module leverages the unified SIDs to model user interests and generate personalized recommendations. It employs a parameter-efficient, two-phase fine-tuning strategy, mirroring the item tokenization's dual focus on universal and domain-specific aspects. The process is visualized in Figure 2 (c).
4.2.2.1. Universal Interest Modeling Network
To capture domain-agnostic interest patterns, a Universal Interest Modeling Network is developed. This is achieved by enhancing a pre-trained Large Language Model (LLM) (e.g., Qwen2.5-7B) with a mixture of multiple Low-Rank Adaptation (LoRA) adapters. This collection of adapters, called universal experts, is trained jointly on aggregated data from all domains. The parameters of the -th universal expert are , and the complete set is .
The input to this network consists of sequences of cross-domain SIDs, , representing a user's interaction history. In the initial fine-tuning phase, the universal parameters are optimized using a standard autoregressive objective: predicting the next semantic ID given the preceding sequence. The training loss is defined as:
- : The set of all users.
- : The sequence of
semantic IDsfor user . - : The length of the sequence for user .
- : The -th
semantic IDin user 's sequence. - : The sequence of
semantic IDsup to index for user . - : The frozen parameters of the base
LLM. - : The trainable parameters of the
universal LoRA adapters. This loss aims to maximize the likelihood of observing the nextsemantic IDin the sequence given the history. After this phase, and are fixed.
4.2.2.2. Domain-specific Interest Adaptation
To capture domain-specific nuances, a second fine-tuning phase trains domain-specific LoRA adapters. For each domain , a dedicated, trainable LoRA adapter, denoted , is added to the frozen model. Both the base LLM parameters and the universal parameters remain frozen during this phase.
The training loss for each domain focuses on user sequences within that domain, for users (users who interacted in domain ):
- : The set of users who have interacted in domain .
- : The sequence of
semantic IDsfor user within domain . - : The trainable parameters of the
domain-specific LoRA adapterfor domain . This approach efficiently learnsdomain-specific interest patterns.
4.2.2.3. User-level Dynamic Interest Routing Network
Symmetric to the item-level router, a VIB-regularized User-level Dynamic Interest Routing Network is employed to fuse the predictions from the universal and domain-specific models during inference. This lightweight gate takes the user's history representation as input to compute a dynamic weight .
This weight is used to fuse the probability distributions over semantic IDs from the universal model () and the domain-adapted model ():
- : The final predicted probability of recommending item given user 's sequence .
- : The probability distribution output from the
frozen universal network(parameterized by ). - : The probability distribution output from the network augmented with
domain-specific adapters(parameterized by and ). - : The dynamic weight from the
user-level routing network, determining the blend ofuniversalanddomain-specificpredictions. TheVIB regularizationensures efficient and robust fusion.
4.2.3. Inference - Domain-aware Prefix-tree
To ensure efficient and valid semantic ID generation, GenCDR utilizes a Domain-aware Prefix-tree mechanism. This addresses the limitations of standard autoregressive decoding (computational inefficiency and invalid ID outputs).
- For each domain , an
offline prefix treeT _ { d }is constructed. This tree encodes all validsemantic IDsequences that can be produced by theDomain-adaptive Tokenizationmodule for items in that domain. - During inference, when a target domain
d _ { t }is specified, its corresponding treeT _ { d _ { t } }guides the generation process. - At each decoding step , the
prefix-treeidentifies a valid subset of next codes based on the current prefix (the sequence ofsemantic IDsgenerated so far). - The
LLM's predictions are then constrained to this subset using amasked softmaxfunction:- : The probability of choosing
semantic IDc _ { k }as the next token, conditioned on the previous sequence and theprefix-treeT _ { d _ { t } }for the target domain. z _ { k }: TheLLM's rawlogitscore forsemantic IDc _ { k }.- : The set of valid next
semantic IDsallowed by theprefix-treegiven the current prefix . - The
softmaxis computed only over the validsemantic IDs, ensuring that only existing and semantically correct items are generated. This approach ensures valid sequence generation while significantly reducing computational overhead by pruning the search space.
- : The probability of choosing
5. Experimental Setup
5.1. Datasets
The experiments are conducted on three pairs of cross-domain datasets, each representing different real-world scenarios:
-
Sports-Clothing (Leisure): Derived from the Amazon product review dataset (McAuley et al. 2015).
-
Phones-Electronics (Technology): Derived from the Amazon product review dataset (McAuley et al. 2015).
-
Books-Movies (Entertainment): Collected from Douban (Zhu et al. 2019, 2020).
These datasets are chosen to validate the model's performance across diverse domains with varying characteristics and overlap levels. For data samples, the paper states: "Following (Rajput et al. 2023; Zhou et al. 2020), we treat users' historical reviews as interactions arranged chronologically." This implies that for each user, their interactions (e.g., buying a product, watching a movie) are recorded in the order they occurred. An example of a data sample for a user might look like a sequence of
item IDs(whichGenCDRconverts tosemantic IDs): . The textual features of these items (e.g., product descriptions, movie summaries) are used for generating thesemantic IDs.
The evaluation protocol used is leave-last-out, where the very last item in a user's sequence is reserved for testing, and the second-to-last item is used for validation, ensuring that the model predicts future interactions based on past behavior.
The following are the statistics of the datasets used in the experiments (Table 1 from the original paper):
| Dataset | #Users | #Items | #Interactions | Sparsity | Overlap |
|---|---|---|---|---|---|
| Sports | 35,598 | 18,357 | 296,337 | 99.95% | 1.73% |
| Clothing | 39,387 | 23,033 | 278,677 | 99.97% | (704) |
| Phones | 27,879 | 10,429 | 194,439 | 99.93% | 0.55% |
| Electronics | 192,403 | 63,001 | 1,689,188 | 99.99% | (404) |
| Books | 1,713 | 8,601 | 104,295 | 99.29% | 7.48% |
| Movies | 2,628 | 20,964 | 1,249,016 | 97.73% | (2,058) |
#Users,#Items,#Interactions: Total counts of users, items, and interactions in each domain.Sparsity: Indicates the proportion of empty entries in theuser-item interaction matrix, calculated as 1 - \frac{\text{#Interactions}}{\text{#Users} \times \text{#Items}}. A high sparsity (close to 100%) means very few interactions relative to the possible total, indicating a challenging recommendation scenario.Overlap: Represents the percentage of common users between the two domains in a pair (e.g., Sports and Clothing). For example, 1.73% overlap between Sports and Clothing means 1.73% of users interacted in both domains. The numbers in parentheses for Clothing, Electronics, and Movies likely refer to the absolute number of overlapping users.
5.2. Evaluation Metrics
The paper adopts Recall@K and NDCG@K as evaluation metrics, with set to 5 and 10, following standard practice in sequential recommendation. These metrics measure the accuracy of the top- recommended items.
Recall@K
Recall@K measures the proportion of relevant items (i.e., the true next item in the sequence) that are present in the top- recommended items. It indicates how well the recommender system can find all relevant items up to a certain cutoff.
-
Conceptual Definition: For a single user,
Recall@Kis 1 if the actual next item is among the top- recommendations, and 0 otherwise. For a set of users, it's the average of these binary outcomes. It focuses on whether any relevant item is captured within the top- list. -
Mathematical Formula:
-
Symbol Explanation:
- : The total number of users in the evaluation set.
- : An indicator function that returns 1 if the condition inside is true, and 0 otherwise.
- : The actual item that user interacted with next.
- : The set of items recommended to user .
NDCG@K (Normalized Discounted Cumulative Gain at K)
NDCG@K is a widely used metric for evaluating ranking quality, especially when items have varying degrees of relevance. It emphasizes relevant items appearing higher in the ranked list.
-
Conceptual Definition:
NDCG@Kconsiders the position of the relevant item in the recommendation list. Relevant items found at higher ranks contribute more to the score. It normalizes the score to a value between 0 and 1, where 1 represents a perfect ranking. -
Mathematical Formula: First,
Discounted Cumulative Gain (DCG@K)is calculated: Then,NDCG@Kis calculated by normalizingDCG@Kby theIdeal DCG (IDCG@K): -
Symbol Explanation:
- : The cutoff position in the recommendation list.
- : The relevance score of the item at position in the recommended list. In binary relevance (where an item is either relevant or not), is 1 if the item at position is the true next item, and 0 otherwise.
- : The logarithmic discount factor, which reduces the contribution of items at lower ranks.
- : The
DCGscore for the ideal ranking, where all relevant items are ranked at the top according to their true relevance. For binary relevance with only one true next item,IDCG@Kis .
5.3. Baselines
To comprehensively evaluate GenCDR, it is compared against three categories of state-of-the-art models:
5.3.1. Single-domain Sequential Recommendation (SDSR)
These models are designed for recommendation within a single domain and do not explicitly leverage cross-domain information.
- SASRec (Kang and McAuley 2018): Employs a
unidirectional Transformerto model users' sequential preferences throughself-attention. It predicts the next item by focusing on relevant past interactions. - BERT4Rec (Sun et al. 2019): Adapts the
BERTarchitecture for recommendation using amasked item prediction objective. It learnsbidirectional contextby considering both past and future dependencies in a sequence. - STOSA (Fan et al. 2022): Introduces
stochastic self-attentionfor long sequences, improving efficiency, and incorporatesself-supervised objectivesto learn more robustitem representations.
5.3.2. Generative Recommendation Systems (GRS)
These models recast recommendation as an autoregressive sequence generation problem, often using vector quantization for item tokenization.
- VQ-Rec (Hou et al. 2023): Combines
VQ-VAE-basedtokenizationwithTransformer sequence modeling. It mapsitem embeddingstodiscrete codesand then predicts the next item in thiscode space. - TIGER (Rajput et al. 2023): Enhances
generative retrievalby optimizingitem tokenizationwithcollaborative constraints. It producessemantic IDsthat capture bothcontentanduser-item interaction signals. - HSTU (Zhai et al. 2024): Proposes a
hierarchical tokenization frameworkthat encodes items at multiplesemantic levels(from coarse to fine-grained), aiming to improve both generation accuracy and efficiency.
5.3.3. Cross-domain Sequential Recommendation (CDSR)
These models are specifically designed to leverage information across multiple domains.
- C2DSR (Cao et al. 2022): Constructs a
unified user-item interaction graphacross domains and uses aGNN-based propagation mechanismwithadaptive gatingto regulateinter-domain knowledge transfer. - TriCDR (Ma et al. 2024): Utilizes
triplet-based contrastive learningto alignuser embeddingsacross domains. It minimizescross-domain intra-user distances(making the same user similar across domains) and maximizesinter-user separability(making different users distinct). - LLM4CDSR (Liu et al. 2025c): Reformulates
CDRas atext generation task. It convertsuser historiesanditem attributesintotextual promptsforLLMsto model implicitcross-domain semantic correlations.
5.4. Implementation Details
- Framework: Implemented in
PyTorchwithHugging Face PEFTforLoRA-based fine-tuning. - Training Stages:
- Domain-adaptive Tokenization module:
RQ-VAEpre-trained on all item embeddings usingAdamW(, batch size ) for 100 epochs.Domain-specific LoRA adapters: , , , fine-tuned for 50 epochs with .Router network: A two-layerMLPwith 128 hidden units, trained jointly with aVIB regularization weightof .
- Cross-Domain Autoregressive Recommendation module:
LLM backbone:Qwen2.5-7B.Universal LoRA experts: experts, , . Trained on combinedcross-domaindata for 10 epochs.Domain-specific adapters: Fine-tuned for 10-20 epochs per domain.
- Domain-adaptive Tokenization module:
- Optimization: All models optimized with
AdamW(, batch size ) undermixed-precision (FP16)onNVIDIA H200 GPUs. - Evaluation: Checkpoint with the best
Recall@10on the validation set is selected for final testing.
6. Results & Analysis
6.1. Core Results Analysis
The experimental results demonstrate that GenCDR consistently and significantly outperforms all baseline models across the tested datasets and metrics. This validates its effectiveness in cross-domain sequential recommendation.
The results also provide insights into the performance hierarchy of different baseline categories:
-
Cross-domain (CDSR)models generally perform better thanSingle-domain Sequential Recommendation (SDSR)models. This reinforces the core premise that leveragingcross-domaininformation is beneficial for recommendation. -
Generative (GenRec)models show an improvement overSDSRbaselines, but their performance typically lags behind specializedCDSRmodels. This highlights the gap thatGenCDRaims to bridge: simply applying existinggenerative models(often single-domain focused) tocross-domainscenarios is not optimal.GenCDR's advantage stems from deeply integrating thegenerative paradigmwith the unique challenges ofcross-domain knowledge transferthrough its novelsemantic IDrepresentation and adaptive modeling.The superior performance of
GenCDRcan be attributed to its key innovations:
-
Resolution of Item Tokenization Dilemma: By generating
disentangled semantic IDsinstead of relying on arbitraryitem IDs,GenCDRprovidesLLMswith semantically rich and transferable representations, overcoming a major limitation of previousLLM-basedCDRmethods. -
Adaptive Modeling of Universal and Domain-Specific Knowledge: The
Domain-adaptive TokenizationandCross-Domain Autoregressive Recommendationmodules, equipped with dynamic routing, effectively disentangle and fuseuniversalanddomain-specificinterests at both the item and user levels. This allows for nuanced personalization and knowledge transfer withoutnegative transfer. -
Efficient and Valid Inference: The
Domain-aware Prefix-treeensures that thegenerative processis computationally efficient and produces only validsemantic IDs, leading to accurate and practical recommendations.The following are the overall performance comparison on all datasets (Table 2 from the original paper):
Scene Domain Metric SDSR GenRec CDSR GenCDR Bert4Rec SASRec STOSA VQ-Rec TIGER HSTU C2DSR TriCDR LLM4CDSR Leisure Sports R@5 0.0188 0.0197 0.0236 0.0261 0.0267 0.0254 0.0265 0.0266 0.0263 0.0274 N@5 0.0121 0.0126 0.0162 0.0238 0.0244 0.0241 0.0253 0.0255 0.0257 0.0261 R@10 0.0325 0.0334 0.0346 0.0389 0.0397 0.0381 0.0395 0.0396 0.0398 0.0403 N@10 0.0169 0.0173 0.0283 0.0281 0.0287 0.0277 0.0258 0.0259 0.0260 0.0262 Clothing R@5 0.0128 0.0132 0.0162 0.0171 0.0173 0.0175 0.0172 0.0174 0.0176 0.0181 N@5 0.0078 0.0081 0.0119 0.0129 0.0125 0.0132 0.0158 0.0161 0.0163 0.0167 R@10 0.0219 0.0227 0.0223 0.0248 0.0241 0.0253 0.0255 0.0258 0.0261 0.0265 N@10 0.0105 0.0108 0.0135 0.0170 0.0167 0.0174 0.0191 0.0194 0.0196 0.0203 Technology Phones R@5 0.0331 0.0345 0.0415 0.0411 0.0423 0.0415 0.0428 0.0434 0.0431 0.0431 N@5 0.0215 0.0224 0.0283 0.0308 0.0315 0.0327 0.0392 0.0396 0.0401 0.0406 R@10 0.0524 0.0537 0.0618 0.0607 0.0613 0.0615 0.0589 0.0593 0.0614 0.0620 N@10 0.0278 0.0287 0.0346 0.0399 0.0406 0.0425 0.0493 0.0505 0.0506 0.0512 Electronics R@5 0.0179 0.0186 0.0213 0.0219 0.0228 0.0232 0.0235 0.0238 0.0237 0.0241 N@5 0.0118 0.0122 0.0148 0.0211 0.0214 0.0226 0.0229 0.0231 0.0230 0.0235 R@10 0.0276 0.0285 0.0315 0.0318 0.0322 0.0328 0.0336 0.0339 0.0338 0.0342 Entertainment Books R@5 0.0089 0.0093 0.0142 0.0175 0.0172 0.0181 0.0152 0.0155 0.0161 0.0192 N@5 0.0071 0.0076 0.0117 0.0178 0.0177 0.0180 0.0143 0.0148 0.0153 0.0187 R@10 0.0176 0.0182 0.0219 0.0224 0.0221 0.0230 0.0205 0.0211 0.0216 0.0237 Movies R@5 0.1503 0.1542 0.1562 0.1680 0.1652 0.1682 0.1588 0.1601 0.1613 0.1713 N@5 0.1015 0.1047 0.1063 0.1182 0.1156 0.1189 0.1092 0.1105 0.1149 0.1215 R@10 0.1798 0.1825 0.1753 0.1922 0.1893 0.1931 0.1854 0.1865 0.1878 0.1971
R@KandN@KdenoteRecallandNDCGat cutoff .Best results are in bold, and thebest baseline results are underlined.- The -tests showed
significant performance improvements().
6.2. Ablation Studies / Parameter Analysis
6.2.1. Ablation Study (RQ2)
To understand the contribution of each component of GenCDR, an ablation study was conducted. The results consistently show that each module plays a crucial role in the overall performance.
The following are the results of the ablation study on GenCDR components across four datasets (NDCG@10) (Table 3 from the original paper):
| Category | Variant | Phones | Electronics | Sports | Clothing |
| Full Model | GenCDR | 0.0512 | 0.0283 | 0.0262 | 0.0203 |
| Tokenization | w/o MTM | 0.0483 (↓5.7%) | 0.0267 (↓5.7%) | 0.0245 (↓6.5%) | 0.0190 (↓6.4%) |
| w/o Item Adapter | 0.0466 (↓9.0%) | 0.0255 (↓9.9%) | 0.0238 (↓9.2%) | 0.0183 (↓9.9%) | |
| Autoregressive Recommendation | w/o Specific Expert | 0.0448 (↓12.5%) | 0.0245 (↓13.4%) | 0.0226 (↓13.7%) | 0.0173 (↓14.8%) |
| w/o Universal Experts | 0.0425 (↓17.0%) | 0.0232 (↓18.0%) | 0.0212 (↓19.1%) | 0.0162 (↓20.2%) | |
| w/o MoE Gate (Avg.) | 0.0475 (↓7.2%) | 0.0262 (↓7.4%) | 0.0242 (↓7.6%) | 0.0186 (↓8.4%) | |
| Inference Strategy | w/o Prefix Tree | 0.0498 (↓2.7%) | 0.0274 (↓3.2%) | 0.0255 (↓2.7%) | 0.0198 (↓2.5%) |
- Impact of Contextual Code Modeling (
w/o MTM): Removing theMasked Token Modeling (MTM)loss from theDomain-adaptive Tokenizationmodule leads to a performance degradation (e.g., 5.7% on Phones, 6.5% on Sports). This confirms that learning the contextual relationships and "grammar" ofsemantic codesis vital, going beyond simple reconstruction of item features. - Impact of Item-specific Adaptation (
w/o Item Adapter): When theitem-specific adapter(domain-specificLoRAfor tokenization) is removed, performance drops significantly (e.g., 9.0% on Phones, 9.9% on Clothing). This validates the necessity of capturingdomain-specific item semanticsto refine theuniversal representations. - Impact of Domain-specific Expert (
w/o Specific Expert): Removing thedomain-specific experts(LoRA adapters) in theCross-Domain Autoregressive Recommendationmodule results in a substantial performance drop (e.g., 12.5% on Phones, 14.8% on Clothing). This highlights their crucial role in modeling fine-graineduser-wise domain-specific interests. - Impact of Universal Experts (
w/o Universal Experts): The most significant performance degradation is observed when alluniversal experts(LoRA adapters) are removed (e.g., 17.0% on Phones, 20.2% on Clothing). This clearly shows that a sharedcross-domain knowledge foundationfor user interests is indispensable for effectiveCDR. - Impact of MoE Gate (
w/o MoE Gate (Avg.)): Replacing the trainableMixture of Experts (MoE)gate (the user-level dynamic interest routing network) with simple averaging ofuniversalanddomain-specificpredictions leads to a noticeable performance decrease (e.g., 7.2% on Phones, 8.4% on Clothing). This emphasizes the importance of a dynamic, context-aware selection mechanism over naive fusion for combining different interest models. - Impact of Constrained Decoding (
w/o Prefix Tree): Removing theDomain-aware Prefix-treeconstraint results in a consistent but smaller performance drop (e.g., 2.7% on Phones, 2.5% on Clothing). This confirms its role in guaranteeing the generation ofvalid item IDsand preventinghallucinated(non-existent or semantically incorrect) recommendations, thus ensuring accuracy and efficiency.
6.2.2. In-depth Analysis (RQ3)
To qualitatively assess the framework's ability to learn disentangled representations, the authors visualized the final item representations () using t-SNE.
The following figure (Figure 3 from the original paper) shows t-SNE visualization of item embeddings in three different settings.
该图像是一个图表,展示了在三种不同设置下项目嵌入的 t-SNE 可视化。左侧的 (a) 中为原始项目嵌入,中间的 (b) 展示共享 LoRA 嵌入,而右侧的 (c) 则为特定领域的 LoRA 嵌入,每个类别通过不同颜色表现。
- Figure 3 (a): Original Item Embeddings. This visualization shows the raw item embeddings before any
tokenizationoradaptation. The items are mixed, with no clear separation by domain. - Figure 3 (b): Universal Adapters Only. When only
universal adaptersare used (withoutdomain-specific adapters),item embeddingsfrom different domains are still mixed together. While some clustering might occur based on shared semantic concepts,domain-specificdistinctions are not pronounced. This confirms thatuniversal knowledgealone is not sufficient to fully separatedomain-specificitem characteristics. - Figure 3 (c): Full GenCDR Model (with Domain-specific Adapters). In contrast, the full
GenCDRmodel, which includes bothuniversalanddomain-specific adapterswith dynamic routing, showsitem embeddingsforming clearly separateddomain-specific clusters. This visual evidence strongly supports the importance and effectiveness of thedomain-specific adaptationin learningdisentangled representations, allowing the model to distinguish items belonging to different domains even if they share some high-level semantic concepts (like "Apple" in Figure 1).
6.2.3. Hyper-parameter Analysis (RQ4)
The sensitivity of key hyperparameters for LoRA fine-tuning was analyzed on the Cloth dataset.
The following figure (Figure 4 from the original paper) shows the sensitivity of LoRA fine-tuning to key hyperparameters on the Cloth dataset.
该图像是图表,显示了在 Cloth 数据集上 LoRA 微调对关键超参数的敏感性。图中展示了不同数量的通用专家、LoRA 排名、LoRA Alpha 和 LoRA dropout 率对 Recall@5、Recall@10、NDCG@5 和 NDCG@10 指标值的影响。各个超参数的变化与相应的度量结果之间的关系被清晰地呈现,便于分析其对推荐系统性能的影响。
The plot shows the effect of varying:
-
Number of Universal Experts (): Performance generally improves up to an optimum (e.g., ), after which it might plateau or decline due to increased complexity or potential redundancy.
-
LoRA Rank (): The
rankparameter determines the capacity of theLoRA adapters. There is a clear optimalrank(e.g., ). A lowrankmight not capture enough information, while a very highrankcan lead tooverfitting(as it approaches full fine-tuning). -
Alpha (): This parameter scales the
LoRAupdates. An optimalalphavalue needs to be found that balances the contribution of theLoRAlayers with the base model. -
LoRA Dropout Rate: A small
LoRA Dropout Rate(e.g., 0.05) is shown to be effective for regularization, preventingoverfittingwithout overly hindering learning.These findings highlight a
balanced trade-off between model capacity and generalization, demonstrating the framework's robustness and tunability. Finding these optimal points ensures the model captures sufficient complexity without overfitting to the training data.
6.2.4. Analysis of Efficiency (RQ5)
6.2.4.1. Training Efficiency
GenCDR leverages LoRA-based fine-tuning, which offers significant efficiency advantages compared to full fine-tuning of Large Language Models (LLMs).
The following figure (Figure 5 from the original paper) shows a comparison of training efficiency using the Qwen2.5-7B model.
该图像是一个图表,比较了使用 Qwen2.5-7B 模型的训练效率。图中展示了 (a) 可训练参数数量(十进制对数尺度)、(b) 训练时间和 (c) 峰值 GPU 内存,分别体现了基于 LoRA 的 GenCDR 与完整微调方法的差异。
-
Figure 5 (a): Trainable Parameters (log scale).
LoRA-basedGenCDRdramatically reduces the number of trainable parameters compared toFull Fine-Tuning. This is a core benefit ofPEFTtechniques, making training feasible for largeLLMs. -
Figure 5 (b): Training Time. With fewer trainable parameters,
GenCDRrequires substantially less training time thanFull Fine-Tuning. -
Figure 5 (c): Peak GPU Memory.
LoRA-basedGenCDRalso consumes significantly less GPU memory, which is critical for training large models on available hardware.These results confirm that
GenCDR'sLoRA-centric architecture provides substantial training efficiency, making it practical to fine-tuneLLMsforcross-domain recommendation.
6.2.4.2. Inference Efficiency and Scalability
GenCDR also demonstrates superior inference efficiency and scalability, primarily due to its Domain-aware Prefix-tree constrained generative architecture.
The following figure (Figure 6 from the original paper) shows a comparison of runtime memory and inference time w.r.t. the item pool size for TriCDR, TIGER, and GenCDR (Qwen2.5-0.5B).
该图像是一个图表,展示了 TriCDR、TIGER 和 GenCDR 在不同项池大小下的运行内存和推理时间的比较。横轴表示项数(以 为单位),左侧纵轴表示内存(GB),右侧纵轴表示时间(s)。从图中可以看出,GenCDR 在内存和时间上均具有优势。
-
Runtime Memory:
GenCDRmaintains a relatively constant runtime memory footprint regardless of the item pool size. This is a crucial advantage for real-world applications with vast item catalogs. Baselines likeTriCDRandTIGERshow increasing memory consumption as the item pool grows. -
Inference Time: Similarly,
GenCDR's inference time remains stable even as the item pool size increases. This is because theprefix-treeeffectively prunes the search space, ensuring that theLLMonly considers validsemantic IDsequences, rather than exhaustively searching through all possible items. In contrast,TriCDRandTIGERexhibit increasing inference times with larger item pools, indicating a less scalable approach for very large item sets.This scalability makes
GenCDRhighly suitable for production environments where recommendation systems must operate efficiently with dynamically changing and extensive item catalogs.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper effectively addresses two critical challenges in LLM-based cross-domain recommendation (CDR): the item ID tokenization dilemma and insufficient domain-specific modeling. The authors propose GenCDR, a novel generative framework that leverages semantic IDs and adaptive mechanisms.
The key contributions and findings include:
- Generative Semantic ID Paradigm:
GenCDRintroducesgenerative semantic IDsto replace traditional itemIDs, enablingLLMsto process items based on their inherent meaning rather than arbitrary identifiers, thus resolving thetokenization dilemma. - Domain-adaptive Tokenization Module: This module dynamically generates hybrid
semantic IDsby fusinguniversalanddomain-specificitem knowledge, ensuring rich and contextually relevant representations. - Cross-Domain Autoregressive Recommendation Module: Symmetrically, this module models user preferences by adaptively fusing
universalanduser-wise domain-specificinterests through dynamic routing, leading to more personalized and accurate recommendations. - Domain-aware Prefix-tree for Inference: An efficient
prefix-treemechanism constrains theLLM's generative process, ensuring the output of valid and accuratesemantic IDswhile improving inference speed and scalability. - Superior Performance and Efficiency: Extensive experiments on multiple real-world datasets demonstrate that
GenCDRsignificantly outperforms state-of-the-art baselines in terms of recommendation accuracy and generalization. Furthermore, it exhibits superior training and inference efficiency compared tofull fine-tuningand othergenerative models, especially with large item pools.
7.2. Limitations & Future Work
The authors identify that GenCDR's current focus is primarily on textual features for generating semantic IDs. A stated direction for future work is to explore incorporating multimodal features (e.g., images, videos, audio) for richer representations. This suggests a current limitation where non-textual aspects of items might not be fully captured, potentially impacting performance in domains where visual or auditory information is highly discriminative.
7.3. Personal Insights & Critique
Inspirations
GenCDR offers several inspiring aspects:
- Bridging LLMs and Structured Data: The elegant solution of converting items into
generative semantic IDsis a powerful way to bridge the gap betweenLLMs(which excel at text) and structured recommendation data. This approach could be highly valuable in other domains whereLLMsneed to interact with non-textual entities. - Adaptive Knowledge Fusion: The dynamic routing mechanisms at both item and user levels for fusing
universalanddomain-specificknowledge are particularly insightful. This fine-grained control over knowledge transfer is crucial for complexmulti-domainsettings and could be adapted to othermulti-taskormeta-learningproblems where disentangling shared and specific features is important. - Practicality of Generative Models: The inclusion of a
Domain-aware Prefix-treehighlights a practical consideration often overlooked in theoreticalgenerative models. Ensuring valid and efficient generation is key to deployingLLM-based recommenders in real-world scenarios, and this mechanism offers a robust solution for constrained generation tasks.
Potential Issues, Unverified Assumptions, or Areas for Improvement
-
Reliance on Initial Item Feature Embeddings: The
Domain-adaptive Tokenizationmodule starts with item feature embeddings (). The quality and richness of these initial embeddings (derived from textual descriptions) are paramount. If the textual descriptions are sparse or low-quality, the generatedsemantic IDsmight not be as effective. The paper mentions "textual features," but the robustness to various qualities of textual data (e.g., short titles vs. long descriptions) could be explored. -
Complexity of Training Pipeline: While
LoRAimproves efficiency, the overall training pipeline involves multiple stages:RQ-VAEpre-training,domain-specific adapterfine-tuning for tokenization,universal expertfine-tuning for recommendation, anddomain-specific expertfine-tuning for recommendation, all with various hyperparameters. Managing and optimizing this multi-stage process can be complex and computationally intensive, especially for a novice. -
Generalizability of
RQ-VAEto Rare Items/Domains: TheRQ-VAEis pre-trained on all item features. How well does it handle items from very sparse domains or extremely rare items within a domain, especially if their textual descriptions are limited? Thesemantic IDsmight not be as well-defined or disentangled for suchlong-tailitems. -
Interpretability of
Semantic IDs: Whilesemantic IDsare intuitively more meaningful than arbitraryIDs, their exact interpretability for human understanding is not fully detailed. Can a sequence ofsemantic IDsbe easily translated back into human-understandable attributes or concepts? This could be important for debugging or explaining recommendations. -
Impact of
VIBRegularization: The paper states thatVIB regularizationis used for therouting networksto promotedisentangled representations. While conceptually sound, a deeper empirical analysis of how effectivelyVIBachieves this disentanglement in the context of routing decisions (beyond the t-SNE visualization) would strengthen the argument. For instance, analyzing the information content passed by the router under differentVIBstrengths.Overall,
GenCDRpresents a significant step forward inLLM-basedcross-domain recommendationby tackling fundamental representation and adaptation challenges. Its innovations offer a solid foundation for future research in generative and adaptive recommendation systems.
Similar papers
Recommended via semantic vector search.