SMILE: SeMantic Ids Enhanced CoLd Item Representation for Click-through Rate Prediction in E-commerce SEarch
TL;DR Summary
SMILE enhances cold-start item representation by fusing semantic IDs using RQ-OPQ encoding for shared and differentiated signals, significantly improving CTR and sales in large-scale industrial e-commerce experiments.
Abstract
With the rise of modern search and recommendation platforms, insufficient collaborative information of cold-start items exacerbates the Matthew effect of existing platform items, challenging platform diversity and becoming a longstanding issue. Existing methods align items' side content with collaborative information to transfer collaborative signals from high-popularity items to cold-start items. However, these methods fail to account for the asymmetry between collaboration and content, nor the fine-grained differences among items. To address these issues, we propose SMILE, an item representation enhancement approach based on fused alignment of semantic IDs. Specifically, we use RQ-OPQ encoding to quantize item content and collaborative information, followed by a two-step alignment: RQ encoding transfers shared collaborative signals across items, while OPQ encoding learns differentiated information of items. Comprehensive offline experiments on large-scale industrial datasets demonstrate superiority of SMILE, and rigorous online A/B tests confirm statistically significant improvements: item CTR +1.66%, buyers +1.57%, and order volume +2.17%.
English Analysis
1. Bibliographic Information
- Title: SMILE: SeMantic Ids Enhanced CoLd Item Representation for Click-through Rate Prediction in E-commerce SEarch.
- Authors: Qihang Zhao, Zhongbo Sun, Xiaoyang Zheng, Xian Guo, Siyuan Wang, Zihan Liang, Mingcan Peng, Ben Chen, and Chenyi Lei.
- Affiliations: The authors are primarily affiliated with Kuaishou Inc. and Kuaishou Technology, a major Chinese technology company known for its video-sharing mobile app and e-commerce platforms. This suggests the research is industry-driven and tested on large-scale, real-world systems.
- Journal/Conference: The paper template mentions "Conference acronym 'XX'", indicating it is a preprint manuscript prepared for submission to a conference. The specific venue is not yet determined or disclosed.
- Publication Year: The paper cites references up to 2025 and uses fake arXiv IDs (e.g., 2510.12604), suggesting it is a very recent or forthcoming work, likely from late 2024 or early 2025.
- Abstract: The paper addresses the "cold-start" item problem in e-commerce search, where new items lack user interaction data, leading to poor recommendation performance. Existing methods that align item content with collaborative signals are criticized for ignoring the asymmetry between these two information sources and the fine-grained differences between items. To solve this, the authors propose SMILE, a method that enhances item representations using semantic IDs. SMILE employs
RQ-OPQ
encoding to quantize item information into a hierarchical structure. It then uses a two-step alignment:RQ
encoding transfers shared collaborative signals, whileOPQ
encoding learns unique item differences. Experiments on large industrial datasets show SMILE's superiority, with significant online A/B test improvements in item CTR (+1.66%), buyer numbers (+1.57%), and order volume (+2.17%). - Original Source Link:
- Official Source:
https://arxiv.org/abs/2510.12604
- PDF Link:
https://arxiv.org/pdf/2510.12604v1.pdf
- Publication Status: The provided arXiv ID is a placeholder and does not correspond to a real paper as of late 2024. This document is a preprint, meaning it has not yet undergone formal peer review for publication in a conference or journal.
- Official Source:
2. Executive Summary
- Background & Motivation (Why):
- Core Problem: In large e-commerce platforms, new ("cold-start") items have little to no user interaction history. This lack of collaborative information (e.g., clicks, purchases) makes it difficult for standard recommendation models to accurately predict their click-through rate (CTR), causing them to be rarely shown to users.
- Importance: This issue exacerbates the Matthew effect ("the rich get richer"), where popular items become even more popular while new items are buried, harming platform diversity and the discovery of new products. With new items comprising a significant portion of platform updates (e.g., 30% monthly on Kuaishou), an effective cold-start solution is crucial for business health.
- Gaps in Prior Work:
- Generator-based methods (e.g., GANs, VAEs) synthesize representations for cold items but can suffer from distribution drift and ignore temporal dynamics.
- Knowledge alignment methods try to match item content features with collaborative signals. However, they wrongly assume these two information sources are symmetric and consistent. For example, two items with similar content (e.g., two white t-shirts) might have vastly different collaborative signals (one is a bestseller, one is not). These methods also fail to capture the fine-grained, unique characteristics that differentiate items.
- Main Contributions / Findings (What):
- A Novel Item Representation Method (SMILE): The paper proposes SMILE, a framework that enhances cold item representations by leveraging
semantic IDs
derived from theOneSearch
generative framework. - Adaptive Bidirectional Information Transfer: An innovative mechanism that adaptively transfers information between the collaborative signals stored in traditional item
IDs
and the semantic information in theRQ
part of the semantic ID. This alignment is weighted based on context, allowing it to handle warm and cold items differently. - Learning Differentiated Information via Contrastive Learning: For the first time, the paper uses the
OPQ
part of the semantic ID (which captures residual, unique features) in a contrastive learning setup. This forces the model to learn the fine-grained attributes that make each item distinct. - Validated Real-World Impact: SMILE is shown to be highly effective through extensive offline experiments and, crucially, a large-scale online A/B test, demonstrating statistically significant improvements in key business metrics like buyer conversion and order volume.
- A Novel Item Representation Method (SMILE): The paper proposes SMILE, a framework that enhances cold item representations by leveraging
3. Prerequisite Knowledge & Related Work
-
Foundational Concepts:
- Click-Through Rate (CTR) Prediction: A core task in computational advertising and recommendation systems. It involves predicting the probability that a user will click on an item when it is displayed. Models are typically trained on massive logs of user impressions and clicks.
- Cold-Start Problem: A classic challenge in recommendation systems. It occurs when a new user (user cold-start) or a new item (item cold-start) enters the system. Since there is no historical interaction data, collaborative filtering models cannot make accurate recommendations. This paper focuses on the item cold-start problem.
- Collaborative Information vs. Side Content Information:
- Collaborative Information: Signals derived from the collective behavior of users, such as clicks, purchases, ratings, and add-to-carts. It reveals latent relationships between users and items (e.g., "users who bought X also bought Y").
- Side Content Information: Intrinsic attributes of an item, such as its title, description, category, brand, and image. This information does not depend on user interactions.
- Matthew Effect: In recommendation systems, this refers to the phenomenon where popular items are recommended more frequently, leading them to accumulate even more interactions and become more popular, while unpopular or new items are starved of exposure.
- Semantic IDs: Unlike traditional random hash IDs, semantic IDs are structured identifiers generated by a model. They encode rich semantic information about the item directly into the ID itself. For example, different parts of the ID could correspond to category, style, and brand.
- Quantization Techniques:
- Product Quantization (PQ): A vector quantization technique that breaks a high-dimensional vector into smaller segments and quantizes each segment independently. This is used for efficient similarity search.
- Residual Quantization (RQ): A hierarchical quantization method. It first quantizes a vector, then computes the residual (the difference between the original vector and its quantized version), and then quantizes the residual. This process is repeated, creating a layered, coarse-to-fine representation.
- Optimized Product Quantization (OPQ): An extension of PQ that applies a rotation to the data before quantization to minimize the quantization error. It helps in creating more balanced and informative codebooks.
- The paper uses
RQ-OPQ
encoding fromOneSearch
, whereRQ
captures shared, hierarchical semantics andOPQ
captures the remaining unique, differentiated information.
- Contrastive Learning: A self-supervised learning technique where the model learns representations by pulling "positive" (similar) examples closer together in the embedding space while pushing "negative" (dissimilar) examples farther apart. The
InfoNCE
loss is a common objective function for this.
-
Previous Works:
- Generator-based Methods:
- Meta-learning (
Vartak et al.
,Luo et al.
): These methods train a "generator" model that can quickly adapt to new items with only a few examples. The paper criticizes them for not handling temporal feature drift well. - GANs (
Chen et al.
): Generative Adversarial Networks can be used to generate plausible embeddings for cold items by aligning their distribution with that of warm items. - VAEs (
Kong et al.
): Variational Autoencoders can learn a latent distribution from warm item data and then sample from it to generate embeddings for cold items.
- Meta-learning (
- Knowledge Alignment-based Methods:
- Contrastive Learning (
Wei et al.
,Xu et al.
): These methods use contrastive loss to force the content-based representation of an item to be similar to its collaboration-based representation. - Knowledge Distillation (
Zhuang et al.
): A "teacher" model trained on warm items (with full information) teaches a "student" model (which only sees content features for cold items) how to mimic its predictions. - Core Limitation: As highlighted in Figure 1, these methods assume a strong, symmetric alignment between content and collaboration, which is often not true. They also tend to ignore the unique characteristics of individual items, leading to suboptimal performance.
- Contrastive Learning (
- Generator-based Methods:
-
Differentiation: SMILE differentiates itself from prior work in two key ways:
- It explicitly addresses the asymmetry between content and collaboration. Instead of forcing a simple alignment, it uses an adaptive gate to fuse information from semantic content (
RQ
encoding) and collaborative history (item ID
) in a flexible, context-aware manner. - It is the first to use
OPQ
encoding to learn fine-grained item differences. WhileRQ
captures shared semantics (what makes an item similar to others),OPQ
captures the residual, unique details. By applying contrastive learning to theseOPQ
representations, SMILE learns to distinguish between very similar items, a capability missing in previous methods.
- It explicitly addresses the asymmetry between content and collaboration. Instead of forcing a simple alignment, it uses an adaptive gate to fuse information from semantic content (
4. Methodology (Core Technology & Implementation)
The core of SMILE is a framework for enhancing item representations by fusing information from traditional item IDs and structured semantic IDs. The process is illustrated in Figure 2.
该图像是论文中图2的示意图,展示了SMILE模型的整体框架及其基于RQ-OPQ编码的语义ID融合对齐机制,包含Embedding Lookup、编码过程及多层融合和多任务损失结构。
Figure 2: The framework of SMILE. This diagram shows the overall architecture. On the left, it takes as input the traditional random item id
and the 5-layer semantic id
(RQ1-3
, OPQ1-2
). The RQ
and item id
embeddings are fused via an adaptive gate. The OPQ
embeddings are enhanced via contrastive learning. The resulting representations are combined and fed into the main CTR model, which is trained with a multi-task loss.
-
Problem Formulation: The goal is to predict the click-through rate for an item. The model takes various features as input:
- Symbols:
- : The predicted probability of a click.
id
: The item's traditional random hash ID. Its embedding stores collaborative signals learned over time.- : The item's side content features (metadata, etc.).
- : The user's current search query.
- : The user's historical behavior sequence.
- : The user's profile features.
- : Cross features between user and item.
- : The model parameters. The model is trained by minimizing the Binary Cross-Entropy (BCE) loss:
- Symbols:
- : The total number of training samples.
- : The true label (1 if clicked, 0 otherwise) for the i-th sample.
- : The predicted click probability for the i-th sample.
- Symbols:
-
RQ-OPQ Encodings: SMILE is built upon semantic IDs generated by the
OneSearch
framework. This process involves:- Training a two-tower model to generate an initial embedding for each item based on its content and conversion signals.
- Applying a three-layer Residual Quantization (RQ) to this embedding. This creates the first three parts of the semantic ID () and captures coarse-to-fine shared semantic information.
- Applying a two-layer Optimized Product Quantization (OPQ) to the final residual vector from RQ. This creates the last two parts of the semantic ID () and captures the unique, discriminative features of the item. The final semantic ID is a five-part code: .
-
SMILE Framework Modules:
1. Adaptive Information Transfer via RQ Encoding: This module aims to fuse the semantic information from
RQ
codes with the collaborative signals from the traditionalitem id
embedding.-
First, the three
RQ
codebook embeddings are fused into a single representation, : TheFuse
function could be concatenation followed by a linear layer, or simple averaging. -
An adaptive transfer gate is computed by a DNN using context features. This gate determines how to mix the
id
andRQ
embeddings. The intuition is that for warm items, the model should rely more on the rich collaborative signal in , while for cold items, it should rely more on the generalizable semantic information in . -
The two embeddings are fused to create a coarse-grained representation :
-
This transfer is guided by a directional KL-divergence loss, , which enforces asymmetric knowledge transfer:
- Symbols & Intuition:
KL(P, Q)
: Kullback-Leibler divergence, measuring how distribution differs from a reference distribution .- : The
stop-gradient
operator. It detaches a variable from the computation graph, so no gradients flow back through it. It effectively treats the variable as a fixed target. - For warm items (where is large), the first term dominates. It pushes the
RQ
embedding () to match the distribution of theid
embedding (), thereby transferring collaborative knowledge to the semantic representation. - For cold items (where is small), the second term dominates. It pushes the
id
embedding () to match the distribution of theRQ
embedding (), thereby injecting semantic knowledge into the sparseid
representation.
- Symbols & Intuition:
2. Enhancing Item Differentiated Information with OPQ Encoding: This module uses contrastive learning on the
OPQ
encodings to learn fine-grained, unique item features.-
Positive/Negative Sample Selection:
- For each item in a training batch, its positive samples are other items in the same batch that have similar
OPQ
encodings. Similarity is pre-computed based on theOPQ
codebook vectors (Top-10 similar encodings are considered candidates). - Negative samples are all other items in the batch plus additional randomly sampled items.
- For each item in a training batch, its positive samples are other items in the same batch that have similar
-
Contrastive Loss (
InfoNCE
): The model is trained to distinguish positive pairs from negative pairs using the following loss:- Symbols:
- : The set of positive samples for item .
- : The set of negative samples for item .
- : Cosine similarity between the
OPQ
embeddings of item and item . - : A temperature hyperparameter that controls the sharpness of the distribution. A smaller makes the task harder by amplifying differences in similarity.
- Symbols:
-
Final Item Representation: The enhanced
OPQ
embedding is added to the coarse-grained representation to produce the final item embedding :- is a hyperparameter controlling the weight of the differentiated information. This final embedding replaces the original in the downstream CTR model.
-
-
Total Loss Function: The entire model is trained end-to-end by minimizing a combined loss function:
- and are hyperparameters that balance the main CTR prediction task with the two auxiliary representation learning tasks.
5. Experimental Setup
-
Datasets:
- Source: 91 days of user interaction logs from a large-scale online e-commerce platform.
- Split: The first 90 days were used for training, and the last day for testing.
- Size: The test set contains 500 million samples.
- Item Categorization:
- Warm Items: Items with more than 3 clicks or at least 1 order within a 7-day window.
- Cold-Start Items: Items with fewer than 200 impressions within a 7-day window.
- Distribution: The dataset consists of 400 million cold-start samples and 100 million warm samples, reflecting the real-world Pareto Principle (a small fraction of items receive most of the traffic).
-
Evaluation Metrics:
-
AUC (Area Under the Receiver Operating Characteristic Curve):
- Conceptual Definition: AUC measures the overall ranking quality of a model. It represents the probability that the model will rank a randomly chosen positive sample (a clicked item) higher than a randomly chosen negative sample (a non-clicked item). An AUC of 0.5 corresponds to random guessing, while 1.0 is a perfect classifier.
- Mathematical Formula: The ROC curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings. AUC is the area under this curve.
- Symbol Explanation:
-
GAUC (Grouped Area Under the Curve):
- Conceptual Definition: GAUC is designed for recommendation scenarios to mitigate the influence of highly active users. It calculates AUC separately for each user and then computes a weighted average, where the weights are typically the number of impressions or clicks for that user. This gives a better measure of personalized ranking performance.
- Mathematical Formula:
- Symbol Explanation:
- : The total number of users.
- : The AUC calculated only on the samples for user .
- : The weight for user , often the number of impressions shown to that user.
-
-
Baselines: The paper compares SMILE against other state-of-the-art methods that also leverage semantic IDs or focus on the cold-start problem.
SPM_SID
: A method that hashes sub-sequences of semantic IDs to capture item semantics adaptively.DAS
: A model that uses contrastive learning to jointly optimize the quantization and alignment of semantic IDs and traditional item IDs.SaviorRec
: A cold-start model that uses collaborative training on multimodal information and semantic IDs.
6. Results & Analysis
-
Core Results:
The main offline results are presented in Table 1, comparing SMILE against baselines across all, warm, and cold item sets.
(Manual transcription of Table 1 from the paper)
All Warm Cold AUC GAUC AUC GAUC AUC GAUC SPM_SID 0.8663 0.6322 0.8649 0.6319 0.8400 0.6203 DAS 0.8679 0.6352 0.8689 0.6357 0.8426 0.6237 SaviorRec 0.8687 0.6360 0.8695 0.6377 0.8435 0.6249 SMILE 0.8725 0.6394 0.8741 0.6405 0.8528 0.6301 - Analysis:
- Overall Performance (All): SMILE significantly outperforms all baselines, achieving the highest AUC (0.8725) and GAUC (0.6394). This demonstrates its superior overall ranking capability.
- Warm Items: SMILE also leads on warm items. This is an important finding, as it shows that even for items with sufficient collaborative data, the structured semantic information from
RQ-OPQ
and the refined representation learning process can further improve prediction accuracy. - Cold Items: This is where SMILE truly excels. The improvement margin over the next best baseline (
SaviorRec
) is substantial (AUC: +0.93pp, GAUC: +0.52pp). This strongly validates the core design of SMILE—the adaptive transfer of shared signals (RQ
) and the contrastive learning of unique features (OPQ
) are highly effective at compensating for the lack of collaborative information in cold-start items.
- Analysis:
-
Ablations / Parameter Sensitivity:
Table 2 presents an ablation study to dissect the contribution of each component in SMILE.
(Manual transcription of Table 2 from the paper)
All Warm Cold AUC GAUC AUC GAUC AUC GAUC only sid 0.8650 0.6312 0.8642 0.6305 0.8391 0.6192 iid+sid 0.8671 0.6331 0.8681 0.6335 0.8401 0.6215 iid+RQ 0.8702 0.6371 0.8725 0.6378 0.8479 0.6277 iid+OPQ 0.8683 0.6345 0.8706 0.6368 0.8455 0.6257 SMILE 0.8725 0.6394 0.8741 0.6405 0.8528 0.6301 - Analysis:
only sid
: Replacing the traditional ID (iid
) with the semantic ID (sid
) performs poorly on warm items, confirming that discarding the learned collaborative signals iniid
is detrimental.- : Simply adding the semantic ID as another feature provides a slight boost, showing that the semantic information is useful but naive fusion is not optimal.
- : This variant includes the traditional ID and the adaptive transfer module for
RQ
. It achieves a very strong performance, especially on cold items. This highlights the effectiveness of the adaptive bidirectional knowledge transfer. - : This variant includes the traditional ID and the contrastive learning module for
OPQ
. It also brings significant benefits, confirming that learning differentiated item information is crucial. SMILE
: The full model, combining both theRQ
andOPQ
modules, achieves the best performance across all categories. This shows that the two components are complementary: theRQ
module provides a robust, coarse-grained semantic foundation, while theOPQ
module adds the fine-grained, discriminative details.
- Analysis:
-
Online A/B Testing:
The most compelling evidence of SMILE's effectiveness comes from a rigorous online A/B test.
(Manual transcription of Table 3 from the paper)
All Cold Buyer Order Volume Buyer Order Volume base - - - - SMILE +1.720% +2.230% +3.512% +9.639% - Analysis:
- The results are statistically significant and demonstrate a strong business impact. SMILE improved the number of unique buyers by +1.72% and the total order volume by +2.23% overall.
- The impact on the cold-start scenario is even more dramatic, with a +3.512% increase in buyers and a remarkable +9.639% increase in order volume for cold items. This confirms that by improving CTR prediction for new items, SMILE successfully drives their discovery and conversion, directly addressing the core business problem.
- Analysis:
7. Conclusion & Reflections
-
Conclusion Summary: The paper introduces SMILE, an effective and novel solution to the item cold-start problem in e-commerce search. By leveraging
RQ-OPQ
semantic IDs, SMILE implements a sophisticated two-part strategy: an adaptive transfer mechanism to fuse shared semantic (RQ
) and collaborative (id
) information, and a contrastive learning approach to enhance fine-grained discriminative features (OPQ
). The method is validated through comprehensive offline experiments and a large-scale online A/B test, proving its state-of-the-art performance and significant real-world business value. -
Limitations & Future Work: The authors briefly mention their plan for future work: exploring the fusion of multimodal semantics (e.g., from images and text) with collaborative signals to handle even more complex cold-start scenarios where item content is richer and more varied.
-
Personal Insights & Critique:
- Strengths:
- The paper's core idea of treating shared semantics (
RQ
) and unique features (OPQ
) differently is highly intuitive and powerful. It moves beyond the simplistic assumption of content-collaboration symmetry that limited previous work. - The validation is exceptionally strong. The combination of detailed offline analysis, extensive ablation studies, and a successful large-scale online A/B test provides compelling evidence of the method's effectiveness and practical utility.
- The adaptive transfer mechanism using a gate network is a clever way to dynamically adjust the model's behavior for warm and cold items without needing separate models.
- The paper's core idea of treating shared semantics (
- Potential Limitations & Open Questions:
- Dependency on
OneSearch
: The entire SMILE framework is dependent on the availability of high-qualityRQ-OPQ
semantic IDs from theOneSearch
system. This makes the method less portable to organizations that do not have a similar generative search framework in place. Replicating theRQ-OPQ
generation process itself is a non-trivial undertaking. - Positive Sampling in Contrastive Learning: The strategy of defining positive samples as items with similar
OPQ
encodings within a batch is pragmatic but could be noisy. There is no guarantee that items with similar residual features are truly "positive" pairs in a semantic or user-preference sense. The effectiveness might depend heavily on the quality of the initialOPQ
codebooks. - Hyperparameter Sensitivity: The model has several key hyperparameters (). While the paper provides the values used, it doesn't discuss their sensitivity, which could be important for practitioners looking to implement the method.
- Dependency on
- Overall Impact: This paper presents a significant advancement in industrial recommender systems for the cold-start problem. Its success demonstrates a clear path forward: moving from simple feature alignment to more nuanced, structured fusion of different information sources. The methodology is likely to be influential for practitioners at other large-scale e-commerce and content platforms.
- Strengths:
Similar papers
Recommended via semantic vector search.
Discussion
Leave a comment
No comments yet. Start the discussion!