UniSearch: Rethinking Search System with a Unified Generative Architecture
TL;DR Summary
UniSearch introduces a unified generative search framework for Kuaishou, replacing the traditional cascaded architecture. By integrating a Search Generator and Video Encoder, it achieves end-to-end optimization, addressing objective inconsistency and limited generalization, thus
Abstract
Modern search systems play a crucial role in facilitating information acquisition. Traditional search engines typically rely on a cascaded architecture, where results are retrieved through recall, pre-ranking, and ranking stages. The complexity of designing and maintaining multiple modules makes it difficult to achieve holistic performance gains. Recent advances in generative recommendation have motivated the exploration of unified generative search as an alternative. However, existing approaches are not genuinely end-to-end: they typically train an item encoder to tokenize candidates first and then optimize a generator separately, leading to objective inconsistency and limited generalization. To address these limitations, we propose UniSearch, a unified generative search framework for Kuaishou Search. UniSearch replaces the cascaded pipeline with an end-to-end architecture that integrates a Search Generator and a Video Encoder. The Generator produces semantic identifiers of relevant items given a user query, while the Video Encoder learns latent item embeddings and provides their tokenized representations. A unified training framework jointly optimizes both components, enabling mutual enhancement and improving representation quality and generation accuracy. Furthermore, we introduce Search Preference Optimization (SPO), which leverages a reward model and real user feedback to better align generation with user preferences. Extensive experiments on industrial-scale datasets, together with online A/B testing in both short-video and live search scenarios, demonstrate the strong effectiveness and deployment potential of UniSearch. Notably, its deployment in live search yields the largest single-experiment improvement in recent years of our product's history, highlighting its practical value for real-world applications.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
UniSearch: Rethinking Search System with a Unified Generative Architecture
1.2. Authors
Jiahui Chen†, Xiaoze Jiang*†, Zhibo Wang†, Quanzhi Zhu†, Junyao Zhao†, Feng Hu†, Kang Pan, Ao Xie, Maohua Pei, Zhiheng Qin, Hongjing Zhang, Zhixin Zhai, Xiaobo Guo, Runbin Zhou, Kefeng Wang, Mingyang Geng, Cheng Chen, Jingshan Lv, Yupeng Huang, Xiao Liang, Han Li The authors are primarily affiliated with Kuaishou Technology, Beijing, China, indicating a strong industry research focus on practical applications and large-scale deployment within a real-world product context.
1.3. Journal/Conference
Preprint Version of UniSearch, Beijing, China. ACM. The paper is published as a preprint, meaning it has not yet undergone formal peer review for a specific conference or journal. However, the ACM citation format indicates an intention or submission to an ACM-affiliated venue. Preprints are common in fast-moving fields like AI/ML for rapid dissemination of research findings.
1.4. Publication Year
2025
1.5. Abstract
This paper introduces UniSearch, a unified generative search framework developed for Kuaishou Search. Modern search systems traditionally use a cascaded architecture (recall, pre-ranking, ranking), which is complex and often leads to performance inconsistencies due to distinct optimization objectives across modules. UniSearch addresses this by proposing an end-to-end architecture that integrates a Search Generator and a Video Encoder. The Generator creates semantic identifiers (SIDs) of relevant items from user queries, while the Video Encoder learns latent item embeddings and tokenizes them into SIDs. A unified training framework jointly optimizes both components, enhancing representation quality and generation accuracy. Additionally, UniSearch employs Search Preference Optimization (SPO) which uses a reward model and real user feedback to align generated results with user preferences. Extensive offline experiments and online A/B tests on Kuaishou's industrial-scale short-video and live search systems demonstrate UniSearch's effectiveness. Notably, its deployment in live search achieved the largest single-experiment improvement in the product's recent history, underscoring its practical value.
1.6. Original Source Link
https://arxiv.org/abs/2509.06887v2 PDF Link: https://arxiv.org/pdf/2509.06887v2.pdf This paper is available as a preprint on arXiv, indicated by the 'v2' in the link, meaning it's the second version posted.
2. Executive Summary
2.1. Background & Motivation
Core Problem: Traditional search systems, widely used in industrial applications, predominantly rely on a multi-stage cascaded architecture (MCA). This architecture typically involves sequential stages like recall, pre-ranking, and ranking. While refined over time, this design suffers from fundamental issues:
-
Objective Misalignment: Each stage uses distinct models optimized with separate objectives, leading to
misaligned training signalsacross the entire pipeline. This prevents optimal end-to-end relevance modeling and limits overall performance. -
Complexity and Overhead: The multi-stage structure introduces substantial
inference latencyand highmaintenance overhead, as each stage requires dedicated algorithmic design, development, and system support.Why this problem is important: In large-scale industrial search applications like Kuaishou, efficient and highly relevant information retrieval is crucial for user satisfaction and business metrics. The limitations of MCA directly impact the user experience, development costs, and the system's ability to adapt and improve holistically.
Challenges in extending generative models to search: While generative recommendation has shown promise in replacing cascaded pipelines, existing approaches are often not truly end-to-end. They typically separate item encoding (tokenization) from generation, leading to objective inconsistency and limited generalization. Furthermore, applying generative models to search is more complex than recommendation because search involves cross-domain generation (text query to item), whereas recommendation often stays intra-domain (user/item embeddings to item). The exploration of unified generative methods for the full cascaded search architecture remains underexplored.
Paper's Innovative Idea: The paper proposes UniSearch, a unified generative search framework that replaces the entire cascaded search pipeline with a single end-to-end architecture. It addresses the limitations of previous generative approaches by jointly optimizing item encoding and query-to-item generation within a unified training framework, directly bridging the gap between text queries and item representations.
2.2. Main Contributions / Findings
The paper makes several primary contributions to the field of generative search:
- Unified End-to-End Generative Search Solution:
UniSearchis presented as a deployed, industrial-scale generative search solution that replaces the traditionalmulti-stage cascaded architecture. It demonstrates superior retrieval quality and computational efficiency compared to its traditional counterparts. This marks a significant step towards fully unified search systems. - Novel Unified Training Framework: The paper introduces a
unified training frameworkthat seamlessly integratesitem tokenizationandgenerative modeling. This contrasts with prior methods that separate these tasks, leading toobjective inconsistency.UniSearch's approach ensures semantic consistency and mutual enhancement betweenvideo encodingandquery generation, resulting in more effective item representations and higher-quality generated results. Key components of this framework include:Residual Contrastive LearningwithCoarse-to-Fine Strategyfor semantic alignment between queries and videos.VQ-VAEforend-to-end discretizationof latent embeddings intosemantic IDsduring training.Reject-Sampling Generative Trainingto prioritize high-quality item generation.
- Search Preference Optimization (SPO) for Alignment:
UniSearchincorporates an onlinepost-trainingphase calledSearch Preference Optimization (SPO). This mechanism leverages areward modelcombining system-estimated signals and realuser feedback(e.g., clicks, watch time) to refine the generation policy. This ensures that the generated results are better aligned with actual user preferences. - Industrial-Scale Deployment and Validation:
UniSearchhas been successfully deployed in Kuaishou'sliveandshort-video search systems, serving hundreds of millions of active users. Large-scale onlineA/B experimentsconfirm its practical effectiveness and efficiency.- Notably, its deployment in
live searchresulted in the largest single-experiment improvement in Kuaishou's product history, significantly boosting metrics likeTotal Play Counts (TPC). - The framework demonstrates enhanced performance on
long-tail queriesandnew users, and producesricher and more diverse results, addressing common challenges in large-scale search.
- Notably, its deployment in
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
- Cascaded Architecture in Search Systems: This is a traditional multi-stage pipeline used in most industrial search engines. It processes a search query through several sequential filtering steps to narrow down a vast item corpus to a relevant few.
- Recall (Retrieval): The initial stage, responsible for efficiently retrieving a large set of candidate items (e.g., thousands or millions) that are broadly relevant to the query. This is often done using inverted indices or dual-encoder models.
- Pre-ranking: A filtering stage that further reduces the candidate set (e.g., to hundreds) for efficiency, using computationally lighter models than ranking.
- Ranking: The final stage, employing complex and computationally expensive models to precisely order the remaining candidates for presentation to the user, optimizing for relevance, diversity, and other quality metrics.
- Generative Models: Models capable of generating new data instances that resemble the training data. In the context of this paper, it refers to models that generate
semantic identifiersof items based on a query, effectively performing retrieval as a generation task.- Transformer: A neural network architecture introduced in 2017, foundational to modern
Large Language Models (LLMs). It relies heavily onself-attention mechanismsto process sequences, allowing it to weigh the importance of different parts of the input sequence when processing each element. - Encoder-Decoder Paradigm: A common architecture for
sequence-to-sequence (seq2seq)tasks. Anencoderprocesses the input sequence (e.g., query) into acontext vector, and adecodergenerates an output sequence (e.g., item IDs) based on this context. - Autoregressive Generation: A process where each token in the output sequence is generated conditioned on the previously generated tokens and the input.
- Transformer: A neural network architecture introduced in 2017, foundational to modern
- Vector Quantized Variational Autoencoder (VQ-VAE): A type of generative model that learns discrete latent representations (a codebook) for data. It's used here to convert continuous
video embeddingsintosemantic identifiers.- Latent Embeddings: Numerical representations (vectors) that capture the underlying semantic information of an item (e.g., a video). These are typically high-dimensional.
- Codebook: In
VQ-VAE, a discrete set of learned vectors. When a continuouslatent embeddingis generated, it is "quantized" by finding the closest vector in thiscodebook, and the index of that closest vector becomes thesemantic ID. - Semantic Identifiers (SIDs): Discrete tokens or indices representing the meaning or characteristics of an item. These are what the
Search Generatorultimately produces.
- Contrastive Learning: A self-supervised learning technique where the model learns to pull similar samples closer together in the embedding space and push dissimilar samples further apart. It requires defining positive and negative pairs.
- Reinforcement Learning (RL): A paradigm where an
agentlearns to make decisions by performing actions in anenvironmentto maximize acumulative reward. In this paper, it's used forSearch Preference Optimization (SPO)to align generated results with user feedback.- Reward Model: A model that assigns a numerical
rewardscore to an action or outcome, guiding theRL agenttowards desired behaviors. - Policy: In
RL, thepolicyis the strategy that theagentuses to determine its next action based on the current state. Here, it refers to theUniSearchmodel's generation behavior.
- Reward Model: A model that assigns a numerical
- Trie (Prefix Tree): A tree-like data structure that efficiently stores and retrieves a dynamic set of strings or sequences, organized by their prefixes. It's used here to constrain the generation of
semantic IDsto only valid sequences.
3.2. Previous Works
The paper primarily positions itself against two main lines of previous work:
-
Traditional Cascaded Search Systems (MCA):
- Description: These systems (e.g., [8, 28, 29, 33, 36]) involve
recall,pre-ranking, andrankingstages. Modern versions often useBERT-based discriminative models[3] withdual-tower architecturesfor recall (contrastive learning) and pre-ranking (post-interaction), andsingle-tower fully interactive architecturesfor fine-ranking (point-wise learning). - Relevance to UniSearch:
UniSearchaims to replace this entireMCAwith a singlegenerative model, unifying objectives and improving end-to-end consistency. The paper highlights themisaligned training signalsandhigh maintenance overheadas key drawbacks of MCA thatUniSearchintends to solve.
- Description: These systems (e.g., [8, 28, 29, 33, 36]) involve
-
Existing Generative Retrieval and OneModels:
- Description: This newer paradigm (e.g., [10-12, 15, 17, 19-22, 26, 34, 37]) formulates retrieval as a text generation task, where document identifiers or passages are generated. In recommendation,
OneRec[2, 40],OneSug[5], and other "OneModel" approaches [25, 27, 39] have explored replacing cascaded pipelines with generative models. - Relevance to UniSearch:
UniSearchbuilds upon this trend but points out a critical limitation: most existing generative approaches are not truly end-to-end. They often involve two separate training steps:- Training an
item encoderto produce embeddings andsemantic tokens(e.g., usingVQ-VAE,FSQ[16],RQ-Kmeans). - Separately training a
generatorto predict these tokens. Thisdiscrepancy between tokenization and generation objectivesleads toinconsistent optimizationandsuboptimal generation quality. Furthermore, many rely onpre-tokenized candidate items, limitinggeneralizationto unseen content. Some studies only apply generative models to early stages like recall [15, 31], not the full architecture.
- Training an
- Description: This newer paradigm (e.g., [10-12, 15, 17, 19-22, 26, 34, 37]) formulates retrieval as a text generation task, where document identifiers or passages are generated. In recommendation,
3.3. Technological Evolution
Search technology has evolved from simple inverted indexing to sophisticated deep neural models. The shift from lexical matching to semantic understanding has been driven by advances in natural language processing (NLP) and deep learning. BERT [3] and other Transformer-based models [9, 18, 23, 30] have enabled powerful semantic matching and relevance modeling.
Recently, the success of Large Language Models (LLMs) and generative AI has inspired a new paradigm: generative retrieval. Instead of just matching or ranking items, systems can now generate identifiers for relevant items directly. This represents a move towards more unified and intelligent information access. UniSearch contributes to this evolution by proposing a fully end-to-end generative framework for search, aiming to overcome the inherent limitations of multi-stage systems.
3.4. Differentiation Analysis
Compared to the main methods in related work, UniSearch's core differences and innovations are:
- Unified End-to-End Architecture for Full Search Pipeline: Unlike traditional
MCAsystems that use distinct models forrecall,pre-ranking, andranking,UniSearchreplaces the entire pipeline with a singleSearch GeneratorandVideo Encoder. This aims to resolve theobjective misalignmentandmaintenance overheadof MCA. - Joint Optimization of Tokenization and Generation: A key differentiator from existing
generative recommendationandretrievalmethods (e.g.,OneRec,OneSug, or those usingVQ-VAEorRQ-Kmeansas separate steps) isUniSearch'sunified pre-training framework. It jointly optimizesvideo encoding(which includesdiscretizationintosemantic IDs) andquery-to-video generation. This eliminates theobjective inconsistencyandsuboptimal generation qualityoften seen when these tasks are trained separately. - Residual Contrastive Learning with Coarse-to-Fine Strategy:
UniSearchintroducesresidual contrastive learningcombined with acoarse-to-fine optimization strategy. This novel approach allows theVideo Encoderto learn robustrecall capabilitiesthrough initial tokens and then progressively refinerankingandpersonalizationwith subsequent tokens. This is explicitly designed to addresstoken redundancyandpath collapseissues found in otherdiscrete representation learningmethods. - Online Reinforcement Learning for Preference Alignment (SPO): Beyond supervised pre-training,
UniSearchintegratesSearch Preference Optimization (SPO). Thispost-trainingphase uses areward modelthat combines both system-estimated signals and realuser feedback(clicks, watch time) to continuously refine the model's generation policy. This ensures a stronger alignment with dynamic user preferences, a capability often lacking in purely offline-trained models. - Robustness in Search Context: While previous generative models might focus on
recommendation(intra-domain mapping),UniSearchtackles the more challengingcross-domain generationfrom text queries directly to items in a search context, explicitly addressing the unique requirements and large-scale nature of industrial search. - Practical Deployment and Efficiency: The integration of
Trie-based inferencefor constrained generation,TensorRTacceleration, and aKV-cacheensures thatUniSearchis not only effective but also efficient and scalable for real-world industrial deployment, a critical aspect often overlooked in academic prototypes.
4. Methodology
The UniSearch framework aims to replace traditional cascaded search systems with a unified, end-to-end generative architecture. This section details its core components, training scheme, inference process, and deployment.
4.1. Principles
The core idea behind UniSearch is to treat the entire search process, from query understanding to item retrieval, as a single sequence generation task. Instead of multiple, loosely coupled models, UniSearch uses a single encoder-decoder generative model to directly produce semantic identifiers of relevant items given a user query. The theoretical basis is that by unifying the objectives of item representation learning and query-to-item generation, the system can achieve better end-to-end relevance modeling, overcome objective inconsistency, and reduce system complexity. The intuition is to enable the model to "speak the language" of items (via semantic IDs) based on the user's "language" (via query text).
4.2. Core Methodology In-depth (Layer by Layer)
UniSearch consists of two main components: a Search Generator and a Video Encoder, which are optimized together through a unified pre-training framework and further refined by Search Preference Optimization (SPO).
4.2.1. UniSearch Architecture
As illustrated in Figure 2 (a) from the original paper, UniSearch's architecture comprises the Search Generator and the Video Encoder.
The following figure (Figure 2 from the original paper) shows the architecture of UniSearch and its deployment:
该图像是示意图,展示了UniSearch的模型架构及其在线部署与后续训练的流程。上半部分展示了搜索生成器和视频编码器的统一预训练,包括query的输入和不同层次的特征表示;下半部分则描述了UniSearch在强化学习训练系统、Trie服务器和奖励系统中的应用与更新机制。
4.2.1.1. Search Generator
The Search Generator follows an encoder-decoder generative paradigm, similar to architectures like BART.
- Encoder: This component is a
bidirectional Transformer. It takes two main inputs:- The
user query(textual input). Auxiliary user features(e.g.,GSU sequencesrepresenting personalized historical behaviors). To capture the holistic semantics of a search request, a special token is prepended to the input sequence. The representation corresponding to this token, after being processed by the encoder, serves as theglobal query embedding. This effectively summarizes the user's intent for the current search.
- The
- Decoder: This component receives the
contextualized query representationfrom the encoder. It thenautoregressively generatesa sequence of tokens. These tokens are not raw text but rathersemantic IDs(SIDs) of relevant videos. This process transforms a natural language query into a structured semantic representation of the desired search results.
4.2.1.2. Video Encoder
The Video Encoder is a unidirectional Transformer designed to learn latent embeddings and semantic IDs for each video item.
- Input: Each video is represented by a combination of:
Textual metadata(e.g., title, description).Multi-modal content features(e.g., visual features from video frames, audio features).Side statistical features(e.g., view counts, engagement metrics). These features are concatenated withlearnable tokensand passed through theTransformer.
- Output: The Transformer produces a sequence of
latent embeddings: . These embeddings capture the semantic properties of the video in a continuous vector space. - Discretization with VQ-VAE: To convert these continuous
latent embeddingsintosemantic identifiers(SIDs), theVideo Encoderis augmented with aVQ-VAEmodule. This module maps thelatent embeddingsinto acodebook space. The output is a sequence ofsemantic IDs: . ThisSID sequenceprovides a compact and interpretable representation of each video, crucial for indexing and retrieval within the generative framework.
4.2.2. Unified Pre-training
The unified pre-training framework is designed to enable the Search Generator and Video Encoder to learn cooperatively. It jointly optimizes video encoding and query-to-video generation, addressing the semantic consistency and task misalignment issues of prior methods.
- Pre-training Data: Sampled from large-scale search logs, each record includes:
query: The user's search query.user features: Contextual information about the user.candidate video set: Videos considered for the query (exposed and unexposed negatives).label set: Graded relevance annotations for each candidate, reflecting query-video relevance and video quality.
4.2.2.1. Residual Contrastive Learning for Semantic Alignment
This objective aligns the semantics of user queries (from the Generator encoder) with the latent representations of relevant items (from the Video Encoder).
-
Let be the
query embeddingfrom theGenerator encoder. -
Let be the latent representation of the -th candidate video encoded by the
Video Encoder, specifically its -th residual embedding is .The
residual contrastive learningformulation is defined by the following equations:
$ \mathcal{L}{\mathrm{contrast}} = \sum{n=1}^{k} \mathcal{L} \Big( q_i, \boldsymbol{\mathrm{sg}} \big[ \sum_{m < n} d_i^{(m)} \big] + d_i^{(n)} \Big) $ Where:
-
is the total contrastive loss.
-
is the number of latent embeddings (or levels in the codebook) for each video.
-
iterates from
1to , representing eachresidual token. -
is the
query embeddingfor the -th query. -
denotes the
stop-gradientoperation. This means the gradient does not flow through the sum of previous residual embeddings, ensuring that each current residual focuses on learning new, complementary information. -
is the sum of previous residual embeddings for video .
-
is the current -th residual embedding for video .
-
The term represents the accumulated information up to the current -th residual, where the current residual is actively learned to contribute to the overall representation.
The specific contrastive loss function used for a query and a document (video) representation is:
$ \mathcal{L}(q, d) = - \log \frac{\exp\left( \mathrm{sim}(q, d) / \tau \right)}{\exp\left( \mathrm{sim}(q, d) / \tau \right) + \sum_{d^- \in N} \exp\left( \mathrm{sim}(q, d^-) / \tau \right)} $ Where:
- is a similarity function between the query and the document representation .
- is the
temperature parameter, which controls the sharpness of the probability distribution. - is the
negative sample set, consisting of irrelevant document representations. - In contrast to conventional cosine similarity, the paper adopts
L2 similarity: . This choice is stated to better fit the residual accumulation mechanism and preserve the geometric structure of embeddings.
Coarse-to-Fine Strategy:
This strategy mimics the cascaded ranking process of traditional search to strengthen semantic alignment.
- The first residual token is trained for an "easy",
coarse-grained matching task, ensuring strongrecall capability. This is achieved by using "simple in-batch negatives" for thenegative sample set. - Subsequent tokens progressively handle more complex tasks, refining the representation for
fine-grained rankingandpersonalization. For these tokens,increasingly hard negativesare sampled from semantically similar but irrelevant items. This staged learning helps decompose the task and preventstoken redundancy, making efficient use of thecodebook.
4.2.2.2. Discretization into Semantic IDs
The Video Encoder must discretize latent embeddings into semantic IDs for the generative model. VQ-VAE [24] is employed for this, allowing the codebook to be updated jointly during training, unlike post-clustering methods.
For each embedding , the VQ-VAE encoder performs a nearest-neighbor lookup in a learnable codebook to obtain a quantized embedding and its corresponding semantic ID . The sequence of quantized embeddings is then used to reconstruct the original embeddings .
A codebook loss is introduced to ensure that approximates :
$ \mathcal{L}{\mathrm{codebook}} = \sum{n=1}^{k} \alpha_1 || \mathrm{sg}[d_i^{(n)}] - e_i^{(n)} ||_2^2 + \alpha_2 || d_i^{(n)} - \mathrm{sg}[e_i^{(n)}] ||_2^2 $ Where:
- is the codebook loss.
- is the
stop-gradientoperation. - and are hyperparameters balancing the terms.
- The first term, , is the
codebook lossitself, pulling thequantized embeddingtowards theencoder output(with gradients only flowing to ). - The second term, , is the
commitment loss, ensuring that theencoder outputstays close to theselected codebook vector(with gradients only flowing to ). TheSimVQstrategy [41] is used to stabilize discretization and preventcodebook collapse.
4.2.2.3. Reject-Sampling Generative Training
Once items are mapped to semantic IDs, the Generator is trained to produce these sequences autoregressively using a next-token prediction objective with cross-entropy loss.
To enhance generation quality, reject-sampling strategies are used:
-
Low-quality items (identified by labels) are filtered out.
-
Losses for items of different quality levels are reweighted.
The
next-token prediction lossis:
$ \mathcal{L}{\mathrm{NTP}} = - w_i \sum{n=1}^{k} \log p \big( s_i^{(n)} | q, u, s_i^{(<n)} \big) $ Where:
- is the next-token prediction loss.
- is a
re-weight factordetermined by the label of the item, ensuring theGeneratorprioritizes high-relevance, high-quality items. - is the probability of generating the -th
semantic IDgiven the query , user context , and previously generatedsemantic IDs.
4.2.2.4. Overall Unified Pre-training Loss
The total pre-training loss combines these objectives:
$ \begin{array} { r } { \mathcal { L } = \lambda _ { 1 } \mathcal { L } _ { \mathrm { c o n t r a s t } } + \lambda _ { 2 } \mathcal { L } _ { \mathrm { c o d e b o o k } } + \lambda _ { 3 } \mathcal { L } _ { \mathrm { NTP } } } \end{array} $ Where:
- are hyperparameters used to balance the loss scales and stabilize the training process. This joint optimization provides a robust foundation for subsequent training and inference.
4.2.3. Post-training with Search Preference Optimization (SPO)
After unified pre-training, UniSearch undergoes an online post-training phase called Search Preference Optimization (SPO) to further align its generation with user preferences. This is an online reinforcement learning framework conducted in a controlled online environment (see Figure 2 (b)).
4.2.3.1. Reward System
To leverage existing advancements in ranking models, UniSearch uses the production system's fine-ranking module as its Reward System.
-
Inputs:
query,user context, andcandidate video features. -
Outputs: Multiple predictive scores covering
query-item relevance,video quality, andexpected user satisfaction. These form part of thereward signal. -
User Interaction Feedback: Explicit feedback (clicks, watch time, likes, downloads) collected after videos are presented to users provides an additional
reward signal.The final
reward scoreis a weighted combination ofsystem-estimated scoresandobserved user interactions:
$ R = \gamma_1 R_{\mathrm{system}} + \gamma_2 R_{\mathrm{interaction}} $ Where:
- represents scores from the
system's fine-ranking module. - represents scores derived from
user interaction feedback. - and are weights to balance the relative importance of these two types of signals.
4.2.3.2. Search Preference Optimization (SPO)
From online system logs, generated search result sequences are collected for a given query, along with their corresponding reward scores . Each sequence is composed of semantic IDs: . UniSearch assigns a generation probability to each sequence, conditioned on the query and user context .
Following GRPO [14], the relative advantage of each candidate is computed by normalizing its reward against the expected reward of the batch:
$ A_i = \frac{R_i - \operatorname*{mean}({R_1, R_2, \ldots, R_G})}{\operatorname*{std}({R_1, R_2, \ldots, R_G})} $ Where:
- is the
relative advantagefor the -th generated sequence. - and are the mean and standard deviation of the rewards within the batch, respectively. This formulation ensures that the model prioritizes results better than the batch average and suppresses worse ones.
The final SPO optimization objective is defined as:
$ \begin{array}{c} { { { \cal L } _ { S P O } ( \theta ) = - \displaystyle \frac { 1 } { G } \sum _ { i = 1 } ^ { G } \displaystyle \frac { 1 } { k } \sum _ { n = 1 } ^ { k } \Bigl ( \displaystyle \frac { \pi _ { \theta } \bigl ( s _ { i } ^ { ( n ) } | q , u , s _ { i } ^ { ( < n ) } \bigr ) } { \pi _ { \theta _ { \mathrm { n o - g r a d } } } \bigl ( s _ _ { i } ^ { ( n ) } | q , u , s _ { i } ^ { ( < n ) } \bigr ) } A _ { \theta } } } \ { { - \beta { \mathbb D } _ { \mathrm { K L } } [ \pi _ { \theta } | | \pi _ { r e f } ] \Bigr ) } } \end{array} $ Where:
- is the
Search Preference Optimizationloss for model parameters . - is the number of generated search results in the batch.
- is the length of the
semantic IDsequence. - is the probability of generating the -th
semantic IDby the current policy (model) . - is the probability of generation by a
reference policy, typically the policy before the current update, where gradients arestopped(no-grad). This term forms aratiowith the current policy, central topolicy gradient methodslikeProximal Policy Optimization (PPO). - is the
relative advantage(from the previous equation), reflecting how much better the generated sequence is compared to the average. - The first term, , aims to maximize the likelihood of preference-aligned generations weighted by their
relative advantages. - is a coefficient that controls the strength of the
KL regularization. - is the
Kullback-Leibler (KL) divergencebetween the current policy and thereference policy(which is the pre-trained model's policy). This term regularizes the updated policy to prevent it from deviating too much from thepre-trained model, ensuring stability and avoidingcatastrophic deviation.
4.2.4. Inference and Deployment
To ensure efficiency and scalability in real-world deployment, UniSearch employs several techniques:
- Trie-based Inference: During
inference, the generation ofsemantic IDsequences is constrained by aprefix tree (Trie)structure. ThisTriestores all validsemantic ID sequencescorresponding to actual candidate items. At each decoding step, theGeneratorqueries adedicated Trie serviceto retrieve feasiblecontinuation tokens, eliminating invalidsemantic IDs. This significantly reduces computational overhead and guarantees that generated results map to valid items. TheTrieis optimized by combiningprefix-tree searchwithbinary searchto handle largesemantic IDspaces and alleviate memory overhead. - Deployment Architecture (Figure 2 (b)):
UniSearchintegrates four key components into a unified system in Kuaishou's production environment:-
Reinforcement Learning (RL) Training System: Continuously fine-tunes
UniSearchwithonline preference alignment (SPO)and synchronizes updated model parameters to theinference service. -
Online Inference System: Serves real-time search requests. Models are deployed using
TensorRT(an NVIDIA platform for high-performance deep learning inference) and accelerated bykey-value (KV) caching(a technique to store intermediate computations inTransformer decodersto speed up subsequent token generation), ensuring low-latency responses. -
Trie Server: Provides efficient
constrained generationby maintaining validsemantic ID paths. It is dynamically updated to reflect changes in the item corpus (e.g., new live streams starting/ending). -
Reward System: Evaluates candidate results using
fine-grained ranking scoresanduser interaction signals, feeding backpreference-aligned rewardsto guide furtherRL training.This end-to-end design allows for scalable, low-latency inference and continuous improvement of search quality through
online learningandpreference optimization.
-
5. Experimental Setup
5.1. Datasets
The experiments are conducted on large-scale industrial datasets derived from real search logs of the Kuaishou App.
- Live Search Dataset:
- Scale: Approximately 80 million user search sessions.
- Period: July 9 to July 23, 2025.
- Training Data: Logs from July 9 to July 22.
- Test Set: Logs from July 23 (held-out).
- Purpose: All offline experiments in Sections 4.2 and 4.3 are based on this dataset. The candidate pool for live search contains around 500K live sessions.
- Short-Video Search Dataset:
-
Purpose: Used in Section 4.4 to validate UniSearch's generalization beyond live scenarios.
-
Scale: The candidate pool for short-video search is much larger, roughly one billion videos.
These datasets are representative of real-world industrial search scenarios, providing a robust environment for evaluating the model's effectiveness and scalability. The data includes queries, user features, candidate video sets, and labeled relevance information.
-
5.2. Evaluation Metrics
The paper uses two primary evaluation metrics: Recall@300 and Mean Reciprocal Rank (MRR). Experiments are conducted on both a ranking subset (RK) and a click subset (CK).
5.2.1. Recall@300
Conceptual Definition: Recall@300 assesses the model's ability to retrieve relevant results within the top 300 generated (or retrieved) items. It focuses on the completeness of relevant item retrieval, indicating whether a significant portion of known relevant items are present among the highly ranked results.
Mathematical Formula: $ \mathrm { R e c a l l } @ 3 0 0 = \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \frac { T _ { i } } { P _ { i } } $
Symbol Explanation:
- : The total number of samples (queries) in the dataset.
- : An index iterating through each sample (query).
- : The number of true positive instances among the top 300 retrieved results for the -th sample. A
true positiveinstance is one that is both predicted by the model as relevant (within the top 300) and labeled as truly relevant. - : The total number of positive (truly relevant) instances in the dataset for the -th sample.
5.2.2. Mean Reciprocal Rank (MRR)
Conceptual Definition: MRR measures the ranking ability of the model. It's particularly useful for tasks where there is only one or a few correct answers, and the goal is to rank them as highly as possible. The Reciprocal Rank is the inverse of the rank of the first relevant item found. If the first relevant item is at position 1, the Reciprocal Rank is . If it's at position 3, it's . MRR is the average of these Reciprocal Ranks across all queries. A higher MRR indicates better ranking performance, meaning relevant items are found earlier in the ranked list.
Mathematical Formula: $ \mathrm { M R R } = \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \frac { 1 } { K _ { i } } $
Symbol Explanation:
- : The total number of samples (queries) in the dataset.
- : An index iterating through each sample (query).
- : The position (rank) of the first ground-truth relevant result for the -th sample in the ranked list of candidates. If no relevant item is found, is typically considered , making .
5.2.3. Test Subsets
- Ranking Subset (RK): This test set includes the top videos composed to users by the search system. It is used to evaluate the alignment between generated videos and system preferences, reflecting how well the model can reproduce the system's intended ranking.
- Click Subset (CK): This test set considers clicked items as positives. It demonstrates immediate user preferences and is used to evaluate how well the model aligns with actual user engagement.
5.3. Baselines
To verify the effectiveness of UniSearch, it is compared against existing generative search frameworks. These baselines are structured as two-stage pipelines, reflecting the common practices that UniSearch aims to improve upon.
The baselines are generally composed of:
-
An Embedding Model: Typically
BERT-basedvariants (e.g.,BERT-6 Layer,BERT-12 Layer,BERT-24 Layer). This model is trained to produce item embeddings. -
A Codebook Construction Method: Used to tokenize the item embeddings into discrete
semantic IDs. Methods includeFSQ[16],VQ-VAE[24], orRQ-Kmeans. -
A Generative Model: Typically
BART-basedvariants (e.g.,BART-6 Layer,BART-12 Layer,BART-24 Layer). This model is trained separately to generatesemantic IDsequences based on queries.The paper reproduces these approaches for fair comparison, with results reported in Table 1. The key point of these baselines is their reliance on multiple models and inconsistent training objectives, which
UniSearchexplicitly addresses.
5.4. Details of UniSearch Implementation
- Architecture:
- Search Generator: Instantiated using
BART[9] architecture. - Video Encoder: Instantiated using
BERT[3] architecture.
- Search Generator: Instantiated using
- Live Search Configuration (smaller scale):
- Model Size: Lightweight 6-layer
BARTandBERTmodels. - Hidden Size: 768.
- Codebook: Three-level codebook (), with each level containing 512 entries.
- Model Size: Lightweight 6-layer
- Short-Video Search Configuration (larger scale):
- Model Size: 12-layer
UniSearchconfiguration (indicating largerBARTandBERTcomponents). - Codebook: Three-level codebook (), with 8192 entries per level. This larger codebook is needed due to the candidate pool of approximately one billion videos.
- Model Size: 12-layer
- Optimizer:
Adam optimizer. - Batch Size: 64.
- Learning Rates:
- Pre-training: Initial learning rate of , scheduled with a
cosine decay strategy. - Online Post-training (SPO): Fixed learning rate of .
- Pre-training: Initial learning rate of , scheduled with a
- Inference:
Beam searchwith abeam sizeof 256 to generate high-qualitysemantic IDs. This balances computational efficiency with the ability to explore diverse candidate items.
6. Results & Analysis
6.1. Core Results Analysis
The paper presents extensive offline and online experiments to validate UniSearch's effectiveness.
6.1.1. Offline Performance Comparison
The following are the results from Table 1 of the original paper:
| Model | Recall@300(%)↑ | MRR(%)↑ | |||||
| Embed. Model | Codebook | Gen. Model | RK | CK | RK | CK | |
| Baselines | BERT-6 Layer | FSQ | BART- 6 Layer | 55.42 | 62.26 | 8.36 | 9.93 |
| BERT-6 Layer | VQ-VAE | BART- 6 Layer | 64.83 | 63.67 | 7.98 | 9.50 | |
| BERT-6 Layer | RQ-Kmeans | BART- 6 Layer | 65.56 | 64.12 | 9.24 | 14.81 | |
| BERT-12 Layer BERT-24 Layer |
RQ-Kmeans | BART-12 Layer BART-24 Layer |
67.79 69.05 |
67.90 69.27 |
12.13 13.84 |
15.68 16.02 |
|
| UniSearch | UniSearch- 6 Layer | 67.45 | 68.17 | 9.65 | 15.36 | ||
| UniSearch-12 Layer | 68.83 | 70.11 | 12.35 | 15.73 | |||
| UniSearch-24 Layer | 69.78 | 72.03 | 14.62 | 16.80 | |||
| UniSearch- 6 Layer (w/ Online SPO) | 68.29 | 68.48 | 12.19 | 16.04 | |||
| UniSearch-12 Layer (w/ Online SPO) | 69.53 | 70.25 | 13.56 | 16.71 | |||
Advantages of Unified Pre-training:
- Consistent Outperformance:
UniSearchconsistently outperforms baseline methods across all model scales and metrics (Recall@300andMRRon bothRKandCKtest sets). - Efficiency at Smaller Scale:
UniSearch-6 Layer(e.g.,MRRof 15.36% on CK) significantly surpasses comparable 6-layer baselines (e.g.,RQ-Kmeanswith 14.81% on CK) and even achievesRecall@300close to 12-layer baselines. This indicates thatUniSearch'sunified objectiveenables more efficient learning and betterquery understandingandgeneration qualityeven with fewer parameters. - Strong Scalability: As model size increases (
UniSearch-12 Layer,UniSearch-24 Layer),UniSearchcontinues to show substantial improvements over baselines of equivalent or even larger scale. For instance,UniSearch-24 Layerachieves the highestMRRof 16.80% on CK, demonstrating the framework's robustness and scalability.
Effect of Search Preference Optimization (SPO):
- MRR Boost: The
SPOpost-training phase yields notable improvements, particularly inMRR. ForUniSearch-6 Layer,MRRon RK increases from 9.65% to 12.19% (+2.54 percentage points), and on CK from 15.36% to 16.04% (+0.68 percentage points). Similar gains are observed for the 12-layer model. - Preference Alignment: This improvement in
MRRsuggests thatSPOsuccessfully guides the model to generate results that are not only relevant but also better aligned withuser preferences, prioritizing items that are more likely to be favored by users.Recall@300also sees moderate gains, butMRR's larger increase highlights the quality and ranking benefits ofSPO.
6.1.2. Online A/B Testing Results
The paper reports the results of online A/B tests conducted on Kuaishou's live-streaming and short-video search systems.
The following are the results from Table 3 of the original paper:
| UniSearch - Live | TPC ↑ | CTR ↑ | CQR ↓ | FCP ↓ |
| +3.31% | +0.202% | -0.382% | -0.107% | |
| UniSearch - Video | VPD ↑ | PVD ↑ | CQR ↓ | LPC ↑ |
| +0.213% | +0.993% | -0.602% | +0.830% |
Live-streaming Search Scenario:
- Total Play Counts (TPC): increase. This is highlighted as the most critical metric, directly reflecting search scale and platform vitality. The gain represents the most significant improvement in recent years for Kuaishou.
- Page Click-Through Rates (CTR): increase, indicating stronger user engagement.
- Change Query Rates (CQR):
-0.382%decrease, suggesting thatUniSearchgenerates more satisfactory results, leading to fewer query reformulations. - Position of First-Click (PFC):
-0.107%decrease (meaning the first click occurs higher up in the results list), indicating better result relevance and ranking.
Short-video Search Scenario:
- Video Playback Duration (VPD): improvement.
- Page View Depth (PVD): increase.
- Change Query Rates (CQR):
-0.602%reduction. - Long Play Count per PV (LPC): increase.
These metrics collectively suggest that
UniSearchnot only satisfies immediate search needs but also encourages users to explore a wider variety of content, leading to more diversified interests and deeper engagement.
6.2. Ablation Studies / Parameter Analysis
6.2.1. Effectiveness of Coarse-to-Fine Strategy and Residual Contrastive Learning
The paper conducts an ablation study to analyze the contributions of Coarse-to-Fine (CF) strategy and Residual Contrastive Learning (RCL).
The following are the results from Table 2 of the original paper:
| Method | Recall@300(%) ↑ | MRR(%) ↑ | ||
| RK | CK | RK | CK | |
| UniSearch-Plain | 63.54 | 63.15 | 6.55 | 12.16 |
| UniSearch-Plain w/ CF | 64.63 | 65.61 | 7.01 | 12.97 |
| UniSearch-Plain w/ RCL | 66.18 | 65.41 | 8.59 | 14.71 |
| UniSearch | 67.45 | 68.17 | 9.65 | 15.36 |
UniSearch-Plain: This variant treats each token independently withoutCForRCL. It shows the weakest performance across all metrics (e.g.,MRRof 12.16% on CK). This validates the need for structured learning and token diversity.UniSearch-Plain w/ CF: AddingCoarse-to-Fine(CF) strategy improves performance (e.g.,MRRof 12.97% on CK). This indicates that gradually learning from easier to more complex objectives enhances ranking capability. However, withoutRCL,token representationscan become similar, leading topath collapsing.UniSearch-Plain w/ RCL: IncorporatingResidual Contrastive Learning(RCL) provides a substantial boost, especially inMRR(14.71% on CK).RCLencourages different tokens to capturecomplementary semantic aspects, mitigatingpath collapseand significantly boosting ranking precision.- Full
UniSearch: The full model, integrating bothCFandRCL, achieves the best performance (e.g.,MRRof 15.36% on CK). TheCFstrategy provides a structured learning path, whileRCLpreservestoken diversity. Their combination leads to more effectiveend-to-end discretizationand superior overall search performance.
6.2.2. Ablation on Codebook Depth and Size
The paper analyzes the impact of codebook depth () and codebook size () on performance.
The following figure (Figure 3 from the original paper) shows the results of different Codebook Depth and Codebook Size on RK and CK test set.
该图像是图表,展示了不同 Codebook 深度 和 Codebook 大小 对 RK 和 CK 测试集的影响。图 (a) 左侧显示在 Recall@300 下的结果,右侧为 MRR 值的变化;图 (b) 同样分别展示了不同 Codebook 大小下的 Recall@300 和 MRR 数据。数据显示出各参数对性能指标的影响趋势。
-
Codebook Depth ():
- Figure 3(a) shows that when
codebook sizeis fixed at 512, performance (Recall@300andMRR) saturates once depth exceeds 3. - Deeper codebooks (longer token sequences) intuitively should capture more semantic information. However, increasing depth beyond 3 leads to diminishing returns and can increase
dispersion in the candidate space, weakening generalization. - Considering both accuracy and
inference latency, a depth of is chosen as optimal.
- Figure 3(a) shows that when
-
Codebook Size ():
- Figure 3(b) shows that when
codebook depthis fixed at , increasingcodebook sizereveals divergent trends:Recall@300decreases, whileMRRsteadily improves. - Larger codebooks encourage
finer semantic partitioning, which enhancesranking precision(higherMRR) but can reducerecallby fragmentingcandidate coverage. - For practical systems requiring a balance between
recallandprecision, acodebook sizeof is selected, offering strong performance across both metrics.
- Figure 3(b) shows that when
6.2.3. Impact of the Trie Constraint
The Trie-based constraint during inference is crucial for system reliability and efficiency.
The following figure (Figure 4 from the original paper) shows the results of incorporating the Trie on RK and CK test set, and Valid Rates of path to candidates.
该图像是一个图表,展示了在RK和CK测试集上使用Trie的UniSearch与不使用Trie的Recall@300、MRR和Valid Rate结果。结果表明,使用Trie的UniSearch在各项指标上均有明显提升。
- Improved Performance: The introduction of the
Triesignificantly improves bothRecall@300andMRRon both RK and CK test sets. For example,Recall@300on RK increases from roughly 60% to over 67%, andMRRon RK from below 8% to over 9%. - Path Validity: Critically, the
Trieincreasespath validityfrom 51.3% to 99.8%. This means nearly all generated results now map to valid candidate items, which is essential for search availability and user experience. - Efficiency: By constraining generation to valid paths, the
Trieenhances computational efficiency during inference. - Dynamic Updates: The paper emphasizes the necessity of dynamically updating the
Triein live scenarios (e.g., for live streams that start/end or change content) to maintain robustness.
6.3. Further Analysis of Online Performance
The paper provides further analysis of the online A/B testing results for the Live Search scenario, focusing on Total Play Count (TPC).
The following figure (Figure 5 from the original paper) shows the analysis of the improvements in Total Play Count (TPC) from query frequency and user type.
该图像是一个饼状图,展示了查询频率和用户类型对总播放量(TPC)改善的分析。图中显示,总播放量的改善为 3.31%,同时在查询频率中,65.06% 为长尾频率,34.94% 为高频率,而用户类型则分为新用户(23.02%)、低活跃用户(58.73%)和高活跃用户(18.25%)。
-
Enhanced Performance on Long-tail Queries: Figure 5 shows that
65.06%of theTPCimprovement comes fromlong-tail queries, nearly doubling the contribution fromhead queries. This indicatesUniSearch's ability to effectively model sparse and semantically complex queries, a significant challenge for traditional systems. Theunified pre-trainingand improvedsemantic representationslikely enable better generalization to diverse, low-frequency queries. -
Stronger Appeal to New Users:
58.73%of theTPCincrease originates fromnew users, accounting for more than half of the total gain. This suggests thatUniSearchdelivers results particularly attractive to users less familiar with the platform. This is valuable for expanding the active user base. -
Richer and More Diverse Results: Case studies, such as the
MOBA gameexample shown in Figure 6, illustrateUniSearch's ability to generate both semantically relevant and diverse results.The following figure (Figure 6 from the original paper) shows the search results comparison between the Multi-stage Cascading Architecture (MCA) and our UniSearch.
该图像是图表,展示了基于多阶段级联架构(MCA)与UniSearch的搜索结果对比。上方为MCA的搜索结果,显示了多款《王者荣耀》的相关信息;下方为UniSearch的搜索结果,其中包括《英雄联盟》和《英雄之魂》的相关信息,展示了UniSearch在搜索结果多样性和准确性上的提升。
For the query "MOBA game":
- Traditional
MCA: Consistently shows "Honor of Kings" as the top results, exhibiting a bias towards a single popular item. This can lead tocontent homogenizationand dissatisfaction for users with different preferences. UniSearch: Surfaces additional relevant games like "League of Legends" and "Heroes Evolved". This broader coverage ensures a wider range of user preferences are met, improving user satisfaction and mitigating the risk of users abandoning the platform due to lack of diverse relevant content.
7. Conclusion & Reflections
7.1. Conclusion Summary
The paper successfully introduces UniSearch, a novel unified generative search framework that rethinks the architecture of modern search systems. By replacing the traditional multi-stage cascaded architecture with a single, end-to-end generative model, UniSearch overcomes critical limitations such as objective inconsistency and high maintenance overhead. The core innovation lies in its unified pre-training framework, which jointly optimizes video encoding (including semantic ID discretization via VQ-VAE and residual contrastive learning with a coarse-to-fine strategy) and query-to-item generation. Furthermore, Search Preference Optimization (SPO), an online reinforcement learning mechanism, effectively aligns the model's outputs with real user preferences using a reward model and user feedback. Extensive offline and industrial-scale online A/B tests on Kuaishou's live and short-video search systems demonstrate UniSearch's superior performance in both retrieval quality and computational efficiency. Its deployment in live search yielded the largest single-experiment improvement in the product's recent history, particularly benefiting long-tail queries and new users by providing richer and more diverse results.
7.2. Limitations & Future Work
The authors acknowledge the following limitations and suggest future research directions:
- Candidate Generation Diversity: Currently,
UniSearchgenerates candidates in apoint-wise mannerusingbeam search, which might limit the diversity of the generated results. - Future Work Directions:
- Enhancing List-wise Generation: The authors plan to focus on improving
list-wise generationto enhance bothdiversityandranking accuracy. This would involve considering the entire list of generated items and their interrelationships rather than generating each item independently. - Finer-grained Reward Methods: Exploring more
finer-grained reward methodsin theSPOframework to further strengthen alignment with diverse and complexuser preferences. This could involve more detailed breakdowns of user interactions or context-aware reward signals.
- Enhancing List-wise Generation: The authors plan to focus on improving
7.3. Personal Insights & Critique
- Innovation of Unified Architecture: The shift from a cascaded architecture to a truly end-to-end generative model is a significant step forward in search systems. The core idea of jointly optimizing item tokenization and generation addresses a fundamental limitation in prior generative approaches. This
unified trainingapproach, especially withresidual contrastive learningandVQ-VAEwithin the same optimization loop, is elegant and powerful. It essentially forces the model to "understand" and "speak" item IDs in a semantically consistent manner from the outset, rather than trying to map a query to pre-defined, possibly suboptimal, item representations. - Robustness of SPO: The integration of
Search Preference Optimization (SPO)viareinforcement learningis crucial for practical deployment. Real-world user preferences are dynamic and often implicit. Relying solely on offline supervised signals can lead to models that perform well on static benchmarks but fail to adapt to evolving user behavior.SPOprovides a continuous feedback loop that ensures the model's output remains aligned with user satisfaction, which is a major contributor to its industrial success. The use of both system-estimated rewards and actual user interactions makes the reward signal robust. - Practicality and Scalability: The emphasis on
Trie-based inference,TensorRTacceleration, andKV-cachinghighlights the paper's strong practical focus. Many academic generative models struggle with real-time latency and scalability requirements for industrial-scale deployment.UniSearchdemonstrates that these challenges can be overcome, making the generative paradigm viable for production systems. The dynamicTrieupdates are particularly important for constantly changing content like live streams. - Transferability: The
UniSearchframework, particularly itsunified trainingandSPOcomponents, appears highly transferable. The core principles could be applied to other information retrieval domains beyond video search, such as e-commerce product search, news article retrieval, or even document search, provided suitablemulti-modal item featuresandsemantic IDtokenization can be developed. Thecross-domain generationaspect (text query to item) is generic. - Potential Areas for Improvement/Critique:
-
Diversity in
Beam Search: Whilebeam searchwith abeam sizeof 256 helps generate high-quality results, the authors acknowledge thepoint-wisenature may limit diversity. Future work onlist-wise generationis a critical next step. However, even withlist-wise generation, maintaining recall for very diverselong-tail querieswhile ensuringfine-grained rankingfor subtle differences remains a significant challenge. -
Interpretability of
Semantic IDs: Whilesemantic IDsprovide a compact representation, their direct interpretability for human understanding or debugging might be limited without a clear mapping back to human-readable concepts. Understanding why certain ID sequences are generated could be complex. -
Computational Cost of
RLTraining: WhileSPOis effective, onlinereinforcement learningcan be computationally intensive and sensitive toreward signal noiseandexploration-exploitation trade-offs. Details on the stability and convergence ofSPOin a highly dynamic environment would be interesting. -
Comparison to LLM-based Retrieval: The paper focuses on
BART/BERT-basedgenerative models. With the rapid advancement of very largeLLMs, it would be interesting to see howUniSearchcompares against or integrates withLLMsthat directly output relevant item metadata or even generate executable search queries. However,LLMsmight struggle with thediscretizationandefficiencyrequirements for large-scale production systems. -
Bias in Reward System: The
reward systemleverages the existingfine-ranking moduleanduser interaction feedback. While practical, this carries the risk of inheriting and perpetuating biases present in the legacy system or user behavior data (e.g., popularity bias, filter bubbles). Further research into debiasing strategies withinSPOcould be valuable.Overall,
UniSearchrepresents a robust and practically impactful solution for modern search systems, effectively bridging the gap between theoretical advancements in generative AI and the stringent demands of industrial deployment.
-
Similar papers
Recommended via semantic vector search.