OneLoc: Geo-Aware Generative Recommender Systems for Local Life Service
TL;DR Summary
OneLoc introduces a geo-aware generative recommender integrating geographic cues and multi-objective reinforcement learning, improving local life service recommendations and significantly boosting GMV and orders on Kuaishou.
Abstract
Local life service is a vital scenario in Kuaishou App, where video recommendation is intrinsically linked with store's location information. Thus, recommendation in our scenario is challenging because we should take into account user's interest and real-time location at the same time. In the face of such complex scenarios, end-to-end generative recommendation has emerged as a new paradigm, such as OneRec in the short video scenario, OneSug in the search scenario, and EGA in the advertising scenario. However, in local life service, an end-to-end generative recommendation model has not yet been developed as there are some key challenges to be solved. The first challenge is how to make full use of geographic information. The second challenge is how to balance multiple objectives, including user interests, the distance between user and stores, and some other business objectives. To address the challenges, we propose OneLoc. Specifically, we leverage geographic information from different perspectives: (1) geo-aware semantic ID incorporates both video and geographic information for tokenization, (2) geo-aware self-attention in the encoder leverages both video location similarity and user's real-time location, and (3) neighbor-aware prompt captures rich context information surrounding users for generation. To balance multiple objectives, we use reinforcement learning and propose two reward functions, i.e., geographic reward and GMV reward. With the above design, OneLoc achieves outstanding offline and online performance. In fact, OneLoc has been deployed in local life service of Kuaishou App. It serves 400 million active users daily, achieving 21.016% and 17.891% improvements in terms of gross merchandise value (GMV) and orders numbers.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
The central topic of this paper is OneLoc: Geo-Aware Generative Recommender Systems for Local Life Service. It introduces a novel generative recommendation framework designed specifically for local life services, integrating geographic information comprehensively and balancing multiple business objectives.
1.2. Authors
The authors of the paper are Zhipeng Wei, Kuo Cai, Junda She, Jie Chen, Minghao Chen, Yang Zeng, Qiang Luo, Wencong Zeng, Ruiming Tang, Kun Gai, and Guorui Zhou. Most authors are affiliated with Kuaishou Inc. in Beijing, China, suggesting that the research is driven by practical applications within a large internet platform. Kun Gai is listed as Unaffiliated, Beijing, China. Their collective background appears to be in recommendation systems, machine learning, and large-scale industrial applications.
1.3. Journal/Conference
The paper specifies "In Proceedings of Make sure to enter the correct conference title from your rights confirmation email (Conference acronym 'XX)." This indicates that it has been submitted to a conference, but the specific venue's name has not yet been finalized or publicly disclosed at the time of this preprint. The paper is currently available as a preprint on arXiv, which is a popular repository for pre-publication research, especially in fields like computer science, allowing for early dissemination and feedback.
1.4. Publication Year
The publication date provided is 2025-08-20T11:57:48.000Z, which suggests it is an upcoming publication or a preprint anticipating publication in 2025.
1.5. Abstract
Local life service recommendation (LLSR) within the Kuaishou App is a challenging task because it requires simultaneously considering users' interests and their real-time location. While end-to-end generative recommendation has emerged as a new paradigm in various scenarios (e.g., short video, search, advertising), it has not yet been successfully applied to local life services due to two key challenges: effectively utilizing geographic information and balancing multiple objectives like user interests, distance to stores, and business goals. To address these, the paper proposes OneLoc, a geo-aware generative recommender system. OneLoc incorporates geographic information through three main mechanisms: (1) geo-aware semantic IDs for tokenization, combining video and geographic data; (2) geo-aware self-attention in the encoder, leveraging video location similarity and user's real-time location; and (3) a neighbor-aware prompt in the decoder to capture surrounding contextual information. To balance multiple objectives, OneLoc employs reinforcement learning with two custom reward functions: a geographic reward and a GMV (Gross Merchandise Value) reward. This design enables OneLoc to achieve superior offline and online performance. It has been deployed in Kuaishou's local life service, serving 400 million daily active users and demonstrating significant improvements of 21.016% in GMV and 17.891% in order numbers.
1.6. Original Source Link
The official source link is https://arxiv.org/abs/2508.14646.
The PDF link is https://arxiv.org/pdf/2508.14646v1.pdf.
This indicates its status as an arXiv preprint.
2. Executive Summary
2.1. Background & Motivation
The core problem OneLoc aims to solve is the challenging task of local life service recommendation (LLSR) in large-scale applications like Kuaishou. In this scenario, recommending short videos related to local businesses requires simultaneously addressing two critical factors: a user's evolving interests and their real-time geographical location.
This problem is important because local life services represent a vital and high-value scenario for major internet platforms. Effective LLSR can significantly drive user engagement, consumption, and Gross Merchandise Value (GMV).
Prior research faced specific challenges and gaps:
-
Comprehensive Geographic Information Utilization: Existing recommendation models, even those for Points of Interest (POI), often utilize geographic information in limited ways, such as independent features or simple prompts. A more holistic integration across the entire recommendation pipeline, from item representation to user behavior modeling and generation, was lacking.
-
Multi-Objective Balance: Recommendation systems in real-world industrial settings must balance various, often conflicting, objectives. These include maximizing user satisfaction (based on interests), ensuring practicality (e.g., proximity to stores), and meeting business goals (e.g., GMV, order volume). Traditional methods struggle to achieve a fine-grained balance of these diverse objectives within a single, end-to-end framework. While
reinforcement learninghas been used to balance objectives in other generative recommendation paradigms, its application to the unique multi-objective landscape of LLSR was unexplored.The paper's entry point and innovative idea revolve around adapting the nascent
end-to-end generative recommendationparadigm to the specific demands of local life services. By developing a model thatgeneratesrecommendations rather than merely ranking pre-selected items, and by deeply embeddinggeo-awarenessandmulti-objective optimizationthroughout its architecture,OneLocaims to overcome these limitations.
2.2. Main Contributions / Findings
The paper makes several significant contributions:
-
Novel End-to-End Generative Framework for LLSR:
OneLocis proposed as the first end-to-end generative recommender system specifically designed for short-video local life services. It unifies a generative architecture with abusiness-value-optimized reinforcement learningmodule, marking a significant advancement over traditional cascaded recommendation models. -
Comprehensive Geo-Information Integration:
OneLocintroduces three core components to make full use of geographic information across different stages of the recommendation process:- Geo-aware Tokenizer (Geo-aware Semantic IDs): Combines geographic semantics with multi-modal video information during the item tokenization phase, creating representations that inherently carry location context.
- Geo-aware Self-Attention: Integrates geographic context into the encoder to capture user behavior sequential patterns, considering both video content and their associated locations.
- Neighbor-aware Prompt: Enhances the decoder's ability to guide recommendation generation by considering not only the user's real-time location but also richer contextual information from surrounding neighborhoods.
-
Multi-Objective Optimization via Reinforcement Learning: To balance user interests, geographic proximity, and business goals,
OneLocproposes two specialized reward functions for itsreinforcement learningphase:- Geographic Reward: Encourages the generation of recommendations for nearby stores by assigning higher rewards to closer locations.
- GMV Reward: Leverages a
GMV scoring modelto promote videos that are likely to attract higher consumption and contribute to business objectives.
-
Empirical Validation and Industrial Deployment: Extensive offline experiments on a large-scale Kuaishou industry dataset and public Foursquare datasets demonstrate the superior effectiveness of
OneLoccompared to state-of-the-art traditional and generative models. Crucially,OneLochas been successfully deployed in Kuaishou's local life service, serving 400 million daily active users.The key findings highlight
OneLoc's ability to significantly improve real-world business metrics. It achieved 21.016% improvement in Gross Merchandise Value (GMV) and 17.891% improvement in order numbers, along with increases in paying users, demonstrating its practical value and impact in a high-traffic industrial environment.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To understand OneLoc, a basic grasp of several key concepts in recommender systems, deep learning, and machine learning is essential.
- Recommender Systems (RS): Software systems that provide suggestions for items to users. Traditionally, RS follow a
matching-basedparadigm, where a large pool of items is filtered down to a smaller set (recallstage), which is then ranked by relevance (rankingstage). The new paradigm,generative recommendation, directly generates item identifiers or representations. - Local Life Service Recommendation (LLSR): A specific type of recommender system focused on local businesses and services, often tied to a user's geographical location. Examples include recommending nearby restaurants, shops, or events.
- Generative Recommendation: A paradigm shift from traditional
matching-basedorranking-basedsystems. Instead of selecting from existing items, generative models directly create or synthesize the recommendations, often in the form of discretesemantic IDsor sequences of tokens, similar to how Large Language Models (LLMs) generate text. This approach is often powered byTransformerarchitectures. - Large Language Models (LLMs): Powerful neural networks, typically based on the
Transformerarchitecture, trained on vast amounts of text data. They excel at understanding context, generating coherent text, and performing various language-related tasks. Their auto-regressive generation capabilities, where they predict the next token based on previous ones, are a key inspiration forgenerative recommendation. - Transformer Architecture: A neural network architecture introduced by Vaswani et al. (2017) that revolutionized sequence modeling. It relies heavily on
self-attentionmechanisms rather than recurrent or convolutional layers.- Encoder-Decoder Structure:
Transformersoften consist of anencoderstack and adecoderstack. Theencoderprocesses an input sequence (e.g., user historical behaviors) to produce a contextualized representation, while thedecodergenerates an output sequence (e.g., recommended item IDs) based on theencoder's output and its own previously generated tokens. - Self-Attention: A mechanism that allows the model to weigh the importance of different parts of an input sequence when processing each element. For a sequence of input vectors,
self-attentioncomputesQuery (Q),Key (K), andValue (V)vectors for each input. The attention score between two elements is calculated by the dot product of their and vectors. These scores are then normalized (e.g., usingsoftmax) and used to compute a weighted sum of vectors, which becomes the output for that element. The standard formula for scaled dot-product attention is: $ \mathrm{Attention}(Q, K, V) = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V $ whereQ, K, Vare matrices representing theQuery,Key, andValuevectors, respectively, and is the dimension of theKeyvectors, used for scaling to prevent vanishing gradients. - Cross-Attention: Similar to
self-attention, butQueryvectors come from one sequence (e.g., thedecoder's current state), whileKeyandValuevectors come from another sequence (e.g., theencoder's output). This allows thedecoderto focus on relevant parts of theencoder's output during generation. - Feed-Forward Network (FFN): A simple neural network applied independently to each position in the
Transformer's output. - RMSNorm (Root Mean Square Normalization): A normalization technique used in
Transformersto stabilize training, similar toLayerNormbut typically computationally simpler. It normalizes the sum of squares of activations.
- Encoder-Decoder Structure:
- Semantic IDs / Tokenization: In
generative recommendation, items (e.g., videos, products) are often converted into discrete tokens orsemantic IDs, much like words are tokenized in natural language processing. This allowsTransformer-based models to "generate" items by predicting sequences of these IDs.Residual K-means (res-kmeans)is a technique forvector quantizationthat maps high-dimensional continuous embeddings to sequences of discrete codes, often arranged hierarchically. - Reinforcement Learning (RL): A machine learning paradigm where an agent learns to make decisions by performing actions in an environment to maximize a cumulative reward.
- Reward Function: Defines the goal of the
RLagent by assigning numerical values (rewards) to different states or actions. Designing effective reward functions is crucial for guiding the agent's learning towards desired outcomes. - Direct Preference Optimization (DPO): A recent
RLtechnique, particularly popular for fine-tuningLLMs. Instead of explicitly training a separate reward model and then usingProximal Policy Optimization (PPO),DPOdirectly optimizes the policy by comparing preferred and dispreferred pairs of responses, leveraging a simpler objective function that implicitly aligns the model with human preferences or desired reward signals.
- Reward Function: Defines the goal of the
- Geographic Information Systems (GIS) & POI:
- GeoHash: A geocoding system that encodes a geographic location (latitude and longitude) into a short string of letters and digits. It has the property that locations closer to each other will have
GeoHasheswith longer common prefixes, making it useful for spatial indexing and proximity queries. - Point of Interest (POI): A specific point location that someone may find useful or interesting, such as a restaurant, store, landmark, or park.
POI recommendationfocuses on suggesting such locations.
- GeoHash: A geocoding system that encodes a geographic location (latitude and longitude) into a short string of letters and digits. It has the property that locations closer to each other will have
- Evaluation Metrics:
- Recall@K: Measures the proportion of relevant items that are present in the top-K recommendations.
- NDCG@K (Normalized Discounted Cumulative Gain at K): A metric that considers both the relevance of recommended items and their position in the ranked list. Higher-ranked relevant items contribute more to the score.
- GMV (Gross Merchandise Value): A business metric representing the total sales value of merchandise sold through a particular channel or platform.
- Order Numbers: A business metric representing the total count of completed purchase transactions.
3.2. Previous Works
The paper positions OneLoc within the context of advancements in POI recommendation and the emerging field of generative recommendation.
POI Recommendation:
-
TPG [12]: A
Transformer-based approach that uses target timestamps as prompts to enhance geography-aware location recommendations. This focuses on leveraging temporal context for location, but not as broadly asOneLocin combining geo-info across all stages. -
Rotan [2]: Introduces a time-aware attention mechanism where time intervals are represented as
rotational position vectorswithinTransformerarchitectures to capture temporal dynamics in user behavior sequences forPOIrecommendation. Again, a focus on time, and an independent feature. -
STAN [13]: Proposes a
spatial-temporal attention mechanismto capture relevance withinPOItrajectories for next location recommendation. This emphasizes patterns in movement. -
LLM-Mob [21], NextLocLLM [11], LLM4POI [9]: More recent works exploring
Large Language Models (LLMs)forPOIrecommendation, often by transforming prediction tasks intoin-context learningorquestion-answeringtasks, or by injecting spatial coordinates intoLLMs. These demonstrate the power ofLLMsbutOneLocaims for a more integrated, end-to-end generative approach with specific business objective alignment.The paper notes that while these
POImethods are effective, they are alldiscriminative models, meaning they predict from existing choices rather than generating.OneLocmoves into thegenerativeparadigm.
Generative Recommendation:
This is a newer paradigm, heavily influenced by the success of LLMs.
- TIGER [15]: Considered one of the first works to propose a
generative recommendation frameworkusinghierarchical semantic IDsencoded withRQ-VAE (Residual Quantized Variational AutoEncoder). This laid the groundwork for representing items as discrete tokens. - OneRec [1, 28]: Further develops the
generative recommendationparadigm for short video scenarios, utilizingreinforcement learningwith reward models to align with user preferences and industrial requirements.OneLocbuilds onOneRec'sRLframework but adapts it specifically for local life services and multi-objective geographic context. - OneSug [3]: Proposes an end-to-end generative framework specifically for e-commerce
query suggestion. - EGA [27]: Designs an end-to-end generative advertising system to address critical advertising requirements like bidding, creative selection, and ad allocation.
- GNPR-SID [17]: Migrates
semantic-ID-based generative recommendationto thePOI recommendation scenario, using geographic information to construct semantic IDs.OneLocis similar in spirit but enhances geographic integration at multiple architectural levels and adds multi-objectiveRL. - COBRA [22]: Proposes a coarse-to-fine framework that first generates
semantic IDsand then dense vectors for retrieval. - LC-Rec [26]: Aligns
collaborative filteringsignals with multiple training tasks within agenerative recommendationcontext. - LETTER [20]: Improves item tokenization by integrating hierarchical semantics, collaborative signals, and code assignment diversity.
- ActionPiece [6]: Focuses on
context-aware tokenizationof action sequences, arguing that action meaning depends on context. - EAGER [5]: Aims to align
linguistic semanticsofpre-trained LLMswithcollaborative semanticsnon-intrusively.
3.3. Technological Evolution
The field of recommender systems has evolved from basic collaborative filtering and content-based filtering to complex deep learning models (e.g., RNNs, CNNs, Transformers) that capture intricate user behaviors and item features. POI recommendation specifically integrated spatial and temporal information. The recent surge in LLMs has driven a paradigm shift towards generative recommendation, moving from matching-and-ranking to directly generating item identifiers. OneRec, OneSug, EGA, and TIGER are examples of this new wave in various domains.
OneLoc represents the next step in this evolution by applying generative recommendation to the highly specialized and complex local life service domain. It specifically addresses the nuanced requirements of this domain, which demand deep integration of geographic information and careful balancing of diverse business objectives, something that previous generative or POI models hadn't fully tackled.
3.4. Differentiation Analysis
Compared to the main methods in related work, OneLoc introduces several core innovations:
- Holistic Geo-Awareness: While some
POImodels use geographic information (e.g.,TPG,Rotan,GNPR-SID),OneLocintegrates it comprehensively and end-to-end across three distinct architectural components:- Item Representation:
Geo-aware semantic IDsembed geographic context directly into item tokens. - User Behavior Encoding:
Geo-aware self-attentionin the encoder models historical interactions with explicit consideration of location similarity and user's real-time location. - Generation Context:
Neighbor-aware promptin the decoder enriches the generation process with surrounding geographical context, not just the user's single location. This level of integration is more profound than simply using geo-coordinates as an independent feature or prompt.
- Item Representation:
- Multi-Objective Reinforcement Learning for LLSR:
OneLocexplicitly tackles the multi-objective nature ofLLSRby employingreinforcement learningwith custom-designedgeographicandGMVreward functions. This allows for a fine-grained balance between user interests (learned during pre-training), geographical practicality (proximity), and critical business metrics (GMV, orders). Previousgenerative recommendationworks likeOneRecalso useRL, butOneLoctailors the reward functions to the specificLLSRcontext. - Industrial Scale and Impact: The successful deployment of
OneLocin Kuaishou'slocal life service, serving 400 million daily active users and achieving significant improvements inGMVandorder numbers, showcases its robustness and effectiveness in a real-world, high-traffic industrial setting, surpassing existing complex cascade systems. This practical validation distinguishes it from many research papers that primarily demonstrate offline efficacy.
4. Methodology
4.1. Principles
OneLoc is an end-to-end generative recommendation model designed for local life services. Its core principles are:
-
Generative Paradigm: Instead of retrieving and ranking pre-existing items,
OneLocdirectlygeneratesrecommendations in the form ofsemantic IDs, leveraging the powerful auto-regressive capabilities inspired byLarge Language Models (LLMs). -
Comprehensive Geo-Awareness: Geographic information is crucial in local life services.
OneLocintegrates this information at multiple stages:item tokenization,user behavior encoding, andrecommendation generation. -
Multi-Objective Optimization: Real-world recommendation systems require balancing diverse objectives (user satisfaction, geographic proximity, business value).
OneLocachieves this through areinforcement learningframework, using specifically designed reward functions. -
Two-Stage Training: The model is trained in two phases: an initial
pre-trainingphase usingnext token prediction (NTP)to learn basic user preferences and item semantics, followed by apost-trainingphase usingreinforcement learningto align the model with complex, multi-faceted business objectives.The overall framework of
OneLocis depicted in Figure 2.
该图像是示意图,展示了OneLoc模型的整体架构,包括编码器-解码器架构(a)、基于地理感知的自注意力机制、邻居感知提示模块、以及基于强化学习的训练流程(b)。图中还涉及公式和,体现了模型多目标优化设计。
The image is a schematic diagram illustrating the overall architecture of the OneLoc model, including the encoder-decoder architecture (a), geospatial-aware self-attention mechanism, neighbor-aware prompt module, and the reinforcement learning training process (b). The diagram also involves formulas and , reflecting the model's multi-objective optimization design.
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. Problem Formulation
The task is defined as recommending videos in the local life service scenario.
- Let , , and represent the set of users, videos, and locations, respectively.
- Each video is associated with a location . This location is a
GeoHash blockand includes geographical coordinates, brand, and category information. - Each video is represented by a
video embedding(), alocation ID embedding(), and alocation context embedding(). - A user has a real-time location , associated with a
location context embedding(). - Each video is mapped to a
semantic ID, denoted as , where is the number of codebooks (representing hierarchical levels of the ID). - Given a user's interacted video sequence and real-time location , the objective is to predict the next video that would attract consumption. This is formulated as maximizing the probability .
4.2.2. Geo-aware Semantic IDs
To inject geographic information directly into the item representation, OneLoc employs geo-aware semantic IDs. This process follows OneRec's use of res-kmeans for tokenization but enriches the initial embeddings with geographic context.
- Initial Embedding: Unlike traditional methods that treat geographic information as an independent feature,
OneLocincorporates it into the raw video features. Each initial residual vector in the initial residual set is represented as an embedding derived from both video content andlocation context information. This combined embedding is extracted using amultimodal large language model(though specific details of this model are not elaborated in the provided text). This ensures that thesemantic IDsgenerated (geo-aware SIDs) inherently contain geographic semantics. - Residual K-means (res-kmeans): This process iteratively quantizes the video embeddings into a sequence of discrete codes.
- At each layer , a codebook is constructed by applying
K-means clusteringon the current set of residuals : Here, is the set of centroids (codes) obtained fromK-means, and is the size of the codebook (number of clusters). - For each residual in the set , the nearest centroid index is found, which becomes the -th code of the
semantic ID: This is the index of the closest codebook entry. - A new residual is then calculated by subtracting the chosen centroid from the current residual. This new residual is passed to the next layer for further quantization:
- At each layer , a codebook is constructed by applying
- Semantic ID Generation: This process iterates times, yielding codebooks. In
OneLoc, is set to 3, meaning each video is tokenized into a three-digitsemantic ID. Thus, a target video is transformed into itssemantic ID.
4.2.3. Encoder
The encoder's role is to process the user's behavioral sequence and extract useful patterns, now enhanced with geographic awareness.
4.2.3.1. Multi-behavior Sequence
To comprehensively capture user preferences, OneLoc considers not just watching sequences but also clicking and purchasing sequences.
- The multi-behavior sequence is defined as: where , , and represent the user's watching, clicking, and purchasing sequences, respectively.
- These sequences are then transformed into a comprehensive
embedding sequenceand alocation context embedding sequence: Here, is the length of the sequence .Concatdenotes the concatenation operation along the feature dimension. is the combined embedding for the -th video, generated by anMLP(Multi-Layer Perceptron) that processes the concatenatedvideo embedding(),location ID embedding(), andlocation context embedding(). specifically stores the sequence oflocation context embeddings.
4.2.3.2. Encoder Architecture
The embedding sequences (, ) along with the user's current location context () are fed into the encoder.
- The encoder consists of a stack of
Transformer blocks. - Each
Transformer blockcontains aGA-Attn(Geo-aware Self-attention) module and aFFN(Feed-Forward Network) module, withRMSNormapplied for normalization. - The computation within each layer of the encoder is formally defined as:
where is the initial input embedding sequence. is the output of the -th encoder layer. represents the intermediate result after the
GA-Attnmodule. is thecontext embeddingof the user's real-time location .GA-Attnis the core module designed to capture user behavior patterns while being aware of geographic information.
4.2.3.3. Geo-aware Self-attention (GA-Attn)
The GA-Attn module is a novel component that adapts the standard self-attention mechanism to integrate geographic context. Its goal is to identify relevant behaviors from a user's interaction history based on their real-time location.
- Attention Score Calculation: The attention score is composed of two parts:
- Comprehensive Similarity: This uses the comprehensive embedding sequence (containing video content, location ID, and location context) to calculate
QueriesandKeys, similar to standardself-attention. - Location Context Similarity: This explicitly uses the
location context embedding sequenceto further enhance the geographical semantics in the attention scores. The combined attention calculation is: Here, are the weight matrices forQuery,Key,Value, and output, respectively. refers to , the dimension of theKeyvectors for scaling. The term adds a direct measure oflocation context similarityto the attention scores, allowing the model to give higher attention to videos with similar location contexts. is the output of this attention mechanism.
- Comprehensive Similarity: This uses the comprehensive embedding sequence (containing video content, location ID, and location context) to calculate
- User Real-time Location Gating: To dynamically inject the user's real-time location information, a gating mechanism is applied to the attention output. This gate scales the attention output based on the similarity between the user's current location context and the location context of each video in the historical sequence.
where is the -th row of (representing the location context of the -th video in the sequence). The
MLPprocesses the concatenation of the user'slocation context embedding() and the individual video'slocation context embedding. ASigmoidactivation function, scaled by 2, produces a gating parameter , which then scales the -th row of the attention output to yield the final output . This allows the user's real-time location to dynamically influence the relevance of historical interactions.
4.2.4. Decoder
The decoder's function is to generate the semantic IDs of recommended videos, utilizing the encoder's output and a specialized prompt that incorporates rich geographic context.
4.2.4.1. Neighbor-aware Prompt
Recognizing the importance of local context, OneLoc introduces a neighbor-aware prompt to model the rich context surrounding a user's real-time location.
- Surrounding Location Information: Given a user's real-time geographic location , the system calculates its surrounding geographic locations (e.g., ). For these locations, their respective
context information(e.g., brands, bestselling products within those areas) is obtained. - Context Embeddings: This yields the user's
location context embeddingand a set ofcontext embeddingsfor surrounding locations . - Cross-Attention for Prompt: A
cross-attentionmechanism is used to aggregate this surrounding information into a singleneighbor-aware prompt embedding. The user'slocation context embeddingacts as theQuery, while the surroundinglocation context embeddingsserve as bothKeysandValues. Here, is the resultingembedding of the neighbor-aware prompt, effectively summarizing the relevant context from the user's neighborhood. This embedding will guide the decoder's generation process.
4.2.4.2. Decoder Architecture
The decoder is also structured as a stack of blocks, each comprising a casual self-attention module, a cross-attention module, and a feed-forward network (FFN) module, with RMSNorm applied.
- Input: The initial input to the decoder () consists of the
neighbor-aware prompt embedding() and the embeddings of the previously generatedsemantic IDdigits (e.g., ). This allows the decoder to generate thesemantic IDof the target videoauto-regressively. - Block Computation: The calculation within each layer of the decoder is formally defined as:
Here, is the output of the -th decoder layer. is the intermediate result after
casual self-attention(which only attends to previous tokens in the output sequence). is the result aftercross-attention, where thedecoderqueries theencoder'sfinal hidden state (represented as ) to get context from the user's historical behavior.SelfAttnrefers to thecasual self-attentionmodule, andFFNis thefeed-forward network. , , and are the embeddings of the three-digitsemantic IDof the target video, which are input sequentially to generate the next digit.
4.2.4.3. Next Token Prediction (NTP) Loss
The output of the final decoder layer is used to predict the next semantic ID token.
- Specifically, the embedding (corresponding to the
geo-aware prompt) is used to predict the first digit . - The embedding (corresponding to the embedding of ) is used to predict the second digit , and so on.
- The training loss during the
pre-trainingphase is across-entropy lossover these predictions: Here, is the predicted probability distribution for the -th digit of the targetsemantic ID(where is the codebook size). is the predicted probability of the true -th digit . ThisNTPloss encourages the model to accurately generate thesemantic IDof the observed next video. - After
pre-training, the model parameters define a probability distribution oversemantic IDsgiven the user's sequence and location: This formula shows theauto-regressivenature of the generation, where each subsequent digit's probability depends on the previous ones.
4.2.5. Reinforcement Learning
The pre-training phase primarily learns to predict videos exposed by existing systems. However, these exposed videos might not perfectly balance all desired objectives (user interest, distance, business value). To address this, OneLoc incorporates reinforcement learning to fine-tune the model with specific reward signals, using Direct Preference Optimization (DPO).
4.2.5.1. Reward Signals
Two distinct reward functions are designed to guide the RL process:
- Geographic Reward (): This reward encourages the model to recommend videos corresponding to stores that are geographically closer to the user. A closer location yields a higher reward. Here, calculates the distance between the location of video and the user's location . is a predefined distance threshold. If the distance exceeds , the reward is 0. Otherwise, it's inversely proportional to the distance, meaning closer videos get higher rewards.
- GMV Reward (): This reward aims to promote the generation of videos that are likely to lead to actual consumption and high
Gross Merchandise Value (GMV). represents the output of a separate, pre-existingGMV scoring model. This model evaluates the potentialGMVgenerated by recommending video to a user with sequence and location .
4.2.5.2. Direct Preference Optimization (DPO)
DPO is used to align the model with these reward signals without explicit reward model training or complex PPO algorithms.
- Preference Pair Construction:
- For each sample (user sequence , user location ), the
pre-trained modelis used to sample different videos viabeam search.Beam searchis a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set. Here, it finds sequences ofsemantic IDswith high probability according to the model. selects the videos (represented by theirsemantic IDs) with the highest probabilities from the model's output distribution. is the set of generated results. - For each of these generated videos, the combined reward is calculated (e.g., by summing and ).
- From these videos, a
positive sample(the video with the highest combined reward) and anegative sample(the video with the lowest combined reward) are selected. These form a preference pair .
- For each sample (user sequence , user location ), the
DPOLoss: These preference pairs are then used to train a new model with parameters , initialized from (the current model parameters). TheDPOloss corresponding to each preference pair is: Here, and are thesemantic IDsof the positive and negative videos, respectively. is the probability of thesemantic IDaccording to the current model being trained, and is the probability from the reference model (the previous iteration's model or thepre-trainedmodel), which serves to prevent the policy from diverging too far from the initial distribution. is a hyperparameter controlling the strength of theDPOregularization. TheDPOloss directly optimizes the policy to assign higher probabilities to preferred outcomes () compared to dispreferred ones (), thereby aligning the model's generation with the desired reward signals.- Final Training Loss: The overall training loss during the
reinforcement learningphase combines the originalNext Token Prediction (NTP)loss () with theDPOloss (): Here, is a hyperparameter that controls the weighting between theNTPpre-training objective and theDPOfine-tuning objective. This combined loss ensures that the model maintains its general generation capabilities while simultaneously optimizing for the specific geographic and business objectives.
5. Experimental Setup
5.1. Datasets
The experiments for OneLoc were conducted on both an internal, large-scale industrial dataset from Kuaishou and public Foursquare datasets.
- KuaiLLSR Dataset:
- Source & Characteristics: This dataset is constructed from Kuaishou's internal behavioral data of users interacting with short videos related to local life services. It's a real-world, high-volume dataset. Each video in this context has associated store location information.
- Scale: Contains 60 million users, 900K items (videos), and 440 million interactions.
- Data Structure: The data records user interaction sequences, with an average user sequence length of approximately 200.
- Split:
OneLocis trained in a streaming setup. The first 7 days' data are used for model training, and the final day's data is reserved for evaluation.
- Foursquare Dataset (Public):
-
Source & Characteristics: This public dataset comprises user check-in records for various Points of Interest (POIs) across multiple cities. Each record includes the user ID, POI ID, geographical coordinates, and a timestamp. It is commonly used for
POI recommendationresearch. -
Preprocessing: The dataset is pre-processed using
LibCity's standardized pipeline [19] to filter users andPOIsand sort check-ins chronologically to construct interaction sequences. -
Subsets: Experiments are conducted on two subsets:
NYC(New York City) andTKY(Tokyo), which exhibit different user densities andPOIdistributions, allowing for evaluation under diverse spatial-temporal patterns. -
Split: For these public datasets, interaction sequences are split into 80% for training, 10% for validation, and 10% for testing.
The following are the results from Table 2 of the original paper:
KuaiLLSR NYC TKY #users 60 M 1,042 2,187 #items 900 K 36,359 59,472 #interactions 440 M 212,347 545,301
-
Why these datasets were chosen: KuaiLLSR provides a realistic, large-scale industrial setting with actual local life service video recommendations, crucial for validating the model's performance in its target application. The Foursquare datasets (NYC, TKY) offer publicly available benchmarks for location-based recommendations, allowing for comparison with other research and demonstrating the model's generalizability across different geographical and behavioral contexts.
5.2. Evaluation Metrics
For evaluating the performance of OneLoc, both offline and online metrics were used.
- Offline Metrics:
- Recall@K: Measures the proportion of relevant items successfully retrieved among the top K recommendations. It indicates how well the recommender system can find all relevant items for a user.
- Conceptual Definition: Recall@K quantifies the ability of a recommendation system to suggest items that are truly relevant to the user's future interactions within a top-K list. It focuses on the completeness of the recommendations, i.e., how many of the "good" items were actually shown.
- Mathematical Formula:
- Symbol Explanation:
- : The set of all users in the test set.
- : The total number of users.
- : The set of top-K items recommended to user .
- : The set of actual relevant items for user in the test set.
- : The number of relevant items that are also in the top-K recommendations.
- : The total number of relevant items for user .
- NDCG@K (Normalized Discounted Cumulative Gain at K): A metric that takes into account not only the presence of relevant items but also their position in the ranked list and their degree of relevance. Highly relevant items ranked higher contribute more to the score.
- Conceptual Definition: NDCG@K evaluates the quality of a ranked recommendation list. It assigns higher scores to relevant items that appear earlier in the list and also accounts for different degrees of relevance, where more relevant items contribute more to the cumulative gain. The "Normalized" part ensures comparability across different recommendation lists.
- Mathematical Formula: First, calculate the
Discounted Cumulative Gain (DCG@K): Then,NDCG@Kis calculated by normalizingDCG@Kby theIdeal DCG (IDCG@K): - Symbol Explanation:
- : The number of top recommendations being considered.
- : The relevance score of the item at position in the recommended list. (In many implicit feedback scenarios, relevance is binary: 1 if the item is interacted with, 0 otherwise).
- : A logarithmic discount factor, penalizing relevant items that appear later in the list.
- : The
DCGof the ideal ranking, where all relevant items are ranked at the top in decreasing order of their relevance. This serves as the maximum possibleDCGfor a given query.
- Recall@K: Measures the proportion of relevant items successfully retrieved among the top K recommendations. It indicates how well the recommender system can find all relevant items for a user.
- Online Metrics (A/B Test):
- GMV (Gross Merchandise Value): The total monetary value of all goods or services sold over a given period through the platform. This is a primary business objective.
- Order Numbers: The total count of unique purchase transactions completed by users. Another key business objective reflecting conversion.
- Number of Paying Users in Local Services: The count of unique users who made a purchase in local services.
- New Paying Users in Local Services: The count of unique users who made their first purchase in local services during the experiment period.
5.3. Baselines
OneLoc was compared against a diverse set of baseline models categorized into traditional recommender models, POI related models, and other generative models.
- Traditional Recommender Models: These models typically focus on sequential patterns in user behavior.
SASRec[7]:Self-Attentive Sequential Recommendation. Employs aunidirectional Transformer encoderto model user preferences by allowing each item in a sequence to attend to all preceding items.BERT4Rec[16]:Sequential recommendation with bidirectional encoder representations from transformer. LeveragesBERT's masked language modeling objective to learnbidirectional Transformerrepresentations for user behavior sequences.GRU4Rec[4]:Session-based recommendations with recurrent neural networks. UsesGated Recurrent Units (GRUs)to capture temporal dependencies in user interaction sequences, particularly effective forsession-based recommendation.S3-Rec[29]:Self-supervised learning for sequential recommendation with mutual information maximization. Enhances sequential recommendation by miningself-supervised learningsignals throughmutual information maximizationwithin user behavior sequences.
- POI Related Models: These models specifically address location-based recommendations.
TPG[12]:Timestamps as prompts for geography-aware location recommendation. ATransformer-based approach that uses target timestamps as prompts to informgeography-aware location recommendations.Rotan[2]:A rotation-based temporal attention network for time-specific next poi recommendation. Encodes time intervals usingrotational position vector representationswithinTransformerarchitectures to capture temporal dynamics fornext POI recommendation.
- Generative Models: Other models that adopt the generative paradigm.
TIGER[15]:Recommender systems with generative retrieval. One of the pioneering works ingenerative recommendation, usingcodebook-based semantic quantization(viaRQ-VAE) to represent items as discrete code sequences.GNPR-SID[17]:Generative Next POI Recommendation with Semantic ID. TransformsPOIinformation into discretesemantic identifiersand uses agenerative approachfornext-POI prediction. This is the closest generative baseline in thePOIdomain.
5.4. Implement Details
The implementation details provide insight into the specific configurations and technologies used for training and inference.
- Optimizer:
AdamWwas used, which is anAdamoptimizer variant with decoupled weight decay regularization.- Initial learning rate:
- Weight decay: 0.1
- Hardware: Training was performed on
NVIDIA A800 GPUs, which are high-performance accelerators suitable for large-scale deep learning. - Semantic ID Configuration:
K-meansclusters: Each codebook layer uses clusters.- Number of codebook layers (): , meaning each
semantic IDis a three-digit code.
- Model Architecture:
- Encoder & Decoder layers: Both the encoder and decoder stack
4 blocks. - Hidden units:
1024. - Attention heads:
8. - Feed-forward network (FFN) dimension:
4096.
- Encoder & Decoder layers: Both the encoder and decoder stack
- Sequence Lengths: The maximum lengths for different user behavior sequences were set to:
- Watch sequence:
256 - Click sequence:
32 - Pay sequence:
10
- Watch sequence:
- DPO Loss Weight: The hyperparameter for weighting the
DPOloss in the final training objective was set to0.05. - Inference Optimization: For online deployment, several techniques were employed to optimize inference performance:
mixed-precision computation,KV cache(for efficientTransformerdecoding),dynamic batching, andTensorRT accelerationonNVIDIA A10 GPUs. These optimizations achieved25% Model FLOPs Utilization (MFU), indicating efficient hardware utilization.
6. Results & Analysis
6.1. Core Results Analysis (RQ1: Overall Performance)
The paper presents extensive experimental results to demonstrate the effectiveness of OneLoc compared to state-of-the-art methods across various datasets.
The following are the results from Table 1 of the original paper:
| Dataset | Metric | Traditional | POI Related | Generative | Improvement | |||||||
| SASRec | BERT4Rec | GRU4Rec | Caser | S3-Rec | TPG | Rotan | TIGER | GNPR-SID | Ours | |||
| KuaiLLSR | Recall@5 | 0.0927 | 0.0682 | 0.0350 | 0.0438 | 0.1218 | 0.1750 | 0.2185 | 0.2832 | 0.3142 | 0.3565* | 13.46% |
| Recall@10 | 0.1336 | 0.1071 | 0.0602 | 0.0712 | 0.1889 | 0.2559 | 0.2843 | 0.3637 | 0.4207 | 0.4563* | 8.46% | |
| Recall@20 | 0.2048 | 0.1623 | 0.1090 | 0.1147 | 0.2808 | 0.3241 | 0.3563 | 0.4413 | 0.5056 | 0.5584* | 10.44% | |
| NDCG@5 | 0.0408 | 0.0311 | 0.0178 | 0.0221 | 0.0793 | 0.0897 | 0.1029 | 0.1500 | 0.1775 | 0.2032* | 14.47% | |
| NDCG@10 | 0.0523 | 0.0338 | 0.0255 | 0.0266 | 0.0971 | 0.1018 | 0.1147 | 0.1584 | 0.1874 | 0.2114* | 12.81% | |
| NDCG@20 | 0.0705 | 0.0565 | 0.0360 | 0.0369 | 0.1143 | 0.1117 | 0.1266 | 0.1615 | 0.1904 | 0.2151* | 12.97% | |
| NYC | Recall@5 | 0.3151 | 0.2857 | 0.1977 | 0.2883 | 0.3071 | 0.3551 | 0.4448 | 0.4965 | 0.5311 | 0.6107* | 14.98% |
| Recall@10 | 0.3896 | 0.3564 | 0.2460 | 0.3570 | 0.3854 | 0.4441 | 0.5223 | 0.5514 | 0.5942 | 0.6563* | 10.45% | |
| Recall@20 | 0.4506 | 0.4130 | 0.2889 | 0.4135 | 0.4503 | 0.5121 | 0.5834 | 0.6001 | 0.6455 | 0.6977* | 8.09% | |
| NDCG@5 | 0.2224 | 0.2074 | 0.1442 | 0.2044 | 0.2235 | 0.2464 | 0.3471 | 0.4131 | 0.4430 | 0.5355* | 20.88% | |
| NDCG@10 | 0.2467 | 0.2304 | 0.1599 | 0.2267 | 0.2489 | 0.2755 | 0.3723 | 0.4276 | 0.4634 | 0.5504* | 18.77% | |
| NDCG@20 | 0.2622 | 0.2448 | 0.1708 | 0.2410 | 0.2654 | 0.2927 | 0.3878 | 0.4443 | 0.4766 | 0.5608* | 17.66% | |
| TKY | Recall@5 | 0.3450 | 0.2649 | 0.2514 | 0.3257 | 0.3365 | 0.3725 | 0.4333 | 0.5031 | 0.5354 | 0.5964* | 11.39% |
| Recall@10 | 0.4284 | 0.3326 | 0.3106 | 0.4067 | 0.4115 | 0.4601 | 0.5113 | 0.5808 | 0.6130 | 0.6620* | 7.99% | |
| Recall@20 | 0.4976 | 0.3943 | 0.3651 | 0.4758 | 0.4739 | 0.5291 | 0.5894 | 0.6431 | 0.6675 | 0.7152* | 7.15% | |
| NDCG@5 | 0.2384 | 0.1907 | 0.1833 | 0.2273 | 0.2423 | 0.2591 | 0.3293 | 0.4003 | 0.4437 | 0.4961* | 11.81% | |
| NDCG@10 | 0.2655 | 0.2127 | 0.2025 | 0.2535 | 0.2666 | 0.2881 | 0.3568 | 0.4251 | 0.4623 | 0.5174* | 11.92% | |
| NDCG@20 | 0.2831 | 0.2284 | 0.2163 | 0.2711 | 0.2825 | 0.3051 | 0.3739 | 0.4401 | 0.4788 | 0.5306* | 10.82% | |
Observations from Offline Performance (Table 1):
- Superiority of
OneLoc:OneLoc(labeled "Ours") consistently achieves the best performance across allRecall@KandNDCG@Kmetrics on all three datasets (KuaiLLSR,NYC,TKY). The asterisk*indicates statistical significance ().- On the
KuaiLLSRdataset,OneLocshows improvements over the second-best baseline (GNPR-SID) of 13.46% inRecall@5, 8.46% inRecall@10, 10.44% inRecall@20, 14.47% inNDCG@5, 12.81% inNDCG@10, and 12.97% inNDCG@20. These are substantial gains for an industrial-scale dataset. - Similar, strong improvements are observed on the public Foursquare datasets (
NYCandTKY), with average boosts of 13.18% inRecall@5and 16.34% inNDCG@5. - This consistent outperformance validates the effectiveness of
OneLoc's integrated geographic generative architecture and its ability to combine user preferences with real-time spatial context.
- On the
- Generative Models Outperform Traditional Models: A clear trend is that all
generative methods(TIGER,GNPR-SID, andOneLoc) significantly outperform thetraditional recommender models(SASRec,BERT4Rec,GRU4Rec,Caser,S3-Rec). The paper highlights an improvement of more than 29% inRecall@5and over 45% inNDCG@5for generative methods compared to traditional ones. This underscores the power of the generative paradigm, likely due to its comprehensivesemantic expressionanddeep reasoning capabilities, especially when compared torepresentation learningandANN retrieval-based traditional recall solutions. - Impact of Geo-Aware Generative Design: Among the generative models,
OneLocsurpassesTIGERandGNPR-SID.GNPR-SIDis alsogeo-awareandgenerative, butOneLoc's more comprehensive integration of geographic information (viageo-aware SIDs,geo-aware self-attention, andneighbor-aware prompts) and itsmulti-objective RLapproach contribute to its superior performance.
6.2. Ablation Studies / Parameter Analysis (RQ2 & RQ3)
6.2.1. Ablation Study: Neighbor-aware Prompt
This study investigates the importance of the neighbor-aware prompt component.
The following figure (Figure 4 from the original paper) shows the ablation study of different prompt techniques:

该图像是图表,展示了不同提示技术在召回率(Recall)和归一化折扣累计增益(NDCG)指标下的消融实验结果。图中显示,Neighbor-aware prompt在各个评价指标上均显著优于Point-wise prompt和Neighbor-MLP prompt。
Figure 4: Ablation study of different prompt techniques. The result shows that Neighbor-aware prompt perform significantly better compared to Point-wise prompt and Neighbormlp prompt.
Observations:
- Effectiveness of Surrounding Context: When the
neighbor-aware promptis replaced with apoint-wise prompt(which only uses the user's current location context, ignoring surrounding context), there's a performance decline across allRecallandNDCGmetrics. This confirms that incorporating surrounding geographical context (e.g., surrounding brands, products) is beneficial for recommendation quality. - Necessity of Cross-Attention: Replacing the
cross-attentionmechanism in theneighbor-aware promptwith a simplerMLP(Multi-Layer Perceptron), termedneighbor-mlp prompt, leads to a sharp performance drop. This indicates that while surrounding context is useful, a sophisticated mechanism likecross-attentionis essential to effectively aggregate and filter information from this context. A simpleMLPmight struggle to discern useful signals from potential noise in aggregated surrounding embeddings.
6.2.2. Ablation Study: Geo-aware Self-attention
This study examines the contributions of the key designs within the geo-aware self-attention module.
The following are the results from Table 3 of the original paper:
| Method | Recall | NDCG | ||||
| @5 | @10 | @20 | @5 | @10 | @20 | |
| Full Model | 0.3565 | 0.4563 | 0.5584 | 0.2032 | 0.2114 | 0.2151 |
| w/o Location Scores | 0.3476 | 0.4439 | 0.5229 | 0.1758 | 0.1847 | 0.1884 |
| w/o Location Gate | 0.3501 | 0.4489 | 0.5295 | 0.1810 | 0.1914 | 0.1950 |
| w/o Geo-aware Self-attention | 0.3315 | 0.4261 | 0.4989 | 0.1552 | 0.1640 | 0.1673 |
Observations:
- Importance of
Location Scores: Removing thelocation context similarityterm () from the attention score calculation (w/o Location Scores) leads to a notable performance decrease across allRecallandNDCGmetrics (e.g.,NDCG@5drops from 0.2032 to 0.1758). This demonstrates the effectiveness of explicitly incorporating geographical proximity into how the model weighs historical interactions. - Role of
Location Gate: Disabling thelocation gate(which scales attention output based on user's real-time location context) (w/o Location Gate) also results in performance drops (e.g.,NDCG@5drops to 0.1810). This highlights the importance of dynamically adjusting the relevance of historical behaviors based on the user's current geographical context. - Overall Impact of
Geo-aware Self-attention: Replacing the entiregeo-aware self-attentionmodule with avanilla self-attention(w/o Geo-aware Self-attention) leads to the most significant performance degradation (e.g.,NDCG@5drops to 0.1552). This combined effect underscores the critical role of theGA-Attnmodule in capturing relevant user behavior patterns with deep geographic integration.
6.2.3. Ablation Study: Reward Signals
This study evaluates the impact of the specifically designed reward functions in the reinforcement learning phase.
The following figure (Figure 5 from the original paper) shows the comparative analysis of Recall, NDCG, and GMV metrics across varied reward signals:

该图像是图表,展示了基于不同奖励信号(全模型、无GMV奖励、无地理奖励)下的Recall、NDCG和GMV指标对比分析,横轴分别为不同模型设置,纵轴对应各指标数值变化。
Figure 5: Conduct a comparative analysis of Recall, NDCG, and GMV metrics across varied reward signals.
Observations:
- Geographic Reward: Removing the
geographic reward(w/o geographic reward) results in a decrease inRecallandNDCG. This suggests that thegeographic rewardsuccessfully forces the model to prioritize distance as a factor, even if it might slightly diverge from pure user preference, leading to a boost in the overall relevance (as reflected inRecall/NDCGwhich implicitly capture user satisfaction/relevance). This is crucial for local services where proximity is paramount. - GMV Reward: Removing the
GMV reward(w/o GMV reward) leads to a decrease not only in theGMVobjective (as expected) but also in theRecallobjective. This indicates that theGMV rewardis effective not only in directly optimizing business value but also in indirectly improving the overall quality of recommendations, as higher-value items are often also highly relevant to users.
6.2.4. Hyperparameter Experiments (RQ3)
This section explores how different hyperparameters affect OneLoc's performance.
The following figure (Figure 6 from the original paper) shows the impact of model parameters, sequence length, and DPO loss weight on Recall and NDCG metrics:

该图像是图表,展示了模型参数规模、序列长度及强化学习损失权重λ对Recall和NDCG指标在不同@k值上的影响。
Figure 6: Hyperparameters include model parameters, sequence length and DPO loss weight λ.
Observations:
- Scaling Laws for Model Size and Sequence Length:
- Model Size: As the model size increases from
0.05B(billion parameters) to0.1Band then to0.3B,OneLocconsistently shows improved performance in bothRecallandNDCG. Specifically, scaling from0.05Bto0.3Byields an average improvement of 6.96% inRecalland 7.29% inNDCG. This demonstrates a clearscaling law, where larger models (within the tested range) lead to better recommendations. - Sequence Length: Similarly, increasing the maximum
sequence lengthof user interactions from100to300also results in average improvements of 13.02% inRecalland 51.24% inNDCG. This indicates thatOneLoceffectively leverages longer historical behavioral sequences to capture richer user preferences and patterns.
- Model Size: As the model size increases from
- Sensitivity of
DPOLoss Weight ():- The
DPOloss weight is a sensitive hyperparameter. Setting performs significantly worse than . - When , both
Recall@10andRecall@20are significantly lower compared to . However,NDCGmetrics show superior performance at . This presents a trade-off, where a slightly lower might improve ranking quality (NDCG) but reduce the overall hit rate (Recall). - After a comprehensive evaluation of these trade-offs, was ultimately chosen, suggesting it provides the best overall balance of objectives.
- The
6.3. Online A/B Test
To validate the real-world impact of OneLoc, an online A/B test was conducted on Kuaishou's primary short-video recommendation scenario.
The following are the results from Table 4 of the original paper:
| Online Metrics | OneLoc |
| GMV | +21.016% |
| Number of Orders | +17.891% |
| Number of Paying Users in Local Services | +18.585% |
| New Paying Users in Local Services | +23.027% |
Observations from Online A/B Test (Table 4):
-
Deployment Scale:
OneLocwas deployed on10%of Kuaishou's traffic, impacting a system serving over400 million daily active users. The control group used the existing production-levelmulti-stage recommendation pipeline(includingmulti-channel retrieval,coarse/fine ranking,link-specific refinements). -
Significant Business Impact:
OneLocachieved statistically significant improvements ( under two-tailed t-tests at ) in crucial business metrics:GMVincreased by .Number of Ordersincreased by .Number of Paying Users in Local Servicesincreased by .New Paying Users in Local Servicesincreased by .
-
Superiority over Complex Cascade Systems: These results unequivocally demonstrate that
OneLoc, despite being an end-to-end generative model, can surpass the performance of sophisticated, hand-tunedcascade systemsin a real-world, high-traffic industrial environment. The substantial gains inNew Paying Usersare particularly impressive, indicatingOneLoc's effectiveness in addressingcold-startanddata sparsity challengesoften associated with local commerce.The successful online deployment and the significant, positive impact on core business metrics solidify
OneLoc's position as a highly effective and practically valuable solution for local life service recommendation.
The following figure (Figure 3 from the original paper) shows the framework of system deployment:

该图像是图3,系统部署框架示意图,展示了OneLoc在线系统和离线系统的交互流程,包括请求、推荐、日志收集、参数更新和奖励系统反馈等核心模块及其关系。
Figure 3: Framework of System Deployment
System Deployment Architecture (Figure 3): The deployed system involves several components:
-
Trainer: This component handles the offline training. It collects
user logsforstreaming training, interacts with theReward Systemto score items and constructpositive/negative sample pairsforRL training, and periodically updates parameters to theInference Server. -
OneLoc Inference Server: This server processes user requests. It converts
user featuresandreal-time geographic locationsintouser tokensandgeo prompts. These are then fed into theOneLocmodel to generatesemantic tokensviabeam search. -
Video Mapping Server: This acts as a storage service that maps the generated
semantic tokensback to actualvideo IDs. -
Reward System: This system plays a dual role. During training, it provides
rewardsto theTrainer. During inference, it processes thecandidate videosfrom theVideo Mapping Serverby estimatingGMV scoresand applyingrule-based filtering. TheTopKresults are then recommended to users.This architecture illustrates a typical industrial deployment of a large-scale recommendation system, where offline training, online inference, and dedicated services for item mapping and reward calculation interact to deliver recommendations.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper introduces OneLoc, a novel end-to-end generative recommender system specifically tailored for the short-video local life service scenario within the Kuaishou App. OneLoc addresses the dual challenges of comprehensively integrating geographic information and balancing multiple business objectives.
Its key innovations include:
-
A
geo-aware tokenizerthat generatessemantic IDscombining video content and geographic semantics. -
A
geo-aware self-attentionmechanism in the encoder, which models user behavioral sequences by considering both content and location similarity, as well as the user's real-time location. -
A
neighbor-aware promptin the decoder, leveragingcross-attentionto incorporate rich contextual information from the user's surrounding geographical area. -
A
reinforcement learningframework that uses two specialized reward functions –geographic rewardandGMV reward– to explicitly optimize for proximity and business value, alongside user interests.Extensive offline experiments demonstrated
OneLoc's superior performance over traditional and existinggenerative recommendationbaselines on both industrial and public datasets. Crucially,OneLochas been successfully deployed in Kuaishou's production environment, serving 400 million daily active users and achieving significant real-world improvements: a 21.016% increase in GMV and a 17.891% increase in order numbers. This industrial success validates the model's effectiveness and robustness in a complex, high-traffic application.
7.2. Limitations & Future Work
The authors acknowledge several promising directions for future research:
- Scaling Laws: Further exploration into the
scaling lawsofmodel sizeandsequence lengthin this specific domain could yield deeper insights into how performance continues to improve with larger models and more extensive historical data. - Advanced RL Methods: Investigating more advanced
reinforcement learningmethods specifically adapted to the nuances oflocal life servicescould lead to even finer-grained optimization of diverse objectives. This might involve more complex reward shaping, multi-agentRL, or dynamic weighting of rewards.
7.3. Personal Insights & Critique
OneLoc presents a compelling and thoroughly validated solution to a critical problem in the realm of local life service recommendation. Its strength lies in the comprehensive and multi-faceted integration of geographical information into a generative recommendation framework, explicitly addressing the unique requirements of this domain.
Inspirations and Applications:
- Deep Geo-Integration: The three-pronged approach to injecting
geo-awareness(tokenization, encoder, decoder) is a significant architectural contribution. This methodology could be highly transferable to other location-sensitive recommendation tasks, such as travel recommendations, real estate, event discovery, or even supply chain optimization, where spatial context is paramount. - Multi-Objective RL for Real-World Impact: The successful application of
DPOwith customgeographicandGMVrewards highlights a practical pathway for aligninggenerative modelswith complex, often conflicting, business and user-centric objectives. This approach could inspire similarRLstrategies in other industrialLLM-based applications where simplenext token predictionis insufficient. - Generative Paradigm Validation: The impressive online
A/B testresults provide strong empirical evidence for the superiority of thegenerative recommendationparadigm over traditionalcascade systems, particularly in complex, high-stakes environments. This reinforces the broader trend towardsLLM-inspired generative architectures in recommendation.
Potential Issues, Unverified Assumptions, or Areas for Improvement:
-
Proprietary Nature of
GMV Scoring Model: TheGMV rewardrelies on a separateGMV scoring model. The details of this model are not provided, and its effectiveness is assumed. In a purely academic context, the design and robustness of thisGMV modelwould warrant further scrutiny or open-sourcing for reproducibility. -
Dependency on High-Quality Geographic Context: The effectiveness of
geo-aware semantic IDs,geo-aware self-attention, andneighbor-aware promptsheavily relies on the quality and richness of the underlyinglocation context information(e.g., brand, category data withinGeoHash blocks). If this context is sparse or inaccurate, the benefits of these modules might diminish. -
Scalability for Extremely Sparse Areas: While
OneLochandlescold-startusers well, theneighbor-aware promptmight be less effective in extremely rural or data-sparse geographical areas where surrounding context information is minimal. -
Ethical Considerations: The use of real-time user location data and extensive behavioral sequences, while crucial for effectiveness, raises privacy and ethical considerations. The paper, like many industry-led ones, doesn't delve into these aspects, but they are important for broader discussions.
-
Preprint Status: The "Published at (UTC): 2025-08-20T11:57:48.000Z" and "Conference acronym 'XX" indicates that this is an arXiv preprint. While the results are strong, they are not yet peer-reviewed and officially published in a top-tier conference or journal. This means there's a possibility of future revisions or further scrutiny from the academic community.
Overall,
OneLocis a significant step forward forlocal life service recommendation, bridging the gap between cutting-edgegenerative modelsand the complex, multi-objective demands of real-world industrial applications. Its rigorous design and validated impact are commendable.
Similar papers
Recommended via semantic vector search.