Paper status: completed

OneLoc: Geo-Aware Generative Recommender Systems for Local Life Service

Published:08/20/2025

Generative Recommendation Systems (38)Multi-Objective Reinforcement Learning Optimization (2)Geo-Aware Recommendation Model (1)Local Life Service Recommendation (1)

Original Link PDF

Price: 0.100000

6 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

OneLoc introduces a geo-aware generative recommender integrating geographic cues and multi-objective reinforcement learning, improving local life service recommendations and significantly boosting GMV and orders on Kuaishou.

Abstract

Local life service is a vital scenario in Kuaishou App, where video recommendation is intrinsically linked with store's location information. Thus, recommendation in our scenario is challenging because we should take into account user's interest and real-time location at the same time. In the face of such complex scenarios, end-to-end generative recommendation has emerged as a new paradigm, such as OneRec in the short video scenario, OneSug in the search scenario, and EGA in the advertising scenario. However, in local life service, an end-to-end generative recommendation model has not yet been developed as there are some key challenges to be solved. The first challenge is how to make full use of geographic information. The second challenge is how to balance multiple objectives, including user interests, the distance between user and stores, and some other business objectives. To address the challenges, we propose OneLoc. Specifically, we leverage geographic information from different perspectives: (1) geo-aware semantic ID incorporates both video and geographic information for tokenization, (2) geo-aware self-attention in the encoder leverages both video location similarity and user's real-time location, and (3) neighbor-aware prompt captures rich context information surrounding users for generation. To balance multiple objectives, we use reinforcement learning and propose two reward functions, i.e., geographic reward and GMV reward. With the above design, OneLoc achieves outstanding offline and online performance. In fact, OneLoc has been deployed in local life service of Kuaishou App. It serves 400 million active users daily, achieving 21.016% and 17.891% improvements in terms of gross merchandise value (GMV) and orders numbers.

Mind Map

In-depth Reading

English Analysis~34 min read · 47,395 chars

1. Bibliographic Information

1.1. Title

The central topic of this paper is OneLoc: Geo-Aware Generative Recommender Systems for Local Life Service. It introduces a novel generative recommendation framework designed specifically for local life services, integrating geographic information comprehensively and balancing multiple business objectives.

1.2. Authors

The authors of the paper are Zhipeng Wei, Kuo Cai, Junda She, Jie Chen, Minghao Chen, Yang Zeng, Qiang Luo, Wencong Zeng, Ruiming Tang, Kun Gai, and Guorui Zhou. Most authors are affiliated with Kuaishou Inc. in Beijing, China, suggesting that the research is driven by practical applications within a large internet platform. Kun Gai is listed as Unaffiliated, Beijing, China. Their collective background appears to be in recommendation systems, machine learning, and large-scale industrial applications.

1.3. Journal/Conference

The paper specifies "In Proceedings of Make sure to enter the correct conference title from your rights confirmation email (Conference acronym 'XX)." This indicates that it has been submitted to a conference, but the specific venue's name has not yet been finalized or publicly disclosed at the time of this preprint. The paper is currently available as a preprint on arXiv, which is a popular repository for pre-publication research, especially in fields like computer science, allowing for early dissemination and feedback.

1.4. Publication Year

The publication date provided is 2025-08-20T11:57:48.000Z, which suggests it is an upcoming publication or a preprint anticipating publication in 2025.

1.5. Abstract

Local life service recommendation (LLSR) within the Kuaishou App is a challenging task because it requires simultaneously considering users' interests and their real-time location. While end-to-end generative recommendation has emerged as a new paradigm in various scenarios (e.g., short video, search, advertising), it has not yet been successfully applied to local life services due to two key challenges: effectively utilizing geographic information and balancing multiple objectives like user interests, distance to stores, and business goals. To address these, the paper proposes OneLoc, a geo-aware generative recommender system. OneLoc incorporates geographic information through three main mechanisms: (1) geo-aware semantic IDs for tokenization, combining video and geographic data; (2) geo-aware self-attention in the encoder, leveraging video location similarity and user's real-time location; and (3) a neighbor-aware prompt in the decoder to capture surrounding contextual information. To balance multiple objectives, OneLoc employs reinforcement learning with two custom reward functions: a geographic reward and a GMV (Gross Merchandise Value) reward. This design enables OneLoc to achieve superior offline and online performance. It has been deployed in Kuaishou's local life service, serving 400 million daily active users and demonstrating significant improvements of 21.016% in GMV and 17.891% in order numbers.

1.6. Original Source Link

The official source link is https://arxiv.org/abs/2508.14646. The PDF link is https://arxiv.org/pdf/2508.14646v1.pdf. This indicates its status as an arXiv preprint.

2. Executive Summary

2.1. Background & Motivation

The core problem OneLoc aims to solve is the challenging task of local life service recommendation (LLSR) in large-scale applications like Kuaishou. In this scenario, recommending short videos related to local businesses requires simultaneously addressing two critical factors: a user's evolving interests and their real-time geographical location.

This problem is important because local life services represent a vital and high-value scenario for major internet platforms. Effective LLSR can significantly drive user engagement, consumption, and Gross Merchandise Value (GMV).

Prior research faced specific challenges and gaps:

Comprehensive Geographic Information Utilization: Existing recommendation models, even those for Points of Interest (POI), often utilize geographic information in limited ways, such as independent features or simple prompts. A more holistic integration across the entire recommendation pipeline, from item representation to user behavior modeling and generation, was lacking.
Multi-Objective Balance: Recommendation systems in real-world industrial settings must balance various, often conflicting, objectives. These include maximizing user satisfaction (based on interests), ensuring practicality (e.g., proximity to stores), and meeting business goals (e.g., GMV, order volume). Traditional methods struggle to achieve a fine-grained balance of these diverse objectives within a single, end-to-end framework. While reinforcement learning has been used to balance objectives in other generative recommendation paradigms, its application to the unique multi-objective landscape of LLSR was unexplored.

The paper's entry point and innovative idea revolve around adapting the nascent end-to-end generative recommendation paradigm to the specific demands of local life services. By developing a model that generates recommendations rather than merely ranking pre-selected items, and by deeply embedding geo-awareness and multi-objective optimization throughout its architecture, OneLoc aims to overcome these limitations.

2.2. Main Contributions / Findings

The paper makes several significant contributions:

Novel End-to-End Generative Framework for LLSR: OneLoc is proposed as the first end-to-end generative recommender system specifically designed for short-video local life services. It unifies a generative architecture with a business-value-optimized reinforcement learning module, marking a significant advancement over traditional cascaded recommendation models.
Comprehensive Geo-Information Integration: OneLoc introduces three core components to make full use of geographic information across different stages of the recommendation process:
1. Geo-aware Tokenizer (Geo-aware Semantic IDs): Combines geographic semantics with multi-modal video information during the item tokenization phase, creating representations that inherently carry location context.
2. Geo-aware Self-Attention: Integrates geographic context into the encoder to capture user behavior sequential patterns, considering both video content and their associated locations.
3. Neighbor-aware Prompt: Enhances the decoder's ability to guide recommendation generation by considering not only the user's real-time location but also richer contextual information from surrounding neighborhoods.
Multi-Objective Optimization via Reinforcement Learning: To balance user interests, geographic proximity, and business goals, OneLoc proposes two specialized reward functions for its reinforcement learning phase:
1. Geographic Reward: Encourages the generation of recommendations for nearby stores by assigning higher rewards to closer locations.
2. GMV Reward: Leverages a GMV scoring model to promote videos that are likely to attract higher consumption and contribute to business objectives.
Empirical Validation and Industrial Deployment: Extensive offline experiments on a large-scale Kuaishou industry dataset and public Foursquare datasets demonstrate the superior effectiveness of OneLoc compared to state-of-the-art traditional and generative models. Crucially, OneLoc has been successfully deployed in Kuaishou's local life service, serving 400 million daily active users.

The key findings highlight OneLoc's ability to significantly improve real-world business metrics. It achieved 21.016% improvement in Gross Merchandise Value (GMV) and 17.891% improvement in order numbers, along with increases in paying users, demonstrating its practical value and impact in a high-traffic industrial environment.

3.1. Foundational Concepts

To understand OneLoc, a basic grasp of several key concepts in recommender systems, deep learning, and machine learning is essential.

Recommender Systems (RS): Software systems that provide suggestions for items to users. Traditionally, RS follow a matching-based paradigm, where a large pool of items is filtered down to a smaller set (recall stage), which is then ranked by relevance (ranking stage). The new paradigm, generative recommendation, directly generates item identifiers or representations.
Local Life Service Recommendation (LLSR): A specific type of recommender system focused on local businesses and services, often tied to a user's geographical location. Examples include recommending nearby restaurants, shops, or events.
Generative Recommendation: A paradigm shift from traditional matching-based or ranking-based systems. Instead of selecting from existing items, generative models directly create or synthesize the recommendations, often in the form of discrete semantic IDs or sequences of tokens, similar to how Large Language Models (LLMs) generate text. This approach is often powered by Transformer architectures.
Large Language Models (LLMs): Powerful neural networks, typically based on the Transformer architecture, trained on vast amounts of text data. They excel at understanding context, generating coherent text, and performing various language-related tasks. Their auto-regressive generation capabilities, where they predict the next token based on previous ones, are a key inspiration for generative recommendation.
Transformer Architecture: A neural network architecture introduced by Vaswani et al. (2017) that revolutionized sequence modeling. It relies heavily on self-attention mechanisms rather than recurrent or convolutional layers.
- Encoder-Decoder Structure: Transformers often consist of an encoder stack and a decoder stack. The encoder processes an input sequence (e.g., user historical behaviors) to produce a contextualized representation, while the decoder generates an output sequence (e.g., recommended item IDs) based on the encoder's output and its own previously generated tokens.
- Self-Attention: A mechanism that allows the model to weigh the importance of different parts of an input sequence when processing each element. For a sequence of input vectors, self-attention computes Query (Q), Key (K), and Value (V) vectors for each input. The attention score between two elements is calculated by the dot product of their $Q$ and $K$ vectors. These scores are then normalized (e.g., using softmax) and used to compute a weighted sum of $V$ vectors, which becomes the output for that element. The standard formula for scaled dot-product attention is: $ \mathrm{Attention}(Q, K, V) = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V $ where Q, K, V are matrices representing the Query, Key, and Value vectors, respectively, and $d_k$ is the dimension of the Key vectors, used for scaling to prevent vanishing gradients.
- Cross-Attention: Similar to self-attention, but Query vectors come from one sequence (e.g., the decoder's current state), while Key and Value vectors come from another sequence (e.g., the encoder's output). This allows the decoder to focus on relevant parts of the encoder's output during generation.
- Feed-Forward Network (FFN): A simple neural network applied independently to each position in the Transformer's output.
- RMSNorm (Root Mean Square Normalization): A normalization technique used in Transformers to stabilize training, similar to LayerNorm but typically computationally simpler. It normalizes the sum of squares of activations.
Semantic IDs / Tokenization: In generative recommendation, items (e.g., videos, products) are often converted into discrete tokens or semantic IDs, much like words are tokenized in natural language processing. This allows Transformer-based models to "generate" items by predicting sequences of these IDs. Residual K-means (res-kmeans) is a technique for vector quantization that maps high-dimensional continuous embeddings to sequences of discrete codes, often arranged hierarchically.
Reinforcement Learning (RL): A machine learning paradigm where an agent learns to make decisions by performing actions in an environment to maximize a cumulative reward.
- Reward Function: Defines the goal of the RL agent by assigning numerical values (rewards) to different states or actions. Designing effective reward functions is crucial for guiding the agent's learning towards desired outcomes.
- Direct Preference Optimization (DPO): A recent RL technique, particularly popular for fine-tuning LLMs. Instead of explicitly training a separate reward model and then using Proximal Policy Optimization (PPO), DPO directly optimizes the policy by comparing preferred and dispreferred pairs of responses, leveraging a simpler objective function that implicitly aligns the model with human preferences or desired reward signals.
Geographic Information Systems (GIS) & POI:
- GeoHash: A geocoding system that encodes a geographic location (latitude and longitude) into a short string of letters and digits. It has the property that locations closer to each other will have GeoHashes with longer common prefixes, making it useful for spatial indexing and proximity queries.
- Point of Interest (POI): A specific point location that someone may find useful or interesting, such as a restaurant, store, landmark, or park. POI recommendation focuses on suggesting such locations.
Evaluation Metrics:
- Recall@K: Measures the proportion of relevant items that are present in the top-K recommendations.
- NDCG@K (Normalized Discounted Cumulative Gain at K): A metric that considers both the relevance of recommended items and their position in the ranked list. Higher-ranked relevant items contribute more to the score.
- GMV (Gross Merchandise Value): A business metric representing the total sales value of merchandise sold through a particular channel or platform.
- Order Numbers: A business metric representing the total count of completed purchase transactions.

3.2. Previous Works

The paper positions OneLoc within the context of advancements in POI recommendation and the emerging field of generative recommendation.

POI Recommendation:

TPG [12]: A Transformer-based approach that uses target timestamps as prompts to enhance geography-aware location recommendations. This focuses on leveraging temporal context for location, but not as broadly as OneLoc in combining geo-info across all stages.
Rotan [2]: Introduces a time-aware attention mechanism where time intervals are represented as rotational position vectors within Transformer architectures to capture temporal dynamics in user behavior sequences for POI recommendation. Again, a focus on time, and an independent feature.
STAN [13]: Proposes a spatial-temporal attention mechanism to capture relevance within POI trajectories for next location recommendation. This emphasizes patterns in movement.
LLM-Mob [21], NextLocLLM [11], LLM4POI [9]: More recent works exploring Large Language Models (LLMs) for POI recommendation, often by transforming prediction tasks into in-context learning or question-answering tasks, or by injecting spatial coordinates into LLMs. These demonstrate the power of LLMs but OneLoc aims for a more integrated, end-to-end generative approach with specific business objective alignment.

The paper notes that while these POI methods are effective, they are all discriminative models, meaning they predict from existing choices rather than generating. OneLoc moves into the generative paradigm.

Generative Recommendation: This is a newer paradigm, heavily influenced by the success of LLMs.

TIGER [15]: Considered one of the first works to propose a generative recommendation framework using hierarchical semantic IDs encoded with RQ-VAE (Residual Quantized Variational AutoEncoder). This laid the groundwork for representing items as discrete tokens.
OneRec [1, 28]: Further develops the generative recommendation paradigm for short video scenarios, utilizing reinforcement learning with reward models to align with user preferences and industrial requirements. OneLoc builds on OneRec's RL framework but adapts it specifically for local life services and multi-objective geographic context.
OneSug [3]: Proposes an end-to-end generative framework specifically for e-commerce query suggestion.
EGA [27]: Designs an end-to-end generative advertising system to address critical advertising requirements like bidding, creative selection, and ad allocation.
GNPR-SID [17]: Migrates semantic-ID-based generative recommendation to the POI recommendation scenario, using geographic information to construct semantic IDs. OneLoc is similar in spirit but enhances geographic integration at multiple architectural levels and adds multi-objective RL.
COBRA [22]: Proposes a coarse-to-fine framework that first generates semantic IDs and then dense vectors for retrieval.
LC-Rec [26]: Aligns collaborative filtering signals with multiple training tasks within a generative recommendation context.
LETTER [20]: Improves item tokenization by integrating hierarchical semantics, collaborative signals, and code assignment diversity.
ActionPiece [6]: Focuses on context-aware tokenization of action sequences, arguing that action meaning depends on context.
EAGER [5]: Aims to align linguistic semantics of pre-trained LLMs with collaborative semantics non-intrusively.

3.3. Technological Evolution

The field of recommender systems has evolved from basic collaborative filtering and content-based filtering to complex deep learning models (e.g., RNNs, CNNs, Transformers) that capture intricate user behaviors and item features. POI recommendation specifically integrated spatial and temporal information. The recent surge in LLMs has driven a paradigm shift towards generative recommendation, moving from matching-and-ranking to directly generating item identifiers. OneRec, OneSug, EGA, and TIGER are examples of this new wave in various domains.

OneLoc represents the next step in this evolution by applying generative recommendation to the highly specialized and complex local life service domain. It specifically addresses the nuanced requirements of this domain, which demand deep integration of geographic information and careful balancing of diverse business objectives, something that previous generative or POI models hadn't fully tackled.

3.4. Differentiation Analysis

Compared to the main methods in related work, OneLoc introduces several core innovations:

Holistic Geo-Awareness: While some POI models use geographic information (e.g., TPG, Rotan, GNPR-SID), OneLoc integrates it comprehensively and end-to-end across three distinct architectural components:
1. Item Representation: Geo-aware semantic IDs embed geographic context directly into item tokens.
2. User Behavior Encoding: Geo-aware self-attention in the encoder models historical interactions with explicit consideration of location similarity and user's real-time location.
3. Generation Context: Neighbor-aware prompt in the decoder enriches the generation process with surrounding geographical context, not just the user's single location. This level of integration is more profound than simply using geo-coordinates as an independent feature or prompt.
Multi-Objective Reinforcement Learning for LLSR: OneLoc explicitly tackles the multi-objective nature of LLSR by employing reinforcement learning with custom-designed geographic and GMV reward functions. This allows for a fine-grained balance between user interests (learned during pre-training), geographical practicality (proximity), and critical business metrics (GMV, orders). Previous generative recommendation works like OneRec also use RL, but OneLoc tailors the reward functions to the specific LLSR context.
Industrial Scale and Impact: The successful deployment of OneLoc in Kuaishou's local life service, serving 400 million daily active users and achieving significant improvements in GMV and order numbers, showcases its robustness and effectiveness in a real-world, high-traffic industrial setting, surpassing existing complex cascade systems. This practical validation distinguishes it from many research papers that primarily demonstrate offline efficacy.

4. Methodology

4.1. Principles

OneLoc is an end-to-end generative recommendation model designed for local life services. Its core principles are:

Generative Paradigm: Instead of retrieving and ranking pre-existing items, OneLoc directly generates recommendations in the form of semantic IDs, leveraging the powerful auto-regressive capabilities inspired by Large Language Models (LLMs).
Comprehensive Geo-Awareness: Geographic information is crucial in local life services. OneLoc integrates this information at multiple stages: item tokenization, user behavior encoding, and recommendation generation.
Multi-Objective Optimization: Real-world recommendation systems require balancing diverse objectives (user satisfaction, geographic proximity, business value). OneLoc achieves this through a reinforcement learning framework, using specifically designed reward functions.
Two-Stage Training: The model is trained in two phases: an initial pre-training phase using next token prediction (NTP) to learn basic user preferences and item semantics, followed by a post-training phase using reinforcement learning to align the model with complex, multi-faceted business objectives.

The overall framework of OneLoc is depicted in Figure 2.

$该图像是示意图，展示了OneLoc模型的整体架构，包括编码器-解码器架构(a)、基于地理感知的自注意力机制、邻居感知提示模块、以及基于强化学习的训练流程(b)。图中还涉及公式$L_{NTP}$和$L_{DPO}$，体现了模型多目标优化设计。$ 该图像是示意图，展示了OneLoc模型的整体架构，包括编码器-解码器架构(a)、基于地理感知的自注意力机制、邻居感知提示模块、以及基于强化学习的训练流程(b)。图中还涉及公式 $L_{NTP}$ 和 $L_{DPO}$ ，体现了模型多目标优化设计。

The image is a schematic diagram illustrating the overall architecture of the OneLoc model, including the encoder-decoder architecture (a), geospatial-aware self-attention mechanism, neighbor-aware prompt module, and the reinforcement learning training process (b). The diagram also involves formulas $L_{NTP}$ and $L_{DPO}$ , reflecting the model's multi-objective optimization design.

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Problem Formulation

The task is defined as recommending videos in the local life service scenario.

Let $\mathcal{U}$ , $\mathcal{V}$ , and $\mathcal{L}$ represent the set of users, videos, and locations, respectively.
Each video $v \in \mathcal{V}$ is associated with a location $l \in \mathcal{L}$ . This location is a GeoHash block and includes geographical coordinates, brand, and category information.
Each video $v$ is represented by a video embedding ( $e^v$ ), a location ID embedding ( $e^{lid}$ ), and a location context embedding ( $e^{lc}$ ).
A user $u$ has a real-time location $l_u$ , associated with a location context embedding ( $e_u^{lc}$ ).
Each video is mapped to a semantic ID, denoted as $Q_v = (q_v^1, q_v^2, ..., q_v^T)$ , where $T$ is the number of codebooks (representing hierarchical levels of the ID).
Given a user's interacted video sequence $\mathcal{S} = \{v_1, v_2, ..., v_{t-1}\}$ and real-time location $l_u$ , the objective is to predict the next video $v_t$ that would attract consumption. This is formulated as maximizing the probability $\wp(v_t | l_u, \mathcal{S})$ .

4.2.2. Geo-aware Semantic IDs

To inject geographic information directly into the item representation, OneLoc employs geo-aware semantic IDs. This process follows OneRec's use of res-kmeans for tokenization but enriches the initial embeddings with geographic context.

Initial Embedding: Unlike traditional methods that treat geographic information as an independent feature, OneLoc incorporates it into the raw video features. Each initial residual vector $r_i^0$ in the initial residual set $\mathcal{R}^0$ is represented as an embedding derived from both video content and location context information. This combined embedding is extracted using a multimodal large language model (though specific details of this model are not elaborated in the provided text). This ensures that the semantic IDs generated (geo-aware SIDs) inherently contain geographic semantics.
Residual K-means (res-kmeans): This process iteratively quantizes the video embeddings into a sequence of discrete codes.
1. At each layer $i$ , a codebook $C^i$ is constructed by applying K-means clustering on the current set of residuals $\mathcal{R}^i$ : $C^i = \mathrm{K-means}(\mathcal{R}^i, N_c)$ Here, $C^i = \{c_k^i | k = 1, 2, ..., N_c\}$ is the set of centroids (codes) obtained from K-means, and $N_c$ is the size of the codebook (number of clusters).
2. For each residual $r_j^i$ in the set $\mathcal{R}^i$ , the nearest centroid index $q_j^i$ is found, which becomes the $i$ -th code of the semantic ID: $q_j^i = \underset{k}{\arg\min} ||c_k^i - r_j^i||$ This $q_j^i$ is the index of the closest codebook entry.
3. A new residual $r_j^{i+1}$ is then calculated by subtracting the chosen centroid from the current residual. This new residual is passed to the next layer for further quantization: $r_j^{i+1} = r_j^i - c_{q_j^i}^i$
Semantic ID Generation: This process iterates $T$ times, yielding $T$ codebooks. In OneLoc, $T$ is set to 3, meaning each video is tokenized into a three-digit semantic ID. Thus, a target video $v_t$ is transformed into its semantic ID $Q_t = \mathrm{Tokenize}(v_t) = (q_t^1, q_t^2, q_t^3)$ .

4.2.3. Encoder

The encoder's role is to process the user's behavioral sequence and extract useful patterns, now enhanced with geographic awareness.

4.2.3.1. Multi-behavior Sequence

To comprehensively capture user preferences, OneLoc considers not just watching sequences but also clicking and purchasing sequences.

The multi-behavior sequence $S$ is defined as: $S = \{S^{watch}, S^{click}, S^{pay}\}$ where $S^{watch}$ , $S^{click}$ , and $S^{pay}$ represent the user's watching, clicking, and purchasing sequences, respectively.
These sequences are then transformed into a comprehensive embedding sequence $Z$ and a location context embedding sequence $E^{lc}$ : $Z = \{z_1, z_2, ..., z_i, ..., z_{|S|}\} \\ z_i = \mathrm{MLP}(\mathrm{Concat}(e_i^v, e_i^{lid}, e_i^{lc})) \\ E^{lc} = \{e_1^{lc}, e_2^{lc}, ..., e_i^{lc}, ..., e_{|S|}^{lc}\}$ Here, $|S|$ is the length of the sequence $S$ . Concat denotes the concatenation operation along the feature dimension. $z_i$ is the combined embedding for the $i$ -th video, generated by an MLP (Multi-Layer Perceptron) that processes the concatenated video embedding ( $e_i^v$ ), location ID embedding ( $e_i^{lid}$ ), and location context embedding ( $e_i^{lc}$ ). $E^{lc}$ specifically stores the sequence of location context embeddings.

4.2.3.2. Encoder Architecture

The embedding sequences ( $Z$ , $E^{lc}$ ) along with the user's current location context ( $e_u^{lc}$ ) are fed into the encoder.

The encoder consists of a stack of $K$ Transformer blocks.
Each Transformer block contains a GA-Attn (Geo-aware Self-attention) module and a FFN (Feed-Forward Network) module, with RMSNorm applied for normalization.
The computation within each layer $i$ of the encoder is formally defined as: $\widetilde{Z}^{i+1} = Z^i + \mathrm{GA-Attn}(\mathrm{RMSNorm}(Z^i), E^{lc}, e_u^{lc}) \\ Z^{i+1} = \widetilde{Z}^{i+1} + \mathrm{FFN}(\mathrm{RMSNorm}(\widetilde{Z}^{i+1})) \\ Z^0 = Z$ where $Z$ is the initial input embedding sequence. $Z^i$ is the output of the $i$ -th encoder layer. $\widetilde{Z}^{i+1}$ represents the intermediate result after the GA-Attn module. $e_u^{lc}$ is the context embedding of the user's real-time location $l_u$ . GA-Attn is the core module designed to capture user behavior patterns while being aware of geographic information.

4.2.3.3. Geo-aware Self-attention (`GA-Attn`)

The GA-Attn module is a novel component that adapts the standard self-attention mechanism to integrate geographic context. Its goal is to identify relevant behaviors from a user's interaction history based on their real-time location.

Attention Score Calculation: The attention score is composed of two parts:
1. Comprehensive Similarity: This uses the comprehensive embedding sequence $Z$ (containing video content, location ID, and location context) to calculate Queries and Keys, similar to standard self-attention.
2. Location Context Similarity: This explicitly uses the location context embedding sequence $E^{lc}$ to further enhance the geographical semantics in the attention scores. The combined attention calculation is: $A = \mathrm{Softmax}((ZW_q)(ZW_k)^T / \sqrt{d} + E^{lc}(E^{lc})^T) \\ \widetilde{O} = A (ZW_v) W_o$ Here, $W_q, W_k, W_v, W_o$ are the weight matrices for Query, Key, Value, and output, respectively. $d$ refers to $d_k$ , the dimension of the Key vectors for scaling. The term $E^{lc}(E^{lc})^T$ adds a direct measure of location context similarity to the attention scores, allowing the model to give higher attention to videos with similar location contexts. $\widetilde{O}$ is the output of this attention mechanism.
User Real-time Location Gating: To dynamically inject the user's real-time location information, a gating mechanism is applied to the attention output. This gate scales the attention output based on the similarity between the user's current location context and the location context of each video in the historical sequence. $g_i = 2 * \mathrm{Sigmoid}(\mathrm{MLP}(\mathrm{Concat}(e_u^{lc}, E_{i*}^{lc}))) \\ O_{i*} = g_i \widetilde{O}_{i*}$ where $E_{i*}^{lc}$ is the $i$ -th row of $E^{lc}$ (representing the location context of the $i$ -th video in the sequence). The MLP processes the concatenation of the user's location context embedding ( $e_u^{lc}$ ) and the individual video's location context embedding. A Sigmoid activation function, scaled by 2, produces a gating parameter $g_i \in (0, 2)$ , which then scales the $i$ -th row of the attention output $\widetilde{O}_{i*}$ to yield the final output $O_{i*}$ . This allows the user's real-time location to dynamically influence the relevance of historical interactions.

4.2.4. Decoder

The decoder's function is to generate the semantic IDs of recommended videos, utilizing the encoder's output and a specialized prompt that incorporates rich geographic context.

4.2.4.1. Neighbor-aware Prompt

Recognizing the importance of local context, OneLoc introduces a neighbor-aware prompt to model the rich context surrounding a user's real-time location.

Surrounding Location Information: Given a user's real-time geographic location $l_u$ , the system calculates its surrounding geographic locations (e.g., $\{l_u^1, l_u^2, ..., l_u^8\}$ ). For these locations, their respective context information (e.g., brands, bestselling products within those areas) is obtained.
Context Embeddings: This yields the user's location context embedding $e_u^{lc}$ and a set of context embeddings for surrounding locations $E^s = \{e_{l_u^1}^{lc}, ..., e_{l_u^8}^{lc}\}$ .
Cross-Attention for Prompt: A cross-attention mechanism is used to aggregate this surrounding information into a single neighbor-aware prompt embedding. The user's location context embedding $e_u^{lc}$ acts as the Query, while the surrounding location context embeddings $E^s$ serve as both Keys and Values. $e^s = \mathrm{CrossAttn}(e_u^{lc}, E^s)$ Here, $e^s$ is the resulting embedding of the neighbor-aware prompt, effectively summarizing the relevant context from the user's neighborhood. This embedding will guide the decoder's generation process.

4.2.4.2. Decoder Architecture

The decoder is also structured as a stack of $K$ blocks, each comprising a casual self-attention module, a cross-attention module, and a feed-forward network (FFN) module, with RMSNorm applied.

Input: The initial input to the decoder ( $H^0$ ) consists of the neighbor-aware prompt embedding ( $e^s$ ) and the embeddings of the previously generated semantic ID digits (e.g., $e_{q_t^1}, e_{q_t^2}, e_{q_t^3}$ ). This allows the decoder to generate the semantic ID of the target video auto-regressively.
Block Computation: The calculation within each layer $i$ of the decoder is formally defined as: $\widetilde{H}^{i+1} = H^i + \mathrm{SelfAttn}(\mathrm{RMSNorm}(H^i)) \\ \widetilde{\widetilde{H}}^{i+1} = \widetilde{H}^{i+1} + \mathrm{CrossAttn}(\mathrm{RMSNorm}(\widetilde{H}^{i+1}), S^K) \\ H^{i+1} = \widetilde{\widetilde{H}}^{i+1} + \mathrm{FFN}(\mathrm{RMSNorm}(\widetilde{\widetilde{H}}^{i+1})) \\ H^0 = \{e^s, e_{q_t^1}, e_{q_t^2}, e_{q_t^3}\}$ Here, $H^i$ is the output of the $i$ -th decoder layer. $\widetilde{H}^{i+1}$ is the intermediate result after casual self-attention (which only attends to previous tokens in the output sequence). $\widetilde{\widetilde{H}}^{i+1}$ is the result after cross-attention, where the decoder queries the encoder's final hidden state (represented as $S^K$ ) to get context from the user's historical behavior. SelfAttn refers to the casual self-attention module, and FFN is the feed-forward network. $e_{q_t^1}$ , $e_{q_t^2}$ , and $e_{q_t^3}$ are the embeddings of the three-digit semantic ID of the target video, which are input sequentially to generate the next digit.

4.2.4.3. Next Token Prediction (NTP) Loss

The output $H^K$ of the final decoder layer is used to predict the next semantic ID token.

Specifically, the embedding $H_0^K$ (corresponding to the geo-aware prompt) is used to predict the first digit $q_t^1$ .
The embedding $H_1^K$ (corresponding to the embedding of $q_t^1$ ) is used to predict the second digit $q_t^2$ , and so on.
The training loss during the pre-training phase is a cross-entropy loss over these predictions: $\hat{y}^j = \mathrm{Softmax}(\mathrm{MLP}(H_{j-1}^K)) \in \mathbb{R}^{N_c}, j \in \{1, 2, 3\} \\ L_{ntp} = \sum_{j=1}^3 -\log \hat{y}_{q_t^j}^j$ Here, $\hat{y}^j \in \mathbb{R}^{N_c}$ is the predicted probability distribution for the $j$ -th digit of the target semantic ID (where $N_c$ is the codebook size). $\hat{y}_{q_t^j}^j$ is the predicted probability of the true $j$ -th digit $q_t^j$ . This NTP loss encourages the model to accurately generate the semantic ID of the observed next video.
After pre-training, the model parameters $\theta_0$ define a probability distribution over semantic IDs given the user's sequence and location: $\mathcal{p}_{\theta_0}(Q_t | S, l_u) = \mathcal{p}_{\theta_0}(q_t^1 | S, l_u) \mathcal{P}_{\theta_0}(q_t^2 | S, l_u, q_t^1) \mathcal{p}_{\theta_0}(q_t^3 | S, l_u, q_t^1, q_t^2)$ This formula shows the auto-regressive nature of the generation, where each subsequent digit's probability depends on the previous ones.

4.2.5. Reinforcement Learning

The pre-training phase primarily learns to predict videos exposed by existing systems. However, these exposed videos might not perfectly balance all desired objectives (user interest, distance, business value). To address this, OneLoc incorporates reinforcement learning to fine-tune the model with specific reward signals, using Direct Preference Optimization (DPO).

4.2.5.1. Reward Signals

Two distinct reward functions are designed to guide the RL process:

Geographic Reward ( $R^{geo}$ ): This reward encourages the model to recommend videos corresponding to stores that are geographically closer to the user. A closer location yields a higher reward. $R^{geo}(v, l_u) = \left\{ \begin{array}{ll} 0 & \mathrm{if} Dis(v, l_u) > D \\ \frac{1}{Dis(v, l_u)} & \mathrm{else} \end{array} \right.$ Here, $Dis(v, l_u)$ calculates the distance between the location of video $v$ and the user's location $l_u$ . $D$ is a predefined distance threshold. If the distance exceeds $D$ , the reward is 0. Otherwise, it's inversely proportional to the distance, meaning closer videos get higher rewards.
GMV Reward ( $R^{gmv}$ ): This reward aims to promote the generation of videos that are likely to lead to actual consumption and high Gross Merchandise Value (GMV). $R^{gmv}(v, S, l_u) = \mathrm{GMV}(v, S, l_u)$ $\mathrm{GMV}(v, S, l_u)$ represents the output of a separate, pre-existing GMV scoring model. This model evaluates the potential GMV generated by recommending video $v$ to a user with sequence $S$ and location $l_u$ .

4.2.5.2. Direct Preference Optimization (`DPO`)

DPO is used to align the model with these reward signals without explicit reward model training or complex PPO algorithms.

Preference Pair Construction:
1. For each sample (user sequence $S$ , user location $l_u$ ), the pre-trained model $\theta_0$ is used to sample $N$ different videos via beam search. Beam search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set. Here, it finds $N$ sequences of semantic IDs with high probability according to the model. $\mathcal{B}_u^N = \mathrm{TopN}(p_{\theta_0}(S, l_u))$ $\mathrm{TopN}$ selects the $N$ videos (represented by their semantic IDs) with the highest probabilities from the model's output distribution. $\mathcal{B}_u^N$ is the set of generated results.
2. For each of these $N$ generated videos, the combined reward is calculated (e.g., by summing $R^{geo}$ and $R^{gmv}$ ).
3. From these $N$ videos, a positive sample $v_p$ (the video with the highest combined reward) and a negative sample $v_n$ (the video with the lowest combined reward) are selected. These form a preference pair $(v_p, v_n, S, l_u)$ .
DPO Loss: These preference pairs are then used to train a new model with parameters $\theta_{i+1}$ , initialized from $\theta_i$ (the current model parameters). The DPO loss corresponding to each preference pair is: $L_{dpo} = -\log \left( \beta \log \frac{\dot{p}_{\theta_{i+1}}(Q_p | S, l_u)}{\dot{p}_{\theta_i}(Q_p | S, l_u)} - \beta \log \frac{\dot{p}_{\theta_{i+1}}(Q_n | S, l_u)}{\dot{p}_{\theta_i}(Q_n | S, l_u)} \right)$ Here, $Q_p = \mathrm{Tokenize}(v_p)$ and $Q_n = \mathrm{Tokenize}(v_n)$ are the semantic IDs of the positive and negative videos, respectively. $\dot{p}_{\theta_{i+1}}$ is the probability of the semantic ID according to the current model being trained, and $\dot{p}_{\theta_i}$ is the probability from the reference model (the previous iteration's model or the pre-trained model), which serves to prevent the policy from diverging too far from the initial distribution. $\beta$ is a hyperparameter controlling the strength of the DPO regularization. The DPO loss directly optimizes the policy to assign higher probabilities to preferred outcomes ( $Q_p$ ) compared to dispreferred ones ( $Q_n$ ), thereby aligning the model's generation with the desired reward signals.
Final Training Loss: The overall training loss during the reinforcement learning phase combines the original Next Token Prediction (NTP) loss ( $L_{ntp}$ ) with the DPO loss ( $L_{dpo}$ ): $L = L_{ntp} + \lambda L_{dpo}$ Here, $\lambda$ is a hyperparameter that controls the weighting between the NTP pre-training objective and the DPO fine-tuning objective. This combined loss ensures that the model maintains its general generation capabilities while simultaneously optimizing for the specific geographic and business objectives.

5. Experimental Setup

5.1. Datasets

The experiments for OneLoc were conducted on both an internal, large-scale industrial dataset from Kuaishou and public Foursquare datasets.

KuaiLLSR Dataset:
- Source & Characteristics: This dataset is constructed from Kuaishou's internal behavioral data of users interacting with short videos related to local life services. It's a real-world, high-volume dataset. Each video in this context has associated store location information.
- Scale: Contains 60 million users, 900K items (videos), and 440 million interactions.
- Data Structure: The data records user interaction sequences, with an average user sequence length of approximately 200.
- Split: OneLoc is trained in a streaming setup. The first 7 days' data are used for model training, and the final day's data is reserved for evaluation.
Foursquare Dataset (Public):
- Source & Characteristics: This public dataset comprises user check-in records for various Points of Interest (POIs) across multiple cities. Each record includes the user ID, POI ID, geographical coordinates, and a timestamp. It is commonly used for POI recommendation research.
- Preprocessing: The dataset is pre-processed using LibCity's standardized pipeline [19] to filter users and POIs and sort check-ins chronologically to construct interaction sequences.
- Subsets: Experiments are conducted on two subsets: NYC (New York City) and TKY (Tokyo), which exhibit different user densities and POI distributions, allowing for evaluation under diverse spatial-temporal patterns.
- Split: For these public datasets, interaction sequences are split into 80% for training, 10% for validation, and 10% for testing.
  
  The following are the results from Table 2 of the original paper:
  
  KuaiLLSR NYC TKY
  
  #users 60 M 1,042 2,187
  
  #items 900 K 36,359 59,472
  
  #interactions 440 M 212,347 545,301

	KuaiLLSR	NYC	TKY
#users	60 M	1,042	2,187
#items	900 K	36,359	59,472
#interactions	440 M	212,347	545,301

Why these datasets were chosen: KuaiLLSR provides a realistic, large-scale industrial setting with actual local life service video recommendations, crucial for validating the model's performance in its target application. The Foursquare datasets (NYC, TKY) offer publicly available benchmarks for location-based recommendations, allowing for comparison with other research and demonstrating the model's generalizability across different geographical and behavioral contexts.

5.2. Evaluation Metrics

For evaluating the performance of OneLoc, both offline and online metrics were used.

Offline Metrics:
- Recall@K: Measures the proportion of relevant items successfully retrieved among the top K recommendations. It indicates how well the recommender system can find all relevant items for a user.
  - Conceptual Definition: Recall@K quantifies the ability of a recommendation system to suggest items that are truly relevant to the user's future interactions within a top-K list. It focuses on the completeness of the recommendations, i.e., how many of the "good" items were actually shown.
  - Mathematical Formula: $\mathrm{Recall}@K = \frac{1}{|U|} \sum_{u \in U} \frac{|R_u \cap T_u|}{|T_u|}$
  - Symbol Explanation:
    - $U$ : The set of all users in the test set.
    - $|U|$ : The total number of users.
    - $R_u$ : The set of top-K items recommended to user $u$ .
    - $T_u$ : The set of actual relevant items for user $u$ in the test set.
    - $|R_u \cap T_u|$ : The number of relevant items that are also in the top-K recommendations.
    - $|T_u|$ : The total number of relevant items for user $u$ .
- NDCG@K (Normalized Discounted Cumulative Gain at K): A metric that takes into account not only the presence of relevant items but also their position in the ranked list and their degree of relevance. Highly relevant items ranked higher contribute more to the score.
  - Conceptual Definition: NDCG@K evaluates the quality of a ranked recommendation list. It assigns higher scores to relevant items that appear earlier in the list and also accounts for different degrees of relevance, where more relevant items contribute more to the cumulative gain. The "Normalized" part ensures comparability across different recommendation lists.
  - Mathematical Formula: First, calculate the Discounted Cumulative Gain (DCG@K): $\mathrm{DCG}@K = \sum_{i=1}^K \frac{2^{\mathrm{rel}_i} - 1}{\log_2(i+1)}$ Then, NDCG@K is calculated by normalizing DCG@K by the Ideal DCG (IDCG@K): $\mathrm{NDCG}@K = \frac{\mathrm{DCG}@K}{\mathrm{IDCG}@K}$
  - Symbol Explanation:
    - $K$ : The number of top recommendations being considered.
    - $\mathrm{rel}_i$ : The relevance score of the item at position $i$ in the recommended list. (In many implicit feedback scenarios, relevance is binary: 1 if the item is interacted with, 0 otherwise).
    - $\log_2(i+1)$ : A logarithmic discount factor, penalizing relevant items that appear later in the list.
    - $\mathrm{IDCG}@K$ : The DCG of the ideal ranking, where all relevant items are ranked at the top in decreasing order of their relevance. This serves as the maximum possible DCG for a given query.
Online Metrics (A/B Test):
- GMV (Gross Merchandise Value): The total monetary value of all goods or services sold over a given period through the platform. This is a primary business objective.
- Order Numbers: The total count of unique purchase transactions completed by users. Another key business objective reflecting conversion.
- Number of Paying Users in Local Services: The count of unique users who made a purchase in local services.
- New Paying Users in Local Services: The count of unique users who made their first purchase in local services during the experiment period.

5.3. Baselines

OneLoc was compared against a diverse set of baseline models categorized into traditional recommender models, POI related models, and other generative models.

Traditional Recommender Models: These models typically focus on sequential patterns in user behavior.
- SASRec [7]: Self-Attentive Sequential Recommendation. Employs a unidirectional Transformer encoder to model user preferences by allowing each item in a sequence to attend to all preceding items.
- BERT4Rec [16]: Sequential recommendation with bidirectional encoder representations from transformer. Leverages BERT's masked language modeling objective to learn bidirectional Transformer representations for user behavior sequences.
- GRU4Rec [4]: Session-based recommendations with recurrent neural networks. Uses Gated Recurrent Units (GRUs) to capture temporal dependencies in user interaction sequences, particularly effective for session-based recommendation.
- S3-Rec [29]: Self-supervised learning for sequential recommendation with mutual information maximization. Enhances sequential recommendation by mining self-supervised learning signals through mutual information maximization within user behavior sequences.
POI Related Models: These models specifically address location-based recommendations.
- TPG [12]: Timestamps as prompts for geography-aware location recommendation. A Transformer-based approach that uses target timestamps as prompts to inform geography-aware location recommendations.
- Rotan [2]: A rotation-based temporal attention network for time-specific next poi recommendation. Encodes time intervals using rotational position vector representations within Transformer architectures to capture temporal dynamics for next POI recommendation.
Generative Models: Other models that adopt the generative paradigm.
- TIGER [15]: Recommender systems with generative retrieval. One of the pioneering works in generative recommendation, using codebook-based semantic quantization (via RQ-VAE) to represent items as discrete code sequences.
- GNPR-SID [17]: Generative Next POI Recommendation with Semantic ID. Transforms POI information into discrete semantic identifiers and uses a generative approach for next-POI prediction. This is the closest generative baseline in the POI domain.

5.4. Implement Details

The implementation details provide insight into the specific configurations and technologies used for training and inference.

Optimizer: AdamW was used, which is an Adam optimizer variant with decoupled weight decay regularization.
- Initial learning rate: $2 \times 10^{-4}$
- Weight decay: 0.1
Hardware: Training was performed on NVIDIA A800 GPUs, which are high-performance accelerators suitable for large-scale deep learning.
Semantic ID Configuration:
- K-means clusters: Each codebook layer uses $K = 8192$ clusters.
- Number of codebook layers ( $L$ ): $L = 3$ , meaning each semantic ID is a three-digit code.
Model Architecture:
- Encoder & Decoder layers: Both the encoder and decoder stack 4 blocks.
- Hidden units: 1024.
- Attention heads: 8.
- Feed-forward network (FFN) dimension: 4096.
Sequence Lengths: The maximum lengths for different user behavior sequences were set to:
- Watch sequence: 256
- Click sequence: 32
- Pay sequence: 10
DPO Loss Weight: The hyperparameter $\lambda$ for weighting the DPO loss in the final training objective was set to 0.05.
Inference Optimization: For online deployment, several techniques were employed to optimize inference performance: mixed-precision computation, KV cache (for efficient Transformer decoding), dynamic batching, and TensorRT acceleration on NVIDIA A10 GPUs. These optimizations achieved 25% Model FLOPs Utilization (MFU), indicating efficient hardware utilization.

6. Results & Analysis

6.1. Core Results Analysis (RQ1: Overall Performance)

The paper presents extensive experimental results to demonstrate the effectiveness of OneLoc compared to state-of-the-art methods across various datasets.

The following are the results from Table 1 of the original paper:

Dataset	Metric	Traditional					POI Related		Generative			Improvement
Dataset	Metric	SASRec	BERT4Rec	GRU4Rec	Caser	S3-Rec	TPG	Rotan	TIGER	GNPR-SID	Ours	Improvement
KuaiLLSR	Recall@5	0.0927	0.0682	0.0350	0.0438	0.1218	0.1750	0.2185	0.2832	0.3142	0.3565*	13.46%
	Recall@10	0.1336	0.1071	0.0602	0.0712	0.1889	0.2559	0.2843	0.3637	0.4207	0.4563*	8.46%
	Recall@20	0.2048	0.1623	0.1090	0.1147	0.2808	0.3241	0.3563	0.4413	0.5056	0.5584*	10.44%
	NDCG@5	0.0408	0.0311	0.0178	0.0221	0.0793	0.0897	0.1029	0.1500	0.1775	0.2032*	14.47%
	NDCG@10	0.0523	0.0338	0.0255	0.0266	0.0971	0.1018	0.1147	0.1584	0.1874	0.2114*	12.81%
	NDCG@20	0.0705	0.0565	0.0360	0.0369	0.1143	0.1117	0.1266	0.1615	0.1904	0.2151*	12.97%
NYC	Recall@5	0.3151	0.2857	0.1977	0.2883	0.3071	0.3551	0.4448	0.4965	0.5311	0.6107*	14.98%
	Recall@10	0.3896	0.3564	0.2460	0.3570	0.3854	0.4441	0.5223	0.5514	0.5942	0.6563*	10.45%
	Recall@20	0.4506	0.4130	0.2889	0.4135	0.4503	0.5121	0.5834	0.6001	0.6455	0.6977*	8.09%
	NDCG@5	0.2224	0.2074	0.1442	0.2044	0.2235	0.2464	0.3471	0.4131	0.4430	0.5355*	20.88%
	NDCG@10	0.2467	0.2304	0.1599	0.2267	0.2489	0.2755	0.3723	0.4276	0.4634	0.5504*	18.77%
	NDCG@20	0.2622	0.2448	0.1708	0.2410	0.2654	0.2927	0.3878	0.4443	0.4766	0.5608*	17.66%
TKY	Recall@5	0.3450	0.2649	0.2514	0.3257	0.3365	0.3725	0.4333	0.5031	0.5354	0.5964*	11.39%
	Recall@10	0.4284	0.3326	0.3106	0.4067	0.4115	0.4601	0.5113	0.5808	0.6130	0.6620*	7.99%
	Recall@20	0.4976	0.3943	0.3651	0.4758	0.4739	0.5291	0.5894	0.6431	0.6675	0.7152*	7.15%
	NDCG@5	0.2384	0.1907	0.1833	0.2273	0.2423	0.2591	0.3293	0.4003	0.4437	0.4961*	11.81%
	NDCG@10	0.2655	0.2127	0.2025	0.2535	0.2666	0.2881	0.3568	0.4251	0.4623	0.5174*	11.92%
	NDCG@20	0.2831	0.2284	0.2163	0.2711	0.2825	0.3051	0.3739	0.4401	0.4788	0.5306*	10.82%

Observations from Offline Performance (Table 1):

Superiority of OneLoc: OneLoc (labeled "Ours") consistently achieves the best performance across all Recall@K and NDCG@K metrics on all three datasets (KuaiLLSR, NYC, TKY). The asterisk * indicates statistical significance ( $p-value < 0.05$ $p - v a l u e < 0.05$ ).
- On the KuaiLLSR dataset, OneLoc shows improvements over the second-best baseline (GNPR-SID) of 13.46% in Recall@5, 8.46% in Recall@10, 10.44% in Recall@20, 14.47% in NDCG@5, 12.81% in NDCG@10, and 12.97% in NDCG@20. These are substantial gains for an industrial-scale dataset.
- Similar, strong improvements are observed on the public Foursquare datasets (NYC and TKY), with average boosts of 13.18% in Recall@5 and 16.34% in NDCG@5.
- This consistent outperformance validates the effectiveness of OneLoc's integrated geographic generative architecture and its ability to combine user preferences with real-time spatial context.
Generative Models Outperform Traditional Models: A clear trend is that all generative methods (TIGER, GNPR-SID, and OneLoc) significantly outperform the traditional recommender models (SASRec, BERT4Rec, GRU4Rec, Caser, S3-Rec). The paper highlights an improvement of more than 29% in Recall@5 and over 45% in NDCG@5 for generative methods compared to traditional ones. This underscores the power of the generative paradigm, likely due to its comprehensive semantic expression and deep reasoning capabilities, especially when compared to representation learning and ANN retrieval-based traditional recall solutions.
Impact of Geo-Aware Generative Design: Among the generative models, OneLoc surpasses TIGER and GNPR-SID. GNPR-SID is also geo-aware and generative, but OneLoc's more comprehensive integration of geographic information (via geo-aware SIDs, geo-aware self-attention, and neighbor-aware prompts) and its multi-objective RL approach contribute to its superior performance.

6.2. Ablation Studies / Parameter Analysis (RQ2 & RQ3)

6.2.1. Ablation Study: Neighbor-aware Prompt

This study investigates the importance of the neighbor-aware prompt component. The following figure (Figure 4 from the original paper) shows the ablation study of different prompt techniques:

Figure 4: Ablation study of different prompt techniques. The result shows that Neighbor-aware prompt perform significantly better compared to Point-wise prompt and Neighbormlp prompt.
该图像是图表，展示了不同提示技术在召回率（Recall）和归一化折扣累计增益（NDCG）指标下的消融实验结果。图中显示，Neighbor-aware prompt在各个评价指标上均显著优于Point-wise prompt和Neighbor-MLP prompt。

Figure 4: Ablation study of different prompt techniques. The result shows that Neighbor-aware prompt perform significantly better compared to Point-wise prompt and Neighbormlp prompt.

Observations:

Effectiveness of Surrounding Context: When the neighbor-aware prompt is replaced with a point-wise prompt (which only uses the user's current location context, ignoring surrounding context), there's a performance decline across all Recall and NDCG metrics. This confirms that incorporating surrounding geographical context (e.g., surrounding brands, products) is beneficial for recommendation quality.
Necessity of Cross-Attention: Replacing the cross-attention mechanism in the neighbor-aware prompt with a simpler MLP (Multi-Layer Perceptron), termed neighbor-mlp prompt, leads to a sharp performance drop. This indicates that while surrounding context is useful, a sophisticated mechanism like cross-attention is essential to effectively aggregate and filter information from this context. A simple MLP might struggle to discern useful signals from potential noise in aggregated surrounding embeddings.

6.2.2. Ablation Study: Geo-aware Self-attention

This study examines the contributions of the key designs within the geo-aware self-attention module. The following are the results from Table 3 of the original paper:

Method	Recall			NDCG
Method	@5	@10	@20	@5	@10	@20
Full Model	0.3565	0.4563	0.5584	0.2032	0.2114	0.2151
w/o Location Scores	0.3476	0.4439	0.5229	0.1758	0.1847	0.1884
w/o Location Gate	0.3501	0.4489	0.5295	0.1810	0.1914	0.1950
w/o Geo-aware Self-attention	0.3315	0.4261	0.4989	0.1552	0.1640	0.1673

Observations:

Importance of Location Scores: Removing the location context similarity term ( $E^{lc}(E^{lc})^T$ ) from the attention score calculation (w/o Location Scores) leads to a notable performance decrease across all Recall and NDCG metrics (e.g., NDCG@5 drops from 0.2032 to 0.1758). This demonstrates the effectiveness of explicitly incorporating geographical proximity into how the model weighs historical interactions.
Role of Location Gate: Disabling the location gate (which scales attention output based on user's real-time location context) (w/o Location Gate) also results in performance drops (e.g., NDCG@5 drops to 0.1810). This highlights the importance of dynamically adjusting the relevance of historical behaviors based on the user's current geographical context.
Overall Impact of Geo-aware Self-attention: Replacing the entire geo-aware self-attention module with a vanilla self-attention (w/o Geo-aware Self-attention) leads to the most significant performance degradation (e.g., NDCG@5 drops to 0.1552). This combined effect underscores the critical role of the GA-Attn module in capturing relevant user behavior patterns with deep geographic integration.

6.2.3. Ablation Study: Reward Signals

This study evaluates the impact of the specifically designed reward functions in the reinforcement learning phase. The following figure (Figure 5 from the original paper) shows the comparative analysis of Recall, NDCG, and GMV metrics across varied reward signals:

Figure 5: Conduct a comparative analysis of Recall, NDCG, and GMV metrics across varied reward signals.
该图像是图表，展示了基于不同奖励信号（全模型、无GMV奖励、无地理奖励）下的Recall、NDCG和GMV指标对比分析，横轴分别为不同模型设置，纵轴对应各指标数值变化。

Figure 5: Conduct a comparative analysis of Recall, NDCG, and GMV metrics across varied reward signals.

Observations:

Geographic Reward: Removing the geographic reward (w/o geographic reward) results in a decrease in Recall and NDCG. This suggests that the geographic reward successfully forces the model to prioritize distance as a factor, even if it might slightly diverge from pure user preference, leading to a boost in the overall relevance (as reflected in Recall/NDCG which implicitly capture user satisfaction/relevance). This is crucial for local services where proximity is paramount.
GMV Reward: Removing the GMV reward (w/o GMV reward) leads to a decrease not only in the GMV objective (as expected) but also in the Recall objective. This indicates that the GMV reward is effective not only in directly optimizing business value but also in indirectly improving the overall quality of recommendations, as higher-value items are often also highly relevant to users.

6.2.4. Hyperparameter Experiments (RQ3)

This section explores how different hyperparameters affect OneLoc's performance. The following figure (Figure 6 from the original paper) shows the impact of model parameters, sequence length, and DPO loss weight $\lambda$ on Recall and NDCG metrics:

Figure 6: Hyperparameters include model parameters, sequence length and DPO loss weight λ.
该图像是图表，展示了模型参数规模、序列长度及强化学习损失权重λ对Recall和NDCG指标在不同@k值上的影响。

Figure 6: Hyperparameters include model parameters, sequence length and DPO loss weight λ.

Observations:

Scaling Laws for Model Size and Sequence Length:
- Model Size: As the model size increases from 0.05B (billion parameters) to 0.1B and then to 0.3B, OneLoc consistently shows improved performance in both Recall and NDCG. Specifically, scaling from 0.05B to 0.3B yields an average improvement of 6.96% in Recall and 7.29% in NDCG. This demonstrates a clear scaling law, where larger models (within the tested range) lead to better recommendations.
- Sequence Length: Similarly, increasing the maximum sequence length of user interactions from 100 to 300 also results in average improvements of 13.02% in Recall and 51.24% in NDCG. This indicates that OneLoc effectively leverages longer historical behavioral sequences to capture richer user preferences and patterns.
Sensitivity of DPO Loss Weight ( $\lambda$ ):
- The DPO loss weight $\lambda$ is a sensitive hyperparameter. Setting $\lambda = 0.01$ performs significantly worse than $\lambda = 0.05$ .
- When $\lambda = 0.03$ , both Recall@10 and Recall@20 are significantly lower compared to $\lambda = 0.05$ . However, NDCG metrics show superior performance at $\lambda = 0.03$ . This presents a trade-off, where a slightly lower $\lambda$ might improve ranking quality (NDCG) but reduce the overall hit rate (Recall).
- After a comprehensive evaluation of these trade-offs, $\lambda = 0.05$ was ultimately chosen, suggesting it provides the best overall balance of objectives.

6.3. Online A/B Test

To validate the real-world impact of OneLoc, an online A/B test was conducted on Kuaishou's primary short-video recommendation scenario.

The following are the results from Table 4 of the original paper:

Online Metrics	OneLoc
GMV	+21.016%
Number of Orders	+17.891%
Number of Paying Users in Local Services	+18.585%
New Paying Users in Local Services	+23.027%

Observations from Online A/B Test (Table 4):

Deployment Scale: OneLoc was deployed on 10% of Kuaishou's traffic, impacting a system serving over 400 million daily active users. The control group used the existing production-level multi-stage recommendation pipeline (including multi-channel retrieval, coarse/fine ranking, link-specific refinements).
Significant Business Impact: OneLoc achieved statistically significant improvements ( $p < 0.01$ under two-tailed t-tests at $\alpha = 0.05$ ) in crucial business metrics:
- GMV increased by $+21.016%$ .
- Number of Orders increased by $+17.891%$ .
- Number of Paying Users in Local Services increased by $+18.585%$ .
- New Paying Users in Local Services increased by $+23.027%$ .
Superiority over Complex Cascade Systems: These results unequivocally demonstrate that OneLoc, despite being an end-to-end generative model, can surpass the performance of sophisticated, hand-tuned cascade systems in a real-world, high-traffic industrial environment. The substantial gains in New Paying Users are particularly impressive, indicating OneLoc's effectiveness in addressing cold-start and data sparsity challenges often associated with local commerce.

The successful online deployment and the significant, positive impact on core business metrics solidify OneLoc's position as a highly effective and practically valuable solution for local life service recommendation.

The following figure (Figure 3 from the original paper) shows the framework of system deployment:

Figure 3: Framework of System Deployment
该图像是图3，系统部署框架示意图，展示了OneLoc在线系统和离线系统的交互流程，包括请求、推荐、日志收集、参数更新和奖励系统反馈等核心模块及其关系。

Figure 3: Framework of System Deployment

System Deployment Architecture (Figure 3): The deployed system involves several components:

Trainer: This component handles the offline training. It collects user logs for streaming training, interacts with the Reward System to score items and construct positive/negative sample pairs for RL training, and periodically updates parameters to the Inference Server.
OneLoc Inference Server: This server processes user requests. It converts user features and real-time geographic locations into user tokens and geo prompts. These are then fed into the OneLoc model to generate semantic tokens via beam search.
Video Mapping Server: This acts as a storage service that maps the generated semantic tokens back to actual video IDs.
Reward System: This system plays a dual role. During training, it provides rewards to the Trainer. During inference, it processes the candidate videos from the Video Mapping Server by estimating GMV scores and applying rule-based filtering. The TopK results are then recommended to users.

This architecture illustrates a typical industrial deployment of a large-scale recommendation system, where offline training, online inference, and dedicated services for item mapping and reward calculation interact to deliver recommendations.

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper introduces OneLoc, a novel end-to-end generative recommender system specifically tailored for the short-video local life service scenario within the Kuaishou App. OneLoc addresses the dual challenges of comprehensively integrating geographic information and balancing multiple business objectives.

Its key innovations include:

A geo-aware tokenizer that generates semantic IDs combining video content and geographic semantics.
A geo-aware self-attention mechanism in the encoder, which models user behavioral sequences by considering both content and location similarity, as well as the user's real-time location.
A neighbor-aware prompt in the decoder, leveraging cross-attention to incorporate rich contextual information from the user's surrounding geographical area.
A reinforcement learning framework that uses two specialized reward functions – geographic reward and GMV reward – to explicitly optimize for proximity and business value, alongside user interests.

Extensive offline experiments demonstrated OneLoc's superior performance over traditional and existing generative recommendation baselines on both industrial and public datasets. Crucially, OneLoc has been successfully deployed in Kuaishou's production environment, serving 400 million daily active users and achieving significant real-world improvements: a 21.016% increase in GMV and a 17.891% increase in order numbers. This industrial success validates the model's effectiveness and robustness in a complex, high-traffic application.

7.2. Limitations & Future Work

The authors acknowledge several promising directions for future research:

Scaling Laws: Further exploration into the scaling laws of model size and sequence length in this specific domain could yield deeper insights into how performance continues to improve with larger models and more extensive historical data.
Advanced RL Methods: Investigating more advanced reinforcement learning methods specifically adapted to the nuances of local life services could lead to even finer-grained optimization of diverse objectives. This might involve more complex reward shaping, multi-agent RL, or dynamic weighting of rewards.

7.3. Personal Insights & Critique

OneLoc presents a compelling and thoroughly validated solution to a critical problem in the realm of local life service recommendation. Its strength lies in the comprehensive and multi-faceted integration of geographical information into a generative recommendation framework, explicitly addressing the unique requirements of this domain.

Inspirations and Applications:

Deep Geo-Integration: The three-pronged approach to injecting geo-awareness (tokenization, encoder, decoder) is a significant architectural contribution. This methodology could be highly transferable to other location-sensitive recommendation tasks, such as travel recommendations, real estate, event discovery, or even supply chain optimization, where spatial context is paramount.
Multi-Objective RL for Real-World Impact: The successful application of DPO with custom geographic and GMV rewards highlights a practical pathway for aligning generative models with complex, often conflicting, business and user-centric objectives. This approach could inspire similar RL strategies in other industrial LLM-based applications where simple next token prediction is insufficient.
Generative Paradigm Validation: The impressive online A/B test results provide strong empirical evidence for the superiority of the generative recommendation paradigm over traditional cascade systems, particularly in complex, high-stakes environments. This reinforces the broader trend towards LLM-inspired generative architectures in recommendation.

Potential Issues, Unverified Assumptions, or Areas for Improvement:

Proprietary Nature of GMV Scoring Model: The GMV reward relies on a separate GMV scoring model. The details of this model are not provided, and its effectiveness is assumed. In a purely academic context, the design and robustness of this GMV model would warrant further scrutiny or open-sourcing for reproducibility.
Dependency on High-Quality Geographic Context: The effectiveness of geo-aware semantic IDs, geo-aware self-attention, and neighbor-aware prompts heavily relies on the quality and richness of the underlying location context information (e.g., brand, category data within GeoHash blocks). If this context is sparse or inaccurate, the benefits of these modules might diminish.
Scalability for Extremely Sparse Areas: While OneLoc handles cold-start users well, the neighbor-aware prompt might be less effective in extremely rural or data-sparse geographical areas where surrounding context information is minimal.
Ethical Considerations: The use of real-time user location data and extensive behavioral sequences, while crucial for effectiveness, raises privacy and ethical considerations. The paper, like many industry-led ones, doesn't delve into these aspects, but they are important for broader discussions.
Preprint Status: The "Published at (UTC): 2025-08-20T11:57:48.000Z" and "Conference acronym 'XX" indicates that this is an arXiv preprint. While the results are strong, they are not yet peer-reviewed and officially published in a top-tier conference or journal. This means there's a possibility of future revisions or further scrutiny from the academic community.

Overall, OneLoc is a significant step forward for local life service recommendation, bridging the gap between cutting-edge generative models and the complex, multi-objective demands of real-world industrial applications. Its rigorous design and validated impact are commendable.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.

OneLoc: Geo-Aware Generative Recommender Systems for Local Life Service

TL;DR Summary

Abstract

Mind Map

In-depth Reading

English Analysis~34 min read · 47,395 chars

1. Bibliographic Information

1.1. Title

1.2. Authors

1.3. Journal/Conference

1.4. Publication Year

1.5. Abstract

1.6. Original Source Link

2. Executive Summary

2.1. Background & Motivation

2.2. Main Contributions / Findings

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

3.2. Previous Works

3.3. Technological Evolution

3.4. Differentiation Analysis

4. Methodology

4.1. Principles

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Problem Formulation

4.2.2. Geo-aware Semantic IDs

4.2.3. Encoder

4.2.3.1. Multi-behavior Sequence

4.2.3.2. Encoder Architecture

4.2.3.3. Geo-aware Self-attention (GA-Attn)

4.2.4. Decoder

4.2.4.1. Neighbor-aware Prompt

4.2.4.2. Decoder Architecture

4.2.4.3. Next Token Prediction (NTP) Loss

4.2.5. Reinforcement Learning

4.2.5.1. Reward Signals

4.2.5.2. Direct Preference Optimization (DPO)

5. Experimental Setup

5.1. Datasets

5.2. Evaluation Metrics

5.3. Baselines

5.4. Implement Details

6. Results & Analysis

6.1. Core Results Analysis (RQ1: Overall Performance)

6.2. Ablation Studies / Parameter Analysis (RQ2 & RQ3)

6.2.1. Ablation Study: Neighbor-aware Prompt

6.2.2. Ablation Study: Geo-aware Self-attention

6.2.3. Ablation Study: Reward Signals

6.2.4. Hyperparameter Experiments (RQ3)

6.3. Online A/B Test

7. Conclusion & Reflections

7.1. Conclusion Summary

7.2. Limitations & Future Work

7.3. Personal Insights & Critique

Similar papers

4.2.3.3. Geo-aware Self-attention (`GA-Attn`)

4.2.5.2. Direct Preference Optimization (`DPO`)