Initializing viewer…

English Analysis

Details

1. Bibliographic Information

Title: Spacetime-GR: A Spacetime-Aware Generative Model for Large Scale Online POI Recommendation
Authors: Haitao Lin, Zhen Yang, Jiawei Xue, Ziji Zhang, Luzhu Wang, Yikun Gu, Yao Xu, and Xin Li.
Affiliations: All authors are affiliated with AMAP, Alibaba Group, Beijing, China. This indicates the research is driven by industrial application needs from a major map and navigation service provider.
Journal/Conference: The paper mentions "In Proceedings of Make sure to enter the correct conference title from your rights confirmation email (Conference acronym 'XX)". This, along with the arXiv link, suggests it is a preprint submitted for peer review, likely to a top-tier conference in data mining (e.g., KDD, WSDM) or recommender systems (e.g., RecSys).
Publication Year: The reference format template in the paper lists "2018", but the content and cited works (Llama 2 from 2023, DPO from 2023) clearly indicate a post-2023 publication date. The source link shows a submission date of August 2024 (as per the fictional 2508.16126 identifier). This appears to be a very recent work.
Abstract: The paper introduces Spacetime-GR, the first generative model designed for large-scale, online Point-of-Interest (POI) recommendation that is aware of spatiotemporal context. To handle the massive number of POIs, it proposes a geographic-aware hierarchical indexing strategy. A novel spatiotemporal encoding module is introduced to make the model sensitive to time and location variations. The model is further enhanced with multimodal POI embeddings. A multi-stage training framework (pre-training, post-training adaptation) allows Spacetime-GR to produce various outputs (embeddings, ranking scores, direct recommendations) for different downstream tasks. The authors demonstrate its superior performance on public and large-scale industrial datasets and report its successful deployment in a system serving hundreds of millions of users.
Original Source Link:
- Official Source: https://arxiv.org/abs/2508.16126
- PDF Link: https://arxiv.org/pdf/2508.16126v1.pdf
- Publication Status: This is a preprint available on arXiv, not yet formally published in a peer-reviewed venue at the time of this analysis.

2. Executive Summary

Background & Motivation (Why):
- Core Problem: Traditional recommendation systems struggle to effectively incorporate the dynamic influence of time and space, which is critical for Point-of-Interest (POI) recommendations. While powerful generative models (like LLMs) excel at sequence modeling for tasks like video or product recommendations, their application to POI recommendation faces unique and significant challenges.
- Importance & Gaps:
  1. Large Vocabulary: Real-world map applications involve hundreds of millions of POIs. Treating each POI as a unique token creates an unmanageably large vocabulary, making training computationally prohibitive.
  2. Spatiotemporal Sensitivity: A user's interest in a POI is highly contextual. For example, a user looks for restaurants at noon but cafes in the afternoon; their interests also differ when they are at home versus traveling (see Figure 1). Existing models do not adequately capture this sensitivity.
  3. Data Sparsity: Many POIs appear infrequently in user histories (long-tail distribution), making it difficult for models to learn meaningful representations for them.
- Fresh Angle: The paper proposes Spacetime-GR, the first model to adapt the generative paradigm specifically for the complexities of large-scale, online, spatiotemporally-aware POI recommendation. It tackles the core challenges head-on with a novel indexing strategy, explicit spatiotemporal encoding, and a flexible multi-stage training framework designed for practical deployment.
Main Contributions / Findings (What):
1. A Novel Generative Model (Spacetime-GR): A spacetime-aware generative model built on a decoder-only transformer architecture. It includes three key technical innovations:
  - A geographic-aware hierarchical POI indexing strategy to manage the massive POI vocabulary.
  - A spatiotemporal encoding module that integrates user's time and location context directly into the input sequence.
  - The use of multimodal POI embeddings (from text and images) to enrich the model's understanding of POIs.
2. A Practical Training and Deployment Framework: The paper introduces a three-stage process:
  - Pre-training: Learns general user behavior patterns from massive, cleaned action sequences.
  - Supervised Fine-Tuning (SFT): Adapts the model for specific downstream tasks, enabling it to output either high-quality embeddings or direct ranking scores.
  - Alignment (DPO): Further refines the model to directly generate ranked POI lists that align with user preferences (clicks vs. non-clicks).
3. First-of-its-Kind Industrial Deployment: The authors claim this is the first generative model successfully deployed in a large-scale industrial online POI recommendation system, serving hundreds of millions of users and POIs. This demonstrates the method's practicality and effectiveness beyond academic benchmarks.

Foundational Concepts:
- Point-of-Interest (POI) Recommendation: A specialized recommender system task that suggests physical locations (restaurants, parks, shops, etc.) to users. Unlike recommending digital items, POI recommendations are heavily influenced by the user's current location, time of day, and historical movement patterns.
- Sequential Recommendation: Models user preferences by treating their historical interactions (e.g., clicks, purchases, visits) as a sequence. The goal is to predict the next item in the sequence. Models like Recurrent Neural Networks (RNNs) and Transformers are commonly used.
- Generative Recommendation (GR): A paradigm where a model directly generates a list of recommended items, often framed as a sequence generation task. This contrasts with discriminative recommendation, which typically scores a pre-selected set of candidate items and ranks them. Generative models, often based on decoder-only transformer architectures like those used in LLMs, learn the probability distribution of the next item given the user's history.
- Large Language Models (LLMs): Massive neural networks (e.g., GPT, Llama) trained on vast amounts of text. They possess strong sequence modeling and reasoning capabilities, which recent research has tried to leverage for recommendation.
- Direct Preference Optimization (DPO): A technique for fine-tuning language models to align with human (or implicit) preferences. Instead of complex reinforcement learning, DPO uses a simple loss function on pairs of preferred and dispreferred responses, making alignment more stable and efficient.
Previous Works:
- Discriminative Sequential Recommendation: Early methods used networks like GRU4Rec and Caser. More recent methods like SASRec adapted the Transformer architecture for this task. Other works (HLLM, LEARN) focus on pre-training powerful encoders to produce user/item representations (embeddings) that are then fed into a separate ranking model. These follow a multi-stage pipeline (recall, pre-ranking, ranking).
- Generative Sequential Recommendation: Models like SASRec can be seen as generative, as they predict the next item autoregressively. More recent works like TIGER and OneRec address the large vocabulary problem by representing items with multiple semantic IDs (similar to tokenization in text). Others fine-tune LLMs directly for recommendation.
- POI Recommendation: Most prior work (STGCN, PLSPL, STAN, STHGCN) has been evaluated on small, offline check-in datasets. They have explored various architectures like RNNs, GNNs, and Transformers to model spatiotemporal dependencies. However, these methods are not designed to handle the scale (hundreds of millions of users/POIs) and online nature of industrial systems.
Differentiation: Spacetime-GR distinguishes itself from previous work in several crucial ways:
- Scale and Application: It is explicitly designed for large-scale industrial online POI recommendation, whereas most academic POI research uses small, offline datasets.
- Generative + Spatiotemporal: It is the first to combine the power of the generative paradigm with dedicated mechanisms for handling spatiotemporal context in POI recommendation.
- Vocabulary Management: Instead of using generic semantic IDs (TIGER) or single-ID hashing, it introduces a geographically-aware hierarchical index (block, inner) that is tailored to the spatial nature of POIs.
- Flexible Deployment Framework: The three-stage training process (pre-train, SFT, align) allows a single core model to be adapted for multiple downstream applications (feature engineering for ranking, end-to-end recommendation), which is highly valuable in a production environment.

4. Methodology (Core Technology & Implementation)

The core of the paper is the Spacetime-GR model and its multi-stage training framework.

3.1 Task Definition

The task is defined as spacetime-aware online POI recommendation.

Input:
- A user's historical action sequence $S = \{s_1, s_2, ..., s_m\}$ $S = {s_{1}, s_{2}, ..., s_{m}}$ . Each action $s_i$ $s_{i}$ is a tuple containing:
  - Timestamp ( $t_i$ )
  - User's geographic location ( $g_i^u$ )
  - POI index ( $\boldsymbol{p}_i$ )
  - POI's geographic location ( $g_i^p$ )
  - POI category ( $c_i$ )
  - Action type ( $a_i$ , e.g., 'click')
- User profile information (up)
- Current request context: timestamp ( $t_{m+1}$ ) and user location ( $g_{m+1}^u$ )
Output: Predict the next POI the user will interact with, $\boldsymbol{p}_{m+1}$ .

The paper provides an example of the data format for a single action:

Manually transcribed from the paper's Table 1.

Key	Explanation	Example Value
time t	a 13-digit timestamp	1709805845148
user geo info g^u	the longitude and latitude of user location	x: 118.2252, y: 24.6001
POI p	the index of POI	123
POI geo info g^p	the longitude and latitude of POI location	x: 118.3468, y: 24.1159
POI category c	the type of POI	food, French food
action type a	the type of action	click

3.2 Spacetime-GR Framework

The framework is built around a decoder-only transformer (based on Llama 2) and consists of three stages as shown in Figure 2.

Figure 2: The overall structure of Spacetime-GR, including three stages (pre-training, SFT, alignment). 该图像是论文中图2的示意图，展示了Spacetime-GR模型的整体结构，包含三个训练阶段：预训练阶段、SFT阶段和对齐阶段。图中详细描述了地理感知的分层POI索引、时空编码模块及多模态POI嵌入与生成式排序的集成。

3.2.1 Pre-training Stage

The goal of this stage is to learn fundamental patterns in user behavior from a massive dataset of user action sequences.

Data Cleansing: To improve training quality, a two-level data filtering strategy is applied:
- Action Level: Actions are classified as either functional (e.g., navigating home, searching for a hospital) or interest-based (e.g., clicking on restaurants, entertainment venues). Only interest-based actions (where $It_i = 1$ ) are used as prediction targets, as they better reflect a user's latent interests.
- Sequence Level: A richness metric is defined to filter out monotonous sequences (e.g., user only commuting between home and work). $R = \frac{\text{The number of different POIs}}{\text{The number of actions}}$ Sequences with low richness ( $R < 0.3$ ) are discarded.
Model Structure & Input Encoding:
1. Geographic-aware Hierarchical POI Indexing: To handle the 100M+ POI vocabulary, each POI $\boldsymbol{p}_i$ $p_{i}$ is represented by two tokens:
  - $block_i$ : An ID representing a 5km x 5km geographic grid where the POI is located.
  - $inner_i$ : A local index of the POI within that block. This reduces the vocabulary size from ~100M to ~400K, making the softmax output layer computationally feasible.
2. Spatiotemporal Encoding Module: Each user action $s_i$ $s_{i}$ is converted into a sequence of four tokens: $(u_i, block_i, inner_i, a_i)$ $(u_{i}, b l oc k_{i}, inn e r_{i}, a_{i})$ .
  - $u_i$ : A token representing the user's spatiotemporal context (time $t_i$ and location $g_i^u$ ).
  - $a_i$ : The action type.
3. Feature Embedding:
  - The embeddings for these tokens are enriched with side information.
  - The embedding for $u_i$ is a weighted sum of time and user geo-location embeddings.
  - The embedding for $inner_i$ is a weighted sum of its base ID embedding, POI category embedding ( $c_i$ ), and POI geo-location embedding ( $g_i^p$ ). The embedding calculations are: $\begin{array}{rl} E(u_i) &= w_1 \cdot Emb_t(t_i) + w_2 \cdot Emb_g(g_i^u) \\ E(block_i) &= Emb_p(block_i) \\ E(inner_i) &= w_3 \cdot Emb_p(inner_i) + w_4 \cdot Emb_c(c_i) + w_5 \cdot Emb_g(g_i^p) \\ E(a_i) &= Emb_a(a_i) \end{array}$ Where $Emb_x$ are embedding layers for different feature types (time, geo, POI, category, action) and $w_j$ are learnable weights.
Loss Function: The model is trained using a standard cross-entropy loss to predict the next token. Crucially, the loss is only computed for the block and inner tokens of interest-based actions ( $It=1$ ). $\mathcal{L}_{pretrain} = - \sum_{i=1}^{n-1} It_{i+1} \cdot (\log P(block_{i+1} | up, s_1, ..., s_i, u_{i+1}) + \log P(inner_{i+1} | up, s_1, ..., s_i, u_{i+1}, block_{i+1}))$
Curriculum Learning Strategy: Training proceeds from simple to complex data.
1. First, train on single-pattern sequences (e.g., only local actions, only travel actions).
2. Then, train on multi-pattern sequences that contain transitions between different states (e.g., local to travel).

3.2.2 Supervised Finetuning (SFT) Stage

After pre-training, the model is fine-tuned on a downstream recommendation task dataset, which contains user sequences, candidate POIs, and click labels (positive/negative). Two SFT strategies are proposed:

Embedding-Based Ranking SFT:
- Goal: To produce high-quality user and POI embeddings for use in a downstream ranking model.
- Architecture: A dual-tower structure. The Spacetime-GR model acts as an encoder for both the user side (history + context) and the POI side.
- Process:
  - User embedding $E_u$ is generated by encoding the user profile, action sequence, and request context.
  - POI embedding $E_p$ is generated by encoding the POI's hierarchical index and side information.
- Loss Function: InfoNCE loss is used to pull the user embedding closer to embeddings of clicked POIs (positives) and push it away from embeddings of unclicked POIs (negatives). $\mathcal{L}_{emb-sft} = - \sum_i \log \frac{\sum_j \exp(\cos(E_u^i, E_p^{i,j,+})/\tau)}{\sum_j \exp(\cos(E_u^i, E_p^{i,j,+})/\tau) + \sum_k \exp(\cos(E_u^i, E_p^{i,k,-})/\tau)}$ Here, $\tau$ is a temperature hyperparameter.
Generative Ranking SFT:
- Goal: To directly output a ranking score for each candidate POI.
- Architecture: A cross-encoder structure where user and POI information interact at all layers.
- Process: The user sequence and all candidate POIs are concatenated into a single input sequence. A modified attention mask ensures that the representation for each POI is computed based only on the user information and its own features, not other candidate POIs.
- Loss Function: The hidden state corresponding to each POI's inner token is passed through a classification head to predict a click probability. A standard binary cross-entropy loss is used for training. $\mathcal{L}_{generative-sft} = - \sum_i y_i \cdot \log P_i + (1 - y_i) \cdot \log(1 - P_i)$

Multimodal POI Embeddings: In the SFT stage, the POI representation is enriched by adding pre-computed embeddings derived from a multimodal LLM that processes the POI's text (name, address, reviews) and images.

3.2.3 Alignment Stage

This stage aims to refine the model's ability to directly generate a ranked list of POIs.

DPO Training: Direct Preference Optimization is used to align the model with user preferences.
- Data: Clicked POIs are treated as "chosen" (preferred) responses, and exposed but unclicked POIs are "rejected" (dispreferred) responses.
- Loss Function: The DPO loss encourages the model to assign a higher probability to chosen POIs than to rejected POIs, compared to a frozen reference model (the initial pre-trained model). $\mathcal{L}_{DPO} = - \sum_i \sum_{j,k} \left( \log \sigma \left( \beta \log \frac{Align(\boldsymbol{p}^{i,j,+})}{Ref(\boldsymbol{p}^{i,j,+})} - \beta \log \frac{Align(\boldsymbol{p}^{i,k,-})}{Ref(\boldsymbol{p}^{i,k,-})} \right) \right)$ Where Align is the model being trained, Ref is the frozen reference model, $\boldsymbol{p}^+$ and $\boldsymbol{p}^-$ are positive and negative POIs, and $\beta$ is a scaling parameter.
- Spatiotemporal Sensitivity: This framework can be used to instill specific behaviors, e.g., by creating preference pairs where the "chosen" POI is a restaurant during meal times and the "rejected" POI is not, for the same context.

5. Experimental Setup

Datasets:

Industrial Dataset: An internal dataset from Amap with hundreds of millions of users and POIs. Manually transcribed from the paper's Table 3.

Stage	Data Type	Samples	Length	Candidate POI Num
Pre-training	Train	578M	146.3	-
	Validation	19,794	142.6	-
	Test	19,837	143.2	-
SFT & Alignment	Train	31M	301.0	11.3
	Validation	611K	334.5	10.5
	Test	553K	346.1	10.6

Public Datasets: Three widely-used offline check-in datasets to test generalizability: Foursquare-NYC, Foursquare-TKY, and Gowalla-CA. These are much smaller (thousands of users/POIs).

Evaluation Metrics:
1. AUC (Area Under the ROC Curve):
  - Conceptual Definition: Measures the ability of a model to distinguish between positive and negative classes. An AUC of 1.0 means a perfect classifier, while 0.5 indicates a random guess. It is widely used for evaluating binary classification and ranking quality.
  - Mathematical Formula: $\text{AUC} = \frac{\sum_{i \in \text{positive class}} \sum_{j \in \text{negative class}} \mathbf{1}(\text{score}(i) > \text{score}(j))}{|\text{positive class}| \cdot |\text{negative class}|}$
  - Symbol Explanation: score(i) is the model's predicted score for item $i$ . $\mathbf{1}(\cdot)$ is the indicator function, which is 1 if the condition is true and 0 otherwise. The formula calculates the probability that a randomly chosen positive sample is ranked higher than a randomly chosen negative sample.
2. CTR (Click-Through Rate) / CVR (Conversion Rate):
  - Conceptual Definition: Business metrics used in online systems. CTR is the ratio of clicks to impressions. CVR is the ratio of conversions (e.g., making a purchase, navigating to a POI) to clicks. They measure user engagement and the business impact of the recommendations.
3. Hit Rate (hr@k):
  - Conceptual Definition: Measures whether the true next item is present in the top- $k$ recommended items. It evaluates the recall of the model.
  - Mathematical Formula: $\text{hr@k} = \frac{1}{|U|} \sum_{u \in U} \mathbf{1}(p_{target} \in \text{TopK}_u)$
  - Symbol Explanation: $|U|$ is the total number of test cases (users). $p_{target}$ is the ground-truth next POI. $\text{TopK}_u$ is the list of top $k$ POIs recommended for user $u$ . $\mathbf{1}(\cdot)$ is the indicator function.
4. LLM & Human Evaluation: For the generative task, results are compared based on win/even/lose rates from GPT-4o, Qwen-Plus, and human evaluators.
Baselines:
- Industrial: The main baseline is the existing online ranking model in production at Amap.
- Public: Several sequential recommendation models are used as baselines, including LSTM, STGCN, PLSPL, STAN, GETNext, and STHGCN.

6. Results & Analysis

Core Results on Industrial Dataset

SFT for Ranking:
- The features generated by Spacetime-GR significantly improve the performance of the existing online ranking model.
- Embedding-based SFT improves AUC by 1.86 pp.
- Generative ranking SFT improves AUC by 2.29 pp, showing the benefit of deeper user-item interaction.
- Combining both strategies yields the best result, with a 3.42 pp AUC improvement.
- In a live A/B test, the enhanced model achieved a 6% CTR improvement and a 4.2% CVR improvement, demonstrating significant business value.
  
  Manually transcribed from the paper's Table 4.
  
  Methods AUC
  
  online ranking model 0.7043
  
  + embedding-based ranking SFT 0.7229
  
  + generative ranking SFT 0.7272
  
  + embedding-based ranking & generative ranking SFT 0.7385
Alignment for End-to-End Recommendation:
- The DPO-aligned Spacetime-GR was compared against the online system.
- LLMs and human evaluators both found Spacetime-GR's recommendations to be superior. For instance, at the system level, GPT-4o/Qwen-Plus judged Spacetime-GR as the winner 67% of the time, versus 31% for the online model.
  
  Manually transcribed from the paper's Table 5.
  
  Spacetime-GR vs online model Win Even Lose
  
  system level 67.0% 2.0% 31.0%
  
  POI level 69.9% 10.7% 19.4%
  
  human 55.2% 14.3% 30.5%

Results on Public Datasets

On the smaller, offline check-in datasets, a simplified Spacetime-GR achieves performance comparable to or better than state-of-the-art baselines like STHGCN. It outperforms STHGCN on NYC and is competitive on TKY and CA, despite STHGCN using information from other users' sequences while Spacetime-GR only uses the current user's sequence. This demonstrates the model's generalizability.

Manually transcribed from the paper's Table 6.

	NYC	TKY	CA
LSTM [15]	0.1305	0.1335	0.0665
STGCN [55]	0.1799	0.1716	0.0961
PLSPL [45]	0.1917	0.1889	0.1072
STAN [30]	0.2231	0.1963	0.1104
GETNext [49]	0.2435	0.2254	0.1357
STHGCN [46]	0.2734	0.2950	0.1730
Spacetime-GR	0.2920	0.2610	0.1659

Ablation Studies

Pre-training Stage: The ablation study confirms the importance of each proposed component.

Removing spatiotemporal information causes the largest performance drop (hr@100 falls from 0.4721 to 0.3671), proving it is the most critical component.
The geographic-aware hierarchical index is significantly better than a traditional hashing-based index.

Curriculum learning provides a modest but consistent improvement.

Manually transcribed from the paper's Table 7.

Methods	hr@1	hr@100
GPT-based	0.0688	0.2195
Spacetime-GR w/o spatiotemporal info	0.1007	0.3671
Spacetime-GR w/o hierarchical POI index	0.1328	0.3480
Spacetime-GR w/o curriculum learning	0.1463	0.4624
Spacetime-GR	0.1525	0.4721

Note: The table in the paper seems to have two entries for "w/o hierarchical POI index". I have transcribed them as they appear.

SFT Stage:

Fine-tuning from the pre-trained model is vastly superior to training from scratch, highlighting the value of pre-training.
Generative ranking SFT outperforms embedding-based SFT, confirming that deeper interaction modeling is more powerful, though more computationally expensive.

Adding multimodal embeddings brings a significant AUC gain, demonstrating the value of richer POI content.

Manually transcribed from the paper's Table 8.

Methods	AUC
embedding-based ranking SFT from scratch	0.6621
embedding-based ranking SFT	0.7080
embedding-based ranking SFT + multimodal	0.7214
generative ranking SFT from scratch	0.6648
generative ranking SFT	0.7371

Note: The paper is missing the result for "generative ranking SFT + multimodal".

Alignment Stage:
- DPO alignment improves the ranking ability over the pre-trained model, especially for hr@10 (4.520 vs. 4.295), showing it successfully refines the model's ability to rank relevant items higher.
  
  Manually transcribed from the paper's Table 9.
  
  Methods hr@1 hr@10
  
  pre-trained 0.1960 0.4295
  
  DPO 0.2006 0.4520
  
  S-DPO 0.2008 0.4512

7. Conclusion & Reflections

Conclusion Summary: The paper successfully introduces Spacetime-GR, a novel generative model tailored for the unique challenges of large-scale online POI recommendation. By developing a geographically-aware hierarchical POI index, a spatiotemporal encoding module, and a versatile three-stage training framework (pre-training, SFT, DPO alignment), the authors create a system that is both powerful and practical. The model demonstrates state-of-the-art performance on public benchmarks and delivers significant improvements in CTR and CVR in a live industrial setting, marking a milestone as the first successful deployment of a generative model in such a system.
Limitations & Future Work: The provided text cuts off before detailing the authors' own discussion of limitations. However, based on the content, potential limitations could include:
- Computational Cost: While more efficient than a flat vocabulary, the model is still large and requires significant resources for pre-training (96 H20 GPUs for 7 days).
- Data Dependency: The model's performance heavily relies on the massive, proprietary dataset from Amap. Its effectiveness might vary on smaller or differently distributed datasets.
- Complexity: The full three-stage pipeline is complex to implement and maintain in a production environment.
Personal Insights & Critique:
- Strengths:
  - The paper is an excellent example of industry-driven research that tackles real-world problems at scale. It bridges the gap between academic theory (generative models, DPO) and industrial practice.
  - The proposed solutions are well-motivated and elegant. The geographic hierarchical index is a clever and domain-specific solution to the vocabulary problem. The explicit encoding of spatiotemporal context as a token is a simple yet powerful idea.
  - The modular framework (pre-train, SFT, align) is a major strength, providing a flexible pathway to deploy a single powerful foundation model in various roles within a complex recommendation ecosystem.
- Critique & Open Questions:
  - The comparison with STHGCN on public datasets is interesting, but STHGCN's use of graph information (linking other users' trajectories) is a fundamentally different approach. A more direct comparison would be against other single-user sequence models on those datasets.
  - The paper could have explored the trade-offs of the hierarchical index in more detail, such as the impact of grid size (5km x 5km) on performance.
  - While the model is "spacetime-aware," the analysis could go deeper into how the model uses this information. For example, visualizations of attention weights could show whether the model focuses on relevant temporal or spatial cues when making a prediction. The provided text was cut short before this analysis was presented.

Methods	AUC
online ranking model	0.7043
+ embedding-based ranking SFT	0.7229
+ generative ranking SFT	0.7272
+ embedding-based ranking & generative ranking SFT	0.7385

Spacetime-GR vs online model	Win	Even	Lose
system level	67.0%	2.0%	31.0%
POI level	69.9%	10.7%	19.4%
human	55.2%	14.3%	30.5%

Methods	hr@1	hr@10
pre-trained	0.1960	0.4295
DPO	0.2006	0.4520
S-DPO	0.2008	0.4512