Paper status: completed

ITMPRec: Intention-based Targeted Multi-round Proactive Recommendation

Published:04/22/2025

Proactive Recommendation Systems (1)Intention-Based Recommendation Method (1)Multi-Round Recommendation Strategy (1)LLM-based User Feedback Simulation (1)Personalized Recommendation Optimization (2)

Original Link

Price: 0.100000

3 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

ITMPRec is a novel intention-based targeted multi-round proactive recommendation method that addresses passive acceptance in personalized systems by selecting target items through pre-matching, utilizing multi-round nudging, and simulating user feedback with an LLM agent, outperf

Abstract

Personalized recommendations are integrated into daily life, but providers may want certain items to become more appealing over time through user interactions, yet this issue is often overlooked. The existing works are often based on the assumption that users will passively accept all intermediate sequences or not explore intention modeling in the targeted nudging process. Both of these factors result in suboptimal performance in the proactive recommendation. In this paper, we propose a novel intention-based targeted multi-round proactive recommendation method, dubbed ITMPRec. We first select target items using a pre-match strategy. Then, we employ a multi-round nudging recommendation method, incorporating a module to quantify users’ intention-level evolution, helping choose suitable intermediate items. Additionally, we model users’ sensitivity to changes caused by these items. Lastly, we propose an LLM agent as a pluggable component to simulate user feedback, offering an alternative to traditional click models by leveraging the agent’s external knowledge and reasoning capabilities. Through extensive experiments on four public datasets, we demonstrate the superiority of ITMPRec compared to eight baseline models.

Mind Map

In-depth Reading

English Analysis~40 min read · 54,529 chars

1. Bibliographic Information

1.1. Title

The central topic of the paper is ITMPRec: Intention-based Targeted Multi-round Proactive Recommendation. This title suggests a novel approach to recommendation systems that focuses on proactively guiding users towards specific items over multiple interactions, taking user intentions into account.

1.2. Authors

Yahong Lian (College of Computer Science, TJ Key Lab of NDST, DISSec, TMCC, TBI Center, Nankai University, Tianjin, China)
Chunyao Song (College of Computer Science, TJ Key Lab of NDST, DISSec, TMCC, TBI Center, Nankai University, Tianjin, China)
Tingjian Ge (Department of Computer Science, University of Massachusetts Lowell, Lowell, MA, USA)

The authors are primarily affiliated with computer science departments and research centers, indicating a background in computational methods, data science, and artificial intelligence, which are highly relevant to recommendation systems.

1.3. Journal/Conference

The paper was published in the Proceedings of the ACM Web Conference 2025 (WWW '25). WWW (The Web Conference, formerly known as World Wide Web Conference) is a premier international academic conference on topics related to the World Wide Web. Its reputation is very high in the fields of information retrieval, web mining, and recommendation systems, making it a highly influential venue for research in these areas. Publication at WWW signifies that the research has undergone rigorous peer review and is considered to be of significant quality and impact.

1.4. Publication Year

2025 (This indicates it's a forthcoming publication or accepted paper for WWW '25).

1.5. Abstract

Personalized recommendation systems are ubiquitous in daily life. However, they often overlook the objective of content providers to make certain items more appealing over time through user interactions. Existing proactive recommendation methods typically assume users passively accept all intermediate recommendations or fail to model user intentions during the nudging process, leading to suboptimal performance.

To address these limitations, the paper proposes a novel method called ITMPRec (Intention-based Targeted Multi-round Proactive Recommendation). ITMPRec first employs a pre-match strategy to select target items. It then utilizes a multi-round nudging recommendation approach that includes a module to quantify the evolution of users' intentions, which aids in selecting appropriate intermediate items. Additionally, the model accounts for users' sensitivity to changes introduced by these intermediate items. Finally, ITMPRec introduces an LLM agent as a pluggable component to simulate user feedback, offering a sophisticated alternative to traditional click models by leveraging the agent's external knowledge and reasoning capabilities. Extensive experiments on four public datasets demonstrate that ITMPRec significantly outperforms eight baseline models.

1.6. Original Source Link

/files/papers/6911a6abc9b7d49a981aac07/paper.pdf (This link points to a local file, implying it's a preprint or a provided PDF for internal review.)

2. Executive Summary

2.1. Background & Motivation

The core problem the paper aims to solve is the limitation of traditional personalized recommendation systems, which primarily focus on predicting users' next preferences based on their historical behavior. While convenient, this user-centric approach can lead to several negative outcomes:

Filter Bubbles and Information Cocoons: By constantly reinforcing existing preferences, users can become confined to a narrow range of content, limiting exposure to diverse items and potentially harming both user experience and the content ecosystem.
Misalignment with Provider Goals: Content providers often have strategic objectives to promote specific items or categories, increase diversity, or guide users towards new experiences. Traditional systems do not inherently support these "nudging" or "proactive" goals.

This problem is important because it highlights a fundamental tension in recommendation systems: balancing user satisfaction with platform objectives and promoting content diversity. The existing solutions for proactive recommendation have significant gaps:
Random Target Item Selection: Previous approaches often assign target items randomly, which can lead to guiding users towards cold-start items (items with little to no historical interaction data) or items that are too scattered, making successful guidance difficult and potentially irrelevant to provider goals.
Neglect of User Intention: The role of a user's underlying intention (a coarse-grained aspect compared to preference) in the multi-round nudging process is largely ignored, which is crucial for effective and dynamic guidance.
Passive User Assumption: Many existing methods assume users will passively accept all intermediate recommendations, or they use simplistic, fixed thresholds for simulating user clicks. This unrealistic assumption fails to reflect real-world user behavior and leads to sub-optimal guidance paths.

The paper's innovative idea is to address these gaps by developing a sophisticated multi-round proactive recommendation framework that not only selects meaningful target items (often a category of items) but also dynamically adapts to users' evolving intentions and individual sensitivities, leveraging advanced LLM agents for more realistic user feedback simulation.

2.2. Main Contributions / Findings

The paper makes several primary contributions to the field of proactive recommendation:

Targeted Multi-Round Proactive Recommendation Paradigm: It proposes ITMPRec, a novel method that aims to guide users towards a class of target items (e.g., a specific category or genre) over multiple rounds, moving beyond the traditional single-item prediction. This approach is more aligned with content provider needs for focused promotion and can encourage content diversity.
Pre-match Module for Target Item Selection: ITMPRec introduces a pre-match module that collects all users' opinions to generate a sensible set of candidate target items within a specific category. This addresses the limitation of random target assignment, ensuring selected targets are more relevant and avoid cold-start issues.
Intention-Induced Scores: The paper devises a mechanism to incorporate intention-induced scores into the recommendation process. By modeling users' intention-level evolution, it helps in choosing more suitable intermediate items that align with the user's changing coarse-grained goals during the nudging process, which was previously overlooked.
Targeted Individual Arousal Coefficients (TIAC): Recognizing that users respond differently to new content, ITMPRec introduces TIAC. This module quantifies each user's unique sensitivity or receptivity to changes caused by intermediate recommendations, enabling more personalized and effective nudging.
LLM Agent for User Feedback Simulation: A novel aspect is the integration of an LLM agent as a pluggable component to simulate user click feedback on intermediate recommendations. This offers a more sophisticated and realistic alternative to traditional distribution-based click models, leveraging the LLM's external knowledge and reasoning capabilities to mimic complex user decision-making, thus better aligning the simulation with real-world scenarios.
Empirical Superiority: Through extensive experiments on four real-world datasets, ITMPRec demonstrates significant superiority over eight state-of-the-art baseline models (including both sequential and proactive approaches). It achieves an average increase of 36.47% in IoI@20 and 68.80% in IoR@20 (metrics for proactive recommendation quality), validating its effectiveness.

These findings collectively address the shortcomings of existing methods by providing a more holistic, intelligent, and realistic framework for proactive recommendation, benefiting both users (through expanded interests) and content providers (through targeted promotion).

3.1. Foundational Concepts

To understand ITMPRec, a reader should be familiar with several core concepts in recommendation systems and machine learning:

Personalized Recommendation Systems: These systems aim to predict user preferences for items (products, movies, music, etc.) and recommend relevant ones. They are based on user-item interaction data (e.g., clicks, purchases, ratings).
Sequential Recommendation (SR): A sub-field of recommendation systems that focuses on modeling the chronological order of user interactions. Instead of just predicting static preferences, SR systems try to predict the next item a user will interact with, given their historical sequence of interactions. This often involves capturing sequential patterns and short-term preferences.
- User/Item Embeddings: In recommendation systems, embeddings are low-dimensional, dense vector representations of users and items. These vectors are learned from interaction data and capture the latent features and relationships between users and items. For example, similar items would have embedding vectors that are close to each other in the embedding space.
- Dot Product for Similarity: A common way to measure the similarity or interaction tendency between a user embedding $e_u$ and an item embedding $e_i$ is to compute their dot product, i.e., $e_u^T \cdot e_i$ . A higher dot product typically indicates higher predicted relevance or preference.
Filter Bubble and Information Cocoon: These are phenomena where users are exposed only to information that confirms their existing beliefs or preferences, due to algorithms that personalize content.
- A filter bubble is created by personalized content filters that selectively guess what information a user would like to see.
- An information cocoon is a state where individuals are isolated from information that contradicts their beliefs, often resulting from their own choices and algorithms.
Proactive Recommendation: A paradigm that goes beyond passively predicting what a user will like. Instead, it actively tries to guide or nudge user preferences towards certain target items or categories, often over multiple rounds of interaction. This can be for purposes like promoting diversity, introducing new content, or achieving specific business goals.
Large Language Models (LLMs): These are advanced AI models trained on vast amounts of text data, capable of understanding, generating, and reasoning with human language. They possess external knowledge (information learned during pre-training) and reasoning capabilities (ability to infer, deduce, and plan based on instructions).
Click Models: In recommendation research, click models are used to simulate user interactions (e.g., whether a user clicks on a recommended item). They estimate the probability of a user clicking on an item given certain features or conditions.
- Bernoulli Distribution: A discrete probability distribution that describes the probability of an event happening (success) or not happening (failure) in a single trial. In click models, it can be used to model whether a user clicks (1) or doesn't click (0) an item.
- Sigmoid Function ( $\sigma(x)$ ): A mathematical function that maps any real-valued number to a value between 0 and 1. It is often used to convert a raw score into a probability. Its formula is $\sigma(x) = \frac{1}{1 + e^{-x}}$ .
Contrastive Learning: A machine learning paradigm where the model learns by contrasting positive pairs (similar items/representations) with negative pairs (dissimilar items/representations). The goal is to bring positive pairs closer in the embedding space while pushing negative pairs farther apart.
- InfoNCE Loss: A common loss function used in contrastive learning. It encourages the model to distinguish a positive sample from a set of negative samples.
BPR Loss (Bayesian Personalized Ranking Loss): A widely used ranking loss function in recommendation systems. It optimizes the model to rank positive items higher than negative (uninteracted) items for a given user.
- The formula for BPR loss is typically given as: $\mathcal{L}_{BPR} = - \sum_{u=1}^U \sum_{i \in R_u} \sum_{j \in I \setminus R_u} \ln \sigma(\hat{x}_{ui} - \hat{x}_{uj})$ , where $R_u$ is the set of items user $u$ interacted with, $I \setminus R_u$ is the set of items user $u$ did not interact with, and $\hat{x}_{ui}$ is the predicted score of item $i$ for user $u$ .
Hyperparameters: Parameters whose values are set before the learning process begins (e.g., learning rate, embedding dimension, number of intentions). They control the learning algorithm itself.

3.2. Previous Works

The paper categorizes related work into Sequential Recommendation (SR) and Proactive Recommendation (ProactRec).

3.2.1. Sequential Recommendation (SR) Methods

SR methods aim to predict the next item a user will interact with based on their historical sequence. They typically focus on modeling chronological behaviors and capturing short-term user interests.

SASRec [18]: A classical sequential recommendation method that uses a self-attention framework. It models item transitions by allowing items in a sequence to "attend" to each other, capturing long-range dependencies effectively.
- Background: Self-attention is a mechanism that allows a model to weigh the importance of different parts of an input sequence when processing a specific element. It's a core component of Transformers. The key idea is to compute Query, Key, and Value matrices from the input embeddings. The attention score is computed as a scaled dot product of Query and Key, followed by a softmax to get weights, which are then applied to Value to get the weighted sum.
- Formula for Self-Attention: $ \mathrm{Attention}(Q, K, V) = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V $ Where:
  - $Q$ (Query), $K$ (Key), $V$ (Value) are matrices derived from the input sequence embeddings.
  - $d_k$ is the dimension of the Key vectors, used for scaling to prevent the dot products from becoming too large.
  - $QK^T$ is the dot product similarity between Query and Key vectors.
  - $\mathrm{softmax}(\cdot)$ normalizes the attention scores into probabilities.
ICLRec [7]: An intention contrastive learning paradigm that models latent user intentions. It fuses these intentions into an SR method using a new contrastive self-supervised learning objective. It uses K-Means to cluster item embeddings and calculate intention centers.
MStein [12]: A sequential recommendation method based on mutual Wasserstein discrepancy minimization. This technique helps in obtaining more fine-grained sequential patterns by measuring the "distance" between probability distributions.
ICSRec [29]: A sequential recommendation method that enhances its performance by incorporating subsequences and considering intention prototypes of users, constructing auxiliary objectives for intention learning.
BSARec [34]: A sequential recommendation method that incorporates an attentive inductive bias, suggesting that it biases the attention mechanism in a specific way to capture particular sequential patterns.

Limitations of SR methods: They are primarily user-centric, focusing on next-item prediction and catering to historical preferences. This inherently leads to filter bubbles and information cocoons, as they reinforce existing interests rather than broadening them.

3.2.2. Proactive Recommendation (ProactRec) Methods

This field focuses on actively guiding user preferences.

IRN (Influential Recommender System) [46]: A Transformer-based proactive recommendation work. It generates a sequence of intermediate items using a personalized impression mask with the goal of guiding users toward a target.
- Limitation: Assumes users passively accept all intermediate recommendations, which is often unrealistic.
LLM-IPP (LLMs with Influential Recommender System) [37]: A pure LLM-based proactive recommendation method that leverages LLMs to produce targeted intermediate guidance sequences. It uses LLMs for path planning and instruction following to ensure coherence and acceptability of recommendations.
- Limitations: Resource-intensive and limited scalability. Similar to IRN, it assumes passive acceptance of intermediate items. The paper notes it doesn't show significant improvement over non-LLM methods in multi-round simulated clicks.
IPG (Iterative Preference Guidance) [3]: A model-agnostic post-processing method that conducts proactive recommendation. It employs a distribution-based click module to simulate user feedback.
- Limitations: Uses a one-size-fits-all fixed threshold to measure user impact, which is an oversimplification. Its overall performance needs further improvement.
Conversational Recommendation Systems [10, 33, 36, 45] and Multi-modal Recommendation Approaches [40]: These are related paradigms that also involve guiding users, often through interactive dialogues or multi-modal feedback, towards stated goals. However, the proactive manner in purely sequential recommendation scenarios was rarely explored before IRN and IPG.

3.3. Technological Evolution

Recommendation systems have evolved from simple collaborative filtering and content-based filtering to sophisticated sequential recommendation models that leverage deep learning architectures like RNNs, LSTMs, and Transformers. The initial focus was purely on user-centric predictions, aiming for high accuracy in predicting the next likely item.

The limitations of this user-centric approach—namely, filter bubbles and information cocoons—led to the emergence of proactive recommendation. This shift reflects a move from purely reactive systems to more goal-oriented or provider-aligned systems. Early proactive methods (like IRN and LLM-IPP) introduced the idea of multi-round guidance paths but often made simplifying assumptions about user behavior. IPG introduced a more realistic click simulation, but still lacked sophistication in modeling user dynamics.

ITMPRec fits into this evolution by addressing the key shortcomings of previous proactive recommendation methods. It improves target item selection (pre-match), deepens the modeling of user dynamics by considering intention evolution and individual sensitivity (TIAC), and introduces a more advanced user feedback simulation using LLMs, pushing the boundary of realistic and effective proactive nudging.

3.4. Differentiation Analysis

Compared to the main methods in related work, ITMPRec introduces several core differences and innovations:

Target Item Selection (Pre-match Module):
- Previous: IRN and IPG often rely on randomly assigned target items (or a single, pre-defined target). LLM-IPP also uses targeted guidance but doesn't explicitly detail a sophisticated target selection strategy. This can lead to cold-start issues or irrelevant targets.
- ITMPRec Innovation: ITMPRec proactively selects a category of target items using a pre-match module that considers the aggregate preferences of all users. This ensures that the chosen targets are meaningful and broadly appealing within a specific domain, making the nudging process more purposeful and successful.
User Intention Modeling (Intention-induced scores):
- Previous: Most proactive recommendation methods (e.g., IRN, IPG) do not explicitly model user intention during the round-by-round nudging process. While some SR methods like ICLRec and ICSRec model intention for next-item prediction, this concept was not integrated into proactive nudging.
- ITMPRec Innovation: ITMPRec explicitly quantifies users' intention-level evolution using intention-induced scores. This allows the system to choose intermediate items that not only align with immediate preferences but also strategically shift the user's underlying coarse-grained intentions towards the target category.
User Sensitivity to Nudging (Targeted Individual Arousal Coefficients - TIAC):
- Previous: IPG uses a one-size-fits-all fixed threshold for simulating user clicks, implicitly assuming uniform user responses to intermediate items. IRN and LLM-IPP assume passive acceptance, entirely ignoring user response variability.
- ITMPRec Innovation: ITMPRec introduces TIAC to model users' sensitivity or receptivity to new content. This acknowledges that each user reacts differently to external stimuli, enabling a more personalized and realistic adaptation of the preference evolution process during nudging.
User Feedback Simulation (LLM Agent):
- Previous: IRN and LLM-IPP assume passive acceptance of all intermediate items. IPG uses a distribution-based click model which is a step forward but still relatively simplistic, relying on a predefined probability function.
- ITMPRec Innovation: ITMPRec offers a sophisticated LLM agent as a pluggable component for simulating user feedback. Leveraging the LLM's external knowledge and reasoning capabilities, this agent can model more complex and realistic user decision-making processes, moving beyond simple probability distributions and providing more accurate feedback for training and evaluating proactive recommendation strategies.
  
  In essence, ITMPRec moves beyond simplistic assumptions about users and target selection by incorporating a deeper understanding of user dynamics (intentions, individual sensitivities) and more realistic feedback mechanisms (LLM agent), specifically tailored for the multi-round, targeted proactive recommendation setting.

4. Methodology

4.1. Principles

The core principle of ITMPRec is to move beyond passive sequential recommendation by proactively nudging user preferences towards a predetermined category of target items over multiple interaction rounds. This is achieved by:

Strategic Target Selection: Instead of random targets, ITMPRec identifies a set of target items that are relevant to a specific category and somewhat aligned with overall user preferences.
Dynamic Preference Evolution: It aims to gradually modify user preferences by recommending intermediate items that act as stepping stones. This evolution is not uniform for all users; it considers individual user intentions and their unique sensitivity to new recommendations.
Realistic User Feedback Simulation: Since real-time user feedback in multi-round nudging is hard to collect offline, ITMPRec relies on a sophisticated environment simulator that can realistically model user clicks, especially through the integration of Large Language Models (LLMs).

The theoretical basis and intuition behind this approach stem from the understanding that:

Users' preferences are dynamic and can be influenced.
User behavior is driven by both explicit preferences (fine-grained) and implicit intentions (coarse-grained).
Individuals react differently to external stimuli (arousal theory), necessitating personalized nudging strategies.
LLMs possess vast external knowledge and reasoning capabilities that can simulate complex human decision-making more accurately than simple statistical models.

4.2. Environment Simulator

To evaluate proactive recommendation methods in an offline setting, ITMPRec utilizes an environment simulator to generate realistic user feedback over multiple rounds. The simulator captures three main aspects: user and item embeddings, preference evolution, and click modeling.

4.2.1. User and Item Embeddings

Initially, ITMPRec uses a pre-trained graph-based recommendation method (specifically GraphAU [42]) to generate initial user embeddings and item embeddings. These embeddings capture the latent features of users and items in a $d$ -dimensional space.

$\hat{e}_u^0 \in \mathbb{R}^d$ : The initial pre-trained embedding vector for user $u$ .
$\hat{e}_i^0 \in \mathbb{R}^d$ : The initial pre-trained embedding vector for item $i$ .

4.2.2. Preference Evolution

The user's preference is not static; it evolves after interacting with new items. If a user $u$ positively interacts with an item $z$ in round $r$ , their embedding is updated to reflect this new preference. The paper models this preference evolution as a weighted sum of the user's current embedding and the interacted item's embedding.

The user $u$ 's embedding after interaction with item $z$ in round $r$ is updated as follows: $ \hat{e}_u^{r+1} \gets \beta_u^r \cdot \hat{e}_u^r + (1 - \beta_u^r) \cdot \hat{e}_z^r $ Where:

$\hat{e}_u^{r+1}$ : The updated embedding for user $u$ for the next round ( $r+1$ ).
$\hat{e}_u^r$ : The current embedding for user $u$ in round $r$ .
$\hat{e}_z^r$ : The embedding of the item $z$ that user $u$ interacted with in round $r$ .
$\beta_u^r$ : A targeted individual arousal coefficient for user $u$ in round $r$ . This coefficient, which ranges between 0 and 1, controls the degree of preference evolution. A higher $\beta_u^r$ means the user's preference changes less after interacting with item $z$ (more weight on $\hat{e}_u^r$ ), while a lower $\beta_u^r$ means the preference changes more (more weight on $\hat{e}_z^r$ ). This coefficient is specific to each user and round, originating from the TIAC module (explained in Section 4.5).

4.2.3. Click Model (Traditional)

The click model simulates the user's decision to interact with a recommended item. It estimates the interaction probability between a user $u$ and an item $z$ . This traditional model is often distribution-based, assuming a probabilistic outcome based on the similarity between user and item embeddings.

The interaction probability (or activation score) between user $u$ and item $z$ in round $r$ is calculated, and then a binary click decision $a_u^r$ is made: $ a_u^r = \sigma\mathrm{(} w\mathrm{(} (\hat{e}_u^r)^T \mathrm{ ~ . ~ } \hat{e}_z^r - b )\mathrm{)} $ Where:

$a_u^r$ : A binary value (1 for a user click, 0 otherwise), representing the outcome of the interaction.
$\sigma(\cdot)$ : The sigmoid function, which squashes the input to a value between 0 and 1, representing a probability.
$(\hat{e}_u^r)^T \cdot \hat{e}_z^r$ : The dot product of the user's current embedding and the item's embedding, indicating their similarity or predicted relevance.
$w(\cdot)$ : Parameters of the click model (e.g., slope).
$b$ : A bias term.

The paper also proposes an LLM-agent click model as an alternative, discussed later in Section 4.6.

4.3. Pre-match Module

The pre-match module addresses the limitation of random target item assignment in previous proactive recommendation methods. Instead, it aims to select a set of target items that are meaningful within a specific category and have a collective appeal to users. This avoids cold-start issues with target items and aligns better with content provider goals.

The process involves two main steps:

Calculate Overall User Preference for Candidate Items: For a given pool of candidate target items (e.g., all items within a specific category), the module calculates a collective preference score from all users for each candidate item. $ L_{N_{can}} = \sum_{u=1}^U (e_l^T \cdot e_u^0) $ Where:
- $L_{N_{can}}$ : A list or set of scores, one for each candidate item in the candidate target pool.
- $U$ : The total number of users.
- $e_l^T \cdot e_u^0$ : The dot product (similarity) between a candidate item $l$ 's initial embedding ( $e_l$ ) and a user $u$ 's initial embedding ( $e_u^0$ ). This represents user $u$ 's initial preference for item $l$ .
- $l \in N_{can}$ : Indicates that $l$ is an item from the candidate target pool, which has a size of $N_{can}$ .
Select Top Target Items: From the scored candidate items, the module selects the top $N_{tar}$ $N_{t a r}$ items (where $N_{tar} \leq N_{can}$ $N_{t a r} \leq N_{c an}$ ) to be the ultimate target items for guidance. $ L_{N_{tar}} = cut{sort(L_{N_{can}}, \searrow), N_{tar}} $ Where:
- $L_{N_{tar}}$ : The final list of $N_{tar}$ selected target items.
- $sort(X, \searrow)$ : A function that sorts the list $X$ in descending order.
- $cut\{X, num\}$ : A function that returns the first num elements from the sorted list $X$ .
- $N_{tar}$ : A pre-defined number, representing the desired count of target items (e.g., 20 or 50).
  
  This pre-match setting ensures that the subsequent proactive recommendation process starts with target items that are collectively preferred by a wider user base, making successful nudging more feasible.

4.4. Intention-induced Scores

To generate effective intermediate recommendations, ITMPRec considers not only direct user-item similarity but also the underlying user intentions. This section details how intention-induced scores are calculated and integrated into the recommendation process.

4.4.1. Basic Recommendation Score

The fundamental tendency of interaction between user $u$ and item $i$ in round $r$ is quantified using the inner product of their embeddings: $ score_{(u,i)}^r = (e_u^r)^T \cdot e_i $ Where:

$score_{(u,i)}^r$ : The basic relevance score between user $u$ and item $i$ in round $r$ .
$e_u^r$ : The current embedding of user $u$ in round $r$ .
$e_i$ : The embedding of candidate item $i$ .

4.4.2. Post-processing Strategy with Nudging Aggressiveness

Following previous work [3], the system also considers the degree of nudging aggressiveness towards a target item $e_j$ . This is formulated as: $ l_{uij}^r = score_{(u,i)}^r \cdot nudge_{(u,i,j)}^r $ Where:

$l_{uij}^r$ : The overall score for recommending intermediate item $i$ to user $u$ to guide towards target item $j$ in round $r$ .
$nudge_{(u,i,j)}^r$ : A term representing the nudging aggressiveness, which is associated with how much a user's preference ( $e_u$ ) would shift towards the target item $e_j$ if they interacted with intermediate item $i$ . It's defined as the difference in similarity between the target item $e_j$ and the user's future embedding ( $e_u^{(r+1)}$ ) versus their current embedding ( $e_u^{(r)}$ ). $ nudge_{(u,i,j)}^r = e_j^T e_u^{(r+1)} - e_j^T e_u^{(r)} $

In the previous work [3], it was assumed that user representation transition from round $r$ to $r+1$ follows a linear pattern: $ e_u^{r+1} = \omega e_u^r + (1 - \omega) e_i $ Where $\omega$ is a coefficient representing the weight of the old user representation. Substituting this into the $nudge_{(u,i,j)}^r$ term: $ nudge_{(u,i,j)}^r = e_j^T (\omega e_u^r + (1 - \omega) e_i) - e_j^T e_u^r $ $ nudge_{(u,i,j)}^r = \omega e_j^T e_u^r + (1 - \omega) e_j^T e_i - e_j^T e_u^r $ $ nudge_{(u,i,j)}^r = (1 - \omega) e_j^T e_i - (1 - \omega) e_j^T e_u^r $ $ nudge_{(u,i,j)}^r = (1 - \omega) (e_j^T e_i - e_j^T e_u^r) $ $ nudge_{(u,i,j)}^r = (1 - \omega) (e_i - e_u^r)^T e_j $ Then, substituting this back into the overall score $l_{uij}^r$ : $ l_{uij}^r = ( (e_u^r)^T \cdot e_i ) \cdot (1 - \omega) (e_i - e_u^r)^T e_j $ The term $(1 - \omega)$ is a constant (assuming uniform $\omega$ across users in previous work) and does not influence the ultimate choice of the intermediate item that yields the maximum score, so it can be omitted. This leads to the simplified formulation: $ l_{uij}^r = ( (e_u^r)^T \cdot e_i ) \cdot (e_i - e_u^r)^T e_j $ This formula consists of two parts: the interaction-tendency term $((e_u^r)^T \cdot e_i)$ and the targeted-guidance term $(e_i - e_u^r)^T e_j$ . The interaction-tendency part favors items similar to the user's current preference, while the targeted-guidance part favors items that move the user's preference closer to the target item $e_j$ .

4.4.3. Intention-level Score

Recognizing that previous studies overlooked intention-level dynamics, ITMPRec introduces this component. It first identifies the user's current intention and then assesses the intention-level similarity with candidate items.

User Intention-level Vector: Given a global intention matrix $C \in \mathbb{R}^{N_C \times d}$ $C \in R^{N_{C} \times d}$ (where $N_C$ $N_{C}$ is the number of intentions, and each row $c_m$ $c_{m}$ is an intention vector) from a pre-trained ICLRec model, the user's current intention-level vector $c_u^r$ $c_{u}^{r}$ is identified as the closest intention prototype to their current user embedding $e_u^r$ $e_{u}^{r}$ . $ c_u^r = \underset{c_m \in {c_1, \ldots, c_{N_C}}}{argmin} (||c_m - e_u^r||_2^2) $ Where:
- $c_u^r$ : The intention-level vector representing user $u$ 's dominant intention in round $r$ .
- $c_m$ : One of the $N_C$ global intention vectors.
- $||\cdot||_2^2$ : The squared Euclidean distance, used to find the closest intention vector.
Item Intention-level Vector: Similarly, each item $e_i$ $e_{i}$ is also projected into the intention space to find its closest intention prototype. $ c_i = \underset{c_m \in {c_1, \ldots, c_{N_C}}}{argmin} (||c_m - e_i||_2^2) $ Where:
- $c_i$ : The intention-level vector representing item $i$ 's dominant intention.
Intention-level Similarity Score: The intention-level score between user $u$ and candidate item $i$ is then calculated using the dot product of their respective intention vectors. $ c_{score} = (c_u^r)^T \cdot c_i $

4.4.4. Final Combined Score

To incorporate intention-induced scores, the interaction-tendency term in the overall score (Equation 9) is modified. It now considers both the direct representational similarity between user $u$ and item $i$ and their similarity in the intention space. The targeted-guidance term remains unchanged.

The final formulation for the overall score $l_{uij}^r$ is: $ l_{uij}^r = \langle (e_u^r)^T \cdot e_i + \lambda \quad (c_u^r)^T \cdot c_i \quad \rangle \cdot (e_i - e_u^r)^T e_j $ Where:

$\lambda$ : A hyperparameter that controls the weight given to the intention-induced score (the second term in the angle brackets) relative to the direct interest score (the first term). A higher $\lambda$ means more emphasis on intention alignment.
The first part $\langle (e_u^r)^T \cdot e_i + \lambda \quad (c_u^r)^T \cdot c_i \quad \rangle$ represents the enhanced interaction-tendency incorporating intention.
The second part $(e_i - e_u^r)^T e_j$ remains the targeted-guidance term.

The item $e_{ui}^r$ that yields the largest score $l_{uij}^r$ will be selected as the intermediate item to be recommended to the user in round $r$ , and this item then serves as input to the click model.

4.5. Targeted Individual Arousal Coefficients (TIAC)

Previous work often assumed a uniform preference evolution coefficient (the $\omega$ in the linear update rule, or $\beta$ in this paper) for all users, implying everyone responds to new content similarly. ITMPRec challenges this by introducing Targeted Individual Arousal Coefficients (TIACs), which personalize the degree of preference evolution based on each user's unique receptivity to new stimuli and their relationship with the target item.

The TIAC ( $\beta_u^r$ ) for user $u$ in round $r$ is calculated as follows:

Historical Preference Variance (Short-term Curiosity): First, ITMPRec identifies a set of items that represent the user's short-term preferences or "curiosity" based on their current embedding and items they have not yet interacted with. $ \mathbf{hp}^{r-1}(u) = Top\mathcal{Q}{\phi(e_u^{r-1}, e_{idx})}, e_{idx} \in E \backslash {S_u^{r-1}} $ Where:
- $\mathbf{hp}^{r-1}(u)$ : A set of $Q$ item embeddings representing user $u$ 's short-term preferences or items they are currently "curious" about, derived from their state in round r-1.
- $\phi(x, y)$ : The cosine similarity between two vectors $x$ and $y$ . This measures how similar $e_u^{r-1}$ is to other items.
- $E \backslash \{S_u^{r-1}\}$ : The set of all items $E$ excluding those already in user $u$ 's historical sequence $S_u^{r-1}$ (i.e., items the user has already interacted with).
- $Top\mathcal{Q}\{\cdot\}$ : A function that returns the top Q items from the specified set that have the highest cosine similarity with the user's embedding $e_u^{r-1}$ .
- $Q$ : A hyperparameter representing the capacity of short-term preferences or curiosity.
Arousal Value Calculation: Based on this set of short-term preferences $\mathbf{hp}^{r-1}(u)$ , an arousal value is computed, reflecting how much the user is "aroused" by the target item $e_j$ given their current curiosities. This arousal value then becomes the TIAC $\beta_u^r$ . $ \beta_u^r = \mathcal{P}\mathcal{O}\mathcal{O}L(\phi({\bf hp}^{r-1}(u), e_j)) $ Where:
- $\beta_u^r$ : The Targeted Individual Arousal Coefficient for user $u$ in round $r$ .
- $\mathcal{P}\mathcal{O}\mathcal{O}L(\cdot)$ : A pooling operation (e.g., average pooling) applied to the cosine similarities. This averages the similarity between the user's top Q curiosities and the target item $e_j$ . Other pooling methods (max, sum) could also be used.
- $\phi({\bf hp}^{r-1}(u), e_j)$ : The cosine similarity between each item in the user's short-term preference set $\mathbf{hp}^{r-1}(u)$ and the target item $e_j$ .
  
  This calculated $\beta_u^r$ is then fed into the preference evolution module (Section 4.2.2), allowing for personalized preference updates. Users who are more "aroused" (i.e., whose current curiosities align more with the target item, leading to a higher $\beta_u^r$ ) might have their preferences shifted more effectively towards the target, or their preferences might be more stable. The actual impact depends on how $\beta_u^r$ is used in the preference update equation: a higher $\beta_u^r$ means the user's preference $\hat{e}_u^r$ retains more weight, implying less change from the current state. Conversely, a lower $\beta_u^r$ would imply a stronger shift towards the interacted item $\hat{e}_z^r$ .

4.6. LLM-based Click Simulation Agent

ITMPRec offers an alternative to the traditional distribution-based click model: an LLM-based click simulation agent. This agent leverages the external knowledge and reasoning capabilities of Large Language Models (LLMs) to simulate user feedback more realistically.

The LLM agent (e.g., ChatGLM3) takes the user's current historical sequence and the recommended intermediate item as input and outputs a binary click decision.

The action of user $u$ (click or no click) is obtained as: $ a_u^r = LLM(\mathcal{P}_F, \mathcal{H}_u^r, NAMES(i_u^r)) $ Where:

$a_u^r$ : The binary action (0 for no click, 1 for click) generated by the LLM for user $u$ in round $r$ .
$LLM(\cdot)$ : Represents the Large Language Model function.
$\mathcal{P}_F$ : The task instruction or prompt provided to the LLM. This includes few-shot examples (demonstrations of desired behavior) and a prompt template that guides the LLM to produce a binary output (0 or 1).
$\mathcal{H}_u^r$ : The user's historical interaction sequence up to round $r$ .
$NAMES(i_u^r)$ : The name or description of the intermediate item $i_u^r$ recommended to user $u$ in round $r$ . The LLM can use this textual information, along with its external knowledge, to make a more informed decision.

Based on the LLM's output, the user's historical sequence for the next round is updated: $ \mathcal{H}_u^{r+1} = \left{ \begin{array}{ll} CONCAT(\mathcal{H}_u^r, NAMES(i_u^r)), & \mathrm{if} ; a_u^r = 1 \ \mathcal{H}_u^r, & \mathrm{if} ; a_u^r = 0 \end{array} \right. $ Where:
$\mathcal{H}_u^{r+1}$ : The updated historical sequence for user $u$ for the next round.
$CONCAT(\cdot)$ : A function to concatenate the new item's name to the existing historical sequence if the user clicked it.

4.6.1. Discussion: Traditional vs. LLM Agent Click Model

Traditional Click Model: Assumes that higher similarity/score directly translates to a higher probability of user acceptance. This is a simplification and may not capture the nuances of human behavior.
LLM Agent Click Model: Offers richer interpretability and external knowledge utilization. LLMs can consider various factors beyond simple similarity, such as item attributes, user's inferred mood, contextual information, and their own reasoning to decide whether a user would click an item. This makes the simulation more realistic for complex decision-making. The LLM can simulate intricate decision-making factors, which is valuable in today's era of complex user behaviors.

4.7. Overall Algorithm

The complete process of ITMPRec is summarized in Algorithm 1, outlining the multi-round nudging for each user and target item.

Algorithm 1: ITMPRec

Input:

$\mathcal{U}$ : set of users.
$\mathcal{I}$ : set of items.
$s_u$ : historical sequences for each user $u$ .
$R$ : total number of nudging rounds.
$B$ : batch size for processing users.

Output:

$P_{uj}^r$ : The nudging path (sequence of clicked intermediate items) for each user $u$ and target item $j$ .

Procedure:

Initialize Target Items: Determine the set of target items to be nudged by applying the pre-match module (Equation 7). This step is performed once before the multi-round process begins.
Iterate Through Target Items: For each target item $j$ in the selected set ( $N_{tar}$ ): a. Initialize Nudging Path: Initialize an empty nudging path $P_{uj}^0 = []$ for each user $u$ and the current target item $j$ . This path will store the intermediate items clicked by user $u$ while being nudged towards $j$ . b. Multi-Round Nudging: For each round $r$ from 0 to R-1: i. Update User Representation: Get the current user representation $e_u^r$ based on their historical sequence $S_u^r$ using the sequence encoder (Equation 5). If $r=0$ , $S_u^0$ is the initial historical sequence. ii. Initialize Intermediate Lists: Create empty lists $intermids_r$ (to store all recommended intermediate items in this round) and $recs_u^r$ (to store the specific recommended item for each user). iii. Batch Processing for Users: Iterate through users in batches of size $B$ : 1. Calculate Intention-level Score: For each user $u$ in the current batch, calculate their intention-level score (Equation 12) with candidate intermediate items using their current intention vector $c_u^r$ . 2. Select Intermediate Item: Determine the optimal intermediate item $rec_u^r$ for user $u$ for this round by maximizing the final combined score $l_{uij}^r$ (Equation 13). This item is intended to nudge user $u$ towards target $j$ . 3. Calculate TIAC: Compute the targeted individual arousal coefficient $\beta_u^r$ for user $u$ (Equation 15). 4. Store Recommended Item: Add $rec_u^r$ to the $recs_u^r$ list. iv. Simulate Clicks: Simulate user clicks on the recommended intermediate items. 1. $intermids_r = clicks{recs_u^r}$ : This step uses either the distribution-based click model (Section 4.2.3) or the LLM-based click simulation agent (Section 4.6) to determine which of the recommended items $recs_u^r$ are "clicked" by users. 2. $intermids_r.extend(intermids_r)$ : Consolidate the click results. v. Update User State and Path: For each user $u$ and their corresponding click result $intermids_r[iidx]$ : 1. If Clicked: If user $u$ clicked the recommended item: * Update user's historical sequence $S_u^{r+1}$ by concatenating the clicked item (represented by $recs_u^r[iidx]$ ) and incorporating the arousal coefficient $\beta_u^r$ (this implicitly refers to the preference evolution formula from Section 4.2.2, where $\beta_u^r$ is used). * Extend the nudging path $P_{uj}^r$ with the clicked item.
Return Nudging Paths: After all rounds and all target items are processed, return the complete nudging paths $P_{uj}^r$ for each user and target.

Key points from the algorithm:

The pre-match module is a one-time setup (Line 1).
The core of the algorithm (Lines 2-21) iterates through each target item and then for multiple rounds.
For each round, it calculates personalized scores (Line 8-9) and arousal coefficients (Line 10).
The click simulation (Line 12) is crucial for determining actual user interaction and subsequent preference evolution (Line 17).
The nudging path $P_{uj}^r$ records the actual sequence of intermediate items that successfully led a user closer to a target.

5. Experimental Setup

5.1. Datasets

The experiments were conducted on four publicly available datasets, chosen to represent different domains and scales of recommendation scenarios:

ML-100k (MovieLens 100k): A movie rating dataset.
- Domain: Movies
- Characteristics: Relatively dense, smaller scale.
- Scale: 943 users, 1,348 items, 98,704 interactions.
- Density: 7.7649%
- Average Items per User: 104.67
Lastfm: A music listening dataset.
- Domain: Music artists/tracks
- Characteristics: Moderately dense.
- Scale: 945 users, 2,782 items, 246,368 interactions.
- Density: 9.3712%
- Average Items per User: 36.78
Steam: A video game platform dataset.
- Domain: Video games
- Characteristics: Sparser, larger number of users.
- Scale: 12,611 users, 2,017 items, 220,100 interactions.
- Density: 0.9686%
- Average Items per User: 19.54

Douban_movie: A movie dataset from Douban (a Chinese social networking service).

Domain: Movies
Characteristics: Sparser, very large number of items and interactions.
Scale: 2,623 users, 20,527 items, 1,161,110 interactions.
Density: 2.1565%

Average Items per User: 442.66

The following are the data statistics from Table 2 of the original paper:

Dataset	ML-100k	Lastfm	Steam	Douban_movie
#Users	943	945	12,611	2,623
#Items	1,348	2,782	2,017	20,527
#Interactions	98,704	246,368	220,100	1,161,110
Density	7.7649%	9.3712%	0.9686%	2.1565%
#Avg. Items per User	104.67	36.78	19.54	442.66

These datasets were chosen because they are widely used benchmarks in recommendation systems research, allowing for fair comparison with existing methods. They cover a range of density and scale, which helps validate the method's robustness across different data characteristics. For instance, ML-100k and Lastfm are denser, while Steam and Douban_movie are sparser, posing different challenges.

5.2. Evaluation Metrics

The performance of ITMPRec and baseline models is evaluated using three metrics, specifically chosen to assess the effectiveness of proactive recommendation tasks over multiple rounds. The evaluation is conducted at different stages ( $P \in [5, 10, 15, 20]$ rounds) of the recommendation process.

HitRatio (HR@P):
- Conceptual Definition: HitRatio measures the proportion of users who positively interact with at least one recommended intermediate item within $P$ proactive recommendation cycles. It quantifies the system's ability to engage users during the nudging process.
- Mathematical Formula: $ HR@P = \frac{1}{P |\mathcal{U}|} \sum_{p=1}^P \sum_{u \in \mathcal{U}} a_{up} $
- Symbol Explanation:
  - HR@P: Hit Ratio at $P$ rounds.
  - $P$ : The number of proactive recommendation cycles (rounds) considered for evaluation.
  - $|\mathcal{U}|$ : The total number of users in the dataset.
  - $a_{up}$ : A binary value (0 or 1) representing the feedback from the click simulator for user $u$ in round $p$ . $a_{up}=1$ if the user clicked the recommended item in round $p$ , and $a_{up}=0$ otherwise.
  - $\sum_{p=1}^P \sum_{u \in \mathcal{U}} a_{up}$ : The total number of positive interactions (clicks) across all users and all $P$ rounds.
Increase of Interest (IoI@P):
- Conceptual Definition: Increase of Interest quantifies how much a user's interest in the target item has increased after $P$ rounds of proactive recommendations. It directly measures the effectiveness of the nudging process in shifting user preferences towards the desired target. A higher value indicates better guidance.
- Mathematical Formula: $ IoI@P = \frac{1}{|\mathcal{U}|} \sum_{u \in \mathcal{U}} (\hat{e}_j^T \cdot \hat{e}_u^P - \hat{e}_j^T \cdot \hat{e}_u^0) $
- Symbol Explanation:
  - IoI@P: Increase of Interest at $P$ rounds.
  - $|\mathcal{U}|$ : The total number of users.
  - $\hat{e}_j$ : The embedding of the target item. This embedding remains constant throughout the process.
  - $\hat{e}_u^P$ : The embedding of user $u$ after $P$ rounds of proactive recommendations (i.e., after their preference has potentially evolved).
  - $\hat{e}_u^0$ : The initial embedding of user $u$ at the start of the guidance phase (before any nudging).
  - $\hat{e}_j^T \cdot \hat{e}_u^P$ : The dot product (similarity) between the target item embedding and the user's embedding after $P$ rounds.
  - $\hat{e}_j^T \cdot \hat{e}_u^0$ : The dot product (similarity) between the target item embedding and the user's initial embedding.
  - The difference $(\hat{e}_j^T \cdot \hat{e}_u^P - \hat{e}_j^T \cdot \hat{e}_u^0)$ measures the change in user $u$ 's interest in the target item. A positive value means increased interest.
Increase of Ranking (IoR@P):
- Conceptual Definition: Increase of Ranking measures the improvement in the ranking position of the target item among all other items, with respect to a user's preference, after $P$ rounds. It indicates how much closer the target item has become to being a top recommendation for the user. A higher value means the target item is ranked much higher after nudging.
- Mathematical Formula: $ IoR@P = \frac{1}{|\mathcal{U}|} \sum_{u \in \mathcal{U}} \mathsf{Ran}{\hat{e}_j | \hat{e}_u^0} - \mathsf{Ran}{\hat{e}_j | \hat{e}_u^P} $
- Symbol Explanation:
  - IoR@P: Increase of Ranking at $P$ rounds.
  - $|\mathcal{U}|$ : The total number of users.
  - $\mathsf{Ran}\{\hat{e}_j | \hat{e}_u^*\}$ : A function that returns the discrete ranking of the target item $\hat{e}_j$ among all items, based on their similarity to the user's embedding $\hat{e}_u^*$ . The ranking is typically based on similarity scores (e.g., dot products) with the user's embedding. A lower rank value means a higher position (e.g., rank 1 is the most preferred).
  - $\mathsf{Ran}\{\hat{e}_j | \hat{e}_u^0\}$ : The initial ranking of the target item for user $u$ .
  - $\mathsf{Ran}\{\hat{e}_j | \hat{e}_u^P\}$ : The ranking of the target item for user $u$ after $P$ rounds of nudging.
  - A positive value for IoR@P indicates that the target item's ranking has improved (i.e., its rank number has decreased, meaning it moved higher up the list).

5.3. Baselines

The proposed ITMPRec method is compared against eight state-of-the-art baseline models, categorized into Sequential Recommendation (SR) methods and Proactive Recommendation (ProactRec) methods. For fairness, all methods use a distribution-based click simulator unless explicitly stated (e.g., for LLM-IPP under its own assumptions or for ITMPRec when combined with its LLM-agent click model).

5.3.1. Sequential Recommendation (SR) Methods

These methods are designed for next-item prediction and typically optimize for user-centric preferences.

SASRec [18]: A foundational self-attentive sequential recommendation model. It represents sequences as item embeddings and uses a Transformer encoder to capture sequential patterns.
ICLRec [7]: An intention contrastive learning model for sequential recommendation. It explicitly models user intentions using a contrastive learning objective, improving user representation.
MStein [12]: A sequential recommendation method that minimizes mutual Wasserstein discrepancy to capture fine-grained sequential patterns.
ICSRec [29]: A sequential recommendation method that uses intent contrastive learning with cross subsequences to learn better representations.
BSARec [34]: An attentive inductive bias based sequential recommendation method, aiming to improve attention mechanisms for sequential modeling.

5.3.2. Proactive Recommendation (ProactRec) Methods

These methods explicitly aim to guide user preferences towards a target.

IRN (Influential Recommender System) [46]: A Transformer-based proactive recommendation method. It generates an entire sequence of middle items in one go, with the assumption that users will passively accept all of them.
IPG (Iterative Preference Guidance) [3]: A model-agnostic post-processing framework for proactive recommendation. It uses an iterative approach to guide preferences and includes a distribution-based click module to simulate user responses.
LLM-IPP (LLMs with Influential Recommender System) [37]: A proactive recommendation method that uses Large Language Models (specifically GLM-4-Flash in the experiments) for path planning and instruction following to generate guidance sequences. The paper notes its high resource consumption and its original assumption of passive user acceptance. In the context of ITMPRec's evaluation, LLM-IPP was tested under the user click simulation settings for fair comparison.

These baselines were chosen to represent the current state-of-the-art in both sequential and proactive recommendation, covering various architectural approaches (attention, contrastive learning) and different levels of sophistication in handling proactive guidance.

6. Results & Analysis

6.1. Core Results Analysis

The experimental results demonstrate the superiority of ITMPRec in proactive recommendation tasks across four datasets. The evaluation primarily focuses on HitRatio (HR@P), Increase of Interest (IoI@P), and Increase of Ranking (IoR@P) at different rounds ( $P \in [5, 10, 15, 20]$ ).

The following are the results from Table 4 of the original paper:

Datasets	Methods	HR				IoI				IoR
		@5	@10	@15	@20	@5	@10	@15	@20	@5	@10	@15	@20
		ML-100k	SASRec	0.3994	0.3991	0.3980	0.3979	0.0455	0.0866	0.1121	0.1259	-0.4036	-0.9826	-1.2254	-1.1867
ICLRec	0.4124		0.4117	0.4102	0.4083	0.0394	0.0744	0.0952	0.1052	0.2398	0.2578	0.2476	0.0111
MStein	0.3134		0.3125	0.3118	0.3114	0.0074	0.0141	0.0204	0.0264	-0.1127	-0.1355	-0.022	0.12
BSARec	0.3705		0.3702	0.3692	0.3689	0.0416	0.0814	0.1131	0.1365	-0.3646	-0.7027	-1.1309	-1.5034
ICSRec	0.3642		0.3636	0.3628	0.3621	0.0412	0.0866	0.1231	0.1503	-0.0593	0.0346	0.2145	0.2695
IRN	0.4274		0.4270	0.4250	0.4237	0.0299	0.0578	0.0867	0.0912	0.0518	0.2712	1.3507	1.7407
IPG	0.3866		0.3891	0.3895	0.3861	0.1520	0.2620	0.3409	0.3898	33.2767	68.7030	96.4608	111.8751
LLM-IPP	0.3695	0.3680	0.3658	0.3659	0.0450	0.0865	0.1184	0.1412	0.6572	1.1868	1.3978	1.3998
ITMPRec w/o P	0.4029	0.4027	0.4066	0.4067	0.2353	0.3951	0.4496	0.4622	63.9955	113.2598	128.8455	131.4221
ITMPRec	0.4064	0.4024	0.4040	0.4016	0.2433	0.3998	0.4556	0.4690	70.0011	120.6690	136.8670	139.6954
Lastfm	SASRec	0.3263	0.3254	0.3248	0.3243	0.0094	0.0174	0.0250	0.0311	-0.1749	-0.6057	-0.7632	-1.1204
	ICLRec	0.4137	0.4129	0.4111	0.4106	0.0083	0.0102	0.0066	0.0001	0.1126	0.4359	0.8594	0.9521
	MStein	0.3289	0.3281	0.3275	0.3270	-0.0024	0.0023	0.0139	0.0240	-0.6893	-0.8823	-0.9250	-0.9139
	BSARec	0.3334	0.3327	0.3320	0.3315	0.0193	0.0297	0.0400	0.0493	0.5216	0.7023	0.6054	0.5891
	ICSRec	0.3359	0.3351	0.3345	0.3369	0.0115	0.0251	0.0362	0.0458	-0.1695	-0.0528	0.0099	0.0688
	IRN	0.4028	0.4018	0.4008	0.4002	0.0101	0.0203	0.0400	0.0525	0.0185	0.0916	1.8248	3.2734
	IPG	0.3516	0.3528	0.3490	0.3520	0.1791	0.2976	0.3879	0.4544	25.2901	52.9695	80.5057	100.1863
ITMPRec w/o P	0.4163	0.4110	0.4122	0.4113	0.2925	0.5218	0.5958	0.6161	60.5283	120.0176	137.4319	141.8555
ITMPRec	0.4129	0.4153	0.4115	0.4135	0.3943	0.5938	0.6486	0.6614	96.9189	146.0627	159.1564	161.7352
Steam	SASRec	0.4271	0.4263	0.4257	0.4251	0.0486	0.0991	0.1320	0.1521	-0.2202	0.3557	1.1601	1.6881
	ICLRec	0.3886	0.3878	0.3872	0.3867	0.0583	0.1140	0.1571	0.1898	0.8334	2.0505	3.3948	4.6866
	MStein	0.3929	0.3921	0.3914	0.3909	0.0584	0.1166	0.1620	0.1942	1.1366	2.4076	2.7133	2.5779
	BSARec	0.4096	0.4089	0.4083	0.4078	0.0608	0.1290	0.1760	0.2050	0.1626	2.0218	4.0323	5.6237
	ICSRec	0.4005	0.3998	0.3991	0.3986	0.0597	0.1223	0.1656	0.1927	0.0492	0.7546	1.6664	2.0173
	IRN	0.4205	0.4195	0.4188	0.4183	0.0418	0.0839	0.1628	0.2016	0.3826	0.5860	2.6768	6.6263
	IPG	0.3921	0.3907	0.3898	0.3895	0.1036	0.1777	0.2245	0.2554	17.6234	27.6880	33.4087	37.4944
ITMPRec w/o P	0.3911	0.3899	0.3915	0.3907	0.1876	0.2654	0.2984	0.3108	46.7208	55.9344	58.9080	59.8572
ITMPRec	0.3918	0.3937	0.3930	0.3923	0.2192	0.2955	0.3239	0.3336	55.3553	66.6745	70.6409	71.6806
Douban_movie	SASRec	0.3673	0.3669	0.3662	0.3655	-0.0021	-0.0042	-0.0046	-0.0040	0.0888	0.2017	0.3321	0.5044
	ICLRec	0.3277	0.3268	0.3261	0.3256	0.0002	-0.0017	-0.0009	0.0019	0.0062	0.0043	0.0475	0.1750
	MStein	0.3174	0.3166	0.3159	0.3154	0.0030	0.0076	0.0128	0.0176	0.0180	0.0636	0.1195	0.2197
	BSARec	0.4217	0.4215	0.4208	0.4200	-0.0046	-0.0095	-0.0130	-0.0150	0.0028	-0.0768	-0.1460	-0.2929
	ICSRec	0.3304	0.3296	0.3289	0.3284	0.0019	0.0016	0.0037	0.0066	0.0858	0.1511	0.2715	0.4051
	IRN	0.3758	0.3753	0.3744	0.3739	0.0037	0.0069	0.0052	0.0010	0.1676	0.2913	0.4284	0.6543
	IPG	0.3310	0.3323	0.3310	0.3303	0.0849	0.1418	0.1885	0.2259	13.3451	21.2825	30.0722	39.0427
ITMPRec w/o P	0.3439	0.3483	0.3422	0.3389	0.1465	0.2222	0.2798	0.3201	33.6715	48.5319	62.1714	73.9921
ITMPRec	0.3366	0.3363	0.3361	0.3362	0.1619	0.2408	0.2960	0.3374	36.0797	50.5707	65.3341	77.2108

6.1.1. Comparison with Traditional SR Methods

Proactive vs. SR: Proactive recommendation methods (IRN, IPG, ITMPRec) generally outperform traditional SR methods (SASRec, ICLRec, MStein, BSARec, ICSRec) in IoI and IoR metrics. This is a critical finding, as IoI and IoR directly measure the success of proactive guidance towards a target. Traditional SR methods often show low or even negative IoI and IoR values (e.g., SASRec on ML-100k and Lastfm, BSARec on Douban_movie), indicating that without explicit guidance, user preferences either do not shift towards the target or even diverge.
HR@P: While proactive methods might show a slightly lower HR@P compared to some SR methods (e.g., SASRec on ML-100k, IRN on Steam), this is an acceptable trade-off. HR@P measures engagement with any intermediate item, whereas IoI and IoR measure engagement specifically towards the target. The goal of proactive recommendation is not just clicks, but guided clicks. The decrease in HR@P is often insignificant, suggesting that the system can still generate engaging intermediate items while steering preferences.

6.1.2. Comparison with Other Proactive Recommendation Methods

IRN's Limitation: IRN shows limited IoI and IoR improvements. This confirms the paper's hypothesis that IRN's assumption of passive user acceptance (generating a full path upfront without accounting for real-time feedback) leads to suboptimal performance when a simulated click feedback mechanism is introduced, as users might not accept the entire pre-planned sequence.
LLM-IPP's Underperformance: LLM-IPP, despite being an LLM-based method, underperforms significantly in IoI@P and IoR@P under the user click simulation settings. The paper notes its high time consumption (over 50 hours for ML-100k compared to others within 1 hour), which limits its practical applicability and suggests that simply using an LLM without the specific nudging mechanisms of ITMPRec is not sufficient.
ITMPRec vs. IPG: ITMPRec demonstrates a substantial improvement over IPG, which is identified as the second-best proactive recommendation method. ITMPRec achieves average enhancements of 36.47% in IoI@20 and 68.80% in IoR@20 across the four datasets. This highlights the effectiveness of ITMPRec's novel components (pre-match, intention-induced scores, TIAC) in more effectively guiding users towards target categories and single items.

6.1.3. ITMPRec's Overall Performance

ITMPRec consistently achieves the best performance in IoI@P and IoR@P across all datasets, confirming its ability to effectively shift user preferences towards target items.
The HR@P scores for ITMPRec are competitive, demonstrating that the system can still generate engaging intermediate recommendations while fulfilling its proactive goal.

6.2. Ablation Studies / Parameter Analysis

The following are the results of ablation studies on four datasets from Table 3 of the original paper:

Dataset	Ablation	HR@20	IoI@20	IoR@20
ML-100k	w/o P	0.4067	0.4622	131.4221
	w/o IIS	0.3878	0.4596	136.6786
	w/o TIAC	0.3823	0.4006	118.3061
	ITMPRec	0.4016	0.4690	139.6954
Lastfm	w/o P	0.4113	0.6161	141.8555
	w/o IIS	0.3324	0.4030	97.2408
	w/o TIAC	0.3758	0.5149	116.5403
	ITMPRec	0.4135	0.6614	161.7352
Steam	w/o P	0.3907	0.3108	59.8572
	w/o IIS	0.3920	0.3321	71.5609
	w/o TIAC	0.3858	0.2472	38.4798
	ITMPRec	0.3923	0.3336	71.6806
Douban_movie	w/o P	0.3389	0.3201	73.9921
	w/o IIS	0.3329	0.3035	64.1521
	w/o TIAC	0.3303	0.2644	50.9361
	ITMPRec	0.3362	0.3374	77.2108

6.2.1. Ablation Study (RQ1)

The ablation study analyzes the contribution of three key components of ITMPRec: the pre-match module (P), intention-induced scores (IIS), and targeted individual arousal coefficients (TIAC).

w/o Pre-match (P):
- Removing the pre-match module (by using random target selection) generally leads to a decrease in IoI@20 and IoR@20. For example, on Lastfm, IoI@20 drops from 0.6614 to 0.6161, and IoR@20 from 161.7352 to 141.8555. This validates that selecting target items that are collectively appealing and avoids cold-start issues is crucial for effective nudging.
- The paper notes that the pre-match module (selecting targets from a specific category based on collective preference) helps avoid problems like scattered or cold-start target items, leading to more successful nudging.
w/o Intention-induced scores (IIS):
- The intention-induced scores component shows a significant degradation in performance, especially on Lastfm (IoI@20 drops from 0.6614 to 0.4030, IoR@20 from 161.7352 to 97.2408) and Douban_movie. This highlights the importance of modeling user intention at a coarse-grained level to guide preferences effectively.
- On the Steam dataset, the impact of IIS is less pronounced. The authors suggest this might be due to the limited number of items users can search for in the Steam domain, which restricts candidate item pools and reduces the impact of different selection strategies related to intention.
w/o Targeted Individual Arousal Coefficients (TIAC):
- Removing TIAC consistently leads to a notable drop in IoI@20 and IoR@20 across all datasets (e.g., on Lastfm, IoI@20 drops from 0.6614 to 0.5149, IoR@20 from 161.7352 to 116.5403). This confirms the importance of personalizing the preference evolution degree based on individual user sensitivity to new content.
- The TIAC module performs particularly well on diverse datasets like Steam and Douban_movie, emphasizing that accounting for individual user responses is crucial in multi-round tasks where users might have varying levels of curiosity or openness to new items.
Overall: All three modules ( $P$ , IIS, TIAC) contribute positively to the interest nudging metrics (IoI@P and IoR@P), even though HR@P might see slight, acceptable variations. This indicates that these components are well-designed to steer users towards target content, improving the quality of proactive recommendation.

6.2.2. Parameter Sensitivity Analysis (RQ3)

The paper investigates the impact of two key hyperparameters: $Q$ (number of items considered for personal curiosity in TIAC) and $N_C$ (number of intentions).

The following figure (Figure 4 from the original paper) shows the effect of hyperparameters $Q$ and $N_C$ for four datasets:

$Figure 4: The effect of hyperparameters $Q$ and `N _ { C }` for four datasets$ 该图像是图表，展示了超参数 $Q$ 和 $N_{C}$ 对四个数据集的影响。上半部分的图表（左侧）显示了在不同采样数量 $Q$ 下的推荐效果（IOR），而下半部分（右侧）则呈现了在不同意图数量 $N_{C}$ 下的推荐效果。每个数据集包括 Lastfm、ML-100k、Steam 和 Douban movie，反映了这些超参数对个性化推荐性能的影响。

Effect of $Q$ (Figure 4a):
- For dense datasets like Lastfm and ML-100k, sampling a smaller number of top user preferences (e.g., $Q=5$ ) is sufficient to characterize user responses and achieve optimal performance for TIAC.
- For sparser datasets like Douban_movie and Steam, a larger $Q$ (e.g., $Q=20$ ) is required to adequately model user arousal levels. This makes sense, as more items might be needed to capture a user's diverse or less-defined short-term preferences in sparse environments.
Effect of $N_C$ (Figure 4b):
- $N_C$ represents the number of intentions modeled by the system. A larger $N_C$ allows for more diverse user intentions to be captured.
- For smaller datasets such as Lastfm and ML-100k, the model performs best with a relatively small number of intentions (around $N_C=32$ ). This suggests that these datasets might exhibit fewer distinct user intention patterns.
- For larger datasets like Steam and Douban_movie, a higher number of intentions (around $N_C=256$ ) yields better performance. This indicates that a more granular understanding of user intentions is beneficial in larger, potentially more diverse user bases.

6.2.3. LLM-based vs. Distribution-based Click Simulation (RQ3)

The paper provides a detailed comparison of the LLM-based and distribution-based click simulation schemes, both quantitatively and qualitatively.

Quantitative Comparison (Figure 6): The following figure (Figure 6 from the original paper) shows the comparative results of the distribution-based and LLM-based click simulations on the Lastfm and Douban_movie datasets:

该图像是一个比较图表，展示了 Lastfm 和 Douban_movie 数据集上基于分布和 LLM 的点击模拟结果。上方为 Lastfm 的结果，左侧为 IoL，右侧为 IoR；下方为 Douban_movie 的结果，左侧为 IoL，右侧为 IoR。

The LLM-based click model consistently outperforms the distribution-based approach on both Lastfm and Douban_movie datasets, for both IoI@P and IoR@P metrics across various evaluation windows ( $P$ ). This indicates that the LLM agent provides a more effective and realistic simulation of user behavior, leading to better nudging outcomes.

Qualitative Comparison (Table 5): The following are the results from Table 5 of the original paper:

Target movies in target category Sci-Fi	Description
[1] Robert A. Heinlein's The Puppet Masters	Sci-Fi, Horror.
[2] Aliens	Sci-Fi, Action, Thriller.
[3] Mars Attacks!	Sci-Fi, Action, Comedy, War.
The latest five movies' categories in the viewing history: Drama, Animation, Children's, Comedy, War
Intermediate items by LLM agent	Intermediate items by distribution-based scheme
Frighteners(Com, Hor) → Hunt for Red October(Act, Thr) → Forbidden Planet (Sci) ✓	Breakfast at Tiffany's (Dra, Rom) → While You Were Sleeping (Com, Rom) → Great Escape (War) → Best of the Best 3: No Turning Back (Act) → Strange Days (Sci, Act, Cri) ✓
Star Trek IV (Act, Adv, Sci) ✓	Forget Paris (Com, Rom) → G.I. Jane (Act, Dra, War) → Great Dictator (Com) → Star Trek IV (Sci) √
Drunks (Dra) → Balto (Ani,Chi) → Red Rock West (Thr) → Canadian Bacon (Com, War) → Dangerous Minds (Dra) → Strange Days (Act, Cri, Sci) √	Dangerous Ground (Dra) → Hour of the Pig (Dra, Mys) → Red Rock West (Thr) → Canadian Bacon (Com, War) → Moonlight and Valentino (Dra, Rom) → Dangerous Minds (Dra) → Hunt for Red October (Act, Thr)

A case study on the ML-100k dataset, with "Sci-Fi" as the target category, illustrates the qualitative difference:

LLM Agent: When nudging towards "Robert A. Heinlein's The Puppet Masters" (Sci-Fi, Horror), the LLM agent recommends a path like "Frighteners" (Com, Hor) $\to$ "Hunt for Red October" (Act, Thr) $\to$ "Forbidden Planet" (Sci). The LLM seems to understand the nuances of genres. For example, "Forbidden Planet" is categorized as "Sci-Fi" but actually contains elements of "Action, Thriller, and Adventure." The LLM leverages its external knowledge to connect these broader genre aspects, forming a coherent nudging path even across seemingly disparate initial preferences (Drama, Animation, Children's, Comedy, War).
Distribution-based Scheme: In contrast, the distribution-based scheme might follow a more direct similarity path. For the same target, it recommends "Breakfast at Tiffany's" (Dra, Rom) $\to$ "While You Were Sleeping" (Com, Rom) $\to$ "Great Escape" (War) $\to$ "Best of the Best 3" (Act) $\to$ "Strange Days" (Sci, Act, Cri). While it eventually reaches a Sci-Fi movie, the path appears less "reasoned" and more purely based on feature similarity. For "Mars Attacks!" (Sci-Fi, Action, Comedy, War), the distribution-based method struggles to recommend Sci-Fi movies effectively at times.
The LLM-based method demonstrates a more sophisticated reasoning capability, allowing it to bridge seemingly larger gaps by identifying subtle connections (e.g., hidden genre elements, broader thematic links) through its external knowledge, thus generating more effective and believable nudging paths.

6.2.4. Case Study: User Embedding Evolution (Appendix A.6)

The following figure (Figure 7 from the original paper) shows user embedding's evolution in ITMPRec:

该图像是热图，展示了ITMPRec模型在不同回合（Round）和维度（Dimension）下用户与目标之间的交互强度变化。横轴表示回合数，从用户到目标；纵轴显示不同的维度，色彩深浅反映了交互强度。图中信息有助于理解用户意图的发展。

The following figure (Figure 8 from the original paper) shows intermediate items recommended by ITMPRec:

该图像是一个示意图，展示了 ITMPRec 方法中用户意图随每轮推荐变化的热力图。图中横轴表示推荐轮次，纵轴表示用户维度，颜色深浅反映了意图强度的变化。

A visualization of a user's embedding evolution from the Lastfm dataset shows that ITMPRec effectively draws user preferences towards the target item over multiple rounds. The user embedding (represented as a point in a 2D space) gradually moves closer to the target item's embedding as intermediate items are recommended and "clicked." This visual evidence reinforces the idea that ITMPRec successfully nudges user preferences.

6.3. Summary of Findings

ITMPRec significantly advances the field of proactive recommendation by introducing sophisticated mechanisms for target selection, user intention modeling, individual sensitivity assessment, and realistic user feedback simulation. The ablation studies confirm the individual importance of these components, while parameter sensitivity analysis provides practical guidance for deployment. The superior performance over SR and existing ProactRec baselines, particularly in IoI and IoR, validates ITMPRec as an effective and robust solution for intention-based targeted multi-round proactive recommendation. The LLM-based click model is a notable innovation, offering more realistic and intelligent user feedback simulation.

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper introduces ITMPRec, a novel Intention-based Targeted Multi-round Proactive Recommendation method designed to overcome the limitations of traditional sequential recommendation (SR) systems that primarily cater to users' historical preferences. ITMPRec focuses on proactively guiding users towards a specific category of items over multiple interaction rounds. Its key contributions include:

Pre-match Module: A strategy to intelligently select a set of target items by considering all users' opinions within a specified category, thereby making the nudging process more purposeful and avoiding cold-start issues.
Intention-induced Scores: Integration of a mechanism to quantify users' intention-level evolution, which helps in selecting suitable intermediate items that align with changing coarse-grained user intentions during the guidance process.
Targeted Individual Arousal Coefficients (TIAC): A component that models each user's unique sensitivity and receptivity to new content, allowing for personalized preference evolution during multi-round nudging.
LLM Agent for Click Simulation: A pluggable Large Language Model (LLM) agent that provides a more realistic and intelligent simulation of user click feedback on intermediate recommendations, leveraging the LLM's external knowledge and reasoning capabilities compared to traditional distribution-based models.

Extensive experiments conducted on four real-world datasets demonstrate the significant superiority of ITMPRec over eight state-of-the-art baselines, showcasing average increases of 36.47% in IoI@20 and 68.80% in IoR@20. The ablation studies confirm the effectiveness of each proposed module.

7.2. Limitations & Future Work

The authors identify several directions for future work:

Causal Theory in Nudging: Further study of causal theory [13] in the nudging process to better understand the cause-and-effect relationships between recommendations and user preference shifts.
Model Explainability: Enhancing the model's explainability [23] to provide users with clearer reasons for the proactive recommendations.
Robustness in Complex Probabilistic Modeling: Improving the model's robustness in more complex probabilistic modeling scenarios, likely referring to the dynamics of user preferences and click behaviors.

While not explicitly stated as limitations, the paper implicitly highlights some challenges:

Resource Intensiveness of LLMs: The LLM-IPP baseline was noted for its high resource consumption, and while ITMPRec uses LLMs pluggably, their integration still implies a computational cost trade-off compared to non-LLM components.
Offline Simulation Dependence: The entire framework relies on an environment simulator. The quality of proactive recommendation in real-world deployment depends heavily on how accurately this simulator (especially the LLM agent) mimics actual user behavior.

7.3. Personal Insights & Critique

ITMPRec presents a significant step forward in proactive recommendation by comprehensively addressing critical limitations of prior work. The shift from random target selection to a pre-match module is highly practical, aligning recommendation goals with content provider strategies. The integration of user intention and individual arousal coefficients is particularly insightful, moving beyond simplistic user models to capture more nuanced and dynamic aspects of human behavior. This makes the nudging process more adaptive and potentially more ethical, as it's tailored to individual user receptivity rather than a rigid push.

The use of an LLM agent for click simulation is perhaps the most innovative aspect. It acknowledges the complexity of user decision-making, which traditional distribution-based models cannot fully capture. This approach has broad implications beyond this paper, suggesting that LLMs could become a standard component in offline evaluation environments for complex interactive systems, not just recommendation. This allows for more robust offline testing before costly live experiments.

Potential issues or unverified assumptions:

Interpretability of LLM Agent: While LLMs offer interpretability in generating explanations, their internal decision-making process for simulating a click ( $a_u^r=0$ or 1) might still be a black box. Understanding why an LLM agent decided a user would click, rather than just that it clicked, could be crucial for refining the nudging strategy.
Generalizability of LLM Agent: The effectiveness of the LLM agent relies on the prompt engineering ( $\mathcal{P}_F$ ) and the LLM's underlying external knowledge. While ChatGLM3 is powerful, its simulated feedback might still be limited by the data it was trained on and the specific instructions it receives. Its ability to capture novel or highly niche user behaviors might be a challenge.
Ethical Implications of Nudging: The term "proactive recommendation" or "nudging" inherently carries ethical considerations. While ITMPRec aims to broaden user interests, it could also be misused to manipulate users towards less beneficial content. Future work could explicitly integrate ethical safeguards or transparency mechanisms to ensure responsible nudging.
Real-world Deployment Challenges: Although ITMPRec shows strong offline performance, deploying a multi-round proactive recommendation system in a real-world setting would involve significant engineering challenges, including real-time user preference updates, cold-start issues for intermediate items, and the computational cost of generating LLM-based feedback or even using LLMs in live recommendation loops.

The methods and conclusions of ITMPRec could be applied to other domains where preference guidance is desirable, such as educational content recommendation (guiding students towards foundational topics), health and wellness apps (nudging users towards healthier habits or information), or news feeds (encouraging exposure to diverse perspectives to combat polarization). The concept of intention-aware and individually sensitive multi-round guidance is broadly applicable to any sequential decision-making process involving human interaction.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.