Paper status: completed

Interactive Recommendation Agent with Active User Commands

Published:09/25/2025
Original LinkPDF
Price: 0.100000
Price: 0.100000
Price: 0.100000
2 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This paper introduces the Interactive Recommendation Feed (IRF), enabling users to actively adjust recommendations via natural language commands. The developed RecBot employs a dual-agent architecture to understand user intent and optimize strategies, significantly enhancing user

Abstract

Traditional recommender systems rely on passive feedback mechanisms that limit users to simple choices such as like and dislike. However, these coarse-grained signals fail to capture users' nuanced behavior motivations and intentions. In turn, current systems cannot also distinguish which specific item attributes drive user satisfaction or dissatisfaction, resulting in inaccurate preference modeling. These fundamental limitations create a persistent gap between user intentions and system interpretations, ultimately undermining user satisfaction and harming system effectiveness. To address these limitations, we introduce the Interactive Recommendation Feed (IRF), a pioneering paradigm that enables natural language commands within mainstream recommendation feeds. Unlike traditional systems that confine users to passive implicit behavioral influence, IRF empowers active explicit control over recommendation policies through real-time linguistic commands. To support this paradigm, we develop RecBot, a dual-agent architecture where a Parser Agent transforms linguistic expressions into structured preferences and a Planner Agent dynamically orchestrates adaptive tool chains for on-the-fly policy adjustment. To enable practical deployment, we employ simulation-augmented knowledge distillation to achieve efficient performance while maintaining strong reasoning capabilities. Through extensive offline and long-term online experiments, RecBot shows significant improvements in both user satisfaction and business outcomes.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

Interactive Recommendation Agent with Active User Commands

1.2. Authors

Jiakai Tang (Gaoling School of Artificial Intelligence, Renmin University of China), Yujie Luo, Xunke Xi, Fei Sun, Xueyang Feng, Sunhao Dai, Chao Yi, Dian Chen, Zhujin Gao, Yang Li, Xu Chen, Wen Chen, Jian Wu, Yuning Jiang, Bo Zheng (Alibaba Group).

1.3. Journal/Conference

Published at (UTC): 2025-09-25. The paper is available via arXiv. Based on the formatting and authors (Alibaba Group), this work represents significant industrial research likely targeted at top-tier information retrieval or data mining conferences (e.g., SIGIR, KDD, WWW).

1.4. Publication Year

2025

1.5. Abstract

This paper addresses the limitations of traditional recommender systems that rely on passive, implicit feedback (like clicks or views), which often leads to ambiguous interpretations of user intent. The authors propose a new product paradigm called Interactive Recommendation Feed (IRF), allowing users to issue natural language commands (e.g., "I want a red dress, not blue") to actively control recommendations. To implement this, they developed RecBot, a dual-agent framework consisting of a Parser Agent (to understand intent) and a Planner Agent (to execute changes). The system uses simulation-augmented knowledge distillation to ensure it is fast enough for real-world use. Experiments show significant improvements in user satisfaction and business metrics (GMV, clicks) in both offline datasets and a live deployment on a major e-commerce platform.

https://arxiv.org/abs/2509.21317 (Preprint)

2. Executive Summary

2.1. Background & Motivation

In the current digital landscape, Recommender Systems (RecSys) are the primary tools for filtering vast amounts of information for users (e.g., on YouTube, Amazon, TikTok).

  • The Problem: Traditional systems operate on a passive feedback loop. They show a list of items, and users can only "click" or "ignore." This signal is coarse-grained and ambiguous. If a user ignores a video, is it because they dislike the topic, the creator, or the thumbnail? The algorithm has to guess, often leading to inaccurate modeling and user frustration.
  • The Gap: Users have complex intentions (e.g., "I like the style but it's too expensive") that cannot be expressed through a simple click. This creates a communication barrier between the user's mind and the system's logic.
  • The Innovation: The paper proposes shifting from "guessing" to "listening." By allowing users to type natural language commands directly into the feed, the system can obtain explicit instructions to adjust its behavior immediately.

2.2. Main Contributions / Findings

  1. New Paradigm (IRF): Introduced the Interactive Recommendation Feed, which integrates a command interface into standard scrolling feeds, distinct from separate "chatbot" windows.
  2. RecBot Framework: Designed a dual-agent system:
    • Parser: Translates messy human language into structured system constraints.
    • Planner: Dynamically selects "tools" (like filters or scorers) to adjust the recommendation algorithm in real-time.
  3. Optimization for Deployment: Solved the latency and cost issues of Large Language Models (LLMs) using knowledge distillation, teaching a smaller student model to mimic a powerful teacher model (GPT-4) using simulated user data.
  4. Proven Impact: Validated in a massive real-world e-commerce application (likely Taobao), showing a 0.71% reduction in negative feedback and a 1.40% increase in Gross Merchandise Volume (GMV), proving that giving users control actually helps business.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand RecBot, one must grasp a few core concepts:

  • Implicit vs. Explicit Feedback:
    • Implicit: Indirect signals like viewing time, clicks, or purchases. Easy to collect but ambiguous.
    • Explicit: Direct statements like ratings, reviews, or typing "I don't like horror movies." High quality but harder to get users to provide.
  • Collaborative Filtering (CF): A classic recommendation technique. It assumes that if User A and User B liked similar items in the past, they will like similar items in the future. It relies on historical patterns.
  • Large Language Model (LLM) Agents: An AI system that uses an LLM (like GPT) as a "brain" to reason. It usually involves:
    • Parsing: Understanding instructions.
    • Planning: Deciding which steps to take.
    • Tool Use: Calling external functions (e.g., a search engine or database) to perform tasks.
  • Embeddings: Converting text or images into a list of numbers (vectors) so computers can calculate similarity. If the vector for "King" and "Queen" are close mathematically, the system knows they are semantically related.
  • Knowledge Distillation: A compression technique where a large, smart, but slow "Teacher" model generates data to train a small, fast "Student" model. The Student learns to mimic the Teacher's output without needing the massive computing power.

3.2. Previous Works

  • Traditional Sequential Recommendation (e.g., SASRec, BERT4Rec): These models look at the sequence of items a user has clicked to predict the next one. They are powerful but fail to handle explicit user commands or sudden changes in intent.
  • Conversational Recommender Systems (CRS): These are typically chatbots (dialogue windows) where the system asks questions to narrow down preferences (e.g., "Do you prefer action or comedy?").
    • Limitation: They are often separate from the main browsing feed, interrupting the user experience.
  • LLM-based Agents (e.g., InteRecAgent, InstructAgent): Recent works attempting to use LLMs for recommendation.
    • Limitation: Some lack domain-specific knowledge (collaborative filtering signals) or are too slow for industrial feeds. Others don't handle negative feedback (what users don't want) effectively.

3.3. Differentiation Analysis

RecBot distinguishes itself by:

  1. Integration: It works directly inside the main feed (IRF), not a separate chat window.

  2. Bi-Directional Feedback: It explicitly models both positive (attraction) and negative (aversion) signals, whereas many systems focus only on what to recommend.

  3. Hybrid Logic: It combines the semantic reasoning of LLMs with the historical pattern matching of traditional collaborative filtering, ensuring recommendations are both logically correct (matches the command) and personalized (matches the user's taste).

    The following figure (Figure 1 from the original paper) illustrates this shift from the traditional passive loop (a) to the active, command-based RecBot loop (b):

    Figure 1: Comparison between traditional and novel interactive recommendation feeds. (a) Traditional systems rely on constrained and implicit feedback signals (e.g., likes/dislikes), making it difficult to accurately infer users' true intentions. (b) Our interactive paradigm enables free-form natural language commands, where RecBot responds and adjusts recommendation policy on-the-fly based on active user commands. 该图像是示意图,展示了传统推荐系统与互动推荐系统的对比。图(a)展示了传统系统依赖隐式反馈信号,引导用户选择;而图(b)则展示了互动推荐系统RecBot,允许用户通过自然语言命令主动调整推荐策略。

4. Methodology

4.1. Principles

The core philosophy of RecBot is "Decompose and Conquer." A raw user command (e.g., "Show me something warmer, but not wool") is too complex for a standard ranking algorithm to handle directly. RecBot splits the process:

  1. Understand (Parser): Convert the text into a structured "recipe" (e.g., Positive: "warmer", Negative: "wool").

  2. Act (Planner): Use specific mathematical tools to follow the recipe and re-score items in the database.

    The overall architecture is shown below (Figure 2 from the original paper), depicting the flow from Command (ctc_t) to Parser (Pt+1P_{t+1}) to Planner to the new Feed (Rt+1R_{t+1}):

    Figure 2: Overview of the RecBot framework for interactive recommendation. The framework comprises a Parser Agent that transforms user natural language command `c _ { t }` into structured preferences \(P _ { t + 1 } ,\) and a Planner Agent that orchestrates tool chains to dynamically adjust recommendation policies and generate the next feed \(R _ { t + 1 }\) . 该图像是示意图,展示了RecBot框架的工作流程。初始推荐源 R0R_0 通过用户的主动命令 ctc_t,经过解析器(Parser)转化为结构化偏好 Pt+1P_{t+1},再由规划器(Planner)动态调整推荐策略,生成下一个推荐项曝光 Rt+1R_{t+1}。图中展示了用户反馈的情绪变化过程,从初始的愤怒到最后的满意。

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. The User-Intent-Understanding Agent (Parser)

The Parser's job is to translate free-form text into a structured format while managing memory across multiple turns of interaction.

Step 1: Structured Command Parsing The Parser takes the current context (previous recommendations RtR_t, current command ctc_t, and past preferences PtP_t) and outputs updated preferences Pt+1P_{t+1}. The transformation function is defined as: $ \mathcal{P} : (R_t, c_t, P_t) \to P_{t+1} $

Crucially, RecBot splits preferences into Positive (+) and Negative (-) dimensions to handle likes and dislikes separately: $ P_{t+1} = { P_{t+1}^+, P_{t+1}^- } $

Within these, it further distinguishes between Hard constraints (strict rules, e.g., "price < $50") and Soft preferences (fuzzy desires, e.g., "romantic style"): $ P_{t+1}^+ = { C_{t+1}^{+, \mathrm{hard}}, C_{t+1}^{+, \mathrm{soft}} }, \quad P_{t+1}^- = { C_{t+1}^{-, \mathrm{hard}}, C_{t+1}^{-, \mathrm{soft}} } $

Step 2: Dynamic Memory Consolidation Users refine their intent over time. The Parser must decide how to combine new commands with old ones. It uses a context-aware decision function: $ P_{t+1} = \left{ \begin{array} { l l } { P _ { t } } & { \mathrm { i f } \ \phi _ { \mathrm { s a t } } ( c _ { t } , R _ { t } ) , } \ { P _ { t } \oplus \mathrm { E x t r a c t } ( c _ { t } ) } & { \mathrm { i f } \ \phi _ { \mathrm { c o m } } ( P _ { t } , c _ { t } , R _ { t } ) , } \ { \mathrm { R e s o l v e } ( P _ { t } , c _ { t } , R _ { t } ) } & { \mathrm { i f } \ \phi _ { \mathrm { c o n } } ( P _ { t } , c _ { t } , R _ { t } ) , } \end{array} \right. $

  • Preservation (PtP_t): If the user is satisfied (ϕsat\phi_{sat}), keep preferences as is.

  • Integration (\oplus): If new info is compatible (ϕcom\phi_{com}), add it to the existing memory (e.g., adding "blue" to "t-shirt").

  • Resolution (Resolve): If new info conflicts (ϕcon\phi_{con}), overwrite the old specific part (e.g., changing "budget < 50"to"budget<50" to "budget < 100").

    The following figure (Figure 3 from the original paper) visualizes this parsing and memory process:

    Figure 3: Illustration of the Parser for user intent understanding. The Parser integrates history preference memory `P _ { t }` , current recommendation feed `R _ { t }` , and active user command `c _ { t }` to generate new preference representation \(P _ { t + 1 }\) through structured parsing and dynamic memory consolidation. 该图像是示意图,展示了解析器的用户意图理解过程。图中,解析器结合历史偏好记忆 PtP_t、当前推荐信息 RtR_t 和主动用户命令 ctc_t,生成新的偏好表示 Pt+1P_{t+1}。图中的两个主要责任分别为:将自由格式命令转化为结构化指令,及动态记忆整合策略,确保用户偏好的高效更新与整合。

4.2.2. The Planning Agent (Planner)

The Planner takes the structured preferences Pt+1P_{t+1} and orchestrates a chain of "Tools" to calculate the final score for every candidate item. The scoring function is A:(Pt+1,Ht,I)St+1\mathcal{A} : (P_{t+1}, H_t, I) \to S_{t+1}.

The Planner dynamically selects which of the following tools to use:

Tool 1: Filter (Hard Constraints) This tool drastically reduces the search space by removing items that violate hard rules. $ I' = { i \in I : C^+ (i, C_{t+1}^{+, \mathrm{hard}}) = 1 \land C^- (i, C_{t+1}^{-, \mathrm{hard}}) = 0 } $

  • C+C^+ checks if item ii meets positive hard constraints (e.g., "Must be Nike").
  • CC^- checks if item ii violates negative hard constraints (e.g., "No red colors"). Items failing this get a score of -\infty.

Tool 2: Matcher (Positive Preferences) This calculates how relevant an item is to the user's positive soft preferences. It uses a hybrid approach combining Semantic understanding and Collaborative history.

  • Path A: Semantic Relevance (ssems_{\mathrm{sem}}) Uses pre-trained embeddings (like BERT or BGE) to measure text similarity between the item description and the user's intent. $ s_{\mathrm{sem}}(i, P_{t+1}^+) = \mathrm{sim}(\mathbf{e}{\mathrm{item}}(i), \mathbf{e}{\mathrm{intent}}(P_{t+1}^+)) $ Here, sim\mathrm{sim} is cosine similarity.

  • Path B: Active-Intent-Aware (AIA) Collaborative Relevance (saias_{\mathrm{aia}}) This is a key innovation. It treats the user's text command as a "query" to look into their behavioral history. First, it creates a unified representation of items by fusing text and image features: $ \mathbf{h}{\mathrm{fused}}^i = \mathrm{Linear}([\mathbf{h}{\mathrm{text}}^i \oplus \mathbf{h}{\mathrm{image}}^i \oplus \dots]) $ Then, it extracts patterns from the history HtH_t using a sequential encoder (SeqEnc): $ \mathbf{H}{\mathrm{fused}} = \mathrm{SeqEnc}([\mathbf{h}{\mathrm{fused}}^{i_1}, \dots, \mathbf{h}{\mathrm{fused}}^{i_{|H_t|}}]) $ Finally, it uses Multi-Head Cross-Attention (MHCA) to weigh historical items based on the current intent embedding hintent\mathbf{h}_{\mathrm{intent}}. This effectively asks: "Based on what the user just typed, which of their past actions are most relevant now?" $ s_{\mathrm{aia}}(i, P_{t+1}^+, H_t) = \mathrm{MHCA}(\mathbf{h}{\mathrm{intent}}, \mathbf{H}{\mathrm{fused}}, \mathbf{H}{\mathrm{fused}}) \cdot \mathbf{h}{\mathrm{fused}}^i $

    • hintent\mathbf{h}_{\mathrm{intent}} acts as the Query.

    • Hfused\mathbf{H}_{\mathrm{fused}} acts as the Key and Value.

      The total Matcher score combines both paths using a weighting parameter α\alpha: $ s_{\mathrm{match}}(i) = \alpha \cdot s_{\mathrm{sem}}(i, P_{t+1}^+) + (1 - \alpha) \cdot s_{\mathrm{aia}}(i, P_{t+1}^+, H_t) $

Tool 3: Attenuator (Negative Preferences) This calculates a penalty score for items matching the negative soft preferences. $ s_{\mathrm{atten}}(i) = - \beta \cdot \mathrm{sim}(\mathbf{e}{\mathrm{item}}(i), \mathbf{e}{\mathrm{intent}}(P_{t+1}^-)) $

  • Note the negative sign (-). High similarity to a negative preference reduces the item's total score.
  • β\beta controls how strong the penalty is.

Tool 4: Aggregator Combines the scores to produce the final ranking: $ s_{\mathrm{final}}(i) = s_{\mathrm{match}}(i) + s_{\mathrm{atten}}(i) $

The Planner's dynamic workflow is illustrated below (Figure 4 from the original paper):

Figure 4: Illustration of the Planner for on-the-fly recommendation policy adaptation. The Planner dynamically constructs optimal tool invocation sequences based on parsed user preferences \(P _ { t + 1 }\) to compute updated item scores \(s _ { \\mathrm { f i n a l } }\) for next recommendation feed \(R _ { t + 1 }\) . 该图像是示意图,展示了规划器在实时推荐策略调整中的作用。图中描述了如何根据解析的用户偏好 Pt+1P_{t+1},规划器生成下一步推荐 Rt+1R_{t+1} 的工具调用序列。推荐域工具集包括过滤工具、匹配工具、衰减工具和聚合工具等,支持自适应工具链的动态编排,以优化推荐效果。

4.2.3. Multi-Agent Optimization (Knowledge Distillation)

Running a giant model like GPT-4 for every user interaction is too expensive and slow. The authors use Simulation-Augmented Knowledge Distillation.

  1. User Simulation: They create a "User Agent" that simulates realistic browsing behavior and generates commands based on a target item itargeti_{\mathrm{target}}. ctsim=Usim(Rt,itarget,Gpersona)c_t^{\mathrm{sim}} = \mathcal{U}_{\mathrm{sim}}(R_t, i_{\mathrm{target}}, \mathcal{G}_{\mathrm{persona}})
  2. Teacher Generation: A powerful Teacher RecBot (GPT-4) generates the optimal structured preferences (PsimP^{\mathrm{sim}}) and tool chains (Tsim\mathcal{T}^{\mathrm{sim}}).
  3. Student Training: A smaller, open-source model (Qwen-14B) is trained to predict these outputs. The training objective is standard Next-Token Prediction (NTP), minimizing the Negative Log-Likelihood (NLL): $ \mathcal{L}(\theta) = \sum_{(x, y) \in \mathcal{D}{\mathrm{Mixed}}} \sum{j=1}^{|y|} - \log P_{\theta}(y_j | x, y_{xx is the input (history/command) and yy is the target output (structured preference or tool chain).

5. Experimental Setup

5.1. Datasets

The authors evaluated RecBot on three datasets representing different domains and complexities:

  1. Amazon (Books):
    • Type: E-commerce.
    • Constraints: Hard (Price, Language, Format), Soft (Category).
    • Scale: Sampled 1,000 users.
  2. MovieLens:
    • Type: Movie recommendations.
    • Constraints: Hard (Release Date), Soft (Genre).
    • Scale: User-item interactions with ratings > 3 are positive.
  3. Taobao:
    • Type: Large-scale industrial e-commerce (Alibaba).
    • Constraints: Hard (Price, Style, Material), Soft (Category). Features include product images (multimodal).
    • Scale: 3,000 users sampled from billions.

5.2. Evaluation Metrics

The paper uses standard ranking metrics alongside novel interaction-specific metrics.

  1. Recall@N (R@N\mathbf{R}@\mathbf{N}):
    • Concept: What percentage of the relevant items (ground truth) appeared in the top N recommendations?
    • Formula: $ \text{Recall@N} = \frac{|{ \text{relevant items} } \cap { \text{top-N recommended} }|}{|{ \text{relevant items} }|} $
  2. NDCG@N (N@N\mathbf{N}@\mathbf{N}):
    • Concept: Normalized Discounted Cumulative Gain. It measures ranking quality, giving more credit if the relevant item is higher up the list.
    • Formula: $ \text{NDCG@N} = \frac{\text{DCG@N}}{\text{IDCG@N}}, \quad \text{DCG@N} = \sum_{i=1}^{N} \frac{2^{rel_i} - 1}{\log_2(i+1)} $ (relirel_i is relevance of item at position ii).
  3. Condition Satisfaction Rate (CSR@N\mathbf{CSR}@\mathbf{N}):
    • Concept: The percentage of recommended items in the top N that actually meet the user's specified attributes (e.g., if user said "Red", are they red?). This measures obedience to commands.
    • Formula: (Derived from text description) $ \text{CSR@N} = \frac{1}{N} \sum_{i=1}^{N} \mathbb{I}(\text{item}_i \text{ satisfies constraints}) $ (I\mathbb{I} is the indicator function).
  4. Pass Rate (PR):
    • Concept: The percentage of test cases where the target item was successfully recommended within the limit of interaction rounds (TT).
  5. Average Rounds (AR):
    • Concept: The average number of dialogue turns required to find the target item. Lower is better (more efficient).

5.3. Baselines

RecBot is compared against three categories of models:

  1. Traditional Sequential Rec: SASRec, BERT4Rec (Deep Learning on ID sequences), MoRec, UniSRec (Multimodal).
  2. Command-Aware Rec: BM25 (Keyword matching), BGE (Semantic embedding matching).
  3. Interactive Agents:
    • GOMMIR: Goal-oriented multi-modal interactive recommendation.
    • InteRecAgent: Uses LLMs with tools but lacks the specific Parser/Planner decomposition of RecBot.
    • Instruct2Agent: A user-controllable agent.

6. Results & Analysis

6.1. Core Results Analysis

The experiments were conducted in three scenarios: Single-Round (SR), Multi-Round (MR), and Multi-Round with Interest Drift (MRID) (where the user changes their mind).

Key Findings:

  1. Superiority over Traditional Models: Traditional models (SASRec, etc.) perform poorly because they cannot process the explicit text command at all—they only rely on history.

  2. Superiority over Simple Search (BM25/BGE): While BGE handles the text command well, it ignores the user's history. RecBot wins because it combines both (via the AIA Collaborative Path).

  3. Student Beats Teacher? In some cases (e.g., MovieLens MRID), the "Aligned" student RecBot (Qwen) outperformed the Teacher (GPT-4). This suggests that finetuning on domain-specific simulation data can make a smaller model more specialized than a generalist giant model.

    The following are the results from Table 2 of the original paper (Single-Round Interaction):

    Method Recall@N NDCG@N Condition Satisfaction (CSR@N) Pass Rate
    R@10 R@20 R@50 N@10 N@20 N@50 C@10 C@20 C@50
    Amazon (SR)
    SASRec 0.0098 0.0163 0.0326 0.0061 0.0077 0.0109 65.81% 69.08% 73.88% 0.76%
    BGE 0.0598 0.1012 0.1795 0.0284 0.0387 0.0543 92.76% 95.19% 97.23% 3.16%
    InteRecAgent 0.0609 0.1023 0.1708 0.0321 0.0425 0.0560 81.98% 84.21% 87.28% 3.81%
    RecBot-Qwen (Align.) 0.2198 0.3101 0.4614 0.1207 0.1434 0.1736 94.92% 96.13% 97.34% 14.58%
    MovieLens (SR)
    SASRec 0.0236 0.0428 0.0942 0.0130 0.0177 0.0279 57.55% 62.15% 70.45% 1.61%
    RecBot-Qwen (Align.) 0.3383 0.4208 0.5675 0.2101 0.2309 0.2598 81.10% 84.69% 88.38% 25.70%

(Note: Selected representative baselines and main result rows for brevity, preserving the data structure).

The following are the results from Table 3 of the original paper (Multi-Round Interaction - Amazon & MovieLens):

Method Recall@N NDCG@N Pass Rate (PR) Avg Rounds (AR)
R@10 R@20 R@50 N@10 N@20 N@50
Amazon (MR)
InteRecAgent 0.0533 0.0849 0.1436 0.0284 0.0363 0.0479 3.70% 5.8085
RecBot-Qwen (Align.) 0.1893 0.2416 0.3177 0.1044 0.1177 0.1327 15.02% 5.5234
MovieLens (MR)
InteRecAgent 0.1081 0.1820 0.3255 0.0582 0.0767 0.1047 7.49% 5.6006
RecBot-Qwen (Align.) 0.4315 0.5021 0.6221 0.2742 0.2921 0.3159 38.12% 4.7837

RecBot demonstrates significantly higher Pass Rates and lower Average Rounds, meaning it understands the user faster and more accurately.

6.2. Ablation Studies

The authors tested simplified versions of RecBot to prove each part is necessary (Figure 5 in the paper):

  • V1 (Semantic Only): Uses only text matching. Performance is poor because it ignores user history.

  • V2 (Collaborative Only): Uses only history. Better, but misses some text nuances.

  • V3 (No Filter): Uses semantic + collaborative but without hard constraints.

  • Full: Uses everything. The addition of the Filter Tool (Hard Constraints) provided a clear boost, proving that strictly removing irrelevant items (e.g., filtering out expensive items when budget is low) is crucial for trust and accuracy.

    The results are visually summarized in the following chart (Figure 5 from the original paper):

    Figure 5: Offline ablation study results on Amazon dataset. All numerical values on axes correspond to percentages (percentage notation is omitted for conciseness). 该图像是一个图表,展示了在Amazon数据集上的离线消融研究结果。图中展示了不同模型版本(V1至V4)在CSR@20(SR、MR、MRID)和PR(SR、MR、MRID)上的性能表现,所有数值均表示百分比。横轴为模型版本,纵轴为相应指标值。

6.3. Online Experiment & Business Impact

RecBot was deployed for 3 months on a major platform (A/B testing). The results were overwhelmingly positive.

Metric Definitions:

  • NFF (Negative Feedback Frequency): How often users clicked "Dislike". (Lower is better).

  • EICD/CICD: Diversity of items exposed/clicked. (Higher is better).

  • GMV: Gross Merchandise Volume (Total sales value).

    The following are the results from Table 5 of the original paper:

    NFF↓ EICD↑ CICD↑ PV↑ ATC↑ GMV↑
    -0.71% +0.88% +1.44% +0.56% +1.28% +1.40%

Analysis:

  • NFF (-0.71%): Users complained less because they could simply tell the system what they wanted instead of getting frustrated.

  • GMV (+1.40%): This is a massive revenue uplift for a large-scale e-commerce platform. It shows that "controllability" leads to higher conversion.

    Figure 6 from the original paper visualizes the sustained improvement over the 3-month period:

    Figure 6: Online performance curves during the three-month A/B testing period. The comparison shows RecBot vs. the base system with all metrics normalized using min-max scaling. 该图像是图表,展示了在三个月的A/B测试期间,RecBot与基础系统的在线性能曲线。图中包含四个指标:EICD、CICD、ATC和GMV,所有度量均经过最小-最大规范化处理。

Figure 7 shows that the reduction in negative feedback was consistent across different user groups, though hardest-to-please users (extreme complainers) saw a slight increase in NFF, possibly because they engaged more with the feedback mechanism itself.

Figure 7: Online performance improvements across different user groups split by historical negative feedback frequency. 该图像是一个图表,展示了不同历史负反馈频率下用户群体的在线绩效改进情况。图中包含了基础用户比例和实验用户比例的对比,以及负反馈减少率的趋势。数据表明,随着负反馈频率的提高,负反馈减少率存在一定变化。

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper successfully introduces the Interactive Recommendation Feed (IRF), a paradigm shift that empowers users to actively shape their recommendations through natural language. The RecBot framework makes this possible by effectively parsing user intent into structured constraints and dynamically planning adjustments to the ranking algorithm. Through knowledge distillation, the team deployed this sophisticated LLM-based system in a real-time industrial environment, achieving significant gains in both user satisfaction (lower negative feedback) and business revenue (higher GMV).

7.2. Limitations & Future Work

  • Limitations: The authors note that while they handle immediate feedback well, the system currently relies on offline training.
  • Future Work: They plan to implement online learning mechanisms where the agents continuously evolve based on real-time user feedback, rather than waiting for the next training cycle. They also aim to improve the "proactive anticipation" capabilities of the agents.

7.3. Personal Insights & Critique

  • Transformative Potential: This work represents a crucial step in the evolution of AI agents. Moving from "Passive Consumption" to "Active Negotiation" in recommendations fundamentally changes the user relationship with algorithms. It reduces the "filter bubble" effect by allowing users to manually break out of their historical patterns (e.g., "Show me something different today").
  • Architectural Elegance: The separation of Parser (Language) and Planner (Logic/Math) is a robust design pattern. It allows the system to be modular—you can upgrade the LLM without changing the recommendation database tools.
  • Industrial Reality: The use of knowledge distillation is the unsung hero here. Many academic papers propose massive LLM agents that are impractical for real apps with milliseconds of latency requirements. RecBot's focus on optimizing a smaller model (Qwen) via a Teacher (GPT-4) makes this a highly practical blueprint for the industry.
  • Critique: While the "hard constraints" tool is powerful, it can be risky. If a user says "cheap," and the system strictly filters for <10, it might miss a perfect item at ``11. The definition of "hard" vs. "soft" in natural language is often ambiguous and requires extremely high precision in the Parser.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.