Paper status: completed

A Survey of Controllable Learning: Methods and Applications in Information Retrieval

Published:07/04/2024
Original LinkPDF
Price: 0.100000
Price: 0.100000
Price: 0.100000
2 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

Controllable learning is essential for trustworthy machine learning, allowing dynamic adaptation to complex information needs. This survey defines controllable learning, explores its applications in information retrieval, identifies challenges, and suggests future research direct

Abstract

Controllability has become a crucial aspect of trustworthy machine learning, enabling learners to meet predefined targets and adapt dynamically at test time without requiring retraining as the targets shift. We provide a formal definition of controllable learning (CL), and discuss its applications in information retrieval (IR) where information needs are often complex and dynamic. The survey categorizes CL according to what is controllable (e.g., multiple objectives, user portrait, scenario adaptation), who controls (users or platforms), how control is implemented (e.g., rule-based method, Pareto optimization, hypernetwork and others), and where to implement control (e.g., pre-processing, in-processing, post-processing methods). Then, we identify challenges faced by CL across training, evaluation, task setting, and deployment in online environments. Additionally, we outline promising directions for CL in theoretical analysis, efficient computation, empowering large language models, application scenarios and evaluation frameworks.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

The central topic of the paper is A Survey of Controllable Learning: Methods and Applications in Information Retrieval.

1.2. Authors

The authors and their affiliations are:

  • Chenglei Shen: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China.
  • Xiao Zhang: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China.
  • Teng Shi: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China.
  • Changshuo Zhang: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China.
  • Guofu Xie: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China.
  • Jun Xu: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China.
  • Ming He: AI Lab at Lenovo Research, Beijing, China.
  • Jianping Fan: AI Lab at Lenovo Research, Beijing, China.

1.3. Journal/Conference

The paper is a preprint, published on arXiv. The abstract also mentions "Received month dd, yyyy; accepted month dd, yyyy E-mail: {chengleishen9,zhangx89}@ruc.edu.cn. \circledcirc Higher Education Press 2025," suggesting it might be intended for publication in a journal or conference in 2025. arXiv is a well-regarded platform for disseminating research preprints across various scientific disciplines, including computer science and artificial intelligence. Its influence lies in enabling rapid sharing of new research before formal peer review and publication.

1.4. Publication Year

The paper was published on arXiv on 2024-07-04.

1.5. Abstract

The paper provides a comprehensive survey of controllable learning (CL), which is defined as the ability of machine learning models to meet predefined targets and adapt dynamically at test time without requiring retraining as targets shift. It formally defines CL and discusses its applications, particularly in information retrieval (IR), where user needs are often complex and dynamic. The survey categorizes CL based on what is controllable (e.g., multiple objectives, user portrait, scenario adaptation), who controls (users or platforms), how control is implemented (e.g., rule-based methods, Pareto optimization, hypernetworks), and where control is applied (e.g., pre-processing, in-processing, post-processing). It identifies current challenges in CL across training, evaluation, task setting, and online deployment. Finally, it outlines future directions, including theoretical analysis, efficient computation, empowering large language models (LLMs), new application scenarios, and improved evaluation frameworks.

The paper is available as a preprint at:

  • Original Source Link: https://arxiv.org/abs/2407.06083
  • PDF Link: https://arxiv.org/pdf/2407.06083v3.pdf It is currently a preprint and has not been formally published in a peer-reviewed journal or conference.

2. Executive Summary

2.1. Background & Motivation

What is the core problem the paper aims to solve?

The core problem the paper addresses is the lack of a unified definition and systematic understanding of controllable learning (CL) in discriminative machine learning models, especially within information retrieval (IR). While controllable generation (e.g., in text or image generation) is well-explored, the concept of models adapting dynamically to changing task requirements without retraining at test time is not clearly defined or broadly surveyed for discriminative tasks. The paper seeks to formalize controllable learning and provide a comprehensive taxonomy of its methods and applications.

Why is this problem important in the current field? What specific challenges or gaps exist in prior research?

This problem is crucial for the advancement of trustworthy machine learning. As machine learning models become more prevalent and powerful, particularly with the rise of Model-as-a-Service (MaaS) and large language models (LLMs), the ability to ensure that these models align with human intent and can be effectively interfered with or adjusted after deployment becomes paramount. The Bletchley Declaration and Global AI Governance Initiative highlight the global emphasis on safe, reliable, and controllable AI.

Specific challenges and gaps include:

  • Lack of Unified Definition: No formal, universally accepted definition of controllable learning exists, especially for discriminative models. This hinders systematic research and comparison.
  • Dynamic Information Needs: In information retrieval, user needs are inherently complex, evolving, and context-dependent. Traditional models often require costly retraining for every shift in user preference or platform objective, which is inefficient and impractical in real-time scenarios.
  • MaaS Paradigm Shift: The transition to MaaS means users access trained models via APIs without controlling the training process. This necessitates robust control mechanisms to allow personalization and adaptation without full model retraining.
  • Limitations of Existing Surveys: Previous surveys have only touched upon related concepts like user control or trustworthy IR, often from narrow perspectives (e.g., user-centric only, security aspects) and lacking in-depth technical analysis or coverage of newer techniques like hypernetworks or LLMs.
  • Distinction from Explainability: While explainability helps users understand why a model makes a decision, controllability focuses on the ability to change that decision to meet specific requirements. This distinction needs clarification.

What is the paper's entry point or innovative idea?

The paper's innovative idea is to formally define controllable learning (CL) as a distinct and crucial category within trustworthy machine learning. It explicitly focuses on the ability to adapt to diverse task requirements without retraining at test time. The paper then proposes a novel, multi-dimensional taxonomy (what, who, how, where) to systematically classify and understand existing CL methods, particularly within information retrieval, thereby providing a structured lens for future research.

2.2. Main Contributions / Findings

What are the paper's primary contributions? (e.g., proposing a new model, theory, algorithm, dataset, etc.)

The paper's primary contributions are:

  • Formal Definition of Controllable Learning (CL): It provides a clear and concise formal definition of CL in terms of task requirements (description, context, target) and the role of a control function that adapts a learner without retraining.
  • Comprehensive Taxonomy for CL in IR: It introduces a novel, multi-faceted taxonomy for CL specific to information retrieval applications, categorizing methods by:
    • What is Controllable: Multi-objective control, User Portrait Control, Scenario Adaptation Control.
    • Who Controls: User-Centric Control, Platform-Mediated Control.
    • How Control is Implemented: Rule-Based Techniques, Pareto Optimization, Hypernetworks, and Other Methods (Disentanglement, Reinforcement Learning, LLMs, Test-Time Adaptation).
    • Where to Control: Pre-processing, In-processing, Post-processing methods.
  • Identification of Challenges: It systematically identifies key challenges faced by CL across training, evaluation, task setting, and online environments.
  • Outline of Future Directions: It suggests promising avenues for future research in CL, including theoretical analysis, efficient computation, empowering LLMs, multi-task switching scenarios, and resource/metric development.
  • Comprehensive Survey of Existing Works: It provides a structured review of numerous existing works under its proposed taxonomy, creating a valuable resource for researchers.

What key conclusions or findings did the paper reach? What specific problems do these findings solve?

The key conclusions and findings include:

  • CL is distinct from and complements other trustworthy ML aspects like fairness, privacy, and interpretability, serving as a crucial mechanism for aligning model behavior with dynamic objectives.

  • The MaaS paradigm significantly heightens the demand for CL, especially in IR, where models must provide fine-grained, context-aware responses without costly retraining.

  • Current CL implementations utilize a diverse set of techniques (from simple rule-based to advanced hypernetworks and LLMs) applied at various stages of the model inference pipeline.

  • A significant gap exists in the evaluation of CL, with a lack of standardized benchmarks and dedicated datasets, hindering progress and comparability.

  • CL presents a fundamental trade-off between controllability and performance/efficiency during training.

  • The field is ripe for theoretical advancements to understand the causal links between task targets and model parameters in vast deep learning models.

    These findings solve the problem of a fragmented understanding of controllable learning. By providing a formal definition and a structured taxonomy, the paper offers a unified framework for researchers to categorize existing work, identify gaps, and guide future innovations. It highlights the importance of CL in adapting AI models to real-world, dynamic scenarios, which is essential for trustworthy AI and efficient deployment in the MaaS era.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand this survey, a beginner should be familiar with the following foundational concepts:

  • Machine Learning (ML): A field of artificial intelligence that enables systems to learn from data, identify patterns, and make decisions with minimal human intervention. Instead of being explicitly programmed, ML models learn to perform tasks by processing large datasets.
  • Artificial Intelligence (AI): A broader field encompassing machine learning, focused on creating machines that can perform tasks that typically require human intelligence, such as problem-solving, understanding language, and recognizing patterns.
  • Information Retrieval (IR): The science of searching for information within documents, searching for documents themselves, searching for metadata about documents, or searching within databases. Common examples include web search engines, recommender systems, and library catalog search.
  • Recommender Systems: A type of information retrieval system that filters information to predict the "rating" or "preference" a user would give to an item. They aim to provide personalized suggestions (e.g., movies, products, news) to users based on their past behavior or preferences.
  • Trustworthy Machine Learning: An umbrella term referring to the critical characteristics that AI systems must possess to be reliable and deployable in real-world scenarios. These characteristics include fairness (no bias against groups), privacy (protection of sensitive data), interpretability (understanding model decisions), and, as highlighted by this paper, controllability.
  • Controllable Generation: A subfield of generative AI where the output of a model (e.g., text, images, audio) can be guided or constrained by specific user-provided inputs, often called prompts. For instance, generating an image of a cat "in the style of Van Gogh" where "in the style of Van Gogh" is the control.
  • Discriminative Machine Learning Models: Models that learn to distinguish between different categories or predict a numerical value based on input features. Unlike generative models that create new data, discriminative models focus on classification or regression tasks (e.g., identifying spam email, predicting house prices, ranking search results).
  • Model-as-a-Service (MaaS): A deployment paradigm where trained machine learning models are made available to users or applications via APIs (Application Programming Interfaces). Users can leverage the model's capabilities without needing to manage the underlying infrastructure, training, or updating process. This contrasts with Software-as-a-Service (SaaS), which offers complete software applications.
  • Hypernetwork: A neural network that generates the weights (parameters) or a subset of weights for another neural network (the "main network"). This allows the main network's behavior to be dynamically adjusted based on an input to the hypernetwork, enabling adaptability without retraining the main network.
  • Large Language Models (LLMs): Advanced deep learning models, often based on the transformer architecture, trained on vast amounts of text data. They are capable of understanding, generating, and processing human language for a wide range of tasks, including translation, summarization, and question-answering. Examples include ChatGPT and GPT-4.
  • Multi-objective Optimization (MOO): A field of mathematical optimization where multiple objective functions are to be optimized simultaneously. These objectives often conflict, meaning improving one objective might worsen another. MOO seeks Pareto optimal solutions, where no objective can be improved without degrading at least one other objective.
  • Pareto Optimality: In multi-objective optimization, a solution is Pareto optimal if it is not possible to improve any objective without simultaneously worsening at least one other objective. The set of all Pareto optimal solutions forms the Pareto front.
  • Test-Time Adaptation (TTA): Algorithms that allow a pre-trained machine learning model to adapt to new, unlabeled data during the testing or inference phase, without requiring a full retraining process. This is particularly useful in dynamic environments where data distributions can shift.

3.2. Previous Works

The paper differentiates itself from and builds upon several related research areas. Here's a summary of key prior studies mentioned and their context:

  • User Control in Recommender Systems [78]: This 2017 survey focused on allowing users to control recommendations, particularly when system assumptions about their preferences were incorrect.

    • Context: It highlights the instantaneous nature of user interests and the need for user-driven intervention.
    • Limitation (as identified by current paper): It was exclusively user-centric, neglecting platform-side control. It also lacked rigorous definitions and in-depth technical analysis, mainly summarizing interaction forms.
  • Trustworthy Information Retrieval [79]: This survey primarily dealt with security aspects like privacy preservation in recommender systems.

    • Context: It did touch upon controllability, categorizing it into explicit controllability (users directly edit preferences) and implicit controllability (users indirectly fine-tune preferences through interactions like re-ranking).
    • Limitation (as identified by current paper): It lacked a systematic summary of controllability in machine learning and the methods to achieve it. It also missed broader controllability types like platform-side or multi-objective control.
  • Explainable Information Retrieval [80, 81]: These surveys focused on making IR or recommender models transparent and interpretable, providing reasons for retrieved or recommended items.

    • Context: Explainability can facilitate human understanding and thus promote controllability.
    • Differentiation (as identified by current paper): Explainability (understanding why) is distinct from controllability (the ability to change decisions). While related, they address different core challenges. Explainability is a preliminary step towards controllability, but not the same.
  • Controllable Generation (e.g., text generation [62-64] or visual generation [65-67]): This field involves AI models generating content aligned with a given task description or prompt.

    • Context: These methods often use prompt-based approaches, seen as a pre-processing form of control where inputs are changed for a fixed model.
    • Differentiation (as identified by current paper): The current paper extends controllable learning beyond generative models to discriminative models, noting that controlling model parameters (in-processing) might yield better convergence than just controlling model inputs.

3.3. Technological Evolution

The field of machine learning has evolved significantly, leading to the current emphasis on controllable learning:

  1. Early IR Systems: Initially, information retrieval systems were often rule-based or relied on simpler statistical models (e.g., Boolean search, TF-IDF). Control was largely manual or hard-coded.

  2. Rise of Learning-to-Rank and Recommender Systems: The advent of machine learning brought sophisticated ranking algorithms (learning-to-rank) and recommender systems. These models, however, often prioritized performance metrics like accuracy and relevance, sometimes at the expense of diversity, fairness, or user control.

  3. Emergence of Trustworthy AI: As ML models moved into critical applications, concerns about fairness, privacy, interpretability, and robustness grew, leading to the trustworthy AI movement. Controllability is identified as a key component of this.

  4. Deep Learning Revolution: The widespread adoption of deep neural networks and transformer models dramatically increased model complexity and performance, but also made them more opaque and harder to control.

  5. Model-as-a-Service (MaaS) Paradigm: The commercialization of large-scale ML models (especially LLMs) through MaaS platforms created a new imperative for controllability. Users of these services need to customize model behavior without access to the full training pipeline.

  6. Advanced Control Techniques: New techniques like hypernetworks (dynamically generating model parameters), Pareto optimization (balancing conflicting objectives), disentangled representations (isolating specific features for control), and the application of Reinforcement Learning (RL) and LLMs for instruction-following have emerged as ways to achieve more sophisticated controllability.

    This paper's work fits into this timeline by formally defining controllable learning and surveying its methods, thereby structuring the field and guiding its development in the context of trustworthy AI, the MaaS paradigm, and the capabilities of modern deep learning and LLMs.

3.4. Differentiation Analysis

Compared to prior work, the core differences and innovations of this paper's approach are:

  • Unified and Formal Definition: Unlike previous works that implicitly used "user control" or addressed controllability as a sub-aspect of trustworthiness (e.g., privacy), this paper provides the first formal and explicit definition of controllable learning (CL) as a standalone concept for discriminative ML models. This definition (Definition 1) is comprehensive, covering task description, context, and task target, and emphasizes adaptation without retraining at test time.

  • Comprehensive Multi-Dimensional Taxonomy: Instead of focusing on a single aspect (e.g., user perspective, security), the paper introduces a novel, multi-dimensional classification framework:

    • What to control: It broadens controllability to multi-objective control, user portrait control, and scenario adaptation control, covering a wider range of practical needs than just basic preference adjustments.
    • Who controls: It explicitly distinguishes between user-centric and platform-mediated control, recognizing the divergent interests and capabilities of different stakeholders.
    • How control is implemented: It surveys diverse technical mechanisms like rule-based, Pareto optimization, hypernetworks, disentanglement, and LLMs, offering a structured view of the algorithmic landscape.
    • Where control is applied: It categorizes methods by their placement in the inference pipeline (pre-processing, in-processing, post-processing), providing a practical framework for implementation.
  • Focus on IR Applications: While acknowledging controllable generation, the survey specifically delves into CL within information retrieval and recommender systems, which are areas with complex, dynamic, and often implicit user needs. This specialized focus fills a gap in the literature.

  • Timeliness and Scope: The survey is timely, incorporating recent advancements in hypernetworks and the emerging role of large language models (LLMs) in controllable learning, which were largely absent in older surveys on related topics. It addresses the challenges posed by the Model-as-a-Service (MaaS) paradigm.

  • Emphasis on Test-Time Adaptation: A key differentiator is the emphasis on test-time adaptation without retraining. This is crucial for efficiency and scalability in real-world online environments and for MaaS deployments, where constant retraining for every new control signal is impractical.

    In essence, this paper moves beyond fragmented discussions of "user control" or "fairness" by proposing a holistic, rigorously defined, and systematically categorized framework for controllable learning, offering a comprehensive map for researchers and practitioners in the evolving landscape of trustworthy AI.

4. Methodology

This paper is a survey, so its "methodology" is primarily its structured approach to defining and categorizing Controllable Learning (CL) within Information Retrieval (IR), rather than proposing a new algorithm. The core methodology involves: (1) providing a formal definition of CL, (2) outlining its procedure, and (3) presenting a multi-faceted taxonomy to classify existing methods.

4.1. Principles

The core idea of Controllable Learning is to enable machine learning models to adapt dynamically to diverse task requirements at test time without the need for retraining. The theoretical basis or intuition behind this is rooted in the need for trustworthy AI systems that can align with human intent and adapt to shifting goals or environments efficiently. Instead of building a new model for every slight change in requirement, a controllable learner is designed to be flexible, allowing users or platforms to "steer" its behavior post-deployment via explicit control signals. This principle is particularly valuable in Information Retrieval where user preferences and platform objectives are constantly evolving.

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Formal Definition of Controllable Learning (CL)

The paper formally defines Controllable Learning (CL) as follows:

Definition 1 (Controllable Learning (CL)). Define a task requirement triplet T={sdesc,sctx,stgt}Γ\mathcal { T } = \{ s _ { \mathrm { d e s c } } , s _ { \mathrm { c t x } } , s _ { \mathrm { t g t } } \} \in \Gamma, where sdescDdescs _ { \mathrm { d e s c } } \in \mathcal { D } _ { \mathrm { d e s c } } represents the task description, sctxDctx\mathbf { s } _ { \mathrm { c t x } } \in \mathcal { D } _ { \mathrm { c t x } } represents the context related to the task, and stgtDtgts _ { \mathrm { t g t } } \in \mathcal { D } _ { \mathrm { t g t } } represents the task target. Given an input space χ\chi and an output space _ y, for a learner f:Xyf : X \to y, controllable learning (CL) aims to find a control function hh that maps the learner ff, the task description sdescT\mathbf { \boldsymbol { s } } _ { \mathrm { d e s c } } \in \mathcal { T }, and the context sctxT\pmb { s } _ { \mathrm { c t x } } \in \mathcal { T } to a new learner fTf _ { \mathcal { T } } that fulfills the task target stgtTs _ { \mathrm { t g t } } \in \mathcal { T }, i.e,

fT=h(f,sdesc,sctx). \begin{array} { r } { f _ { \mathcal { T } } = h ( f , s _ { \mathrm { d e s c } } , s _ { \mathrm { c t x } } ) . } \end{array}

The integration of the learner ff and the control function hh is called a controllable learner. Moreover, upon receiving a new task requirement TΓ\mathcal { T } ^ { \prime } \in \Gamma at test time, the control function hh should be capable of outputting a new learner fT,f _ { \mathcal { T } } , without the need for model retraining, ensuring that fT,f _ { \mathcal { T } } , satisfies the task target stgtTs _ { \mathrm { t g t } } ^ { \prime } \in \mathcal T ^ { \prime }.

Symbol Explanation:

  • T\mathcal { T }: Represents a task requirement triplet, a comprehensive specification for a task.

  • sdescs _ { \mathrm { d e s c } }: Denotes the task description, which is the specific representation of the task target that can be perceived and processed by the control function hh. This could be a vector of weights, natural language, or specific rules.

  • Ddesc\mathcal { D } _ { \mathrm { d e s c } }: Represents the domain or space from which task descriptions are drawn.

  • sctx\mathbf { s } _ { \mathrm { c t x } }: Denotes the context related to the task, such as historical data or user profiles, providing additional background information.

  • Dctx\mathcal { D } _ { \mathrm { c t x } }: Represents the domain or space from which task contexts are drawn.

  • stgts _ { \mathrm { t g t } }: Represents the task target, which is the ideal quantitative metric or performance objective that the controllable learner aims to achieve (e.g., a specific balance of accuracy and diversity).

  • Dtgt\mathcal { D } _ { \mathrm { t g t } }: Represents the domain or space from which task targets are drawn.

  • Γ\Gamma: The space of all possible task requirement triplets.

  • χ\chi: The input space of the learner ff.

  • _ y: The output space of the learner ff.

  • ff: The base learner or machine learning model that maps inputs from χ\chi to outputs in _ y.

  • hh: The control function, which is the core mechanism of CL. It takes the base learner ff, the task description sdescs _ { \mathrm { d e s c } }, and the context sctxs _ { \mathrm { c t x } } as inputs.

  • fTf _ { \mathcal { T } }: The new learner (or adapted learner) produced by the control function hh. This adapted learner is specifically configured to fulfill the task target stgts _ { \mathrm { t g t } } defined in T\mathcal { T }.

  • T\mathcal { T } ^ { \prime }: A new task requirement that arrives at test time.

    The key aspects of this definition are the existence of a control function hh that modifies the learner ff (or its behavior) and the crucial requirement that this adaptation happens without retraining the underlying model for each new task requirement.

The procedure of Controllable Learning is depicted in Figure 2 (from the original paper). This figure illustrates how the task description (e.g., preferences, rules) and context (e.g., historical data, user profiles) are fed into the controllable learner, which consists of the control function hh and the learner ff. The control function then adapts the learner to produce recommendations or predictions that are consistent with the task target.

The following figure (Figure 2 from the original paper) shows the overall framework of Controllable Learning:

该图像是一个示意图,展示了可控学习的框架,包括上下文、可控学习者和输出。图中涵盖了历史数据、用户画像等要素,并且强调了多目标控制、用户画像控制和场景适应控制等目标。该图形有助于理解可控学习在信息检索中的应用。
该图像是一个示意图,展示了可控学习的框架,包括上下文、可控学习者和输出。图中涵盖了历史数据、用户画像等要素,并且强调了多目标控制、用户画像控制和场景适应控制等目标。该图形有助于理解可控学习在信息检索中的应用。

VLM Description: The image is a diagram illustrating the framework of controllable learning, including context, controllable learner, and output. It covers elements such as historical data and user profile, emphasizing objectives like multi-objective control, user portrait control, and scenario adaptation control. This figure helps in understanding the applications of controllable learning in information retrieval.

4.2.2. Taxonomy of CL in Information Retrieval

The paper proposes a four-dimensional taxonomy to categorize CL methods in IR: What is Controllable, Who Controls, How Control is Implemented, and Where to Control.

4.2.2.1. What is Controllable

This dimension addresses "What does the task target stgts _ { \mathrm { t g t } } in Definition 1 of CL look like?".

  • Multi-Objective Control: This refers to scenarios where both users and platforms have multiple, often conflicting, objectives (e.g., accuracy, diversity, novelty). The task target stgts _ { \mathrm { t g t } } represents an expected performance target balancing these objectives, and the task description sdescs _ { \mathrm { d e s c } } explicitly or implicitly conveys the preference weights for each objective (e.g., a vector [0.4, 0.6] for accuracy and diversity). The goal is to adapt the model to these shifting preferences without retraining. The following figure (Figure 3 from the original paper) illustrates the need for Multi-Objective Control:

    Figure 3 Illustration of the need for Multi-Objective Control. Users may have a temporary preference during the test stage (e.g., shifting from "Love" and "Suspense" to "Fiction" after a temporary description). The platform may instead focus more on the diversity of the outputs rather than accuracy during the test stage. This figure depicts the dynamic nature of hifting objectives for both the platform and users during test stage, underscoring the critical role of multi-objective control in recommender systems.
    该图像是示意图,展示了多目标控制的必要性。在测试阶段,用户的临时偏好可能会从'爱情'和'悬疑'转向'小说',而平台则更关注输出的多样性。图中说明了用户与平台在测试阶段目标动态变化的重要性。

    VLM Description: The image is an illustration demonstrating the need for multi-objective control. During the test stage, user preferences may temporarily shift from 'Love' and 'Suspense' to 'Fiction', while the platform focuses more on the diversity of outputs. It highlights the importance of dynamic goal changes for both users and platforms during the test stage.

  • User Portrait Control: This involves allowing users to edit their context sctx\mathbf { s } _ { \mathrm { c t x } } (e.g., personal profiles, interaction history) to influence the recommendation output. The task target is achieved by modifying the input to the learner ff. This enables personalization and privacy protection. The following figure (Figure 4 from the original paper) provides examples of User Portrait Control:

    Figure 4 The Examples on User Portrait Control. The preference summary represents an aggregation of the user's interaction history. The list on the right shows the output items along with their ratings. Deletions are marked in red, and additions are marked in green. This figure illustrates three forms of user modification, demonstrating that user portrait control can be achieved by modifying and editing the interaction history, preference summary, and ratings.
    该图像是示意图,展示了用户偏好控制的示例。用户的偏好摘要显示其过去对小说类电影的喜好,以及最近向情节驱动影片(如爱情和悬疑)转变的趋势。交互历史部分清晰列出了用户观看的电影,同时右侧的评分部分包括不同电影的评分情况,体现了用户对动作类电影的潜在偏好。

    VLM Description: The image is a diagram illustrating examples of user portrait control. The preference summary shows the user's past enjoyment of fiction films and a recent shift towards plot-driven movies like love and suspense. The interaction history clearly lists the movies watched by the user, while the rating section on the right includes the ratings for different films, reflecting the user's potential preference for action movies.

  • Scenario Adaptation Control: This addresses adapting the model to different scenarios (e.g., different content pages, time segments) without retraining. The task description sdescs _ { \mathrm { d e s c } } includes scenario-specific side information, allowing the control function hh to map the learner ff to a scenario-specific learner fTf _ { \mathcal { T } }. The following figure (Figure 5 from the original paper) shows the workflow of Scenario Adaptation Control:

    Figure 5 The Workflow of Scenario Adaptation Control. Unlike traditional settings where a learner is fixed to a specific scenario, Scenario Adaptation Control aims to address the challenge of learner adaptation under continuous scenario switching during the test stage. When a new scenario arises, the control function maps the general learner to a scenario-specific learner based on the corresponding task description. This allows the scenario-specific learner to be applied to the new context without the need for retraining.
    该图像是示意图,展示了场景适应控制的工作流程。在这一过程中,控制函数将普通学习者根据任务描述映射到特定场景的学习者,支持在不同场景下的动态适应,无需重训练。

    VLM Description: The image is a diagram illustrating the workflow of scenario adaptation control. In this process, the control function maps a general learner to a scenario-specific learner based on task descriptions, enabling dynamic adaptation across different scenarios without retraining.

4.2.2.2. Who Controls

This dimension addresses "Who proposes the task description sdescs _ { \mathrm { d e s c } } in Definition 1?".

  • User-Centric Control: The user explicitly defines their preferences (task target) and provides this information in a specified format (task description), such as questionnaires, weighting buttons, or natural language. It can be explicit (direct input) or implicit (inferred from behaviors like interactions). Objectives include fostering interest, privacy protection, noise elimination, and exploration.

  • Platform-Mediated Control: The platform (e.g., service provider) imposes algorithmic adjustments or policy-based constraints on the recommendation process. The control requirements are still expressed as sdescs _ { \mathrm { d e s c } }, but the task target stgts _ { \mathrm { t g t } } focuses on optimizing platform objectives (e.g., increasing diversity to promote less popular items, balancing multiple objectives, adapting to multi-scenarios for performance, efficiency for cost reduction). The following figure (Figure 6 from the original paper) illustrates the objects of User-Centric Control and Platform-Centric Control:

    Figure 6 The Objects of User-Centric Control and Platform-Centric Control.
    该图像是示意图,展示了以用户为中心和以平台为中心的控制对象。左侧包含与用户相关的偏好、隐私、多样性、探索和数据过滤等概念,右侧则涉及到与平台相关的适应性、效率和准确性等方面。

    VLM Description: The image is a diagram illustrating the objects of user-centric and platform-centric control. The left side includes concepts related to users, such as preference, privacy, diversity, exploration, and data filtering, while the right side addresses platform-related aspects like adaptation, efficiency, and accuracy.

4.2.2.3. How Control is Implemented

This dimension summarizes the common control methods for implementing the control function h()h ( \cdot ).

  • Rule-Based Techniques: These involve applying predefined rules to the inputs or outputs of recommender systems. They act as a patchwork solution to enhance system performance from a product perspective.

    • Pre-processing: h:ffgruleh : f \to f \circ g _ { \mathrm { r u l e } }. The rule-based control mechanism gruleg _ { \mathrm { r u l e } } processes context information sctxs _ { \mathrm { c t x } } (e.g., user profiles, interaction history) to alter representations for privacy preservation or fairness.
    • Post-processing: h:fgrulefh : f \to g _ { \mathrm { r u l e } } \circ f. The rule-based control mechanism gruleg _ { \mathrm { r u l e } } directly modifies the output of the learner ff (e.g., removing outdated items, promoting less popular items to enhance diversity).
  • Pareto Optimization: This technique is used when balancing multiple, often conflicting objectives. The task target stgts _ { \mathrm { t g t } } represents a single Pareto optimal solution that satisfies multiple objectives simultaneously. The task description sdescs _ { \mathrm { d e s c } } typically consists of multi-objective weights or constraints. The challenge is to guide the learner ff to achieve Pareto optimality under these constraints. The following figure (Figure 7 from the original paper) illustrates Controllable Pareto Optimization:

    Figure 7 The illustration of controllable pareto optimization.
    该图像是示意图,展示了可控帕累托优化的影响。左侧显示了在没有控制的情况下的帕累托前沿,而右侧则展示了在控制目标1的情况下的帕累托前沿变化。图中用不同颜色和标记区分了各个目标,强调了控制在优化过程中的重要性。

    VLM Description: The image is an illustration showing the effects of controllable Pareto optimization. The left side displays the Pareto front without control, while the right side illustrates the changes in the Pareto front when controlling for Objective 1. Different colors and markings are used to differentiate the objectives, highlighting the importance of control in the optimization process.

  • Hypernetwork: A neural network that generates parameters for another network. In CL, the hypernetwork acts as the control function hh. The task description sdescs _ { \mathrm { d e s c } } (e.g., task or domain description) is input to the hypernetwork, which then outputs all or part of the weights for the learner ff, thus customizing the model to achieve the task target stgts _ { \mathrm { t g t } } dynamically.

  • Other Methods:

    • Disentanglement: Decouples user interests into specific dimensions within the latent space, allowing for controllable manipulation of aspects like item category preferences.
    • Reinforcement Learning (RL): Achieves controllability by designing specific reward functions that guide the algorithm's learning from environmental interaction to meet desired control goals.
    • Large Language Models (LLMs): Leverages LLMs for their general intelligence and instruction-following capabilities. LLMs can be fine-tuned or prompted to act as control functions, interpreting natural language instructions (task descriptions) to guide recommendations or content generation.
    • Test-Time Adaptation (TTA): Algorithms that adapt a pre-trained model directly during the test stage using unlabeled test data, without retraining. While some hypernetwork-based methods fall here, TTA generally focuses on reactive adjustments to data shifts rather than proactive control signals.

4.2.2.4. Where to Control in Information Retrieval Models

This dimension categorizes CL methods based on where the control function is applied during the inference process.

  • Pre-Processing Methods: These methods achieve the task target by transforming the model inputs (e.g., concatenating task descriptions as prompts or modifying user profiles/interaction histories) before inference, without altering the model parameters itself.

  • In-Processing Methods: These methods adaptively adjust the parameters or hidden states of the learner upon receiving the task description and context during inference. Hypernetworks are a prime example, generating or modifying model parameters on-the-fly.

  • Post-Processing Methods: These methods refine the model outputs after inference. Examples include reranking (e.g., MMR for balancing relevance and diversity) or result diversification to meet task targets.

    The overall framework classification and specific examples mentioned in Table 1 (which will be presented in the Results & Analysis section) illustrate how these dimensions combine in various CL applications.

5. Experimental Setup

As a survey paper, this work does not present novel experimental results for a new model but rather analyzes existing literature. Therefore, it discusses evaluation metrics and datasets that are commonly used in the field of controllable learning within information retrieval.

5.1. Datasets

The paper summarizes several publicly available datasets commonly used for controllable learning research in information retrieval and recommender systems. These datasets are chosen because they include features crucial for various control requirements, such as item category information for diversity control, and user profiles and interaction history for user portrait control.

  • Amazon [53, 54]:
    • Source: Product reviews from Amazon.
    • Scale: Comprises 142.8 million product reviews.
    • Characteristics: Covers various product categories, includes user and item profiles. Contains category information for items and time information.
    • Purpose: Suitable for multi-objective control (e.g., diversity, fairness via categories), user portrait control (using user profiles and historical sequences), and potentially scenario adaptation control (using time information for dynamic preferences).
  • Ali Display_Ad_Click [55]:
    • Source: Ad display/click logs from Alibaba.
    • Scale: Includes records for 1 million users and 26 million ad display/click logs.
    • Characteristics: Features 8 user profile attributes (e.g., ID, age, occupation) and 6 item features (e.g., ID, campaign, brand).
    • Purpose: Useful for user portrait control (via user profiles) and multi-objective control (e.g., optimizing click-through rate and other ad-related metrics).
  • UserBehavior [1]:
    • Source: User behaviors from Taobao's recommender systems.
    • Scale: Collects behaviors of approximately one million randomly selected users.
    • Characteristics: Covers all behaviors (clicks, purchases, add-to-cart, likes) during a specific period (November 25, 2017, to December 3, 2017).
    • Purpose: Ideal for user portrait control (editing historical sequences), multi-objective control (balancing different types of interactions), and scenario adaptation control (if specific temporal patterns within this window are considered scenarios).
  • MovieLens:
    • Source: Classical movie recommendation dataset.
    • Scale: Available in various sizes (100k, 1M, 10M, 20M ratings).
    • Characteristics: Includes information on users' gender, age, and occupation, as well as item category information (genres).
    • Purpose: Widely used for user portrait control (user demographics), multi-objective control (balancing movie genres, accuracy vs. diversity), and general recommender system research.
  • MS MARCO [56] (Microsoft Machine Reading Comprehension):
    • Source: Compiled from real user queries extracted from Microsoft Bing's search logs.

    • Scale: Extensive dataset with 3.2 million documents and 8.8 million passages.

    • Characteristics: Designed for evaluating machine reading comprehension, retrieval, and question-answering. Each query is paired with annotated relevant documents, spanning a wide variety of question types and document genres.

    • Purpose: Primarily used for search-focused IR tasks, offering potential for controllable learning in search (e.g., controlling query intent, result diversity, or relevance aspects).

      The paper notes that there are currently no dedicated domain-specific datasets explicitly designed for controllable learning, indicating this as an open challenge.

5.2. Evaluation Metrics

The paper highlights the absence of specific evaluation criteria for controllable learning and suggests that appropriate use of existing information retrieval and multi-objective optimization metrics can verify controllability. The core idea is to assess whether the control function hh can effectively make the learner fTf _ { \mathcal { T } } meet the task requirements T\mathcal { T }. This can be checked by observing the correlation between a control parameter α\alpha (representing the degree of control) and the desired performance ss (e.g., NDCG, diversity).

5.2.1. Single-Objective Metrics

These metrics are typically used to evaluate one specific aspect of a recommender system or information retrieval model.

Accuracy

Accuracy in recommender systems measures how well recommended items match user preferences.

  • NDCG (Normalized Discounted Cumulative Gain) [48]

    • Conceptual Definition: NDCG evaluates the quality of a ranked list by considering both the relevance of items and their position in the list. More relevant items at higher ranks contribute more to the score. It normalizes the score against an ideal ranking to allow comparison across different queries or recommendation lists.
    • Mathematical Formula: NDCG @k=1Ni=1NDCGi@kIDCGi@k,DCGi@k=j=1k2yij1log2(j+1), \begin{array} { r l } & { \mathrm { N D C G } \ @k = \displaystyle \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \frac { \mathrm { D C G } _ { i } @ k } { \mathrm { I D C G } _ { i } @ k } , } \\ & { \mathrm { D C G } _ { i } @ k = \displaystyle \sum _ { j = 1 } ^ { k } \frac { 2 ^ { y _ { i j } } - 1 } { \log _ { 2 } ( j + 1 ) } , } \end{array}
    • Symbol Explanation:
      • NN: The total number of test samples (e.g., users or queries).
      • kk: The cut-off rank, indicating that only the top kk items in the list are considered.
      • DCGi@k\mathrm { DCG } _ { i } @ k: The Discounted Cumulative Gain for the ii-th test sample at rank kk. It sums the relevance scores, applying a logarithmic discount for lower-ranked items.
      • IDCGi@k\mathrm { IDCG } _ { i } @ k: The Ideal Discounted Cumulative Gain for the ii-th test sample at rank kk. This is the maximum possible DCG score, obtained by ranking all relevant items perfectly.
      • yij{0,1}y _ { i j } \in \{ 0 , 1 \} (or higher values for graded relevance): Denotes the relevance label of the jj-th item in the ranked list for the ii-th test sample. Here, it implies binary relevance (0 for irrelevant, 1 for relevant).
      • log2(j+1)\log _ { 2 } ( j + 1 ): The logarithmic discount factor, which reduces the contribution of items at lower ranks.
  • Precision

    • Conceptual Definition: Precision measures the proportion of retrieved items that are actually relevant. A high precision indicates that the system primarily retrieves useful information and avoids irrelevant results.
    • Mathematical Formula: Precision@k=i=1NL^ikLikL^ik, \mathrm { P r e c i s i o n } @ k = \sum _ { i = 1 } ^ { N } \frac { \widehat { L } _ { i } ^ { k } \cap L _ { i } ^ { k } } { \widehat { L } _ { i } ^ { k } } ,
    • Symbol Explanation:
      • NN: The total number of test samples.
      • kk: The cut-off rank.
      • L^ik\widehat { L } _ { i } ^ { k }: The set of top kk items outputted by the IR model for the ii-th test sample.
      • LikL _ { i } ^ { k }: The set of ground-truth relevant items for the ii-th test sample that would ideally appear in a top kk list.
      • L^ikLik\widehat { L } _ { i } ^ { k } \cap L _ { i } ^ { k }: The number of relevant items among the top kk recommended items.
  • Recall

    • Conceptual Definition: Recall measures the proportion of total relevant items that were successfully retrieved by the system. A high recall indicates that the system is good at finding most of the relevant information, even if it also retrieves some irrelevant items.
    • Mathematical Formula: Recall@k=i=1NL^ikLikLik. \mathrm { R e c a l l } @ k = \sum _ { i = 1 } ^ { N } \frac { \widehat { L } _ { i } ^ { k } \cap L _ { i } ^ { k } } { L _ { i } ^ { k } } .
    • Symbol Explanation:
      • NN: The total number of test samples.
      • kk: The cut-off rank.
      • L^ik\widehat { L } _ { i } ^ { k }: The set of top kk items outputted by the IR model for the ii-th test sample.
      • LikL _ { i } ^ { k }: The set of all ground-truth relevant items for the ii-th test sample (regardless of whether they are in the top kk or not).
      • L^ikLik\widehat { L } _ { i } ^ { k } \cap L _ { i } ^ { k }: The number of relevant items among the top kk recommended items.
  • Hit Rate

    • Conceptual Definition: Hit Rate assesses whether at least one relevant item is present within the top kk recommendations. It's useful for scenarios where simply presenting any relevant option significantly impacts user satisfaction.
    • Mathematical Formula: Hit@k=i=1NI(L^ikLik), \mathrm { H i t } @ k = \sum _ { i = 1 } ^ { N } \mathbb { I } ( \widehat { L } _ { i } ^ { k } \cap L _ { i } ^ { k } \neq \varnothing ) ,
    • Symbol Explanation:
      • NN: The total number of test samples (e.g., users).
      • kk: The cut-off rank.
      • I()\mathbb { I } ( \cdot ): An indicator function that returns 1 if its argument is true, and 0 otherwise.
      • L^ik\widehat { L } _ { i } ^ { k }: The set of top kk items outputted by the IR model for the ii-th test sample.
      • LikL _ { i } ^ { k }: The set of ground-truth relevant items for the ii-th test sample.
      • L^ikLik\widehat { L } _ { i } ^ { k } \cap L _ { i } ^ { k } \neq \varnothing: This condition is true if there is at least one overlap between the recommended top kk items and the relevant items.

Diversity

Diversity in recommendation systems refers to the variety of items presented to users, aiming to avoid redundancy.

  • α\alpha-NDCG [49]

    • Conceptual Definition: αNDCGextendsNDCGtoevaluatediversitybypenalizingredundancyamongretrieveditems.Itconsiderssubtopicsoritemcategoriesandappliesapenaltybasedonhowmanytimesasubtopichasalreadybeencoveredbyhigherrankeditems.MathematicalFormula:\alpha`-NDCG` extends `NDCG` to evaluate `diversity` by penalizing redundancy among retrieved items. It considers `subtopics` or item categories and applies a penalty based on how many times a subtopic has already been covered by higher-ranked items. * **Mathematical Formula**: \begin{array} { r l } & { \alpha \mathrm { - N D C G } @ k = \displaystyle \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \frac { \alpha \mathrm { - D C G } _ { i } @ k } { \alpha \mathrm { - I D C G } _ { i } @ k } , } \ & { ~ \alpha \mathrm { - D C G } _ { i } @ k = \displaystyle \sum _ { j = 1 } ^ { k } \sum _ { l = 1 } ^ { m } \frac { t _ { j , l } ( 1 - \alpha ) ^ { c _ { j , l } } } { \log _ { 2 } ( j + 1 ) } , } \end{array} * **Symbol Explanation**: * $N$: The number of test samples. * $k$: The cut-off rank. * $\alpha \mathrm { - D C G } _ { i } @ k$: The \alpha-Discounted Cumulative Gain for the ii-th sample.
      • αIDCGi@k\alpha \mathrm { - IDCG } _ { i } @ k: The Ideal\alpha-Discounted Cumulative Gain for the ii-th sample.
      • mm: The total number of subtopics or categories.
      • t _ { j , l }: A binary indicator; tj,l=1t _ { j , l } = 1 if the jj-th item in the list covers subtopic ll, and tj,l=0t _ { j , l } = 0 otherwise.
      • α\alpha: A parameter between 0 and 1 that controls the penalty for redundancy. A higher α\alpha means a stronger penalty.
      • c _ { j , l }: The count of how many times subtopic ll has been covered by items appearing prior to the jj-th item in the ranked list.
      • log2(j+1)\log _ { 2 } ( j + 1 ): The logarithmic discount factor for rank position.
  • ERR-IA (Expected Reciprocal Rank - Intent Aware) [50]

    • Conceptual Definition: ERR-IA extends the Expected Reciprocal Rank (ERR) metric to incorporate user intents, making it suitable for evaluating diversity in situations where users might have multiple underlying information needs. It models user satisfaction as a function of item relevance and alignment with diverse intents.
    • Mathematical Formula: ERRIA=1Ni=1Nj=1k1jl=1m1mtil2cj,l+1. \mathrm { E R R - I A } = \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \sum _ { j = 1 } ^ { k } \frac { 1 } { j } \sum _ { l = 1 } ^ { m } \frac { 1 } { m } \frac { t _ { i l } } { 2 ^ { c _ { j , l } } + 1 } .
    • Symbol Explanation:
      • NN: The number of test samples.
      • kk: The cut-off rank.
      • jj: The rank of an item in the list.
      • ll: Represents a specific user intent or subtopic.
      • mm: The total number of intents/subtopics considered.
      • t _ { i l }: A binary indicator (likely 1 if item ii is relevant to intent ll, 0 otherwise). The paper's formula uses tilt_{il} without explicit definition; typically, tjlt_{jl} is used to indicate relevance of item at rank jj to intent ll. Assuming a slight typo and interpreting tilt_{il} as relevance of item ii to intent ll. More commonly, for ERR-IA, it's Rj,lR_{j,l}, the relevance of item at rank jj to intent ll.
      • c _ { j , l }: Represents the count of how many items before rank jj have satisfied intent ll.
      • 1j\frac { 1 } { j } : Reciprocal rank component.
      • 1m\frac { 1 } { m } : Averaging over intents.
      • til2cj,l+1\frac { t _ { i l } } { 2 ^ { c _ { j , l } } + 1 }: A component that models the probability of a user continuing to examine results given previous satisfaction.
  • Coverage

    • Conceptual Definition: Coverage evaluates the proportion of unique items from the entire item set that are included in the recommendations. High coverage means the system recommends a broad range of items, indicating a less biased or narrow selection.
    • Mathematical Formula: Coverage@k=i=1NL^ikΔ, \mathrm { C o v e r a g e } @ k = \frac { \cup _ { i = 1 } ^ { N } \widehat { L } _ { i } ^ { k } } { | \varDelta | } ,
    • Symbol Explanation:
      • NN: The number of test samples (e.g., users).
      • kk: The cut-off rank.
      • i=1NL^ik\cup _ { i = 1 } ^ { N } \widehat { L } _ { i } ^ { k }: The union of all unique items recommended across all users up to rank kk.
      • Δ| \varDelta | (or T|\mathcal{T}| as used in the paper's text below the formula): The total number of unique items available in the entire item set. The paper uses Δ\varDelta in the formula, but refers to it as T\boldsymbol { \mathcal { T } } in the text, so clarifying that Δ\varDelta refers to the set of all items.

Fairness

Fairness in recommender systems aims to ensure equitable treatment for all users or items, regardless of their group affiliations.

  • Demographic Parity (DP)

    • Conceptual Definition: Demographic Parity ensures that different demographic groups (e.g., based on gender or race) receive similar rates of recommendations for a specific outcome, regardless of their underlying characteristics or past interactions. It measures whether a positive outcome (e.g., being recommended) is equally likely across groups.
    • Mathematical Formula: DP=i=1S0y^iS0i=1S1y^iS1, \mathrm { D P } = \left| \frac { \sum _ { i = 1 } ^ { | S _ { 0 } | } \hat { y } _ { i } } { | S _ { 0 } | } - \frac { \sum _ { i = 1 } ^ { | S _ { 1 } | } \hat { y } _ { i } } { | S _ { 1 } | } \right| ,
    • Symbol Explanation:
      • S _ { 0 }: The set of individuals belonging to one demographic group (e.g., female).
      • S _ { 1 }: The set of individuals belonging to another demographic group (e.g., male).
      • S0| S _ { 0 } |, S1| S _ { 1 } |: The number of individuals in group 0 and group 1, respectively.
      • y^i\hat { y } _ { i }: The predicted score or outcome (e.g., likelihood of being recommended, where 1 means recommended) for individual ii. The sum y^i\sum \hat{y}_i for a group represents the total "positive outcomes" for that group.
      • The metric calculates the absolute difference in the average predicted outcome between the two groups. A value closer to 0 indicates higher Demographic Parity.
  • Equal Opportunity (EO)

    • Conceptual Definition: Equal Opportunity focuses on ensuring that groups who truly deserve a positive outcome (i.e., have the same ground-truth label) are equally likely to receive that outcome. For instance, if two groups have the same click rate, they should have similar recommendation rates.
    • Mathematical Formula: EO=y{0,1}i=1S0yy^iS0yi=1S1yy^iS1y, \mathrm { E O } = \sum _ { y \in \{ 0 , 1 \} } \left. \frac { \sum _ { i = 1 } ^ { \vert S _ { 0 } ^ { y } \vert } \hat { y } _ { i } } { \vert S _ { 0 } ^ { y } \vert } - \frac { \sum _ { i = 1 } ^ { \vert S _ { 1 } ^ { y } \vert } \hat { y } _ { i } } { \vert S _ { 1 } ^ { y } \vert } \right. ,
    • Symbol Explanation:
      • y{0,1}y \in \{ 0 , 1 \} (or other relevant labels): Represents the ground-truth label or true outcome (e.g., y=1y=1 for relevant, y=0y=0 for irrelevant). The summation usually considers only positive outcomes (y=1y=1).
      • S0yS _ { 0 } ^ { y }: The subset of group 0 individuals who have ground-truth label yy.
      • S1yS _ { 1 } ^ { y }: The subset of group 1 individuals who have ground-truth label yy.
      • S0y\vert S _ { 0 } ^ { y } \vert, S1y\vert S _ { 1 } ^ { y } \vert: The number of individuals in these subsets.
      • y^i\hat { y } _ { i }: The predicted score or outcome for individual ii.
      • The metric typically focuses on the absolute difference in true positive rates (when y=1y=1) between groups. A value closer to 0 indicates higher Equal Opportunity.
  • Iso-Index

    • Conceptual Definition: Iso-Index assesses the isolation or segregation of certain groups within the retrieved or recommended results. A lower Iso-Index suggests less isolation, meaning a more equitable distribution of information across different groups. It combines diversity and fairness aspects.
    • Mathematical Formula: ISOindex=λDiversity+(1λ)Fairness, \mathrm { I S O - i n d e x } = \lambda \cdot \mathrm { D i v e r s i t y } + ( 1 - \lambda ) \cdot \mathrm { F a i r n e s s } ,
    • Symbol Explanation:
      • Diversity\mathrm { D i v e r s i t y }: A chosen diversity metric (e.g., \alpha`-NDCG`, `Coverage`). * $\mathrm { F a i r n e s s }$: A chosen `fairness` metric (e.g., `Demographic Parity`, `Equal Opportunity`). * $\lambda$: A `hyper-parameter` (between 0 and 1) that weights the importance of `diversity` versus `fairness` in the combined `Iso-Index`. #### Novelty `Novelty` in `recommender systems` measures how unfamiliar or "new" the recommended items are to the user, promoting exploration beyond known preferences. * **Novelty [110]** * **Conceptual Definition**: `Novelty` quantifies the extent to which recommended items differ from what a user has previously encountered or from popular items. It usually inversely correlates with item `popularity`, meaning less popular items are considered more novel. * **Mathematical Formula**: \mathrm { N o v e l t y } @ k = \sum _ { i = 1 } ^ { N } \sum _ { j = 1 } ^ { | \widehat { L } _ { i } ^ { k } | } \frac { \log ( \mathrm { P o p } ( \hat { l } _ { i , j } ) + 1 ) } { | \widehat { L } _ { i } ^ { k } | } , * **Symbol Explanation**: * $N$: The number of test samples (e.g., users). * $k$: The cut-off rank for recommendations. * $| \widehat { L } _ { i } ^ { k } |$: The number of items in the top $k$ recommended list for user $i$. * $\hat { l } _ { i , j }$: The $j$-th item in the ranked list $\widehat { L } _ { i } ^ { k }$ for user $i$. * $\mathrm { P o p } ( \hat { l } _ { i , j } )$: The `popularity` of item $\hat { l } _ { i , j }$, often measured by its frequency of appearance in the dataset or overall interactions. Less popular items result in higher `novelty` scores. * $\log ( \mathrm { P o p } ( \hat { l } _ { i , j } ) + 1 )$: A logarithmic transformation applied to `popularity` to reduce the impact of extremely popular items and to provide a more stable `novelty` score. The $+1$ is to avoid $\log(0)$ if an item has zero popularity. ### 5.2.2. Multi-Objective Optimization Metrics These metrics are used when `task requirements` involve multiple, potentially conflicting `objectives`, and the `control function` aims to generate a `Pareto front` or approximation of one. * **Hypervolume (HV) [111]** * **Conceptual Definition**: `Hypervolume` (also known as `Lebesgue measure`) quantifies the volume of the objective space that is `dominated` by a set of solutions, bounded by a predefined `reference point`. It simultaneously assesses both the `convergence` (how close the solutions are to the true `Pareto front`) and `diversity` (how well the solutions cover the `Pareto front`) of the solution set. A larger `HV` value indicates a better-performing solution set. * **Mathematical Formula**: \mathrm { H } \mathbf { V } = \lambda \left( \bigcup _ { i = 1 } ^ { | S | } \prod _ { j = 1 } ^ { m } \left[ f _ { j } ^ { ( i ) } , z _ { j } \right] \right) , * **Symbol Explanation**: * $S$: The set of `Pareto optimal solutions` obtained by the algorithm. * $| S |$: The number of solutions in the set $S$. * $f _ { j } ^ { ( i ) }$: The value of the $j$-th `objective function` for the $i$-th solution in the set $S$. * `z _ { j }`: The `reference point` in the $j$-th objective space. This point must be `dominated` by all solutions in the `Pareto front` (i.e., it represents the worst possible values for each objective). * $\prod _ { j = 1 } ^ { m } \left[ f _ { j } ^ { ( i ) } , z _ { j } \right]$: Represents a `hyper-rectangle` formed by the $i$-th solution and the `reference point` across $m$ objectives. * $\bigcup _ { i = 1 } ^ { | S | } (\dots)$: The union of all such `hyper-rectangles`, representing the total volume `dominated` by the solution set $S$. * $\lambda ( \cdot )$: Denotes the `Lebesgue measure`, which calculates the `volume` of the specified region in the objective space. * **R2 [112]** * **Conceptual Definition**: `R2` is a `scalarization-based metric` used to evaluate the quality of a solution set in `multi-objective optimization`. It measures the proximity of the obtained solution set to an ideal set by considering a set of `weight vectors` that represent user preferences. It doesn't require knowing the true `Pareto front`. * **Mathematical Formula**: { \mathrm R } 2 = { \frac { 1 } { | W | } } \sum _ { w \in W } \operatorname* { m i n } _ { x \in P } \sum _ { i = 1 } ^ { m } w _ { i } f _ { i } ( x ) , * **Symbol Explanation**: * $W$: A set of `weight vectors`, where each $w \in W$ represents a different preference for the objectives. Each $w_i$ is a weight for the $i$-th objective. * $| W |$: The number of `weight vectors` in the set $W$. * $P$: The set of solutions (e.g., `Pareto optimal solutions`) being evaluated. * $x$: A single solution within the set $P$. * `f _ { i } ( x )`: The value of the $i$-th objective function for solution $x$. * $\sum _ { i = 1 } ^ { m } w _ { i } f _ { i } ( x )$: The `scalarized value` of solution $x$ for a given `weight vector` $w$, representing a weighted sum of its objective values. * $\operatorname* { m i n } _ { x \in P } (\dots)$: Finds the minimum `scalarized value` among all solutions in $P$ for a given `weight vector` $w$. * The metric averages these minimum `scalarized values` across all `weight vectors` in $W$. A lower `R2` value generally indicates better performance (closer to the ideal). * **Generational Distance (GD) [113]** * **Conceptual Definition**: `Generational Distance` quantifies the `convergence` of an obtained solution set $P$ to the true `Pareto front` $P ^ { * }$. It calculates the average `Euclidean distance` from each solution in $P$ to its nearest point on $P ^ { * }$. A lower `GD` indicates that the obtained solutions are closer to the true `Pareto front`. * **Mathematical Formula**: \mathrm { G D } = \left( \frac { 1 } { | P | } \sum _ { i = 1 } ^ { | P | } d _ { i } ^ { p } \right) ^ { \frac { 1 } { p } } , * **Symbol Explanation**: * $P$: The set of solutions obtained by the algorithm. * $| P |$: The number of solutions in $P$. * `d _ { i }`: The `Euclidean distance` from the $i$-th solution in $P$ to the nearest point on the true `Pareto front` $P ^ { * }$. * $p$: A parameter, typically set to 2 (for standard `Euclidean distance`). * **Inverted Generational Distance (IGD) [114]** * **Conceptual Definition**: `Inverted Generational Distance` evaluates both `convergence` and `diversity` by calculating the average `Euclidean distance` from each point on the true `Pareto front` $P ^ { * }$ to its nearest solution in the obtained set $P$. It measures how well the obtained set $P$ covers the true `Pareto front`. A lower `IGD` indicates better performance. * **Mathematical Formula**: \mathrm { I G D } = \frac { 1 } { | P ^ { * } | } \sum _ { j = 1 } ^ { | P ^ { * } | } d _ { j } , * **Symbol Explanation**: * $P ^ { * }$: The true `Pareto front`. * $| P ^ { * } |$: The number of points on the true `Pareto front` $P ^ { * }$. * `d _ { j }`: The `Euclidean distance` from the $j$-th point on the true `Pareto front` $P ^ { * }$ to the nearest solution in the obtained set $P$. ## 5.3. Baselines As a survey paper, the document does not propose a single new model to compare against `baselines`. Instead, it discusses various `controllable learning` methods from the literature. Therefore, the concept of `baselines` in the traditional experimental sense (where a new method is compared to established state-of-the-art models) does not directly apply here. Each surveyed paper would have its own set of `baselines` relevant to the specific `controllable task` it addresses. The paper implicitly highlights the limitations of traditional non-controllable or less controllable methods, which would serve as conceptual `baselines` for the `controllable learning` paradigm. These include: * `Traditional recommender systems` that are fixed after training and require full retraining for any change in objectives or user preferences. * `Domain adaptation` or `transfer learning` methods that involve retraining or significant adaptation, unlike the test-time, no-retraining requirement of `CL`. * `Generative models` that are `prompt-based` but don't allow for in-processing parameter adjustments. The purpose of this survey is to categorize and analyze existing `controllable learning` approaches themselves, rather than to benchmark a new one against `baselines`. # 6. Results & Analysis As a survey paper, this document does not present novel experimental results but rather synthesizes and categorizes existing research. The "results" of this paper are its comprehensive taxonomy and the structured overview of various `controllable learning (CL)` methods in `information retrieval (IR)`. ## 6.1. Core Results Analysis The core contribution of this paper is the creation of a systematic framework for understanding `CL`. Through its multi-dimensional taxonomy, the survey demonstrates that `controllability` is not a monolithic concept but can be achieved in various ways, for different purposes, by different actors, and at different stages of the `ML` pipeline. The analysis reveals that: * **Diverse `Controllable Objectives`**: `CL` addresses a wide array of needs, from `multi-objective balancing` (e.g., `accuracy` vs. `diversity`) to `privacy protection` (via `user portrait control`) and `environmental robustness` (via `scenario adaptation`). This underscores the versatility and importance of `CL` in real-world applications. * **Dual Control Perspectives**: Both `users` and `platforms` seek `controllability`, but their motivations differ. `User-centric control` often emphasizes `personalization`, `privacy`, and `exploration`, while `platform-mediated control` focuses on `system-wide objectives` like `diversity`, `efficiency`, and `adaptability` across tasks/scenarios. * **Evolving Technical Landscape**: `CL` is implemented using a growing suite of techniques, from foundational `rule-based methods` and `Pareto optimization` to cutting-edge `hypernetworks`, `disentangled representations`, `reinforcement learning`, and leveraging `Large Language Models (LLMs)`. `Hypernetworks` emerge as a particularly powerful tool for `in-processing control` due to their ability to dynamically generate model parameters. * **Pipeline Integration**: `Controllability` can be injected at various stages: `pre-processing` (modifying inputs/prompts), `in-processing` (adjusting model parameters/hidden states), and `post-processing` (re-ranking or filtering outputs). `In-processing` methods, especially those leveraging `hypernetworks`, appear to offer more fundamental adaptation without full retraining. The table presented in the paper, `Table 1`, succinctly summarizes the surveyed methods according to this taxonomy, providing a quick reference for researchers. It highlights the recency of much of this research, with many papers from 2023 and 2024, indicating a rapidly developing field. The following are the results from Table 1 of the original paper: <div class="table-wrapper"><table> <thead> <tr> <td colspan="2">Method Information</td> <td colspan="4">Paradigm of Controllable Learning</td> </tr> <tr> <td>Method</td> <td>Year</td> <td>What</td> <td>Who</td> <td>CL Tech.</td> <td>Where</td> </tr> </thead> <tbody> <tr> <td>MocDT [103]</td> <td>2025</td> <td>multi-objective control</td> <td>user-centric control</td> <td>RL</td> <td>in-processing</td> </tr> <tr> <td>PadiRec [4]</td> <td>2024</td> <td>multi-objective control</td> <td>platform-mediated control</td> <td>hypernetwork</td> <td>in-processing</td> </tr> <tr> <td>FollowIR [45]</td> <td>2024</td> <td>user portrait control</td> <td>user-centric control</td> <td>SFT</td> <td>pre-processing</td> </tr> <tr> <td>InstructIR [46]</td> <td>2024</td> <td>user portrait control</td> <td>user-centric control</td> <td>SFT</td> <td>pre-processing</td> </tr> <tr> <td>RecLM-gen [5]</td> <td>2024</td> <td>multi-objective control</td> <td>platform-mediated control</td> <td>SFT, RL</td> <td>in-processing</td> </tr> <tr> <td>IFRQE [7]</td> <td>2024</td> <td>user portrait control</td> <td>user-centric control</td> <td>others</td> <td>pre-processing</td> </tr> <tr> <td>TEARS [9]</td> <td>2024</td> <td>user portrait control</td> <td>user-centric control</td> <td>RL, NLIE</td> <td>in-processing</td> </tr> <tr> <td>CMBR [10]</td> <td>2024</td> <td>user portrait control</td> <td>user-centric control</td> <td>SFT, RL</td> <td>in-processing</td> </tr> <tr> <td>LangPTune [17]</td> <td>2024</td> <td>user portrait control</td> <td>user-centric control</td> <td>RL, NLIE</td> <td>in-processing</td> </tr> <tr> <td>CCDF [21]</td> <td>2024</td> <td>multi-objective control</td> <td>platform-mediated control</td> <td>others</td> <td>in-processing</td> </tr> <tr> <td>CMR [3]</td> <td>2023</td> <td>multi-objective control</td> <td>platform-mediated control</td> <td>hypernetwork</td> <td>in-processing</td> </tr> <tr> <td>LACE [6]</td> <td>2023</td> <td>user portrait control</td> <td>user-centric control</td> <td>NLIE</td> <td>pre-processing</td> </tr> <tr> <td>UCR [11]</td> <td>2023</td> <td>user portrait control</td> <td>user-centric control</td> <td>others</td> <td>pre-processing</td> </tr> <tr> <td>Hamur [13]</td> <td>2023</td> <td>scenario adaptation control</td> <td>platform-mediated control</td> <td>hypernetwork, TTA</td> <td>in-processing</td> </tr> <tr> <td>HyperBandit [12]</td> <td>2023</td> <td>scenario adaptation control</td> <td>platform-mediated control</td> <td>hypernetwork, TTA</td> <td>in-processing</td> </tr> <tr> <td>PEPNet [14]</td> <td>2023</td> <td>scenario adaptation control</td> <td>user-centric control</td> <td>hypernetwork</td> <td>in-processing</td> </tr> <tr> <td>SAMD [23]</td> <td>2023</td> <td>scenario adaptation control</td> <td>platform-mediated control</td> <td>hypernetwork</td> <td>in-processing</td> </tr> <tr> <td>DTRN [24]</td> <td>2023</td> <td>scenario adaptation control</td> <td>user-centric control</td> <td>hypernetwork</td> <td>in-processing</td> </tr> <tr> <td>MoFIR [96]</td> <td>2022</td> <td>multi-objective control</td> <td>user-centric control</td> <td>pareto optimization</td> <td>in-processing</td> </tr> <tr> <td>UCRS [2]</td> <td>2022</td> <td>multi-objective control</td> <td>user-centric control</td> <td>others</td> <td>pre-processing</td> </tr> <tr> <td>PAPERec [95]</td> <td>2021</td> <td>multi-objective control</td> <td>user-centric control</td> <td>pareto optimization</td> <td>in-processing</td> </tr> <tr> <td>Supervised β-VAE [15]</td> <td>2021</td> <td>user portrait control</td> <td>user-centric control</td> <td>Disentanglement</td> <td>in-processing</td> </tr> <tr> <td>ComiRec [1]</td> <td>2020</td> <td>multi-objective control</td> <td>platform-mediated control</td> <td>others</td> <td>post-processing</td> </tr> <tr> <td>LP [16]</td> <td>2020</td> <td>user portrait control</td> <td>user-centric control</td> <td>Disentanglement</td> <td>in-processing</td> </tr> <tr> <td>MMR [47]</td> <td>1998</td> <td>multi-objective control</td> <td>platform-mediated control</td> <td>rule-based</td> <td>post-processing</td> </tr> </tbody> </table></div> ## 6.2. Ablation Studies / Parameter Analysis As a survey paper, the authors do not perform their own `ablation studies` or `parameter analyses`. Instead, they summarize findings from individual research papers that often include such analyses. For instance, the paper mentions: * `ComiRec [1]` introduces a controllable hyperparameter $\lambda$ to adjust `diversity`, and notes that `empirical evidence suggests that while diversity improves, accuracy can be compromised to some degree`. This implies a trade-off that would be explored through parameter analysis. * `UCRS [2]` provides `control coefficients`\alpha`and`\beta$ to regulate `accuracy`, `isolation`, and `diversity`. Adjustments to these coefficients allow for dynamic control without `retraining`. These examples demonstrate that `controllability` often involves `hyperparameters` that trade off different objectives, and researchers typically perform analyses to understand the impact of these parameters on various `metrics`. The current survey, however, focuses on categorizing these approaches rather than re-evaluating them. # 7. Conclusion & Reflections ## 7.1. Conclusion Summary This survey provides a crucial formal definition of `controllable learning (CL)` and a comprehensive, multi-dimensional taxonomy for its application in `information retrieval (IR)`. It clarifies `CL`'s role as a vital component of `trustworthy machine learning`, enabling models to adapt to dynamic `task requirements` at test time without `retraining`. The paper categorizes `CL` by `what` is controllable (multi-objectives, user portraits, scenarios), `who` controls (users or platforms), `how` control is implemented (rule-based, Pareto optimization, hypernetworks, LLMs, disentanglement, RL), and `where` control occurs in the inference pipeline (pre-processing, in-processing, post-processing). It highlights the increasing demand for `CL` in the `Model-as-a-Service (MaaS)` paradigm and outlines significant challenges and promising future directions, particularly in theoretical analysis, computational efficiency, integration with `LLMs`, and developing dedicated evaluation frameworks. ## 7.2. Limitations & Future Work The authors identify several key limitations and suggest future research directions: * **Balancing Difficulty in Training**: A pivotal challenge is the trade-off between `controllability` and other performance metrics (e.g., `accuracy`). Achieving `controllability` often compromises other `user-centric optimization metrics`. * **Absence of Evaluation Standards**: There is a significant lack of standardized benchmarks and specific `evaluation metrics` for `controllable learning`. Existing works use disparate evaluation approaches, hindering direct comparisons and field progression. * **Setting `Task Descriptions`**: A crucial issue is how to effectively set the `task target` and transform it into a human-understandable and precise `task description`. These descriptions are not limited to vectors or text but could also be images, graphs, or rules. * **Challenges in `Online Environments`**: Integrating `controllable learning` principles into `streaming IR applications` (e.g., `online learning`, `reinforcement learning`) is difficult. Current methods often aren't equipped for swift changes in preferences without costly `retraining` in real-time settings. * **Theoretical Analyses of `Controllable Learning`**: There is a need for rigorous theoretical analysis to understand how `task targets` map to `model parameters` in vast `deep learning models`, and to uncover structural information and `causal associations`. * **`Controllable Sequential Decision-Making Models`**: In streaming applications with `bandit feedback`, balancing `exploration` and `exploitation` while achieving `adaptive control` over `task requirements` is a critical theoretical and practical challenge. * **Empowering `LLM-based AIGC` through `Controllable Learning`**: While `LLMs` are used for `controllable generation` via `prompts`, deeper exploration of `CL techniques` to manipulate `LLM model parameters` or `outputs` for specific `task targets` (e.g., `multi-objective preferences`) is needed. * **Cost-Effective Control Mechanisms**: `Controllable learners` introduce additional computational costs. Research into efficient and `cost-effective control mechanisms` is imperative, especially for `large-scale models`. * **`Controllable Learning` for `Multi-Task Switching`**: Most existing `CL` research focuses on `recommender systems`. Extending `CL` to `search` and enabling `adaptive switching` between diverse tasks, objectives, and scenarios with a small set of `controllable models` is a key future direction. * **Demand for Resources and Metrics**: The field lacks dedicated datasets and standardized `evaluation metrics` for `controllable learning`. Collecting or constructing `labels` or `user feedback` across multiple objectives or diverse `task requirements` is crucial. ## 7.3. Personal Insights & Critique This survey is a highly valuable and timely contribution to the `machine learning` community. Its formal definition of `controllable learning` and the proposed multi-dimensional taxonomy provide a much-needed framework for a nascent but critical field. One of the paper's strengths is its clear articulation of `controllability` as distinct from, yet complementary to, other `trustworthy AI` pillars like `explainability` and `fairness`. This distinction is often muddled in discussions, and the paper sets a solid foundation. The emphasis on `test-time adaptation` without `retraining` is particularly insightful, acknowledging the practical realities and computational constraints of deploying large `AI` models in dynamic `MaaS` environments. This operational definition makes `CL` a highly pragmatic concept. The comprehensive categorization by `what`, `who`, `how`, and `where` is exceptionally helpful for organizing existing literature and identifying gaps. It allows researchers to pinpoint specific areas where innovation is most needed. For instance, the observation that `hypernetworks` are becoming a dominant `in-processing control` technique is a significant finding that can guide future algorithmic development. **Potential Issues/Unverified Assumptions**: * **Generality of `Definition 1`**: While `Definition 1` is robust, the practical realization of mapping any `task requirement triplet` $\mathcal{T}$ to an adapted `learner` $f_{\mathcal{T}}$ without retraining can be profoundly complex. The implicit assumption is that the base `learner` $f$ must be pre-trained in a way that allows for such flexibility, which itself is a challenging design problem. * **Scalability of `Control Functions`**: Many `control functions` (especially those involving `hypernetworks` or `LLMs`) can be computationally intensive themselves. The claim of "cost-effective control learning mechanisms" as a future direction acknowledges this, but the actual computational overhead of $h$ could become a bottleneck, especially for very large models or very rapid adaptation needs. * **Subjectivity of `Task Targets`**: `Task targets` $s _ { \mathrm { t g t } }$ can be highly subjective, especially for users (e.g., "more interesting recommendations"). Translating such vague human desires into precise `task descriptions` $s _ { \mathrm { d e s c } }$ (e.g., a numerical vector) that a `control function` can reliably interpret remains a significant challenge, even with `LLMs`. The paper touches on this in `Setting Task Descriptions` but it remains a foundational hurdle. * **True `Zero-Shot` Control**: The ideal `CL` implies adapting to entirely novel `task requirements` (\mathcal{T}'$$) at test time without prior exposure during training. While hypernetworks and LLMs show promise, achieving truly robust zero-shot controllability for arbitrary new goals is an ambitious goal that might require more sophisticated meta-learning or compositional control functions.

    Transferability and Applications: The concepts and taxonomy presented are highly transferable. Controllable learning is not limited to information retrieval; it can be applied to almost any machine learning domain where dynamic adaptation and user/platform steering are desired. Examples include:

  • Healthcare: Controlling diagnostic models for specific patient subgroups or ethical considerations.

  • Finance: Adapting fraud detection models to evolving attack patterns or specific regulatory compliance requirements.

  • Autonomous Systems: Allowing human operators to adjust the behavior of autonomous vehicles or robots in unforeseen scenarios.

  • Scientific Discovery: Guiding AI models to explore specific regions of a chemical or material space based on emergent hypotheses.

    This paper serves as an excellent foundational text. Its primary value lies in formalizing and structuring a crucial area of trustworthy AI, providing a common language and a roadmap for future research. The identified challenges, particularly around evaluation and theoretical understanding, underscore that controllable learning is still in its early stages but holds immense potential for making AI systems more robust, adaptable, and aligned with human values.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.