A Survey of Controllable Learning: Methods and Applications in Information Retrieval
TL;DR Summary
Controllable learning is essential for trustworthy machine learning, allowing dynamic adaptation to complex information needs. This survey defines controllable learning, explores its applications in information retrieval, identifies challenges, and suggests future research direct
Abstract
Controllability has become a crucial aspect of trustworthy machine learning, enabling learners to meet predefined targets and adapt dynamically at test time without requiring retraining as the targets shift. We provide a formal definition of controllable learning (CL), and discuss its applications in information retrieval (IR) where information needs are often complex and dynamic. The survey categorizes CL according to what is controllable (e.g., multiple objectives, user portrait, scenario adaptation), who controls (users or platforms), how control is implemented (e.g., rule-based method, Pareto optimization, hypernetwork and others), and where to implement control (e.g., pre-processing, in-processing, post-processing methods). Then, we identify challenges faced by CL across training, evaluation, task setting, and deployment in online environments. Additionally, we outline promising directions for CL in theoretical analysis, efficient computation, empowering large language models, application scenarios and evaluation frameworks.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
The central topic of the paper is A Survey of Controllable Learning: Methods and Applications in Information Retrieval.
1.2. Authors
The authors and their affiliations are:
- Chenglei Shen: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China.
- Xiao Zhang: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China.
- Teng Shi: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China.
- Changshuo Zhang: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China.
- Guofu Xie: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China.
- Jun Xu: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China.
- Ming He: AI Lab at Lenovo Research, Beijing, China.
- Jianping Fan: AI Lab at Lenovo Research, Beijing, China.
1.3. Journal/Conference
The paper is a preprint, published on arXiv. The abstract also mentions "Received month dd, yyyy; accepted month dd, yyyy E-mail: {chengleishen9,zhangx89}@ruc.edu.cn. Higher Education Press 2025," suggesting it might be intended for publication in a journal or conference in 2025. arXiv is a well-regarded platform for disseminating research preprints across various scientific disciplines, including computer science and artificial intelligence. Its influence lies in enabling rapid sharing of new research before formal peer review and publication.
1.4. Publication Year
The paper was published on arXiv on 2024-07-04.
1.5. Abstract
The paper provides a comprehensive survey of controllable learning (CL), which is defined as the ability of machine learning models to meet predefined targets and adapt dynamically at test time without requiring retraining as targets shift. It formally defines CL and discusses its applications, particularly in information retrieval (IR), where user needs are often complex and dynamic. The survey categorizes CL based on what is controllable (e.g., multiple objectives, user portrait, scenario adaptation), who controls (users or platforms), how control is implemented (e.g., rule-based methods, Pareto optimization, hypernetworks), and where control is applied (e.g., pre-processing, in-processing, post-processing). It identifies current challenges in CL across training, evaluation, task setting, and online deployment. Finally, it outlines future directions, including theoretical analysis, efficient computation, empowering large language models (LLMs), new application scenarios, and improved evaluation frameworks.
1.6. Original Source Link
The paper is available as a preprint at:
- Original Source Link:
https://arxiv.org/abs/2407.06083 - PDF Link:
https://arxiv.org/pdf/2407.06083v3.pdfIt is currently a preprint and has not been formally published in a peer-reviewed journal or conference.
2. Executive Summary
2.1. Background & Motivation
What is the core problem the paper aims to solve?
The core problem the paper addresses is the lack of a unified definition and systematic understanding of controllable learning (CL) in discriminative machine learning models, especially within information retrieval (IR). While controllable generation (e.g., in text or image generation) is well-explored, the concept of models adapting dynamically to changing task requirements without retraining at test time is not clearly defined or broadly surveyed for discriminative tasks. The paper seeks to formalize controllable learning and provide a comprehensive taxonomy of its methods and applications.
Why is this problem important in the current field? What specific challenges or gaps exist in prior research?
This problem is crucial for the advancement of trustworthy machine learning. As machine learning models become more prevalent and powerful, particularly with the rise of Model-as-a-Service (MaaS) and large language models (LLMs), the ability to ensure that these models align with human intent and can be effectively interfered with or adjusted after deployment becomes paramount. The Bletchley Declaration and Global AI Governance Initiative highlight the global emphasis on safe, reliable, and controllable AI.
Specific challenges and gaps include:
- Lack of Unified Definition: No formal, universally accepted definition of
controllable learningexists, especially for discriminative models. This hinders systematic research and comparison. - Dynamic Information Needs: In
information retrieval, user needs are inherently complex, evolving, and context-dependent. Traditional models often require costly retraining for every shift in user preference or platform objective, which is inefficient and impractical in real-time scenarios. - MaaS Paradigm Shift: The transition to
MaaSmeans users access trained models via APIs without controlling the training process. This necessitates robust control mechanisms to allow personalization and adaptation without full model retraining. - Limitations of Existing Surveys: Previous surveys have only touched upon related concepts like
user controlortrustworthy IR, often from narrow perspectives (e.g., user-centric only, security aspects) and lacking in-depth technical analysis or coverage of newer techniques likehypernetworksorLLMs. - Distinction from Explainability: While
explainabilityhelps users understand why a model makes a decision,controllabilityfocuses on the ability to change that decision to meet specific requirements. This distinction needs clarification.
What is the paper's entry point or innovative idea?
The paper's innovative idea is to formally define controllable learning (CL) as a distinct and crucial category within trustworthy machine learning. It explicitly focuses on the ability to adapt to diverse task requirements without retraining at test time. The paper then proposes a novel, multi-dimensional taxonomy (what, who, how, where) to systematically classify and understand existing CL methods, particularly within information retrieval, thereby providing a structured lens for future research.
2.2. Main Contributions / Findings
What are the paper's primary contributions? (e.g., proposing a new model, theory, algorithm, dataset, etc.)
The paper's primary contributions are:
- Formal Definition of
Controllable Learning (CL): It provides a clear and concise formal definition ofCLin terms oftask requirements(description, context, target) and the role of acontrol functionthat adapts alearnerwithout retraining. - Comprehensive Taxonomy for
CLinIR: It introduces a novel, multi-faceted taxonomy forCLspecific toinformation retrievalapplications, categorizing methods by:- What is Controllable:
Multi-objective control,User Portrait Control,Scenario Adaptation Control. - Who Controls:
User-Centric Control,Platform-Mediated Control. - How Control is Implemented:
Rule-Based Techniques,Pareto Optimization,Hypernetworks, andOther Methods(Disentanglement, Reinforcement Learning, LLMs, Test-Time Adaptation). - Where to Control:
Pre-processing,In-processing,Post-processingmethods.
- What is Controllable:
- Identification of Challenges: It systematically identifies key challenges faced by
CLacrosstraining,evaluation,task setting, andonline environments. - Outline of Future Directions: It suggests promising avenues for future research in
CL, includingtheoretical analysis,efficient computation,empowering LLMs,multi-task switching scenarios, andresource/metric development. - Comprehensive Survey of Existing Works: It provides a structured review of numerous existing works under its proposed taxonomy, creating a valuable resource for researchers.
What key conclusions or findings did the paper reach? What specific problems do these findings solve?
The key conclusions and findings include:
-
CLis distinct from and complements other trustworthy ML aspects likefairness,privacy, andinterpretability, serving as a crucial mechanism for aligning model behavior with dynamic objectives. -
The
MaaSparadigm significantly heightens the demand forCL, especially inIR, where models must provide fine-grained, context-aware responses without costly retraining. -
Current
CLimplementations utilize a diverse set of techniques (from simple rule-based to advancedhypernetworksandLLMs) applied at various stages of the model inference pipeline. -
A significant gap exists in the evaluation of
CL, with a lack of standardized benchmarks and dedicated datasets, hindering progress and comparability. -
CLpresents a fundamental trade-off betweencontrollabilityandperformance/efficiencyduring training. -
The field is ripe for theoretical advancements to understand the causal links between
task targetsandmodel parametersin vast deep learning models.These findings solve the problem of a fragmented understanding of
controllable learning. By providing a formal definition and a structured taxonomy, the paper offers a unified framework for researchers to categorize existing work, identify gaps, and guide future innovations. It highlights the importance ofCLin adaptingAImodels to real-world, dynamic scenarios, which is essential fortrustworthy AIand efficient deployment in theMaaSera.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To understand this survey, a beginner should be familiar with the following foundational concepts:
- Machine Learning (ML): A field of artificial intelligence that enables systems to learn from data, identify patterns, and make decisions with minimal human intervention. Instead of being explicitly programmed, ML models
learnto perform tasks by processing large datasets. - Artificial Intelligence (AI): A broader field encompassing machine learning, focused on creating machines that can perform tasks that typically require human intelligence, such as problem-solving, understanding language, and recognizing patterns.
- Information Retrieval (IR): The science of searching for information within documents, searching for documents themselves, searching for metadata about documents, or searching within databases. Common examples include web search engines, recommender systems, and library catalog search.
- Recommender Systems: A type of
information retrievalsystem that filters information to predict the "rating" or "preference" a user would give to an item. They aim to provide personalized suggestions (e.g., movies, products, news) to users based on their past behavior or preferences. - Trustworthy Machine Learning: An umbrella term referring to the critical characteristics that
AIsystems must possess to be reliable and deployable in real-world scenarios. These characteristics includefairness(no bias against groups),privacy(protection of sensitive data),interpretability(understanding model decisions), and, as highlighted by this paper,controllability. - Controllable Generation: A subfield of
generative AIwhere the output of a model (e.g., text, images, audio) can be guided or constrained by specific user-provided inputs, often calledprompts. For instance, generating an image of a cat "in the style of Van Gogh" where "in the style of Van Gogh" is the control. - Discriminative Machine Learning Models: Models that learn to distinguish between different categories or predict a numerical value based on input features. Unlike generative models that create new data, discriminative models focus on classification or regression tasks (e.g., identifying spam email, predicting house prices, ranking search results).
- Model-as-a-Service (MaaS): A deployment paradigm where trained
machine learningmodels are made available to users or applications viaAPIs (Application Programming Interfaces). Users can leverage the model's capabilities without needing to manage the underlying infrastructure, training, or updating process. This contrasts withSoftware-as-a-Service (SaaS), which offers complete software applications. - Hypernetwork: A
neural networkthat generates the weights (parameters) or a subset of weights for anotherneural network(the "main network"). This allows the main network's behavior to be dynamically adjusted based on an input to thehypernetwork, enabling adaptability without retraining the main network. - Large Language Models (LLMs): Advanced
deep learningmodels, often based on thetransformerarchitecture, trained on vast amounts of text data. They are capable of understanding, generating, and processing human language for a wide range of tasks, including translation, summarization, and question-answering. Examples includeChatGPTandGPT-4. - Multi-objective Optimization (MOO): A field of mathematical optimization where multiple objective functions are to be optimized simultaneously. These objectives often conflict, meaning improving one objective might worsen another.
MOOseeksPareto optimal solutions, where no objective can be improved without degrading at least one other objective. - Pareto Optimality: In
multi-objective optimization, a solution isPareto optimalif it is not possible to improve any objective without simultaneously worsening at least one other objective. The set of allPareto optimal solutionsforms thePareto front. - Test-Time Adaptation (TTA): Algorithms that allow a pre-trained
machine learningmodel to adapt to new, unlabeled data during the testing or inference phase, without requiring a full retraining process. This is particularly useful in dynamic environments where data distributions can shift.
3.2. Previous Works
The paper differentiates itself from and builds upon several related research areas. Here's a summary of key prior studies mentioned and their context:
-
User Control in Recommender Systems [78]: This 2017 survey focused on allowing users to control recommendations, particularly when system assumptions about their preferences were incorrect.
- Context: It highlights the instantaneous nature of user interests and the need for user-driven intervention.
- Limitation (as identified by current paper): It was exclusively user-centric, neglecting platform-side control. It also lacked rigorous definitions and in-depth technical analysis, mainly summarizing interaction forms.
-
Trustworthy Information Retrieval [79]: This survey primarily dealt with security aspects like privacy preservation in
recommender systems.- Context: It did touch upon
controllability, categorizing it intoexplicit controllability(users directly edit preferences) andimplicit controllability(users indirectly fine-tune preferences through interactions like re-ranking). - Limitation (as identified by current paper): It lacked a systematic summary of
controllabilityinmachine learningand the methods to achieve it. It also missed broadercontrollabilitytypes likeplatform-sideormulti-objective control.
- Context: It did touch upon
-
Explainable Information Retrieval [80, 81]: These surveys focused on making
IRorrecommender modelstransparent andinterpretable, providing reasons for retrieved or recommended items.- Context:
Explainabilitycan facilitate human understanding and thus promotecontrollability. - Differentiation (as identified by current paper):
Explainability(understanding why) is distinct fromcontrollability(the ability to change decisions). While related, they address different core challenges.Explainabilityis a preliminary step towardscontrollability, but not the same.
- Context:
-
Controllable Generation (e.g., text generation [62-64] or visual generation [65-67]): This field involves
AImodels generating content aligned with a given task description orprompt.- Context: These methods often use
prompt-basedapproaches, seen as apre-processingform of control where inputs are changed for a fixed model. - Differentiation (as identified by current paper): The current paper extends
controllable learningbeyondgenerative modelstodiscriminative models, noting that controllingmodel parameters(in-processing) might yield better convergence than just controllingmodel inputs.
- Context: These methods often use
3.3. Technological Evolution
The field of machine learning has evolved significantly, leading to the current emphasis on controllable learning:
-
Early
IRSystems: Initially,information retrievalsystems were oftenrule-basedor relied on simpler statistical models (e.g.,Boolean search,TF-IDF). Control was largely manual or hard-coded. -
Rise of
Learning-to-RankandRecommender Systems: The advent ofmachine learningbrought sophisticated ranking algorithms (learning-to-rank) andrecommender systems. These models, however, often prioritized performance metrics likeaccuracyandrelevance, sometimes at the expense ofdiversity,fairness, oruser control. -
Emergence of
Trustworthy AI: AsMLmodels moved into critical applications, concerns aboutfairness,privacy,interpretability, androbustnessgrew, leading to thetrustworthy AImovement.Controllabilityis identified as a key component of this. -
Deep LearningRevolution: The widespread adoption ofdeep neural networksandtransformermodels dramatically increased model complexity and performance, but also made them more opaque and harder to control. -
Model-as-a-Service (MaaS)Paradigm: The commercialization of large-scaleMLmodels (especiallyLLMs) throughMaaSplatforms created a new imperative forcontrollability. Users of these services need to customize model behavior without access to the full training pipeline. -
Advanced Control Techniques: New techniques like
hypernetworks(dynamically generating model parameters),Pareto optimization(balancing conflicting objectives),disentangled representations(isolating specific features for control), and the application ofReinforcement Learning (RL)andLLMsfor instruction-following have emerged as ways to achieve more sophisticatedcontrollability.This paper's work fits into this timeline by formally defining
controllable learningand surveying its methods, thereby structuring the field and guiding its development in the context oftrustworthy AI, theMaaSparadigm, and the capabilities of moderndeep learningandLLMs.
3.4. Differentiation Analysis
Compared to prior work, the core differences and innovations of this paper's approach are:
-
Unified and Formal Definition: Unlike previous works that implicitly used "user control" or addressed
controllabilityas a sub-aspect oftrustworthiness(e.g.,privacy), this paper provides the first formal and explicit definition ofcontrollable learning (CL)as a standalone concept for discriminativeMLmodels. This definition (Definition 1) is comprehensive, coveringtask description,context, andtask target, and emphasizes adaptation without retraining at test time. -
Comprehensive Multi-Dimensional Taxonomy: Instead of focusing on a single aspect (e.g., user perspective, security), the paper introduces a novel, multi-dimensional classification framework:
Whatto control: It broadenscontrollabilitytomulti-objective control,user portrait control, andscenario adaptation control, covering a wider range of practical needs than just basic preference adjustments.Whocontrols: It explicitly distinguishes betweenuser-centricandplatform-mediated control, recognizing the divergent interests and capabilities of different stakeholders.Howcontrol is implemented: It surveys diverse technical mechanisms likerule-based,Pareto optimization,hypernetworks,disentanglement, andLLMs, offering a structured view of the algorithmic landscape.Wherecontrol is applied: It categorizes methods by their placement in the inference pipeline (pre-processing,in-processing,post-processing), providing a practical framework for implementation.
-
Focus on
IRApplications: While acknowledgingcontrollable generation, the survey specifically delves intoCLwithininformation retrievalandrecommender systems, which are areas with complex, dynamic, and often implicit user needs. This specialized focus fills a gap in the literature. -
Timeliness and Scope: The survey is timely, incorporating recent advancements in
hypernetworksand the emerging role oflarge language models (LLMs)incontrollable learning, which were largely absent in older surveys on related topics. It addresses the challenges posed by theModel-as-a-Service (MaaS)paradigm. -
Emphasis on Test-Time Adaptation: A key differentiator is the emphasis on
test-time adaptationwithoutretraining. This is crucial for efficiency and scalability in real-worldonline environmentsand forMaaSdeployments, where constant retraining for every new control signal is impractical.In essence, this paper moves beyond fragmented discussions of "user control" or "fairness" by proposing a holistic, rigorously defined, and systematically categorized framework for
controllable learning, offering a comprehensive map for researchers and practitioners in the evolving landscape oftrustworthy AI.
4. Methodology
This paper is a survey, so its "methodology" is primarily its structured approach to defining and categorizing Controllable Learning (CL) within Information Retrieval (IR), rather than proposing a new algorithm. The core methodology involves: (1) providing a formal definition of CL, (2) outlining its procedure, and (3) presenting a multi-faceted taxonomy to classify existing methods.
4.1. Principles
The core idea of Controllable Learning is to enable machine learning models to adapt dynamically to diverse task requirements at test time without the need for retraining. The theoretical basis or intuition behind this is rooted in the need for trustworthy AI systems that can align with human intent and adapt to shifting goals or environments efficiently. Instead of building a new model for every slight change in requirement, a controllable learner is designed to be flexible, allowing users or platforms to "steer" its behavior post-deployment via explicit control signals. This principle is particularly valuable in Information Retrieval where user preferences and platform objectives are constantly evolving.
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. Formal Definition of Controllable Learning (CL)
The paper formally defines Controllable Learning (CL) as follows:
Definition 1 (Controllable Learning (CL)). Define a task requirement triplet , where represents the task description, represents the context related to the task, and represents the task target. Given an input space and an output space _ y, for a learner , controllable learning (CL) aims to find a control function that maps the learner , the task description , and the context to a new learner that fulfills the task target , i.e,
The integration of the learner and the control function is called a controllable learner. Moreover, upon receiving a new task requirement at test time, the control function should be capable of outputting a new learner without the need for model retraining, ensuring that satisfies the task target .
Symbol Explanation:
-
: Represents a
task requirement triplet, a comprehensive specification for a task. -
: Denotes the
task description, which is the specific representation of thetask targetthat can be perceived and processed by thecontrol function. This could be a vector of weights, natural language, or specific rules. -
: Represents the domain or space from which
task descriptionsare drawn. -
: Denotes the
contextrelated to the task, such as historical data or user profiles, providing additional background information. -
: Represents the domain or space from which
task contextsare drawn. -
: Represents the
task target, which is the ideal quantitative metric or performance objective that thecontrollable learneraims to achieve (e.g., a specific balance of accuracy and diversity). -
: Represents the domain or space from which
task targetsare drawn. -
: The space of all possible
task requirement triplets. -
: The
input spaceof thelearner. -
_ y: Theoutput spaceof thelearner. -
: The
base learnerormachine learning modelthat maps inputs from to outputs in_ y. -
: The
control function, which is the core mechanism ofCL. It takes the baselearner, thetask description, and thecontextas inputs. -
: The
new learner(oradapted learner) produced by thecontrol function. This adapted learner is specifically configured to fulfill thetask targetdefined in . -
: A
new task requirementthat arrives at test time.The key aspects of this definition are the existence of a
control functionthat modifies thelearner(or its behavior) and the crucial requirement that this adaptation happens without retraining the underlying model for each new task requirement.
The procedure of Controllable Learning is depicted in Figure 2 (from the original paper). This figure illustrates how the task description (e.g., preferences, rules) and context (e.g., historical data, user profiles) are fed into the controllable learner, which consists of the control function and the learner . The control function then adapts the learner to produce recommendations or predictions that are consistent with the task target.
The following figure (Figure 2 from the original paper) shows the overall framework of Controllable Learning:

该图像是一个示意图,展示了可控学习的框架,包括上下文、可控学习者和输出。图中涵盖了历史数据、用户画像等要素,并且强调了多目标控制、用户画像控制和场景适应控制等目标。该图形有助于理解可控学习在信息检索中的应用。
VLM Description: The image is a diagram illustrating the framework of controllable learning, including context, controllable learner, and output. It covers elements such as historical data and user profile, emphasizing objectives like multi-objective control, user portrait control, and scenario adaptation control. This figure helps in understanding the applications of controllable learning in information retrieval.
4.2.2. Taxonomy of CL in Information Retrieval
The paper proposes a four-dimensional taxonomy to categorize CL methods in IR: What is Controllable, Who Controls, How Control is Implemented, and Where to Control.
4.2.2.1. What is Controllable
This dimension addresses "What does the task target in Definition 1 of CL look like?".
-
Multi-Objective Control: This refers to scenarios where both users and platforms have multiple, often conflicting, objectives (e.g.,
accuracy,diversity,novelty). Thetask targetrepresents an expected performance target balancing these objectives, and thetask descriptionexplicitly or implicitly conveys the preference weights for each objective (e.g., a vector[0.4, 0.6]foraccuracyanddiversity). The goal is to adapt the model to these shifting preferences without retraining. The following figure (Figure 3 from the original paper) illustrates the need forMulti-Objective Control:
该图像是示意图,展示了多目标控制的必要性。在测试阶段,用户的临时偏好可能会从'爱情'和'悬疑'转向'小说',而平台则更关注输出的多样性。图中说明了用户与平台在测试阶段目标动态变化的重要性。VLM Description: The image is an illustration demonstrating the need for multi-objective control. During the test stage, user preferences may temporarily shift from 'Love' and 'Suspense' to 'Fiction', while the platform focuses more on the diversity of outputs. It highlights the importance of dynamic goal changes for both users and platforms during the test stage.
-
User Portrait Control: This involves allowing users to edit their
context(e.g., personal profiles, interaction history) to influence the recommendation output. Thetask targetis achieved by modifying the input to thelearner. This enables personalization andprivacy protection. The following figure (Figure 4 from the original paper) provides examples ofUser Portrait Control:
该图像是示意图,展示了用户偏好控制的示例。用户的偏好摘要显示其过去对小说类电影的喜好,以及最近向情节驱动影片(如爱情和悬疑)转变的趋势。交互历史部分清晰列出了用户观看的电影,同时右侧的评分部分包括不同电影的评分情况,体现了用户对动作类电影的潜在偏好。VLM Description: The image is a diagram illustrating examples of user portrait control. The preference summary shows the user's past enjoyment of fiction films and a recent shift towards plot-driven movies like love and suspense. The interaction history clearly lists the movies watched by the user, while the rating section on the right includes the ratings for different films, reflecting the user's potential preference for action movies.
-
Scenario Adaptation Control: This addresses adapting the model to different
scenarios(e.g., different content pages, time segments) without retraining. Thetask descriptionincludesscenario-specific side information, allowing thecontrol functionto map thelearnerto ascenario-specific learner. The following figure (Figure 5 from the original paper) shows the workflow ofScenario Adaptation Control:
该图像是示意图,展示了场景适应控制的工作流程。在这一过程中,控制函数将普通学习者根据任务描述映射到特定场景的学习者,支持在不同场景下的动态适应,无需重训练。VLM Description: The image is a diagram illustrating the workflow of scenario adaptation control. In this process, the control function maps a general learner to a scenario-specific learner based on task descriptions, enabling dynamic adaptation across different scenarios without retraining.
4.2.2.2. Who Controls
This dimension addresses "Who proposes the task description in Definition 1?".
-
User-Centric Control: The user explicitly defines their preferences (task target) and provides this information in a specified format (task description), such as questionnaires, weighting buttons, or natural language. It can be
explicit(direct input) orimplicit(inferred from behaviors like interactions). Objectives includefostering interest,privacy protection,noise elimination, andexploration. -
Platform-Mediated Control: The platform (e.g., service provider) imposes algorithmic adjustments or policy-based constraints on the recommendation process. The
control requirementsare still expressed as , but thetask targetfocuses on optimizing platform objectives (e.g.,increasing diversityto promote less popular items,balancing multiple objectives,adapting to multi-scenariosfor performance,efficiencyfor cost reduction). The following figure (Figure 6 from the original paper) illustrates the objects ofUser-Centric ControlandPlatform-Centric Control:
该图像是示意图,展示了以用户为中心和以平台为中心的控制对象。左侧包含与用户相关的偏好、隐私、多样性、探索和数据过滤等概念,右侧则涉及到与平台相关的适应性、效率和准确性等方面。VLM Description: The image is a diagram illustrating the objects of user-centric and platform-centric control. The left side includes concepts related to users, such as preference, privacy, diversity, exploration, and data filtering, while the right side addresses platform-related aspects like adaptation, efficiency, and accuracy.
4.2.2.3. How Control is Implemented
This dimension summarizes the common control methods for implementing the control function .
-
Rule-Based Techniques: These involve applying predefined rules to the inputs or outputs of
recommender systems. They act as apatchwork solutionto enhance system performance from a product perspective.- Pre-processing: . The rule-based control mechanism processes
context information(e.g., user profiles, interaction history) to alter representations forprivacy preservationorfairness. - Post-processing: . The rule-based control mechanism directly modifies the output of the
learner(e.g., removing outdated items, promoting less popular items to enhancediversity).
- Pre-processing: . The rule-based control mechanism processes
-
Pareto Optimization: This technique is used when balancing multiple, often conflicting
objectives. Thetask targetrepresents a singlePareto optimal solutionthat satisfies multiple objectives simultaneously. Thetask descriptiontypically consists ofmulti-objective weightsorconstraints. The challenge is to guide thelearnerto achievePareto optimalityunder these constraints. The following figure (Figure 7 from the original paper) illustratesControllable Pareto Optimization:
该图像是示意图,展示了可控帕累托优化的影响。左侧显示了在没有控制的情况下的帕累托前沿,而右侧则展示了在控制目标1的情况下的帕累托前沿变化。图中用不同颜色和标记区分了各个目标,强调了控制在优化过程中的重要性。VLM Description: The image is an illustration showing the effects of controllable Pareto optimization. The left side displays the Pareto front without control, while the right side illustrates the changes in the Pareto front when controlling for Objective 1. Different colors and markings are used to differentiate the objectives, highlighting the importance of control in the optimization process.
-
Hypernetwork: A
neural networkthat generates parameters for another network. InCL, thehypernetworkacts as thecontrol function. Thetask description(e.g., task or domain description) is input to thehypernetwork, which then outputs all or part of the weights for thelearner, thus customizing the model to achieve thetask targetdynamically. -
Other Methods:
- Disentanglement: Decouples user interests into specific dimensions within the
latent space, allowing forcontrollable manipulationof aspects likeitem categorypreferences. - Reinforcement Learning (RL): Achieves
controllabilityby designing specificreward functionsthat guide the algorithm's learning from environmental interaction to meet desired control goals. - Large Language Models (LLMs): Leverages
LLMsfor their general intelligence and instruction-following capabilities.LLMscan be fine-tuned or prompted to act ascontrol functions, interpretingnatural language instructions(task descriptions) to guide recommendations or content generation. - Test-Time Adaptation (TTA): Algorithms that adapt a pre-trained model directly during the test stage using unlabeled test data, without retraining. While some
hypernetwork-basedmethods fall here,TTAgenerally focuses on reactive adjustments to data shifts rather than proactive control signals.
- Disentanglement: Decouples user interests into specific dimensions within the
4.2.2.4. Where to Control in Information Retrieval Models
This dimension categorizes CL methods based on where the control function is applied during the inference process.
-
Pre-Processing Methods: These methods achieve the
task targetby transforming themodel inputs(e.g., concatenatingtask descriptionsaspromptsor modifyinguser profiles/interaction histories) before inference, without altering themodel parametersitself. -
In-Processing Methods: These methods adaptively adjust the
parametersorhidden statesof thelearnerupon receiving thetask descriptionandcontextduring inference.Hypernetworksare a prime example, generating or modifying model parameters on-the-fly. -
Post-Processing Methods: These methods refine the
model outputsafter inference. Examples includereranking(e.g.,MMRfor balancing relevance and diversity) orresult diversificationto meettask targets.The overall framework classification and specific examples mentioned in Table 1 (which will be presented in the Results & Analysis section) illustrate how these dimensions combine in various
CLapplications.
5. Experimental Setup
As a survey paper, this work does not present novel experimental results for a new model but rather analyzes existing literature. Therefore, it discusses evaluation metrics and datasets that are commonly used in the field of controllable learning within information retrieval.
5.1. Datasets
The paper summarizes several publicly available datasets commonly used for controllable learning research in information retrieval and recommender systems. These datasets are chosen because they include features crucial for various control requirements, such as item category information for diversity control, and user profiles and interaction history for user portrait control.
- Amazon [53, 54]:
- Source: Product reviews from Amazon.
- Scale: Comprises 142.8 million product reviews.
- Characteristics: Covers various product categories, includes
useranditem profiles. Containscategory informationfor items andtime information. - Purpose: Suitable for
multi-objective control(e.g.,diversity,fairnessvia categories),user portrait control(using user profiles and historical sequences), and potentiallyscenario adaptation control(using time information for dynamic preferences).
- Ali Display_Ad_Click [55]:
- Source: Ad display/click logs from Alibaba.
- Scale: Includes records for 1 million users and 26 million ad display/click logs.
- Characteristics: Features 8
user profile attributes(e.g., ID, age, occupation) and 6item features(e.g., ID, campaign, brand). - Purpose: Useful for
user portrait control(via user profiles) andmulti-objective control(e.g., optimizingclick-through rateand other ad-related metrics).
- UserBehavior [1]:
- Source: User behaviors from Taobao's
recommender systems. - Scale: Collects behaviors of approximately one million randomly selected users.
- Characteristics: Covers all behaviors (clicks, purchases, add-to-cart, likes) during a specific period (November 25, 2017, to December 3, 2017).
- Purpose: Ideal for
user portrait control(editing historical sequences),multi-objective control(balancing different types of interactions), andscenario adaptation control(if specific temporal patterns within this window are considered scenarios).
- Source: User behaviors from Taobao's
- MovieLens:
- Source: Classical movie recommendation dataset.
- Scale: Available in various sizes (100k, 1M, 10M, 20M ratings).
- Characteristics: Includes information on
users'gender, age, and occupation, as well asitem category information(genres). - Purpose: Widely used for
user portrait control(user demographics),multi-objective control(balancing movie genres,accuracyvs.diversity), and generalrecommender systemresearch.
- MS MARCO [56] (Microsoft Machine Reading Comprehension):
-
Source: Compiled from real user queries extracted from Microsoft Bing's search logs.
-
Scale: Extensive dataset with 3.2 million documents and 8.8 million passages.
-
Characteristics: Designed for evaluating
machine reading comprehension,retrieval, andquestion-answering. Each query is paired with annotated relevant documents, spanning a wide variety of question types and document genres. -
Purpose: Primarily used for
search-focused IRtasks, offering potential forcontrollable learningin search (e.g., controlling query intent, result diversity, or relevance aspects).The paper notes that there are currently no dedicated domain-specific datasets explicitly designed for
controllable learning, indicating this as an open challenge.
-
5.2. Evaluation Metrics
The paper highlights the absence of specific evaluation criteria for controllable learning and suggests that appropriate use of existing information retrieval and multi-objective optimization metrics can verify controllability. The core idea is to assess whether the control function can effectively make the learner meet the task requirements . This can be checked by observing the correlation between a control parameter (representing the degree of control) and the desired performance (e.g., NDCG, diversity).
5.2.1. Single-Objective Metrics
These metrics are typically used to evaluate one specific aspect of a recommender system or information retrieval model.
Accuracy
Accuracy in recommender systems measures how well recommended items match user preferences.
-
NDCG (Normalized Discounted Cumulative Gain) [48]
- Conceptual Definition:
NDCGevaluates the quality of a ranked list by considering both therelevanceof items and theirpositionin the list. More relevant items at higher ranks contribute more to the score. It normalizes the score against an ideal ranking to allow comparison across different queries or recommendation lists. - Mathematical Formula:
- Symbol Explanation:
- : The total number of test samples (e.g., users or queries).
- : The cut-off rank, indicating that only the top items in the list are considered.
- : The
Discounted Cumulative Gainfor the -th test sample at rank . It sums the relevance scores, applying a logarithmic discount for lower-ranked items. - : The
Ideal Discounted Cumulative Gainfor the -th test sample at rank . This is the maximum possibleDCGscore, obtained by ranking all relevant items perfectly. - (or higher values for graded relevance): Denotes the
relevance labelof the -th item in the ranked list for the -th test sample. Here, it implies binary relevance (0 for irrelevant, 1 for relevant). - : The logarithmic discount factor, which reduces the contribution of items at lower ranks.
- Conceptual Definition:
-
Precision
- Conceptual Definition:
Precisionmeasures the proportion ofretrieved itemsthat are actuallyrelevant. A high precision indicates that the system primarily retrieves useful information and avoids irrelevant results. - Mathematical Formula:
- Symbol Explanation:
- : The total number of test samples.
- : The cut-off rank.
- : The set of top items outputted by the
IRmodel for the -th test sample. - : The set of ground-truth relevant items for the -th test sample that would ideally appear in a top list.
- : The number of relevant items among the top recommended items.
- Conceptual Definition:
-
Recall
- Conceptual Definition:
Recallmeasures the proportion oftotal relevant itemsthat were successfullyretrievedby the system. A high recall indicates that the system is good at finding most of the relevant information, even if it also retrieves some irrelevant items. - Mathematical Formula:
- Symbol Explanation:
- : The total number of test samples.
- : The cut-off rank.
- : The set of top items outputted by the
IRmodel for the -th test sample. - : The set of all ground-truth relevant items for the -th test sample (regardless of whether they are in the top or not).
- : The number of relevant items among the top recommended items.
- Conceptual Definition:
-
Hit Rate
- Conceptual Definition:
Hit Rateassesses whether at least onerelevant itemis present within the top recommendations. It's useful for scenarios where simply presenting any relevant option significantly impacts user satisfaction. - Mathematical Formula:
- Symbol Explanation:
- : The total number of test samples (e.g., users).
- : The cut-off rank.
- : An
indicator functionthat returns 1 if its argument is true, and 0 otherwise. - : The set of top items outputted by the
IRmodel for the -th test sample. - : The set of ground-truth relevant items for the -th test sample.
- : This condition is true if there is at least one overlap between the recommended top items and the relevant items.
- Conceptual Definition:
Diversity
Diversity in recommendation systems refers to the variety of items presented to users, aiming to avoid redundancy.
-
-NDCG [49]
- Conceptual Definition:
\begin{array} { r l } & { \alpha \mathrm { - N D C G } @ k = \displaystyle \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \frac { \alpha \mathrm { - D C G } _ { i } @ k } { \alpha \mathrm { - I D C G } _ { i } @ k } , } \ & { ~ \alpha \mathrm { - D C G } _ { i } @ k = \displaystyle \sum _ { j = 1 } ^ { k } \sum _ { l = 1 } ^ { m } \frac { t _ { j , l } ( 1 - \alpha ) ^ { c _ { j , l } } } { \log _ { 2 } ( j + 1 ) } , } \end{array}
* **Symbol Explanation**:
* $N$: The number of test samples.
* $k$: The cut-off rank.
* $\alpha \mathrm { - D C G } _ { i } @ k$: The \alpha
-Discounted Cumulative Gainfor the -th sample.- : The
Ideal\alpha-Discounted Cumulative Gainfor the -th sample. - : The total number of subtopics or categories.
t _ { j , l }: A binary indicator; if the -th item in the list coverssubtopic, and otherwise.- : A parameter between 0 and 1 that controls the penalty for redundancy. A higher means a stronger penalty.
c _ { j , l }: The count of how many timessubtopichas been covered by items appearing prior to the -th item in the ranked list.- : The logarithmic discount factor for rank position.
- : The
- Conceptual Definition:
\begin{array} { r l } & { \alpha \mathrm { - N D C G } @ k = \displaystyle \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \frac { \alpha \mathrm { - D C G } _ { i } @ k } { \alpha \mathrm { - I D C G } _ { i } @ k } , } \ & { ~ \alpha \mathrm { - D C G } _ { i } @ k = \displaystyle \sum _ { j = 1 } ^ { k } \sum _ { l = 1 } ^ { m } \frac { t _ { j , l } ( 1 - \alpha ) ^ { c _ { j , l } } } { \log _ { 2 } ( j + 1 ) } , } \end{array}
* **Symbol Explanation**:
* $N$: The number of test samples.
* $k$: The cut-off rank.
* $\alpha \mathrm { - D C G } _ { i } @ k$: The \alpha
-
ERR-IA (Expected Reciprocal Rank - Intent Aware) [50]
- Conceptual Definition:
ERR-IAextends theExpected Reciprocal Rank (ERR)metric to incorporateuser intents, making it suitable for evaluatingdiversityin situations where users might have multiple underlying information needs. It models user satisfaction as a function of itemrelevanceand alignment with diverseintents. - Mathematical Formula:
- Symbol Explanation:
- : The number of test samples.
- : The cut-off rank.
- : The rank of an item in the list.
- : Represents a specific user intent or subtopic.
- : The total number of intents/subtopics considered.
t _ { i l }: A binary indicator (likely1ifitemis relevant tointent,0otherwise). The paper's formula uses without explicit definition; typically, is used to indicate relevance of item at rank to intent . Assuming a slight typo and interpreting as relevance of item to intent . More commonly, forERR-IA, it's , the relevance of item at rank to intent .c _ { j , l }: Represents the count of how many items before rank have satisfiedintent.- : Reciprocal rank component.
- : Averaging over intents.
- : A component that models the probability of a user continuing to examine results given previous satisfaction.
- Conceptual Definition:
-
Coverage
- Conceptual Definition:
Coverageevaluates the proportion ofunique itemsfrom the entire item set that are included in the recommendations. Highcoveragemeans the system recommends a broad range of items, indicating a less biased or narrow selection. - Mathematical Formula:
- Symbol Explanation:
- : The number of test samples (e.g., users).
- : The cut-off rank.
- : The union of all unique items recommended across all users up to rank .
- (or as used in the paper's text below the formula): The total number of unique items available in the entire item set. The paper uses in the formula, but refers to it as in the text, so clarifying that refers to the set of all items.
- Conceptual Definition:
Fairness
Fairness in recommender systems aims to ensure equitable treatment for all users or items, regardless of their group affiliations.
-
Demographic Parity (DP)
- Conceptual Definition:
Demographic Parityensures that different demographic groups (e.g., based on gender or race) receive similar rates of recommendations for a specific outcome, regardless of their underlying characteristics or past interactions. It measures whether a positive outcome (e.g., being recommended) is equally likely across groups. - Mathematical Formula:
- Symbol Explanation:
S _ { 0 }: The set of individuals belonging to one demographic group (e.g., female).S _ { 1 }: The set of individuals belonging to another demographic group (e.g., male).- , : The number of individuals in
group 0andgroup 1, respectively. - : The predicted score or outcome (e.g., likelihood of being recommended, where 1 means recommended) for individual . The sum for a group represents the total "positive outcomes" for that group.
- The metric calculates the absolute difference in the average predicted outcome between the two groups. A value closer to 0 indicates higher
Demographic Parity.
- Conceptual Definition:
-
Equal Opportunity (EO)
- Conceptual Definition:
Equal Opportunityfocuses on ensuring that groups who truly deserve a positive outcome (i.e., have the sameground-truth label) are equally likely to receive that outcome. For instance, if two groups have the sameclickrate, they should have similarrecommendationrates. - Mathematical Formula:
- Symbol Explanation:
- (or other relevant labels): Represents the
ground-truth labelor true outcome (e.g., forrelevant, forirrelevant). The summation usually considers only positive outcomes (). - : The subset of
group 0individuals who haveground-truth label. - : The subset of
group 1individuals who haveground-truth label. - , : The number of individuals in these subsets.
- : The predicted score or outcome for individual .
- The metric typically focuses on the absolute difference in true positive rates (when ) between groups. A value closer to 0 indicates higher
Equal Opportunity.
- (or other relevant labels): Represents the
- Conceptual Definition:
-
Iso-Index
- Conceptual Definition:
Iso-Indexassesses theisolationorsegregationof certain groups within the retrieved or recommended results. A lowerIso-Indexsuggests less isolation, meaning a more equitable distribution of information across different groups. It combinesdiversityandfairnessaspects. - Mathematical Formula:
- Symbol Explanation:
- : A chosen
diversitymetric (e.g., \alpha`-NDCG`, `Coverage`). * $\mathrm { F a i r n e s s }$: A chosen `fairness` metric (e.g., `Demographic Parity`, `Equal Opportunity`). * $\lambda$: A `hyper-parameter` (between 0 and 1) that weights the importance of `diversity` versus `fairness` in the combined `Iso-Index`. #### Novelty `Novelty` in `recommender systems` measures how unfamiliar or "new" the recommended items are to the user, promoting exploration beyond known preferences. * **Novelty [110]** * **Conceptual Definition**: `Novelty` quantifies the extent to which recommended items differ from what a user has previously encountered or from popular items. It usually inversely correlates with item `popularity`, meaning less popular items are considered more novel. * **Mathematical Formula**: \mathrm { N o v e l t y } @ k = \sum _ { i = 1 } ^ { N } \sum _ { j = 1 } ^ { | \widehat { L } _ { i } ^ { k } | } \frac { \log ( \mathrm { P o p } ( \hat { l } _ { i , j } ) + 1 ) } { | \widehat { L } _ { i } ^ { k } | } , * **Symbol Explanation**: * $N$: The number of test samples (e.g., users). * $k$: The cut-off rank for recommendations. * $| \widehat { L } _ { i } ^ { k } |$: The number of items in the top $k$ recommended list for user $i$. * $\hat { l } _ { i , j }$: The $j$-th item in the ranked list $\widehat { L } _ { i } ^ { k }$ for user $i$. * $\mathrm { P o p } ( \hat { l } _ { i , j } )$: The `popularity` of item $\hat { l } _ { i , j }$, often measured by its frequency of appearance in the dataset or overall interactions. Less popular items result in higher `novelty` scores. * $\log ( \mathrm { P o p } ( \hat { l } _ { i , j } ) + 1 )$: A logarithmic transformation applied to `popularity` to reduce the impact of extremely popular items and to provide a more stable `novelty` score. The $+1$ is to avoid $\log(0)$ if an item has zero popularity. ### 5.2.2. Multi-Objective Optimization Metrics These metrics are used when `task requirements` involve multiple, potentially conflicting `objectives`, and the `control function` aims to generate a `Pareto front` or approximation of one. * **Hypervolume (HV) [111]** * **Conceptual Definition**: `Hypervolume` (also known as `Lebesgue measure`) quantifies the volume of the objective space that is `dominated` by a set of solutions, bounded by a predefined `reference point`. It simultaneously assesses both the `convergence` (how close the solutions are to the true `Pareto front`) and `diversity` (how well the solutions cover the `Pareto front`) of the solution set. A larger `HV` value indicates a better-performing solution set. * **Mathematical Formula**: \mathrm { H } \mathbf { V } = \lambda \left( \bigcup _ { i = 1 } ^ { | S | } \prod _ { j = 1 } ^ { m } \left[ f _ { j } ^ { ( i ) } , z _ { j } \right] \right) , * **Symbol Explanation**: * $S$: The set of `Pareto optimal solutions` obtained by the algorithm. * $| S |$: The number of solutions in the set $S$. * $f _ { j } ^ { ( i ) }$: The value of the $j$-th `objective function` for the $i$-th solution in the set $S$. * `z _ { j }`: The `reference point` in the $j$-th objective space. This point must be `dominated` by all solutions in the `Pareto front` (i.e., it represents the worst possible values for each objective). * $\prod _ { j = 1 } ^ { m } \left[ f _ { j } ^ { ( i ) } , z _ { j } \right]$: Represents a `hyper-rectangle` formed by the $i$-th solution and the `reference point` across $m$ objectives. * $\bigcup _ { i = 1 } ^ { | S | } (\dots)$: The union of all such `hyper-rectangles`, representing the total volume `dominated` by the solution set $S$. * $\lambda ( \cdot )$: Denotes the `Lebesgue measure`, which calculates the `volume` of the specified region in the objective space. * **R2 [112]** * **Conceptual Definition**: `R2` is a `scalarization-based metric` used to evaluate the quality of a solution set in `multi-objective optimization`. It measures the proximity of the obtained solution set to an ideal set by considering a set of `weight vectors` that represent user preferences. It doesn't require knowing the true `Pareto front`. * **Mathematical Formula**: { \mathrm R } 2 = { \frac { 1 } { | W | } } \sum _ { w \in W } \operatorname* { m i n } _ { x \in P } \sum _ { i = 1 } ^ { m } w _ { i } f _ { i } ( x ) , * **Symbol Explanation**: * $W$: A set of `weight vectors`, where each $w \in W$ represents a different preference for the objectives. Each $w_i$ is a weight for the $i$-th objective. * $| W |$: The number of `weight vectors` in the set $W$. * $P$: The set of solutions (e.g., `Pareto optimal solutions`) being evaluated. * $x$: A single solution within the set $P$. * `f _ { i } ( x )`: The value of the $i$-th objective function for solution $x$. * $\sum _ { i = 1 } ^ { m } w _ { i } f _ { i } ( x )$: The `scalarized value` of solution $x$ for a given `weight vector` $w$, representing a weighted sum of its objective values. * $\operatorname* { m i n } _ { x \in P } (\dots)$: Finds the minimum `scalarized value` among all solutions in $P$ for a given `weight vector` $w$. * The metric averages these minimum `scalarized values` across all `weight vectors` in $W$. A lower `R2` value generally indicates better performance (closer to the ideal). * **Generational Distance (GD) [113]** * **Conceptual Definition**: `Generational Distance` quantifies the `convergence` of an obtained solution set $P$ to the true `Pareto front` $P ^ { * }$. It calculates the average `Euclidean distance` from each solution in $P$ to its nearest point on $P ^ { * }$. A lower `GD` indicates that the obtained solutions are closer to the true `Pareto front`. * **Mathematical Formula**: \mathrm { G D } = \left( \frac { 1 } { | P | } \sum _ { i = 1 } ^ { | P | } d _ { i } ^ { p } \right) ^ { \frac { 1 } { p } } , * **Symbol Explanation**: * $P$: The set of solutions obtained by the algorithm. * $| P |$: The number of solutions in $P$. * `d _ { i }`: The `Euclidean distance` from the $i$-th solution in $P$ to the nearest point on the true `Pareto front` $P ^ { * }$. * $p$: A parameter, typically set to 2 (for standard `Euclidean distance`). * **Inverted Generational Distance (IGD) [114]** * **Conceptual Definition**: `Inverted Generational Distance` evaluates both `convergence` and `diversity` by calculating the average `Euclidean distance` from each point on the true `Pareto front` $P ^ { * }$ to its nearest solution in the obtained set $P$. It measures how well the obtained set $P$ covers the true `Pareto front`. A lower `IGD` indicates better performance. * **Mathematical Formula**: \mathrm { I G D } = \frac { 1 } { | P ^ { * } | } \sum _ { j = 1 } ^ { | P ^ { * } | } d _ { j } , * **Symbol Explanation**: * $P ^ { * }$: The true `Pareto front`. * $| P ^ { * } |$: The number of points on the true `Pareto front` $P ^ { * }$. * `d _ { j }`: The `Euclidean distance` from the $j$-th point on the true `Pareto front` $P ^ { * }$ to the nearest solution in the obtained set $P$. ## 5.3. Baselines As a survey paper, the document does not propose a single new model to compare against `baselines`. Instead, it discusses various `controllable learning` methods from the literature. Therefore, the concept of `baselines` in the traditional experimental sense (where a new method is compared to established state-of-the-art models) does not directly apply here. Each surveyed paper would have its own set of `baselines` relevant to the specific `controllable task` it addresses. The paper implicitly highlights the limitations of traditional non-controllable or less controllable methods, which would serve as conceptual `baselines` for the `controllable learning` paradigm. These include: * `Traditional recommender systems` that are fixed after training and require full retraining for any change in objectives or user preferences. * `Domain adaptation` or `transfer learning` methods that involve retraining or significant adaptation, unlike the test-time, no-retraining requirement of `CL`. * `Generative models` that are `prompt-based` but don't allow for in-processing parameter adjustments. The purpose of this survey is to categorize and analyze existing `controllable learning` approaches themselves, rather than to benchmark a new one against `baselines`. # 6. Results & Analysis As a survey paper, this document does not present novel experimental results but rather synthesizes and categorizes existing research. The "results" of this paper are its comprehensive taxonomy and the structured overview of various `controllable learning (CL)` methods in `information retrieval (IR)`. ## 6.1. Core Results Analysis The core contribution of this paper is the creation of a systematic framework for understanding `CL`. Through its multi-dimensional taxonomy, the survey demonstrates that `controllability` is not a monolithic concept but can be achieved in various ways, for different purposes, by different actors, and at different stages of the `ML` pipeline. The analysis reveals that: * **Diverse `Controllable Objectives`**: `CL` addresses a wide array of needs, from `multi-objective balancing` (e.g., `accuracy` vs. `diversity`) to `privacy protection` (via `user portrait control`) and `environmental robustness` (via `scenario adaptation`). This underscores the versatility and importance of `CL` in real-world applications. * **Dual Control Perspectives**: Both `users` and `platforms` seek `controllability`, but their motivations differ. `User-centric control` often emphasizes `personalization`, `privacy`, and `exploration`, while `platform-mediated control` focuses on `system-wide objectives` like `diversity`, `efficiency`, and `adaptability` across tasks/scenarios. * **Evolving Technical Landscape**: `CL` is implemented using a growing suite of techniques, from foundational `rule-based methods` and `Pareto optimization` to cutting-edge `hypernetworks`, `disentangled representations`, `reinforcement learning`, and leveraging `Large Language Models (LLMs)`. `Hypernetworks` emerge as a particularly powerful tool for `in-processing control` due to their ability to dynamically generate model parameters. * **Pipeline Integration**: `Controllability` can be injected at various stages: `pre-processing` (modifying inputs/prompts), `in-processing` (adjusting model parameters/hidden states), and `post-processing` (re-ranking or filtering outputs). `In-processing` methods, especially those leveraging `hypernetworks`, appear to offer more fundamental adaptation without full retraining. The table presented in the paper, `Table 1`, succinctly summarizes the surveyed methods according to this taxonomy, providing a quick reference for researchers. It highlights the recency of much of this research, with many papers from 2023 and 2024, indicating a rapidly developing field. The following are the results from Table 1 of the original paper: <div class="table-wrapper"><table> <thead> <tr> <td colspan="2">Method Information</td> <td colspan="4">Paradigm of Controllable Learning</td> </tr> <tr> <td>Method</td> <td>Year</td> <td>What</td> <td>Who</td> <td>CL Tech.</td> <td>Where</td> </tr> </thead> <tbody> <tr> <td>MocDT [103]</td> <td>2025</td> <td>multi-objective control</td> <td>user-centric control</td> <td>RL</td> <td>in-processing</td> </tr> <tr> <td>PadiRec [4]</td> <td>2024</td> <td>multi-objective control</td> <td>platform-mediated control</td> <td>hypernetwork</td> <td>in-processing</td> </tr> <tr> <td>FollowIR [45]</td> <td>2024</td> <td>user portrait control</td> <td>user-centric control</td> <td>SFT</td> <td>pre-processing</td> </tr> <tr> <td>InstructIR [46]</td> <td>2024</td> <td>user portrait control</td> <td>user-centric control</td> <td>SFT</td> <td>pre-processing</td> </tr> <tr> <td>RecLM-gen [5]</td> <td>2024</td> <td>multi-objective control</td> <td>platform-mediated control</td> <td>SFT, RL</td> <td>in-processing</td> </tr> <tr> <td>IFRQE [7]</td> <td>2024</td> <td>user portrait control</td> <td>user-centric control</td> <td>others</td> <td>pre-processing</td> </tr> <tr> <td>TEARS [9]</td> <td>2024</td> <td>user portrait control</td> <td>user-centric control</td> <td>RL, NLIE</td> <td>in-processing</td> </tr> <tr> <td>CMBR [10]</td> <td>2024</td> <td>user portrait control</td> <td>user-centric control</td> <td>SFT, RL</td> <td>in-processing</td> </tr> <tr> <td>LangPTune [17]</td> <td>2024</td> <td>user portrait control</td> <td>user-centric control</td> <td>RL, NLIE</td> <td>in-processing</td> </tr> <tr> <td>CCDF [21]</td> <td>2024</td> <td>multi-objective control</td> <td>platform-mediated control</td> <td>others</td> <td>in-processing</td> </tr> <tr> <td>CMR [3]</td> <td>2023</td> <td>multi-objective control</td> <td>platform-mediated control</td> <td>hypernetwork</td> <td>in-processing</td> </tr> <tr> <td>LACE [6]</td> <td>2023</td> <td>user portrait control</td> <td>user-centric control</td> <td>NLIE</td> <td>pre-processing</td> </tr> <tr> <td>UCR [11]</td> <td>2023</td> <td>user portrait control</td> <td>user-centric control</td> <td>others</td> <td>pre-processing</td> </tr> <tr> <td>Hamur [13]</td> <td>2023</td> <td>scenario adaptation control</td> <td>platform-mediated control</td> <td>hypernetwork, TTA</td> <td>in-processing</td> </tr> <tr> <td>HyperBandit [12]</td> <td>2023</td> <td>scenario adaptation control</td> <td>platform-mediated control</td> <td>hypernetwork, TTA</td> <td>in-processing</td> </tr> <tr> <td>PEPNet [14]</td> <td>2023</td> <td>scenario adaptation control</td> <td>user-centric control</td> <td>hypernetwork</td> <td>in-processing</td> </tr> <tr> <td>SAMD [23]</td> <td>2023</td> <td>scenario adaptation control</td> <td>platform-mediated control</td> <td>hypernetwork</td> <td>in-processing</td> </tr> <tr> <td>DTRN [24]</td> <td>2023</td> <td>scenario adaptation control</td> <td>user-centric control</td> <td>hypernetwork</td> <td>in-processing</td> </tr> <tr> <td>MoFIR [96]</td> <td>2022</td> <td>multi-objective control</td> <td>user-centric control</td> <td>pareto optimization</td> <td>in-processing</td> </tr> <tr> <td>UCRS [2]</td> <td>2022</td> <td>multi-objective control</td> <td>user-centric control</td> <td>others</td> <td>pre-processing</td> </tr> <tr> <td>PAPERec [95]</td> <td>2021</td> <td>multi-objective control</td> <td>user-centric control</td> <td>pareto optimization</td> <td>in-processing</td> </tr> <tr> <td>Supervised β-VAE [15]</td> <td>2021</td> <td>user portrait control</td> <td>user-centric control</td> <td>Disentanglement</td> <td>in-processing</td> </tr> <tr> <td>ComiRec [1]</td> <td>2020</td> <td>multi-objective control</td> <td>platform-mediated control</td> <td>others</td> <td>post-processing</td> </tr> <tr> <td>LP [16]</td> <td>2020</td> <td>user portrait control</td> <td>user-centric control</td> <td>Disentanglement</td> <td>in-processing</td> </tr> <tr> <td>MMR [47]</td> <td>1998</td> <td>multi-objective control</td> <td>platform-mediated control</td> <td>rule-based</td> <td>post-processing</td> </tr> </tbody> </table></div> ## 6.2. Ablation Studies / Parameter Analysis As a survey paper, the authors do not perform their own `ablation studies` or `parameter analyses`. Instead, they summarize findings from individual research papers that often include such analyses. For instance, the paper mentions: * `ComiRec [1]` introduces a controllable hyperparameter $\lambda$ to adjust `diversity`, and notes that `empirical evidence suggests that while diversity improves, accuracy can be compromised to some degree`. This implies a trade-off that would be explored through parameter analysis. * `UCRS [2]` provides `control coefficients`\alpha`and`\beta$ to regulate `accuracy`, `isolation`, and `diversity`. Adjustments to these coefficients allow for dynamic control without `retraining`. These examples demonstrate that `controllability` often involves `hyperparameters` that trade off different objectives, and researchers typically perform analyses to understand the impact of these parameters on various `metrics`. The current survey, however, focuses on categorizing these approaches rather than re-evaluating them. # 7. Conclusion & Reflections ## 7.1. Conclusion Summary This survey provides a crucial formal definition of `controllable learning (CL)` and a comprehensive, multi-dimensional taxonomy for its application in `information retrieval (IR)`. It clarifies `CL`'s role as a vital component of `trustworthy machine learning`, enabling models to adapt to dynamic `task requirements` at test time without `retraining`. The paper categorizes `CL` by `what` is controllable (multi-objectives, user portraits, scenarios), `who` controls (users or platforms), `how` control is implemented (rule-based, Pareto optimization, hypernetworks, LLMs, disentanglement, RL), and `where` control occurs in the inference pipeline (pre-processing, in-processing, post-processing). It highlights the increasing demand for `CL` in the `Model-as-a-Service (MaaS)` paradigm and outlines significant challenges and promising future directions, particularly in theoretical analysis, computational efficiency, integration with `LLMs`, and developing dedicated evaluation frameworks. ## 7.2. Limitations & Future Work The authors identify several key limitations and suggest future research directions: * **Balancing Difficulty in Training**: A pivotal challenge is the trade-off between `controllability` and other performance metrics (e.g., `accuracy`). Achieving `controllability` often compromises other `user-centric optimization metrics`. * **Absence of Evaluation Standards**: There is a significant lack of standardized benchmarks and specific `evaluation metrics` for `controllable learning`. Existing works use disparate evaluation approaches, hindering direct comparisons and field progression. * **Setting `Task Descriptions`**: A crucial issue is how to effectively set the `task target` and transform it into a human-understandable and precise `task description`. These descriptions are not limited to vectors or text but could also be images, graphs, or rules. * **Challenges in `Online Environments`**: Integrating `controllable learning` principles into `streaming IR applications` (e.g., `online learning`, `reinforcement learning`) is difficult. Current methods often aren't equipped for swift changes in preferences without costly `retraining` in real-time settings. * **Theoretical Analyses of `Controllable Learning`**: There is a need for rigorous theoretical analysis to understand how `task targets` map to `model parameters` in vast `deep learning models`, and to uncover structural information and `causal associations`. * **`Controllable Sequential Decision-Making Models`**: In streaming applications with `bandit feedback`, balancing `exploration` and `exploitation` while achieving `adaptive control` over `task requirements` is a critical theoretical and practical challenge. * **Empowering `LLM-based AIGC` through `Controllable Learning`**: While `LLMs` are used for `controllable generation` via `prompts`, deeper exploration of `CL techniques` to manipulate `LLM model parameters` or `outputs` for specific `task targets` (e.g., `multi-objective preferences`) is needed. * **Cost-Effective Control Mechanisms**: `Controllable learners` introduce additional computational costs. Research into efficient and `cost-effective control mechanisms` is imperative, especially for `large-scale models`. * **`Controllable Learning` for `Multi-Task Switching`**: Most existing `CL` research focuses on `recommender systems`. Extending `CL` to `search` and enabling `adaptive switching` between diverse tasks, objectives, and scenarios with a small set of `controllable models` is a key future direction. * **Demand for Resources and Metrics**: The field lacks dedicated datasets and standardized `evaluation metrics` for `controllable learning`. Collecting or constructing `labels` or `user feedback` across multiple objectives or diverse `task requirements` is crucial. ## 7.3. Personal Insights & Critique This survey is a highly valuable and timely contribution to the `machine learning` community. Its formal definition of `controllable learning` and the proposed multi-dimensional taxonomy provide a much-needed framework for a nascent but critical field. One of the paper's strengths is its clear articulation of `controllability` as distinct from, yet complementary to, other `trustworthy AI` pillars like `explainability` and `fairness`. This distinction is often muddled in discussions, and the paper sets a solid foundation. The emphasis on `test-time adaptation` without `retraining` is particularly insightful, acknowledging the practical realities and computational constraints of deploying large `AI` models in dynamic `MaaS` environments. This operational definition makes `CL` a highly pragmatic concept. The comprehensive categorization by `what`, `who`, `how`, and `where` is exceptionally helpful for organizing existing literature and identifying gaps. It allows researchers to pinpoint specific areas where innovation is most needed. For instance, the observation that `hypernetworks` are becoming a dominant `in-processing control` technique is a significant finding that can guide future algorithmic development. **Potential Issues/Unverified Assumptions**: * **Generality of `Definition 1`**: While `Definition 1` is robust, the practical realization of mapping any `task requirement triplet` $\mathcal{T}$ to an adapted `learner` $f_{\mathcal{T}}$ without retraining can be profoundly complex. The implicit assumption is that the base `learner` $f$ must be pre-trained in a way that allows for such flexibility, which itself is a challenging design problem. * **Scalability of `Control Functions`**: Many `control functions` (especially those involving `hypernetworks` or `LLMs`) can be computationally intensive themselves. The claim of "cost-effective control learning mechanisms" as a future direction acknowledges this, but the actual computational overhead of $h$ could become a bottleneck, especially for very large models or very rapid adaptation needs. * **Subjectivity of `Task Targets`**: `Task targets` $s _ { \mathrm { t g t } }$ can be highly subjective, especially for users (e.g., "more interesting recommendations"). Translating such vague human desires into precise `task descriptions` $s _ { \mathrm { d e s c } }$ (e.g., a numerical vector) that a `control function` can reliably interpret remains a significant challenge, even with `LLMs`. The paper touches on this in `Setting Task Descriptions` but it remains a foundational hurdle. * **True `Zero-Shot` Control**: The ideal `CL` implies adapting to entirely novel `task requirements` (\mathcal{T}'$$) at test time without prior exposure during training. WhilehypernetworksandLLMsshow promise, achieving truly robustzero-shotcontrollabilityfor arbitrary new goals is an ambitious goal that might require more sophisticated meta-learning or compositionalcontrol functions.
- : A chosen
Transferability and Applications: The concepts and taxonomy presented are highly transferable.
Controllable learningis not limited toinformation retrieval; it can be applied to almost anymachine learningdomain where dynamic adaptation and user/platform steering are desired. Examples include: - Conceptual Definition:
-
Healthcare: Controlling diagnostic models for specific patient subgroups or ethical considerations.
-
Finance: Adapting fraud detection models to evolving attack patterns or specific regulatory compliance requirements.
-
Autonomous Systems: Allowing human operators to adjust the behavior of
autonomous vehiclesorrobotsin unforeseen scenarios. -
Scientific Discovery: Guiding
AImodels to explore specific regions of a chemical or material space based on emergent hypotheses.This paper serves as an excellent foundational text. Its primary value lies in formalizing and structuring a crucial area of
trustworthy AI, providing a common language and a roadmap for future research. The identified challenges, particularly around evaluation and theoretical understanding, underscore thatcontrollable learningis still in its early stages but holds immense potential for makingAIsystems more robust, adaptable, and aligned with human values.
Similar papers
Recommended via semantic vector search.