From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations
TL;DR Summary
This study integrates persona traits into LLMs to analyze their stability and impact on emotional support dialogue quality and strategy, enhancing personalization and empathy in AI-driven emotional support conversations.
Abstract
The rapid advancement of Large Language Models (LLMs) has revolutionized the generation of emotional support conversations (ESC), offering scalable solutions with reduced costs and enhanced data privacy. This paper explores the role of personas in the creation of ESC by LLMs. Our research utilizes established psychological frameworks to measure and infuse persona traits into LLMs, which then generate dialogues in the emotional support scenario. We conduct extensive evaluations to understand the stability of persona traits in dialogues, examining shifts in traits post-generation and their impact on dialogue quality and strategy distribution. Experimental results reveal several notable findings: 1) LLMs can infer core persona traits, 2) subtle shifts in emotionality and extraversion occur, influencing the dialogue dynamics, and 3) the application of persona traits modifies the distribution of emotional support strategies, enhancing the relevance and empathetic quality of the responses. These findings highlight the potential of persona-driven LLMs in crafting more personalized, empathetic, and effective emotional support dialogues, which has significant implications for the future design of AI-driven emotional support systems.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations
1.2. Authors
- Shenghan Wu (National University of Singapore)
- Yimo Zhu (National University of Singapore)
- Wynne Hsu (National University of Singapore)
- Mong-Li Lee (National University of Singapore)
- Yang Deng (Singapore Management University)
1.3. Journal/Conference
This paper is published as a preprint on arXiv (arxiv.org). arXiv is a free open-access archive for scholarly articles in various fields, including computer science, mathematics, and physics. While it hosts preprints that have not yet undergone peer review, many papers later appear in reputable conferences or journals. Its reputation lies in its role as a platform for rapid dissemination of research findings.
1.4. Publication Year
2025 (Published at UTC: 2025-02-17T05:24:30.000Z)
1.5. Abstract
The paper investigates the role of personas in generating emotional support conversations (ESC) using Large Language Models (LLMs). It employs established psychological frameworks to measure and infuse persona traits into LLMs for dialogue generation in emotional support scenarios. Extensive evaluations are conducted to assess the stability of persona traits in generated dialogues, trait shifts post-generation, and their impact on dialogue quality and emotional support strategy distribution. Key findings include: 1) LLMs can infer core persona traits, 2) emotionality and extraversion traits exhibit subtle shifts, influencing dialogue dynamics, and 3) persona application modifies emotional support strategy distribution, enhancing relevance and empathetic quality. These findings underscore the potential of persona-driven LLMs in creating more personalized, empathetic, and effective emotional support dialogues, with significant implications for future AI-driven emotional support systems.
1.6. Original Source Link
Official Source: https://arxiv.org/abs/2502.11451 PDF Link: https://arxiv.org/pdf/2502.11451v2.pdf Publication Status: Preprint (on arXiv).
2. Executive Summary
2.1. Background & Motivation
The rapid advancement of Large Language Models (LLMs) has significantly impacted the generation of emotional support conversations (ESC), offering solutions that are scalable, cost-effective, and enhance data privacy. However, a crucial challenge identified in generative AI applications, particularly in ESC, is the lack of human intuition. Effective emotional support requires considering individual differences, such as personality traits, emotional states, and contextual factors. While prior research has made strides in measuring the personality characteristics of LLMs using psychological inventories, there remains a significant gap in understanding how these persona-related aspects truly influence the generation of emotional support dialogues.
The core problem this paper addresses is to systematically investigate the relationship between LLM-generated emotional support dialogues and persona traits through psychological measurement. This is important because existing ESC generation methods often lack the nuanced personalization and empathy that human interactions provide. Previous methods for creating ESC corpora (e.g., skilled crowdsourcing, therapist session transcription, online Q&A) face limitations like high costs, privacy concerns, and data quality variability. LLMs offer a promising alternative for large-scale data generation, but their outputs can lack human-like individuality. The paper's entry point is to bridge this gap by infusing personas into LLM-driven ESC generation and rigorously evaluating their impact using established psychological frameworks.
2.2. Main Contributions / Findings
The paper makes several primary contributions and key findings that address the aforementioned problems:
-
Ability to Infer Core Persona Traits: The research demonstrates that
LLMscan infer stablepersona traits(like personalities and communication styles) from givenpersonasinemotional support scenarios. Specifically,gpt-4o-minishowed strong correlations between personality and communication styles that align with established psychological theories. This finding validates the foundational premise thatLLMscan understand and internalizepersonadescriptions. -
Persona Consistency with Subtle Shifts: Dialogues generated by
LLMslargely retain the originalpersona traits, butsubtle shiftsoccur, particularly inemotionality(tendency to become higher) andextraversion(tendency to become lower) forLLM-simulatedseekers. These shifts are attributed to theseeker's role in anemotional support dialogue, where they are dealing with emotional issues. This finding highlights the dynamic nature ofpersonamanifestation in interactive contexts. -
Impact on Emotional Support Strategy Distribution: Infusing
persona traitsinto theLLM-generatedemotional support dialoguessignificantlymodifies the distribution of emotional support strategies.Persona-enhanced dialogues leadsupportersto focus more on understanding theseeker's problems throughquestioningand to offerreassuranceandencouragementmore gently. Conversely,dialogues without personasshowed a higher reliance onself-disclosureand direct problem explanation. This enhancement contributes to morerelevantandempatheticresponses.These findings collectively address the challenge of infusing human-like intuition and personalization into
AI-drivenemotional support systems. By showing thatpersonascan be effectively inferred, maintained, and used to shape conversational strategies, the paper paves the way for designingAImodels that provide more adaptive, empathetic, and effective emotional support.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To fully understand this paper, a reader should be familiar with the following core concepts:
-
Emotional Support Conversations (ESC): These are dialogues where one participant, the "supporter," aims to help another participant, the "seeker," to alleviate stress, address emotional difficulties, and promote mental well-being. This often involves active listening, empathy, reassurance, and guidance.
-
Large Language Models (LLMs): These are advanced artificial intelligence models trained on vast amounts of text data. They are capable of understanding, generating, and processing human language for a wide range of tasks, including text generation, translation, summarization, and conversation. Examples include
GPTseries (likegpt-4o-mini),Claudeseries (likeClaude-3.5-Haiku), andLLaMAseries (likeLLaMA-3.1-8B-Instruct). -
Personas: In the context of
AIand user experience, apersonais a fictional character created to represent a user type or a specific individual with distinct characteristics. In this paper,personasdescribe individuals with their age, gender, occupation, socio-demographic background, emotional state, and problems, along with their personality traits and communication styles. The goal is to makeAIinteractions more human-like and personalized. -
Psychological Frameworks (HEXACO and CSI): These are standardized tools used in psychology to measure human personality and communication styles.
- HEXACO Model: This is a model of human personality structure that organizes personality traits into six broad dimensions or factors. The acronym
HEXACOstands for:- Honesty-Humility: Sincerity, fairness, greed-avoidance, modesty.
- Emotionality: Fearfulness, anxiety, dependence, sentimentality.
- Extraversion: Social self-esteem, social boldness, sociability, liveliness.
- Agreeableness: Forgiveness, gentleness, flexibility, patience.
- Conscientiousness: Organization, diligence, perfectionism, prudence.
- Openness to Experience: Aesthetic appreciation, inquisitiveness, creativity, unconventionality.
- Communication Styles Inventory (CSI): This framework describes different patterns of verbal and nonverbal behaviors individuals use when interacting. The paper refers to a six-dimensional model:
- Expressiveness: The degree to which a person outwardly displays emotions and feelings.
- Preciseness: The tendency to be clear, specific, and accurate in communication.
- Verbal Aggressiveness: The extent to which a person uses assertive or combative language.
- Questioningness: The propensity to ask questions and seek clarification.
- Emotionality: Similar to the HEXACO dimension, this refers to the degree of emotional content and expression in communication.
- Impression Manipulativeness: The tendency to manage or control the impression one makes on others.
- HEXACO Model: This is a model of human personality structure that organizes personality traits into six broad dimensions or factors. The acronym
-
Pearson Correlation Coefficient (): A statistical measure that quantifies the linear relationship between two sets of data. It ranges from -1 to +1, where +1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship. The formula for the Pearson correlation coefficient is: $ r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} $ Where:
- : The number of pairs of data points (e.g., the number of personas for which HEXACO and CSI scores are calculated).
- : The sum of the products of the corresponding and values.
- : The sum of all values.
- : The sum of all values.
- : The sum of the squares of all values.
- : The sum of the squares of all values.
-
Principal Component Analysis (PCA): A statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
PCAis often used for dimensionality reduction, helping to visualize high-dimensional data in 2D or 3D space while retaining as much variance as possible. In this paper, it's used to project personality scores into a 2D space for visualization.
3.2. Previous Works
The paper contextualizes its work by referencing several prior studies, especially concerning emotional support dialogues and the integration of personas and LLMs.
-
Traditional ESC Corpora Development: Early efforts focused on collecting
emotional question-answerdata from online platforms (Medeiros and Bosse, 2018; Sharma et al., 2020b; Turcan and McKeown, 2019; Garg et al., 2022). These provided insights into user emotions but were limited to single-turn interactions.Empathetic Dialogue dataset(Rashkin et al., 2019) introduced multi-turn dialogues through crowdsourcing.ESConv(Liu et al., 2021) further advanced this by incorporating emotional support strategies from psychological theories, enabling chatbots to use these strategies.ESConvis a key dataset used in this paper for history generation and strategy analysis.CAMS(Garg et al., 2022) andDreaddit(Turcan and McKeown, 2019) areQAdatasets derived from Reddit posts about mental health, also used in this paper for persona extraction.
-
LLM-based ESC Generation: Recent studies leverage the power of
LLMsto generate large-scaleemotional support dialogue datasetsthrough role-playing, offering advantages like lower costs (Zheng et al., 2023, 2024; Qiu and Lan, 2024; Wu et al., 2024). TheExTESdataset (Zheng et al., 2024) specifically usesLLMsto create dialogues withemotional support strategies. -
Persona in Emotional Support: Recognizing the need for personalization, research has started integrating
personasintoESC.ESC dataset(Zhang et al., 2024) introducedpersonasinto the dialogue generation process.- Zhao et al. (2024) proposed a framework to extract
personasfrom existing datasets for evaluation. Personashave also been incorporated into chatbots for generating personalized responses (Tu et al., 2023; Ait Baha et al., 2023; Ma et al., 2024).
-
Psychological Analysis of LLMs: The importance of psychological perspectives in
AIhas been emphasized.- Huang et al. (2023b) argue for psychological analysis of
LLMsto create more human-like interactions. - Studies have made progress in measuring
LLMpersonality using established psychological inventories (Frisch and Giulianelli, 2024; Safdari et al., 2023).
- Huang et al. (2023b) argue for psychological analysis of
3.3. Technological Evolution
The evolution of emotional support systems has progressed from simple question-answer systems to multi-turn dialogue models, and more recently, to systems that can incorporate sophisticated emotional support strategies. With the advent of LLMs, there's been a shift towards using these powerful models for ESC generation due to their high-quality output and scalability. However, a key evolutionary step, which this paper contributes to, is moving beyond generic LLM generation to persona-driven ESC. This involves understanding and leveraging individual differences, personality traits, and communication styles, as inspired by psychological theories, to create truly personalized and empathetic AI assistants. This paper places its work at the intersection of advanced LLM capabilities and psychological theory, aiming to make AI dialogue generation more aligned with human expectations of nuanced emotional support.
3.4. Differentiation Analysis
Compared to the main methods in related work, this paper's core differences and innovations lie in its rigorous, psychologically-grounded approach to understanding the impact of personas on LLM-generated emotional support conversations.
-
Beyond Generation to Deep Analysis: While previous works have used
LLMsto generateESCor incorporatedpersonasinto chatbots for personalization, this paper goes a step further by systematically investigating howpersona traitsare inferred, how consistently they are maintained during generation, and how they specificallyinfluence the distribution of emotional support strategies. This moves beyond simply generating personalized dialogue to deeply analyzing the underlying mechanisms and effects. -
Psychological Measurement Integration: A significant innovation is the direct application of established psychological inventories (
HEXACOfor personality andCSIfor communication style) to measureLLMs' ability to infer and maintainpersona traits. This provides a quantitative, theory-backed method for evaluating thehuman-likenessandconsistencyofLLMoutputs, a gap identified in previous research onLLMpersonality characteristics. -
Focus on Strategy Distribution: The paper meticulously examines how
persona traitsalter the adoption of specificemotional support strategies. This granular analysis, including human evaluation and case studies, provides actionable insights into howpersona-drivenLLMscan be fine-tuned to produce more effective and empathetic support, contrasting with prior works that might only note general improvements in personalization. -
Empirical Evidence for Trait Shifts: The observation of subtle shifts in
emotionalityandextraversioninLLM-simulatedseekersduringESCoffers a novel insight into the dynamics ofLLMbehavior in specific conversational contexts. This highlights thatpersonamanifestation is not static but can be influenced by the conversational role.In essence, this paper differentiates itself by providing a comprehensive,
psychologically-informedframework for evaluating and understanding the impact ofpersonasonLLM-synthesizedESC, moving beyond anecdotal evidence to rigorous empirical analysis.
4. Methodology
The paper proposes an LLM-based simulation framework to investigate the impact of personas on LLM-synthesized emotional support conversations (ESC). The methodology is structured around three key research questions (RQ1, RQ2, RQ3).
4.1. Principles
The core idea is to leverage the advanced language generation and comprehension capabilities of Large Language Models (LLMs) to simulate emotional support conversations and concurrently use these LLMs to interpret and quantify persona traits based on established psychological frameworks. This allows for a systematic study of how predefined personas are inferred, maintained, and how they influence the emotional support strategies employed in generated dialogues. The theoretical basis relies on the assumption that LLMs can process and manifest human-like personality and communication styles if appropriately prompted and measured.
4.2. Core Methodology In-depth (Layer by Layer)
The methodology addresses three main research questions:
4.2.1. RQ1: Measurement of Persona Traits
This section investigates whether LLMs can infer stable personality and communication style traits from persona cards in an emotional support scenario.
4.2.1.1. Data Collection for Persona Cards
-
Source Datasets: The study first collects non-synthetic
ESCdata from three existing datasets:ESConv(multi-turn dialogue),CAMS(QA from Reddit), andDreaddit(QA from Reddit). -
Persona Extraction:
gpt-4o-miniis used to extractbasic personainformation (age, gender, occupation, sociodemographic description, and problem) from these datasets. This extraction focuses only on theseekerpersonas due to limited personal information aboutsupportersin the datasets. -
Persona Filtering: After initial extraction,
LLMsare prompted to filterpersonasto ensure they include individual emotions, experienced events, and a clear socio-demographic background. The prompt used for filtering is shown in Figure 14:
该图像是论文中图6的图表,展示了由claude-3.5-haiku模型生成对话中提取的人格特质(橙色)与原始人格(绿色)在HEXACO六维度上的得分分布对比。- Explanation: This prompt instructs the
LLMto act as anAIassistant to identifypersonasthat lack sufficient detail. It requires thepersonato have a clear socio-demographic background, a description of the emotional problem, and the relevant events experienced. If these criteria are not met, theLLMshould classify thepersonaas "unclear." This ensures the quality and richness of thepersonasused for trait inference.
- Explanation: This prompt instructs the
4.2.1.2. Trait Assessment using Psychological Inventories
-
Inventories Used:
HEXACO model(Ashton and Lee, 2009) for personality andCommunication Styles Inventory (CSI)(De Vries et al., 2013) for communication styles.- HEXACO: Measures Honesty-Humility, Emotionality, Extraversion, Agreeableness, Conscientiousness, and Openness to Experience.
- CSI: Measures Expressiveness, Preciseness, Verbal Aggressiveness, Questioningness, Emotionality, and Impression Manipulativeness.
-
LLM-based Trait Description Generation:
LLMsare prompted to generate descriptions for eachHEXACOandCSIdimension based on the extracted socio-demographic information. The prompt for generating trait descriptions is shown in Figure 12:
该图像是论文中图4的示意图,展示了研究LLM模拟情感支持对话中人格一致性的流程,涉及原始人格输入、人格模拟器完成清单、人格提取、对话合成及结果评分比较。- Explanation: This prompt instructs the
LLMto generate a short sentence (around 20 words) describing how a person, given theirsocio-demographic description, would typically rate on a specificHEXACOorCSIdimension (e.g.,Extraversion,Preciseness). This step operationalizes howLLMsinterpretpersonainformation into trait manifestations.
- Explanation: This prompt instructs the
-
LLM-based Inventory Completion:
LLMsare then prompted to predict answers to theHEXACOandCSIinventories using the generatedpersona cardand descriptions. The prompt for answering inventories is shown in Figure 13:
该图像是图表,展示了论文中图5所述的原始人格特质与由gpt-4o-mini生成对话提取的人格特质(HEXACO六因素模型)的对比分布情况,涵盖了诚实-谦逊、情绪性、外向性、宜人性、尽责性和经验开放性六个维度。- Explanation: This prompt guides the
LLMto act as anAIassistant, evaluating apersona cardand then answeringHEXACOorCSIinventory questions. For each question, theLLMmust provide a numerical rating from 1 to 5, indicating how well the statement describes the person in thepersona card. This process quantifies thepersona'straits.
- Explanation: This prompt guides the
-
Correlation Calculation: Based on the
LLM's responses to the inventories,HEXACOandCSIdimension scores are calculated for eachpersona. Then, Pearson correlation is used to analyze the relationships between each dimension ofHEXACOandCSI. The formula for the Pearson correlation coefficient () is: $ r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} $ Where:- : The total number of
personainstances for which bothHEXACOandCSIscores are obtained. - : The score for a specific
HEXACOdimension (e.g.,Extraversion). - : The score for a specific
CSIdimension (e.g.,Expressiveness). - : The sum of the products of corresponding
HEXACOandCSIdimension scores across allpersonas. - : The sum of all
HEXACOdimension scores. - : The sum of all
CSIdimension scores. - : The sum of the squares of all
HEXACOdimension scores. - : The sum of the squares of all
CSIdimension scores. This coefficient measures the linear relationship between the two trait dimensions.
- : The total number of
-
Expected Correlations: The paper references established psychological theory (De Vries et al., 2013) that posits strong correlations between specific
HEXACOandCSIdimensions:Extraversion< - >ExpressivenessConscientiousness< - >PrecisenessAgreeableness< - >Verbal Aggressiveness(expected to be an inverse correlation)Openness to Experience< - >QuestioningnessEmotionality< - >EmotionalityHonesty-Humility< - >Impression Manipulativeness(expected to be an inverse correlation)
-
LLMs Used:
gpt-4o-mini,Claude-3.5-Haiku, andLLaMA-3.1-8B-Instructare used withtemperatureset to 0 for stable results.
4.2.1.3. Process Diagram for Measurement of Persona Traits
The following figure (Figure 3 from the original paper) illustrates the overall process for measuring persona traits:
该图像是论文中图15的截取部分,展示了用于从Persona Hub扩展人物设定的文本提示内容,指导生成个人的社会人口学描述。
- Explanation: The diagram shows
personadata being extracted fromHEXACOandCSIinventories. These are then processed by anLLMto generate scores. Finally, the correlations between personality and communication style dimensions are calculated and evaluated against psychological theories. This confirms whetherLLMscan infer stablepersona traits.
4.2.2. RQ2: Persona Consistency in LLM-simulated Emotional Support Dialogues
This section evaluates whether persona traits remain consistent throughout the LLM-based dialogue generation process.
4.2.2.1. Experimental Setup for Consistency
-
Persona Selection: 1,000 randomly selected
personasfromPersonaHub(Chan et al., 2024) are used to ensure diverse characteristics. -
Persona Enhancement: These
personas, initially simple descriptions, are systematically enhanced usingLLMs. This involves adding socio-demographic details (age, gender, occupation) and specific trait-indicative statements aligned withHEXACOandCSIdimensions. The prompt for extendingpersonasis shown in Figure 15:
该图像是论文中图7的图表,展示了原始人格特质与从LLaMA-3.1-8B-Instruct生成的对话中提取的人格特质HEXACO得分的对比,体现了各维度上的得分分布差异。- Explanation: This prompt instructs the
LLMto act as apersonagenerator, taking a basicpersonadescription (e.g., "A fearless and highly trained athlete...") and elaborating it into a detailedsocio-demographic description. This ensures thepersonashave sufficient depth for trait assessment and dialogue generation.
- Explanation: This prompt instructs the
-
Quantification of Original Personas:
HEXACOandCSIdimension scores are generated for these enhancedpersonasusing the same methodology as described in Section 4.2.1. -
Dialogue Generation: The enriched
personasare used to generateemotional support dialogueswhere eachpersonaacts as aseekerin contextually relevant scenarios (e.g., an athlete discussing an injury). The prompt for generatingemotional support dialoguesis shown in Figure 17:
该图像是论文中第9图的示意图,展示了研究角色设定对LLM生成情感支持对话影响的流程。图中比较了带有角色设定和不带角色设定的对话生成及其对应策略分布差异。- Explanation: This prompt defines the role of the
AIassistant as aseekerand instructs it to engage in anemotional support conversationbased on a providedpersona(socio-demographic description,HEXACOscores,CSIscores, and problem statement). It also provides definitions ofemotional support strategiesfor thesupporterto use. TheLLMis asked to simulate theseeker's responses, making sure they align with thepersona.
- Explanation: This prompt defines the role of the
-
Persona Extraction from Generated Dialogues: After dialogue generation, the methodology from Section 4.2.1 (using
gpt-4o-minito extractpersona characteristics) is applied to the generated conversations. -
Score Calculation from Extracted Personas:
HEXACOandCSIscores are then calculated from these newly extractedpersonas. -
Consistency Assessment: These extracted scores are compared with the original scores assigned to the input
personasto assess theconsistencyof trait representation after the dialogue generation process.
4.2.2.2. Process Diagram for Persona Consistency
The following figure (Figure 4 from the original paper) illustrates the process for studying persona consistency in LLM-simulated ESC:
该图像是一段用于合成带有人格特征对话的提示文本,详细说明了模拟情绪支持对话的任务要求,包括利用HEXACO人格维度和CSI沟通风格得分指导响应策略,结合之前对话内容生成三天后的跟进对话。
- Explanation: The diagram shows that an
original personais fed into apersona simulatorto generateHEXACOandCSIscores. This persona is then used indialogue synthesisas aseeker. From thesynthesized dialogue, apersonaisextracted, and itsHEXACOandCSIscores are calculated. Finally, theextracted personascores arecomparedto theoriginal personascores to determine consistency.
4.2.2.3. Ablation Study for Persona Influence
- Objective: To understand how
LLM's inherentpersonality traitsinfluence dialogue generation by comparing dialogues generated with and without predefinedpersonas. - Method:
Implicit personasare extracted from both sets of conversations, and their correspondingpersonality scoresare calculated. - Visualization:
Principal Component Analysis (PCA)is used to project these scores into a 2D space for visualization.- Conceptual Definition:
PCAis a technique that transforms high-dimensional data into a lower-dimensional space while preserving the most important information (variance). It finds orthogonal components (principal components) that capture the maximum variance in the data. In this context, it takes multi-dimensional personality scores and reduces them to two dimensions, allowing for a visual comparison of their distributions. - Purpose: The comparison of distributions (e.g., concentrated vs. broad) reveals the impact of
persona injectionon the range ofpersonality traitsexpressed in the dialogues.
- Conceptual Definition:
4.2.3. RQ3: Impact of Persona on LLM-simulated Emotional Support Dialogues
This section investigates how infusing persona traits affects the distribution of emotional support strategies in LLM-simulated ESC.
4.2.3.1. Experimental Setup for Strategy Impact
-
Dialogue Continuation: The study uses
ESConv dialoguesas historical context.LLMsare then instructed to predict how a new conversation would unfold, generating two versions of continuations: onewith persona traitsand onewithout persona traits. -
Persona-Injected Dialogue Generation: The prompt for synthesizing dialogues with
persona traitsis shown in Figure 16:
该图像是一个图表,展示了未注入人格时对话中人格评分在二维空间的分布情况,使用圆点和三角形区分是否注入人格,表现出两者在分布上的差异。- Explanation: This prompt instructs the
LLMto act as aseekerand generate dialogue continuing from a provideddialogue history. TheLLMis explicitly given apersonaincludingsocio-demographic description,HEXACO scores,CSI scores, andproblem statement. TheLLMmust ensure its responses are consistent with thispersonaand continue the conversation as if it's three days later.
- Explanation: This prompt instructs the
-
Non-Persona Dialogue Generation: The prompt for synthesizing dialogues without
persona traitsis shown in Figure 18:
该图像是一个对比示例表,展示了带有人格标签与不带人格标签的情感支持对话内容,突出支持者在不同策略下的回复差异,内容通过颜色高亮区分情感支持、直接建议和引导反思。- Explanation: Similar to the persona-injected prompt, this prompt asks the
LLMto continue adialogue historyas aseeker. However, it explicitly states that theLLMshould not consider anypersona traits(personality or communication style) but should still focus on solving theseeker's problem and maintainingemotional support strategies. This serves as the control group for comparing strategy distribution.
- Explanation: Similar to the persona-injected prompt, this prompt asks the
-
Emotional Support Strategies: The study utilizes the
emotional support strategiesdefined in theESConvdataset (Liu et al., 2021). The definitions are provided in Appendix G of the paper:- Question: Asking open-ended or specific questions to help the
seekerarticulate issues and provide clarity. - Restatement or Paraphrasing: Concisely rephrasing the
seeker's statements to aid self-understanding. - Reflection of feelings: Expressing and clarifying the
seeker's emotions to acknowledge their feelings. - Self-disclosure: Sharing similar experiences or emotions to build empathy and connection.
- Affirmation and Reassurance: Affirming the
seeker's strengths, motivation, capabilities, and providing encouragement. - Providing Suggestions: Offering possible ways forward while respecting the
seeker's autonomy. - Information: Providing useful information to the
seeker. - Others: Exchange pleasantries or strategies beyond defined categories.
- Question: Asking open-ended or specific questions to help the
-
Strategy Distribution Analysis: The generated dialogues (both with and without
personas) are analyzed to quantify the frequency of eachemotional support strategyemployed by thesupporter. This comparison reveals howpersonasinfluence strategic choices.
4.2.3.2. Human Evaluation
- Objective: To intuitively reveal the impact of
personasonLLM-simulatedESCthrough subjective assessment. - Evaluators: 10 native English-speaking undergraduate students, experienced in annotation tasks, are recruited.
- Task: Evaluators compare 50 randomly selected groups of instances (dialogues generated with and without
personas). For each pair, they rate which dialogue (or if they are tied) performs better on five indicators. - Evaluation Indicators:
- Suggestion: How effectively the
supporterprovided helpful advice. - Consistency: Whether participants consistently maintained their roles and exhibited coherent behavior.
- Comforting: The
supporter's ability to provideemotional supportto theseeker. - Identification: Which
supporterdelved deeper into theseeker's situation and was more effective in identifying issues. - Overall: An assessment of the overall performance comparing the two dialogue groups.
- Suggestion: How effectively the
4.2.3.3. Case Study
A detailed example (Figure 10, Figure 22, Figure 23) is provided to qualitatively illustrate the differences in conversational dynamics and strategy usage when personas are injected. This highlights how persona-guided supporters might use rhetorical questions to encourage reflection and offer suggestions more tactfully, leading to deeper conversations.
The following figure (Figure 9 from the original paper) illustrates the overall process for studying the impact of persona on LLM-simulated ESC:
该图像是图表,展示了图20中基于claude-3.5-haiku生成对话提取的人格特质(CSI分数)与原始人格的对比,比较了多种人格维度上的得分分布差异。
- Explanation: The diagram starts with an
original dialoguefromESConv. This history is used to promptLLMsto generatedialogue continuationsunder two conditions:with persona(wherepersona traitsare injected) andwithout persona. The generated dialogues are then analyzed forstrategy distributionand subjected tohuman evaluationto assess theimpact of persona.
5. Experimental Setup
5.1. Datasets
The study utilizes several datasets for different phases of its investigation:
-
For Persona Extraction (RQ1):
- ESConv (Liu et al., 2021): A multi-turn
emotional support dialoguedataset. This dataset is crowdsourced and focuses on conversations where supporters provide emotional strategies. - CAMS (Garg et al., 2022): A
Question-Answering (QA)dataset derived from Reddit posts discussing mental health issues. - Dreaddit (Turcan and McKeown, 2019): Another
QAdataset from Reddit, focused on stress analysis in social media. The paper extractedseeker personasfrom these datasets. The detailed statistics of the extractedpersona cardsare summarized in Table 1. The following are the results from Table 1 of the original paper:
Dataset ESConv CAMS Dreaddit Data Type dialogue QA QA Num. of personas 1,155 1,140 730 Avg. words of desc. 57.38 66.42 56.68 Avg. words of prob. 33.92 32.02 27.75 Num. of age 901 1,014 459 Num. of gender 417 401 300 Num. of occupation 926 968 542 - Explanation: This table provides a quantitative overview of the
personasextracted from each dataset. It shows the number ofpersonas, the average word count for their descriptions and problems, and the count ofpersonasfor which age, gender, and occupation information were available. This indicates the scale and descriptive richness of thepersonadata used.
- ESConv (Liu et al., 2021): A multi-turn
-
For Persona Consistency (RQ2):
- PersonaHub (Chan et al., 2024): A large collection of synthetic
personas. 1,000 randomly selectedpersonasfromPersonaHubwere used. Thesepersonaswere initially simple descriptions and were further enhanced byLLMsto include socio-demographic details and trait-indicative statements.
- PersonaHub (Chan et al., 2024): A large collection of synthetic
-
For Impact of Persona on Strategies (RQ3):
- ESConv (Liu et al., 2021): Used as the source for
dialogue historyto ensure a realistic context for generatingfuture conversations. This allowsLLMsto continue existingemotional support scenarioswith and withoutpersona traits.
- ESConv (Liu et al., 2021): Used as the source for
5.2. Evaluation Metrics
The evaluation in this paper spans quantitative statistical measures and qualitative human assessments.
-
Pearson Correlation Coefficient () (for RQ1):
- Conceptual Definition: Quantifies the linear statistical relationship between two variables, specifically between
HEXACOpersonality dimensions andCSIcommunication style dimensions. A higher absolute value indicates a stronger linear relationship. - Mathematical Formula: $ r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} $
- Symbol Explanation:
- : Number of paired observations (number of
personas). - : Score of a
HEXACOdimension for apersona. - : Score of a
CSIdimension for the samepersona. - : Summation operator.
- : Sum of the products of each pair of
(x, y)scores. - : Sum of all scores.
- : Sum of all scores.
- : Sum of the squared scores.
- : Sum of the squared scores.
- : Number of paired observations (number of
- Conceptual Definition: Quantifies the linear statistical relationship between two variables, specifically between
-
Distribution Comparison (for RQ2):
- Conceptual Definition: Visual comparison using
violin plotsand2D PCA projectionsto assess how the distribution ofHEXACOandCSIscores changes between originalpersonasandpersonasextracted fromLLM-generated dialogues, as well as between dialogues generated with and withoutpersona injection. - Explanation: This metric visually highlights shifts in the mean, median, spread, and density of trait scores, indicating the stability or changes in
personarepresentation. For the ablation study,PCAhelps visualize ifpersona injectionleads to a broader or more concentrated range of manifestedpersonality traits.
- Conceptual Definition: Visual comparison using
-
Strategy Distribution (for RQ3):
- Conceptual Definition: Measures the percentage of turns or utterances where each predefined
emotional support strategyis utilized by thesupporterinLLM-generated dialogues. - Explanation: This quantitative measure reveals how the presence or absence of
persona traitsinfluences theLLM's choice ofemotional support strategies, showing if certain strategies become more or less prominent.
- Conceptual Definition: Measures the percentage of turns or utterances where each predefined
-
Human Evaluation Indicators (for RQ3):
- Conceptual Definition: Subjective ratings provided by human annotators comparing pairs of dialogues (with vs. without
persona) based on specific quality criteria. - Indicators:
- Suggestion: Evaluates the helpfulness and quality of advice given by the
supporter. - Consistency: Assesses whether the participants (seeker and supporter) maintain their roles and exhibit coherent behavior throughout the dialogue.
- Comforting: Measures the
supporter's effectiveness in providingemotional supportto theseeker. - Identification: Determines how well the
supporterunderstood theseeker's situation and identified underlying issues. - Overall: A holistic assessment of which dialogue (or if tied) performed better across all aspects.
- Suggestion: Evaluates the helpfulness and quality of advice given by the
- Explanation: These metrics capture the qualitative aspects of
dialogue qualityandempathy, which are crucial foremotional support conversationsand might not be fully captured by automated metrics. Annotators vote for "Win" (persona dialogue outperforms), "Loss" (non-persona dialogue outperforms), or "Tie".
- Conceptual Definition: Subjective ratings provided by human annotators comparing pairs of dialogues (with vs. without
5.3. Baselines
The study's experimental design involves comparisons rather than direct baselines in the traditional sense of competing models.
-
For RQ1 (Measurement of Persona Traits): The
LLMs' ability to inferpersona traitsis evaluated by comparing the calculatedPearson correlationsbetweenHEXACOandCSIdimensions against established psychological theory (De Vries et al., 2013). The "baseline" here is human psychological understanding of these trait relationships. The performance of differentLLMs(gpt-4o-mini,Claude-3.5-Haiku,LLaMA-3.1-8B-Instruct) is compared against each other in this task. -
For RQ2 (Persona Consistency): The consistency is assessed by comparing the
HEXACOandCSIscores of original input personas against the scores ofpersonas extracted from LLM-generated dialogues. The "baseline" is the original persona itself. The ablation study within this section uses dialogues generated without persona injection as a baseline to understand the inherent traits of theLLMsand the impact of explicitpersonaguidance. -
For RQ3 (Impact of Persona on Strategies): The impact is measured by comparing
dialogues generated with persona traitsagainstdialogues generated without persona traits. The "baseline" is theLLM's output when no explicitpersonainformation is provided, allowing for an isolated assessment of thepersona's influence onstrategy distributionanddialogue quality.The
LLMsthemselves (gpt-4o-mini,Claude-3.5-Haiku,LLaMA-3.1-8B-Instruct) are the agents generating the dialogues and making the assessments, with their performance across these tasks being a key part of the analysis.
6. Results & Analysis
6.1. Core Results Analysis
6.1.1. RQ1: LLMs Can Infer Core Persona Traits
The experimental results demonstrate that LLMs can infer stable persona traits from persona cards in emotional support scenarios, though with varying degrees of accuracy across different models.
The following are the results from Table 2 of the original paper:
| Expr. Prec. Verb. Ques. | Emot. | Impr. | ||||
| Extr.Cons. | .54 | .15 | -.21 | .36 | -.39 | .04 |
| .34 | .34 | -.11 | .16 | -.36 | .02 | |
| Agre. | .21 | .15 | -.39 | .12 | -.19 | -.05 |
| Open. | .41 | .25 | -.23 | .47 | -.09 | .09 |
| Emot. | -.32 | -.11 | -.04 | -.21 | .45 | -.10 |
| Hone. | -.24 | -.01 | -.17 | -.27 | .05 | -.18 |
-
Explanation: This table displays the Pearson correlations between
HEXACOdimensions (rows) andCSIdimensions (columns) as measured bygpt-4o-minion theESConvdataset. All reported P-values are less than 0.01, indicating statistical significance. A key observation is the strong positive correlation (0.54) betweenExtraversionandExpressiveness, and a positive correlation (0.47) betweenOpennessandQuestioningness. There's also a clear positive correlation (0.45) betweenEmotionality(HEXACO) andEmotionality(CSI), and a negative correlation (-0.39) betweenAgreeablenessandVerbal Aggressiveness. These align well with established psychological theories.The following are the results from Table 3 of the original paper:
Expr. Prec. Verb. Ques. Emot. Impr. Extr. .63 .01 -.21 .50 .07 -.11 Cons. .15 .48 -.22 .04 -.18 -.19 Agre. .13 .14 -.59 -.02 .06 -.39 Open. .39 .01 -.28 .46 .21 -.11 Emot. -.24 -.18 .00 -.22 .32 -.06 Hone. .06 .28 -.42 .00 .05 -.37 -
Explanation: This table shows the correlations measured by
Claude-3.5-Haiku. While it also shows a strong correlation betweenExtraversionandExpressiveness(0.63), andOpennessandQuestioningness(0.46), it exhibits some discrepancies. For example,Questioningnessis incorrectly linked withConscientiousness(0.04 correlation, which is very weak) andExtraversionhas a stronger link (0.50), suggesting a misinterpretation compared to the theoretical expectation thatOpennesscorrelates withQuestioningness.The following are the results from Table 4 of the original paper:
Expr. Prec. Verb. Ques. Emot. Impr. Extr. .28 .26 -.21 .15 -.33 -.11 Cons. .13 .55 -.16 -.01 -.42 -.04 Agre. -.02 -.13 -.19 .03 .10 -.05 Open. .07 -.12 -.15 .32 .08 -.10 Emot. -.18 -.32 0.06 -.02 .48 .01 Hone. -.08 -.11 -.20 .06 .07 -.13 -
Explanation: This table presents correlations from
LLaMA-3.1-8B-Instruct. This model shows stronger inconsistencies. For example, it incorrectly associatesverbal aggressiveness(weakly negative, -0.21) withextraversionandconscientiousness, andpreciseness(0.55) withconscientiousness. The correlation betweenExtraversionandExpressiveness(0.28) is weaker compared togpt-4o-mini. This suggests thatLLaMA-3.1-8B-Instructhas more limitations in accurately interpreting certainpersona traits.Overall,
gpt-4o-miniconsistently exhibits the strongest correlations aligning with psychological theory, indicating its superior ability to infer stablepersona traits. Similar findings forCAMSandDreadditdatasets are mentioned in Appendix C, reinforcing these observations across different data sources.
6.1.2. RQ2: Persona Consistency in LLM-simulated Emotional Support Dialogues
The analysis of persona consistency reveals that while core persona traits are largely maintained during LLM-based dialogue generation, subtle shifts occur, particularly in emotionality and extraversion.
The following figure (Figure 5 from the original paper) displays a comparison of HEXACO scores:
该图像是图17,展示了基于给定人物设定生成情感支持对话的提示词示意框,包含对话模拟指令、情感支持策略定义及社会人口学描述模板。
-
Explanation: This violin plot compares the distribution of
HEXACOscores fororiginal personas(green) andpersonas extracted from dialoguesgenerated bygpt-4o-mini(orange). It shows that Honesty-Humility, Agreeableness, Conscientiousness, and Openness largely maintain similar distributions. However,Emotionalitytends to be higher in extractedpersonas, andExtraversiontends to be lower, indicating a shift in these traits during the dialogue simulation. This suggests that the context of seeking emotional support influences how these traits manifest.The following figure (Figure 6 from the original paper) displays a comparison of
HEXACOscores:
该图像是图18,展示了用于合成无角色特征对话的提示模板,内容涉及情景设定、对支持者回复的策略要求及对话历史格式。 -
Explanation: This violin plot shows the
HEXACOscore comparison for dialogues generated byClaude-3.5-Haiku. Similar togpt-4o-mini,Emotionalityis generally higher, andExtraversionis lower in the extractedpersonascompared to the originals, while other traits remain relatively consistent.The following figure (Figure 7 from the original paper) displays a comparison of
HEXACOscores:
该图像是一个小提琴图,展示了原始人格特质与提取出的人格特质在情感支持对话中的分布对比,涉及表达性、准确性、言语攻击性、质疑性、情感性和印象操控性等维度。 -
Explanation: This violin plot illustrates the
HEXACOscore comparison forLLaMA-3.1-8B-Instruct. The trend of increasedEmotionalityand decreasedExtraversionin extractedpersonasis also observed here, confirming the pattern across differentLLMs.These findings indicate that
LLM-simulatedseekerstend to exhibit moreemotionalityand lowerextraversionpost-generation. This is interpreted as a natural consequence of theseeker's role inemotional support dialogues, where they are actively dealing with emotional issues and are less likely to be outgoing. A similar pattern is observed forCSIscores (Tables 19, 20, and 21 in Appendix E), supporting these conclusions aboutpersona consistencywith specific shifts.
6.1.3. Ablation Study
The following figure (Figure 8 from the original paper) shows the distribution of personality scores:
该图像是论文中图19的图表,展示了原始人格特质与从gpt-4o-mini生成对话中提取的人格特质CSI评分的对比。图中通过小提琴图展示了七种人格特质的评分分布与差异。
- Explanation: This 2D
PCAprojection visualizes the distribution ofpersonality scoresobtained from dialogues. Thered circlesrepresent dialogues generated without persona injection, showing a moreconcentrated distribution. Theblue trianglesrepresent dialogues generated with persona injection, showing abroader rangeof personality traits. This clearly demonstrates that explicitpersona injectionguides and shapes the dialogue generation process, leading to a wider diversity of expressed personalities compared to theLLM's inherent default personality.
6.1.4. RQ3: Impact of Persona on LLM-simulated Emotional Support Dialogues
The application of persona traits significantly modifies the distribution of emotional support strategies used by LLM-simulated supporters.
The following are the results from Table 5 of the original paper:
| Strategy | w/PT | w/o PT |
| question | 27.23% | 16.45% |
| restatement or paraph. | 3.61% | 10.57% |
| reflection of feelings | 11.75% | 11.33% |
| self-disclosure | 2.64% | 10.64% |
| affirmation and reass. | 29.72% | 21.06% |
| providing suggestions | 16.92% | 14.25% |
| information | 0.78% | 4.85% |
| others | 8.45% | 10.88% |
-
Explanation: This table compares
strategy distributionforgpt-4o-mini. Dialogues generatedwith persona traits (w/PT)show a notably higher percentage ofquestioning(27.23% vs. 16.45%) andaffirmation and reassurance(29.72% vs. 21.06%). Conversely, dialogueswithout persona traits (w/o PT)have higher rates ofrestatement or paraphrasing,self-disclosure, andinformation. This suggestspersonasencourage more proactive and empathetic strategies.The following are the results from Table 6 of the original paper:
Strategy w/PT w/o PT question 33.10% 27.31% restatement or paraph. 0.69% 0.96% reflection of feelings 21.19% 18.15% self-disclosure 2.22% 13.97% affirmation and reass. 19.32% 17.41% providing suggestions 19.01% 15.41% information 4.45% 6.77% others 0.02% 0.02% -
Explanation: This table presents
strategy distributionforClaude-3.5-Haiku. Similar togpt-4o-mini,w/PTdialogues show a higher percentage ofquestioning(33.10% vs. 27.31%) andreflection of feelings(21.19% vs. 18.15%).Self-disclosure(2.22% vs. 13.97%) is significantly reduced with personas.The following are the results from Table 7 of the original paper:
Strategy w/PT w/o PT question 12.70% 12.34% restatement or paraph. 6.51% 7.56% reflection of feelings 18.44% 18.06% self-disclosure 7.69% 9.86% affirmation and reass. 21.42% 18.91% providing suggestions 13.33% 13.40% information 2.48% 4.25% others 17.43% 15.62% -
Explanation: This table shows
strategy distributionforLLaMA-3.1-8B-Instruct. While the differences are less pronounced compared to the otherLLMs,w/PTdialogues still exhibit slightly higheraffirmation and reassuranceand lowerself-disclosureandinformation, consistent with the overall trend.The following are the results from Table 8 of the original paper:
Strategy HEXACO CSI question 27.83% 27.23% restatement or paraph. 3.72% 3.61% reflection of feelings 12.48% 11.75% self-disclosure 3.41% 2.64% affirmation and reass. 28.96% 29.72% providing suggestions 16.44% 16.92% information 0.50% 0.78% others 6.66% 8.45% -
Explanation: This table compares
strategy distributionwhenpersonasare defined usingHEXACO scoresversusCSI scores. The distributions are remarkably similar, reinforcing the strong correlation between these two trait measures as demonstrated in RQ1. This implies that definingpersonasthrough either personality or communication style scores leads to consistent effects onstrategy usage.The analysis indicates that
persona-enhanced dialogues leadsupportersto engage in morequestioningand provide moreaffirmation and reassurance, focusing on deeper understanding. Conversely,supporterswithoutpersona traitstend to explain problems more and rely onself-disclosure, which prior research suggests can be less effective inemotional supportif not carefully managed.
6.1.5. Human Evaluation
The following are the results from Table 9 of the original paper:
| w/ vs. w/o PT | Win | Tie | Loss |
| Suggestion | 38% | 27% | 35% |
| Consistency | 27% | 54% | 19% |
| Comforting | 38% | 28% | 34% |
| Identification | 37% | 30% | 33% |
| Overall | 39% | 27% | 34% |
- Explanation: This table summarizes the results of human evaluation, comparing dialogues generated
with persona traits (w/PT)against thosewithout persona traits (w/o PT). "Win" indicates thatw/PTdialogues outperformed, "Loss" indicatesw/o PTdialogues outperformed, and "Tie" indicates similar performance.w/PTdialogues were perceived as better inSuggestion(38% Win),Comforting(38% Win),Identification(37% Win), andOverall(39% Win).Consistencyshowed the highest number of ties (54%), suggesting both groups performed similarly in maintaining roles. These results align with the strategy distribution analysis, indicating thatpersona-drivenLLMsproduce subjectively betteremotional support conversations.
6.1.6. Case Study
A case study (Figures 10, 22, and 23) qualitatively illustrates the impact of personas.
The following figure (Figure 10 from the original paper) shows a case study comparing dialogue segments:
该图像是论文中的示意图,展示了一个人物卡片的例子,描述了一个23岁男性的社会人口统计信息及其心理健康问题,文字内容包括年龄、性别、职业及其所遭遇的心理困境。
-
Explanation: This figure visually compares a segment of a dialogue generated
with persona(left) andwithout persona(right). The blue, green, and yellow highlights indicate differentemotional support strategies.Blueis direct emotional support,Greenis direct suggestions, andYellowis suggestions through rhetorical questions or guiding reflection. Thepersona-driven dialogue shows a greater tendency to userhetorical questions(yellow) to encourageseekerreflection and offers suggestions more tactfully. In contrast, the dialogue generatedwithout personarelies more ondirect affirmationsorsuggestions(blue/green). This supports the finding thatpersonasenhance the depth and empathetic quality of conversations by encouraging more nuancedstrategy usage.The following figure (Figure 22 from the original paper) displays the persona card used in the case study:
Persona Card Age: teenage Gender: unknown Occupation: Student Socio-demographic description: The person is a teenager who is currently a student. They are experiencing the challenges of remote learning due to the CoVID- 19 pandemic, which has led to feelings of loneliness and isolation. The person previously had a supportive social circle but has lost that connection during the iockdown. They live with a roommate who is preoccupied with her boyfriend, further contributing to their feelings of being alone. Problem: The person is struggling with feelings of loneliness and isolation due to the lack of social interaction during the pandemic. They are contemplating quitting school because of these feelings but are also concerned about their parents' reactions. They are seeking ways to reconnect with friends and manage their -
Explanation: This table shows the detailed
persona cardfor theseekerin the case study. It includes age, gender, occupation, a socio-demographic description highlighting loneliness due to COVID-19 remote learning, and the specific problem of social isolation and contemplating quitting school. This persona provides the context for theemotional support conversation.The following figure (Figure 23 from the original paper) shows the historical dialogue for the case study:

-
Explanation: This image presents the full historical dialogue from
ESConvthat serves as the context for thecase study. The conversation depicts aseekerstruggling with loneliness and social isolation during a lockdown, discussing feelings of wanting to quit school. Thesupporterattempts to provide variousemotional support strategies, including suggesting clubs, video calls, and reflecting on theseeker's feelings. This history is then used as input forLLMsto generate continuations, both with and without the specificpersonafrom Figure 22.The use of
rhetorical questionsbypersona-guidedsupportersis highlighted as a key difference, making suggestions more acceptable and fostering deeper, more meaningful conversations, especially foremotional support, which is often a "weak argument" scenario (Petty and Cacioppo, 2012).
6.2. Data Presentation (Tables)
6.2.1. Dialogue Generation Statistics
The following are the results from Table 16 of the original paper:
| w/ persona | w/o personas | |
| Total Words | 218,433 | 232,674 |
| Total Turns | 10,398 | 12,666 |
| Avg Words (Total) | 21.01 | 18.37 |
| Seeker Words | 91,590 | 94,286 |
| Seeker Turns | 5,199 | 6,323 |
| Avg Words (Seeker) | 17.62 | 14.91 |
| Supporter Words | 126,843 | 138,388 |
| Supporter Turns | 5,199 | 6,343 |
| Avg Words (Supporter) | 24.40 | 21.82 |
- Explanation: This table provides statistics on the generated dialogues. Dialogues generated
with personashave fewerTotal Turns(10,398 vs. 12,666) but a higherAvg Words (Total)per turn (21.01 vs. 18.37). This pattern holds for bothseekerandsupporterturns individually. This quantitative finding supports the qualitative observation thatpersona-guided conversations, while potentially shorter in terms of turns, are more efficient and in-depth, leading to longer, more substantive responses per turn.
6.3. Ablation Studies / Parameter Analysis
The ablation study discussed in Section 5.3 (and presented visually in Figure 8) specifically investigates the impact of persona injection versus no persona injection. This is a crucial component of the research, as it directly addresses whether personas actively shape LLM outputs or if LLMs inherently produce a similar range of behaviors.
Results of Ablation Study:
The PCA projection in Figure 8 shows a clear distinction:
- Dialogues generated
without persona injectionexhibit amore concentrated distributionofpersonality scoresin the 2D space. This suggests that without explicitpersonaguidance,LLMstend to converge on a narrower, perhaps "default" or "average," set ofpersonality traits. - Dialogues generated
with predefined personascover abroader range of personality traits. This indicates that the externally providedpersonassuccessfully diversify thepersonality traitsmanifested in the generated dialogues.
Analysis:
This ablation study confirms that persona injection is not merely redundant but actively influences the LLM's personality expression. It demonstrates that LLMs possess an inherent personality (or a default operating mode) that is modified and expanded when specific persona traits are provided. This finding is critical because it validates the effectiveness of using personas to guide LLMs towards more varied and contextually appropriate human-like interactions, thereby addressing the lack of human intuition in generative AI annotations. It implies that persona information acts as a steering signal, moving the LLM away from its intrinsic personality manifold to a desired persona-specific space.
7. Conclusion & Reflections
7.1. Conclusion Summary
This analytical study rigorously investigated the impact of incorporating personas into Large Language Model (LLM)-generated emotional support conversations (ESC). The key findings demonstrate the significant potential of persona-driven LLMs in enhancing the effectiveness and human-likeness of AI-powered emotional support systems.
Specifically, the research confirmed three notable points:
-
LLMs' Capacity to Infer Traits:
LLMs, particularlygpt-4o-mini, can reliably infer stablepersona traitssuch aspersonalityandcommunication stylesfrom textual descriptions, aligning with established psychological theories. -
Persona Consistency with Contextual Shifts: While
persona traitsare generally maintained during dialogue generation,LLM-simulatedseekersexhibit subtle, context-driven shifts towards higheremotionalityand lowerextraversion. These shifts reflect the natural dynamics of engaging inemotional support conversations. -
Influence on Strategy Distribution: The application of
persona traitssignificantly alters the distribution ofemotional support strategiesemployed byLLMs.Persona-guidedsupporterstend to utilize morequestioning,affirmation, andreassurancetactfully, leading to more relevant and empathetic responses compared to dialogues generated withoutpersonas. Human evaluations further corroborated thatpersona-enhanced dialogues are perceived as more comforting and effective.These conclusions highlight that
personasare a powerful mechanism for crafting more personalized, empathetic, and effectiveemotional support dialoguesby guidingLLMsto adopt appropriate communication styles and strategic approaches.
7.2. Limitations & Future Work
The authors acknowledge several limitations and propose directions for future research:
- LLM Output Bias: The reliance on
LLMoutputs for bothpersona extractionanddialogue simulationintroduces potential biases inherited from the models' training data. These biases might affect the accuracy of results compared to real human interactions.- Future Work: Investigate the impact of inherent biases in
LLMsonpersona extractionanddialogue simulation, and develop methods to mitigate them.
- Future Work: Investigate the impact of inherent biases in
- Accuracy of Persona Capture: The current approach for extracting
personasusingLLMs, while aligned with recent research, may still have limitations in accurately capturing and representing the complexity of human traits.- Future Work: Explore more sophisticated methods for
persona extractionand representation to enhance fidelity to real human characteristics.
- Future Work: Explore more sophisticated methods for
- Omniscient Perspective: The current data generation process assumes an
omniscient perspective, where bothseekersandsupportershave complete information. This does not fully reflect real-world conversational dynamics where information is often asymmetric or revealed gradually.- Future Work: Improve realism by simulating
emotional support scenarioswith distinct information states for each role, allowing for more dynamic and realistic information exchange.
- Future Work: Improve realism by simulating
7.3. Personal Insights & Critique
This paper offers valuable insights into the burgeoning field of AI-driven emotional support. The rigorous application of psychological frameworks (HEXACO and CSI) to quantify LLM behavior is a significant strength, moving beyond subjective assessments to provide empirical evidence for the impact of personas. The finding that gpt-4o-mini aligns well with psychological theory in trait inference suggests a promising future for building AI agents that can genuinely understand and manifest nuanced human characteristics.
The observed shifts in emotionality and extraversion in LLM-simulated seekers are particularly insightful. This indicates that LLMs are not merely static persona replicators but can adapt trait manifestations to the conversational context, reflecting a deeper level of simulation. This dynamic behavior is crucial for realistic and effective emotional support. The detailed analysis of strategy distribution and the human evaluation further reinforce the practical utility of persona injection, demonstrating that it leads to qualitatively better and more empathetic interactions.
Potential Issues & Critique:
- Ethical Concerns: The authors appropriately highlight the ethical implications. Making
LLMsseem more human-like throughpersonascarries risks, including userdependencyon chatbots instead of seeking professional help, potential foremotional manipulation, and the inherentsocietal biasesinLLMtraining data. While the paper emphasizes academic use, the transition to real-world applications requires robustsafeguards, transparent disclosure to users that they are interacting with anAI, and clear pathways to human assistance for severe distress. The subtle shifts inemotionalitycould be a double-edged sword: while makingseekersmore realistic, it could also makesupporters' responses less objectively rational if not carefully controlled. - Generalizability of Persona Effects: While the study shows significant impacts on
strategy distribution, it would be interesting to explore if certainpersona traits(e.g., highHonesty-Humilityor lowVerbal Aggressiveness) lead to more universally positiveESCoutcomes, or if the "optimal"strategy distributionis highly context-dependent. - Complexity of Real-world Personas: The
personasused, though enhanced, might still be simpler than the multifaceted and evolvingpersonasof real individuals. Future work could explore howLLMshandle dynamicpersonaevolution or conflictingpersonaelements.
Broader Applicability & Future Value:
The methods and conclusions of this paper can be applied to various domains beyond emotional support. For instance:
-
Customer Service Bots:
Persona-drivenLLMscould create more patient, understanding, or assertive customer service agents tailored to specific customer segments or complaint types. -
Education:
AItutors could adoptpersonasthat match student learning styles or provide encouragement in a way that resonates with individual learners. -
Healthcare:
AIassistants could communicate medical information with appropriate levels of empathy and clarity based on a patient's personality and emotional state. -
Gaming and Entertainment: More believable and engaging non-player characters (
NPCs) in games, or interactive storytelling agents, could be developed with consistent and expressivepersonas.This research is a crucial step towards developing
AIsystems that are not just intelligent, but also emotionally intelligent and contextually aware, laying foundational work for truly empathetic and personalized human-AIinteraction.
Similar papers
Recommended via semantic vector search.