Paper status: completed

From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations

Published:02/17/2025
Original LinkPDF
Price: 0.100000
Price: 0.100000
Price: 0.100000
3 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This study integrates persona traits into LLMs to analyze their stability and impact on emotional support dialogue quality and strategy, enhancing personalization and empathy in AI-driven emotional support conversations.

Abstract

The rapid advancement of Large Language Models (LLMs) has revolutionized the generation of emotional support conversations (ESC), offering scalable solutions with reduced costs and enhanced data privacy. This paper explores the role of personas in the creation of ESC by LLMs. Our research utilizes established psychological frameworks to measure and infuse persona traits into LLMs, which then generate dialogues in the emotional support scenario. We conduct extensive evaluations to understand the stability of persona traits in dialogues, examining shifts in traits post-generation and their impact on dialogue quality and strategy distribution. Experimental results reveal several notable findings: 1) LLMs can infer core persona traits, 2) subtle shifts in emotionality and extraversion occur, influencing the dialogue dynamics, and 3) the application of persona traits modifies the distribution of emotional support strategies, enhancing the relevance and empathetic quality of the responses. These findings highlight the potential of persona-driven LLMs in crafting more personalized, empathetic, and effective emotional support dialogues, which has significant implications for the future design of AI-driven emotional support systems.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations

1.2. Authors

  • Shenghan Wu (National University of Singapore)
  • Yimo Zhu (National University of Singapore)
  • Wynne Hsu (National University of Singapore)
  • Mong-Li Lee (National University of Singapore)
  • Yang Deng (Singapore Management University)

1.3. Journal/Conference

This paper is published as a preprint on arXiv (arxiv.org). arXiv is a free open-access archive for scholarly articles in various fields, including computer science, mathematics, and physics. While it hosts preprints that have not yet undergone peer review, many papers later appear in reputable conferences or journals. Its reputation lies in its role as a platform for rapid dissemination of research findings.

1.4. Publication Year

2025 (Published at UTC: 2025-02-17T05:24:30.000Z)

1.5. Abstract

The paper investigates the role of personas in generating emotional support conversations (ESC) using Large Language Models (LLMs). It employs established psychological frameworks to measure and infuse persona traits into LLMs for dialogue generation in emotional support scenarios. Extensive evaluations are conducted to assess the stability of persona traits in generated dialogues, trait shifts post-generation, and their impact on dialogue quality and emotional support strategy distribution. Key findings include: 1) LLMs can infer core persona traits, 2) emotionality and extraversion traits exhibit subtle shifts, influencing dialogue dynamics, and 3) persona application modifies emotional support strategy distribution, enhancing relevance and empathetic quality. These findings underscore the potential of persona-driven LLMs in creating more personalized, empathetic, and effective emotional support dialogues, with significant implications for future AI-driven emotional support systems.

Official Source: https://arxiv.org/abs/2502.11451 PDF Link: https://arxiv.org/pdf/2502.11451v2.pdf Publication Status: Preprint (on arXiv).

2. Executive Summary

2.1. Background & Motivation

The rapid advancement of Large Language Models (LLMs) has significantly impacted the generation of emotional support conversations (ESC), offering solutions that are scalable, cost-effective, and enhance data privacy. However, a crucial challenge identified in generative AI applications, particularly in ESC, is the lack of human intuition. Effective emotional support requires considering individual differences, such as personality traits, emotional states, and contextual factors. While prior research has made strides in measuring the personality characteristics of LLMs using psychological inventories, there remains a significant gap in understanding how these persona-related aspects truly influence the generation of emotional support dialogues.

The core problem this paper addresses is to systematically investigate the relationship between LLM-generated emotional support dialogues and persona traits through psychological measurement. This is important because existing ESC generation methods often lack the nuanced personalization and empathy that human interactions provide. Previous methods for creating ESC corpora (e.g., skilled crowdsourcing, therapist session transcription, online Q&A) face limitations like high costs, privacy concerns, and data quality variability. LLMs offer a promising alternative for large-scale data generation, but their outputs can lack human-like individuality. The paper's entry point is to bridge this gap by infusing personas into LLM-driven ESC generation and rigorously evaluating their impact using established psychological frameworks.

2.2. Main Contributions / Findings

The paper makes several primary contributions and key findings that address the aforementioned problems:

  1. Ability to Infer Core Persona Traits: The research demonstrates that LLMs can infer stable persona traits (like personalities and communication styles) from given personas in emotional support scenarios. Specifically, gpt-4o-mini showed strong correlations between personality and communication styles that align with established psychological theories. This finding validates the foundational premise that LLMs can understand and internalize persona descriptions.

  2. Persona Consistency with Subtle Shifts: Dialogues generated by LLMs largely retain the original persona traits, but subtle shifts occur, particularly in emotionality (tendency to become higher) and extraversion (tendency to become lower) for LLM-simulated seekers. These shifts are attributed to the seeker's role in an emotional support dialogue, where they are dealing with emotional issues. This finding highlights the dynamic nature of persona manifestation in interactive contexts.

  3. Impact on Emotional Support Strategy Distribution: Infusing persona traits into the LLM-generated emotional support dialogues significantly modifies the distribution of emotional support strategies. Persona-enhanced dialogues lead supporters to focus more on understanding the seeker's problems through questioning and to offer reassurance and encouragement more gently. Conversely, dialogues without personas showed a higher reliance on self-disclosure and direct problem explanation. This enhancement contributes to more relevant and empathetic responses.

    These findings collectively address the challenge of infusing human-like intuition and personalization into AI-driven emotional support systems. By showing that personas can be effectively inferred, maintained, and used to shape conversational strategies, the paper paves the way for designing AI models that provide more adaptive, empathetic, and effective emotional support.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To fully understand this paper, a reader should be familiar with the following core concepts:

  • Emotional Support Conversations (ESC): These are dialogues where one participant, the "supporter," aims to help another participant, the "seeker," to alleviate stress, address emotional difficulties, and promote mental well-being. This often involves active listening, empathy, reassurance, and guidance.

  • Large Language Models (LLMs): These are advanced artificial intelligence models trained on vast amounts of text data. They are capable of understanding, generating, and processing human language for a wide range of tasks, including text generation, translation, summarization, and conversation. Examples include GPT series (like gpt-4o-mini), Claude series (like Claude-3.5-Haiku), and LLaMA series (like LLaMA-3.1-8B-Instruct).

  • Personas: In the context of AI and user experience, a persona is a fictional character created to represent a user type or a specific individual with distinct characteristics. In this paper, personas describe individuals with their age, gender, occupation, socio-demographic background, emotional state, and problems, along with their personality traits and communication styles. The goal is to make AI interactions more human-like and personalized.

  • Psychological Frameworks (HEXACO and CSI): These are standardized tools used in psychology to measure human personality and communication styles.

    • HEXACO Model: This is a model of human personality structure that organizes personality traits into six broad dimensions or factors. The acronym HEXACO stands for:
      1. Honesty-Humility: Sincerity, fairness, greed-avoidance, modesty.
      2. Emotionality: Fearfulness, anxiety, dependence, sentimentality.
      3. Extraversion: Social self-esteem, social boldness, sociability, liveliness.
      4. Agreeableness: Forgiveness, gentleness, flexibility, patience.
      5. Conscientiousness: Organization, diligence, perfectionism, prudence.
      6. Openness to Experience: Aesthetic appreciation, inquisitiveness, creativity, unconventionality.
    • Communication Styles Inventory (CSI): This framework describes different patterns of verbal and nonverbal behaviors individuals use when interacting. The paper refers to a six-dimensional model:
      1. Expressiveness: The degree to which a person outwardly displays emotions and feelings.
      2. Preciseness: The tendency to be clear, specific, and accurate in communication.
      3. Verbal Aggressiveness: The extent to which a person uses assertive or combative language.
      4. Questioningness: The propensity to ask questions and seek clarification.
      5. Emotionality: Similar to the HEXACO dimension, this refers to the degree of emotional content and expression in communication.
      6. Impression Manipulativeness: The tendency to manage or control the impression one makes on others.
  • Pearson Correlation Coefficient (rr): A statistical measure that quantifies the linear relationship between two sets of data. It ranges from -1 to +1, where +1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship. The formula for the Pearson correlation coefficient is: $ r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} $ Where:

    • nn: The number of pairs of data points (e.g., the number of personas for which HEXACO and CSI scores are calculated).
    • xy\sum xy: The sum of the products of the corresponding xx and yy values.
    • x\sum x: The sum of all xx values.
    • y\sum y: The sum of all yy values.
    • x2\sum x^2: The sum of the squares of all xx values.
    • y2\sum y^2: The sum of the squares of all yy values.
  • Principal Component Analysis (PCA): A statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. PCA is often used for dimensionality reduction, helping to visualize high-dimensional data in 2D or 3D space while retaining as much variance as possible. In this paper, it's used to project personality scores into a 2D space for visualization.

3.2. Previous Works

The paper contextualizes its work by referencing several prior studies, especially concerning emotional support dialogues and the integration of personas and LLMs.

  • Traditional ESC Corpora Development: Early efforts focused on collecting emotional question-answer data from online platforms (Medeiros and Bosse, 2018; Sharma et al., 2020b; Turcan and McKeown, 2019; Garg et al., 2022). These provided insights into user emotions but were limited to single-turn interactions.

    • Empathetic Dialogue dataset (Rashkin et al., 2019) introduced multi-turn dialogues through crowdsourcing.
    • ESConv (Liu et al., 2021) further advanced this by incorporating emotional support strategies from psychological theories, enabling chatbots to use these strategies. ESConv is a key dataset used in this paper for history generation and strategy analysis.
    • CAMS (Garg et al., 2022) and Dreaddit (Turcan and McKeown, 2019) are QA datasets derived from Reddit posts about mental health, also used in this paper for persona extraction.
  • LLM-based ESC Generation: Recent studies leverage the power of LLMs to generate large-scale emotional support dialogue datasets through role-playing, offering advantages like lower costs (Zheng et al., 2023, 2024; Qiu and Lan, 2024; Wu et al., 2024). The ExTES dataset (Zheng et al., 2024) specifically uses LLMs to create dialogues with emotional support strategies.

  • Persona in Emotional Support: Recognizing the need for personalization, research has started integrating personas into ESC.

    • ESC dataset (Zhang et al., 2024) introduced personas into the dialogue generation process.
    • Zhao et al. (2024) proposed a framework to extract personas from existing datasets for evaluation.
    • Personas have also been incorporated into chatbots for generating personalized responses (Tu et al., 2023; Ait Baha et al., 2023; Ma et al., 2024).
  • Psychological Analysis of LLMs: The importance of psychological perspectives in AI has been emphasized.

    • Huang et al. (2023b) argue for psychological analysis of LLMs to create more human-like interactions.
    • Studies have made progress in measuring LLM personality using established psychological inventories (Frisch and Giulianelli, 2024; Safdari et al., 2023).

3.3. Technological Evolution

The evolution of emotional support systems has progressed from simple question-answer systems to multi-turn dialogue models, and more recently, to systems that can incorporate sophisticated emotional support strategies. With the advent of LLMs, there's been a shift towards using these powerful models for ESC generation due to their high-quality output and scalability. However, a key evolutionary step, which this paper contributes to, is moving beyond generic LLM generation to persona-driven ESC. This involves understanding and leveraging individual differences, personality traits, and communication styles, as inspired by psychological theories, to create truly personalized and empathetic AI assistants. This paper places its work at the intersection of advanced LLM capabilities and psychological theory, aiming to make AI dialogue generation more aligned with human expectations of nuanced emotional support.

3.4. Differentiation Analysis

Compared to the main methods in related work, this paper's core differences and innovations lie in its rigorous, psychologically-grounded approach to understanding the impact of personas on LLM-generated emotional support conversations.

  • Beyond Generation to Deep Analysis: While previous works have used LLMs to generate ESC or incorporated personas into chatbots for personalization, this paper goes a step further by systematically investigating how persona traits are inferred, how consistently they are maintained during generation, and how they specifically influence the distribution of emotional support strategies. This moves beyond simply generating personalized dialogue to deeply analyzing the underlying mechanisms and effects.

  • Psychological Measurement Integration: A significant innovation is the direct application of established psychological inventories (HEXACO for personality and CSI for communication style) to measure LLMs' ability to infer and maintain persona traits. This provides a quantitative, theory-backed method for evaluating the human-likeness and consistency of LLM outputs, a gap identified in previous research on LLM personality characteristics.

  • Focus on Strategy Distribution: The paper meticulously examines how persona traits alter the adoption of specific emotional support strategies. This granular analysis, including human evaluation and case studies, provides actionable insights into how persona-driven LLMs can be fine-tuned to produce more effective and empathetic support, contrasting with prior works that might only note general improvements in personalization.

  • Empirical Evidence for Trait Shifts: The observation of subtle shifts in emotionality and extraversion in LLM-simulated seekers during ESC offers a novel insight into the dynamics of LLM behavior in specific conversational contexts. This highlights that persona manifestation is not static but can be influenced by the conversational role.

    In essence, this paper differentiates itself by providing a comprehensive, psychologically-informed framework for evaluating and understanding the impact of personas on LLM-synthesized ESC, moving beyond anecdotal evidence to rigorous empirical analysis.

4. Methodology

The paper proposes an LLM-based simulation framework to investigate the impact of personas on LLM-synthesized emotional support conversations (ESC). The methodology is structured around three key research questions (RQ1, RQ2, RQ3).

4.1. Principles

The core idea is to leverage the advanced language generation and comprehension capabilities of Large Language Models (LLMs) to simulate emotional support conversations and concurrently use these LLMs to interpret and quantify persona traits based on established psychological frameworks. This allows for a systematic study of how predefined personas are inferred, maintained, and how they influence the emotional support strategies employed in generated dialogues. The theoretical basis relies on the assumption that LLMs can process and manifest human-like personality and communication styles if appropriately prompted and measured.

4.2. Core Methodology In-depth (Layer by Layer)

The methodology addresses three main research questions:

4.2.1. RQ1: Measurement of Persona Traits

This section investigates whether LLMs can infer stable personality and communication style traits from persona cards in an emotional support scenario.

4.2.1.1. Data Collection for Persona Cards

  • Source Datasets: The study first collects non-synthetic ESC data from three existing datasets: ESConv (multi-turn dialogue), CAMS (QA from Reddit), and Dreaddit (QA from Reddit).

  • Persona Extraction: gpt-4o-mini is used to extract basic persona information (age, gender, occupation, sociodemographic description, and problem) from these datasets. This extraction focuses only on the seeker personas due to limited personal information about supporters in the datasets.

  • Persona Filtering: After initial extraction, LLMs are prompted to filter personas to ensure they include individual emotions, experienced events, and a clear socio-demographic background. The prompt used for filtering is shown in Figure 14:

    Figure 6: Comparison of HEXACO scores between the original persona and the one extracted from the dialogue generated by claude-3.5-haiku. 该图像是论文中图6的图表,展示了由claude-3.5-haiku模型生成对话中提取的人格特质(橙色)与原始人格(绿色)在HEXACO六维度上的得分分布对比。

    • Explanation: This prompt instructs the LLM to act as an AI assistant to identify personas that lack sufficient detail. It requires the persona to have a clear socio-demographic background, a description of the emotional problem, and the relevant events experienced. If these criteria are not met, the LLM should classify the persona as "unclear." This ensures the quality and richness of the personas used for trait inference.

4.2.1.2. Trait Assessment using Psychological Inventories

  • Inventories Used: HEXACO model (Ashton and Lee, 2009) for personality and Communication Styles Inventory (CSI) (De Vries et al., 2013) for communication styles.

    • HEXACO: Measures Honesty-Humility, Emotionality, Extraversion, Agreeableness, Conscientiousness, and Openness to Experience.
    • CSI: Measures Expressiveness, Preciseness, Verbal Aggressiveness, Questioningness, Emotionality, and Impression Manipulativeness.
  • LLM-based Trait Description Generation: LLMs are prompted to generate descriptions for each HEXACO and CSI dimension based on the extracted socio-demographic information. The prompt for generating trait descriptions is shown in Figure 12:

    Figure 4: Diagram of the process for studying the persona consistency in LLM-simulated ESC. 该图像是论文中图4的示意图,展示了研究LLM模拟情感支持对话中人格一致性的流程,涉及原始人格输入、人格模拟器完成清单、人格提取、对话合成及结果评分比较。

    • Explanation: This prompt instructs the LLM to generate a short sentence (around 20 words) describing how a person, given their socio-demographic description, would typically rate on a specific HEXACO or CSI dimension (e.g., Extraversion, Preciseness). This step operationalizes how LLMs interpret persona information into trait manifestations.
  • LLM-based Inventory Completion: LLMs are then prompted to predict answers to the HEXACO and CSI inventories using the generated persona card and descriptions. The prompt for answering inventories is shown in Figure 13:

    Figure 5: Comparison of HEXACO scores between the original persona and the one extracted from the dialogue generated by gpt-4o-mini. 该图像是图表,展示了论文中图5所述的原始人格特质与由gpt-4o-mini生成对话提取的人格特质(HEXACO六因素模型)的对比分布情况,涵盖了诚实-谦逊、情绪性、外向性、宜人性、尽责性和经验开放性六个维度。

    • Explanation: This prompt guides the LLM to act as an AI assistant, evaluating a persona card and then answering HEXACO or CSI inventory questions. For each question, the LLM must provide a numerical rating from 1 to 5, indicating how well the statement describes the person in the persona card. This process quantifies the persona's traits.
  • Correlation Calculation: Based on the LLM's responses to the inventories, HEXACO and CSI dimension scores are calculated for each persona. Then, Pearson correlation is used to analyze the relationships between each dimension of HEXACO and CSI. The formula for the Pearson correlation coefficient (rr) is: $ r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} $ Where:

    • nn: The total number of persona instances for which both HEXACO and CSI scores are obtained.
    • xx: The score for a specific HEXACO dimension (e.g., Extraversion).
    • yy: The score for a specific CSI dimension (e.g., Expressiveness).
    • xy\sum xy: The sum of the products of corresponding HEXACO and CSI dimension scores across all personas.
    • x\sum x: The sum of all HEXACO dimension scores.
    • y\sum y: The sum of all CSI dimension scores.
    • x2\sum x^2: The sum of the squares of all HEXACO dimension scores.
    • y2\sum y^2: The sum of the squares of all CSI dimension scores. This coefficient measures the linear relationship between the two trait dimensions.
  • Expected Correlations: The paper references established psychological theory (De Vries et al., 2013) that posits strong correlations between specific HEXACO and CSI dimensions:

    • Extraversion < - > Expressiveness
    • Conscientiousness < - > Preciseness
    • Agreeableness < - > Verbal Aggressiveness (expected to be an inverse correlation)
    • Openness to Experience < - > Questioningness
    • Emotionality < - > Emotionality
    • Honesty-Humility < - > Impression Manipulativeness (expected to be an inverse correlation)
  • LLMs Used: gpt-4o-mini, Claude-3.5-Haiku, and LLaMA-3.1-8B-Instruct are used with temperature set to 0 for stable results.

4.2.1.3. Process Diagram for Measurement of Persona Traits

The following figure (Figure 3 from the original paper) illustrates the overall process for measuring persona traits:

Figure 15: Prompt for extending personas from Persona Hub. 该图像是论文中图15的截取部分,展示了用于从Persona Hub扩展人物设定的文本提示内容,指导生成个人的社会人口学描述。

  • Explanation: The diagram shows persona data being extracted from HEXACO and CSI inventories. These are then processed by an LLM to generate scores. Finally, the correlations between personality and communication style dimensions are calculated and evaluated against psychological theories. This confirms whether LLMs can infer stable persona traits.

4.2.2. RQ2: Persona Consistency in LLM-simulated Emotional Support Dialogues

This section evaluates whether persona traits remain consistent throughout the LLM-based dialogue generation process.

4.2.2.1. Experimental Setup for Consistency

  • Persona Selection: 1,000 randomly selected personas from PersonaHub (Chan et al., 2024) are used to ensure diverse characteristics.

  • Persona Enhancement: These personas, initially simple descriptions, are systematically enhanced using LLMs. This involves adding socio-demographic details (age, gender, occupation) and specific trait-indicative statements aligned with HEXACO and CSI dimensions. The prompt for extending personas is shown in Figure 15:

    Figure 7: Comparison of HEXACO scores between the original persona and the one extracted from the dialogue generated by LLaMA-3.1-8B-Instruct. 该图像是论文中图7的图表,展示了原始人格特质与从LLaMA-3.1-8B-Instruct生成的对话中提取的人格特质HEXACO得分的对比,体现了各维度上的得分分布差异。

    • Explanation: This prompt instructs the LLM to act as a persona generator, taking a basic persona description (e.g., "A fearless and highly trained athlete...") and elaborating it into a detailed socio-demographic description. This ensures the personas have sufficient depth for trait assessment and dialogue generation.
  • Quantification of Original Personas: HEXACO and CSI dimension scores are generated for these enhanced personas using the same methodology as described in Section 4.2.1.

  • Dialogue Generation: The enriched personas are used to generate emotional support dialogues where each persona acts as a seeker in contextually relevant scenarios (e.g., an athlete discussing an injury). The prompt for generating emotional support dialogues is shown in Figure 17:

    Figure 9: Diagram of the process for studying the impact of persona on LLM-simulated ESC. 该图像是论文中第9图的示意图,展示了研究角色设定对LLM生成情感支持对话影响的流程。图中比较了带有角色设定和不带角色设定的对话生成及其对应策略分布差异。

    • Explanation: This prompt defines the role of the AI assistant as a seeker and instructs it to engage in an emotional support conversation based on a provided persona (socio-demographic description, HEXACO scores, CSI scores, and problem statement). It also provides definitions of emotional support strategies for the supporter to use. The LLM is asked to simulate the seeker's responses, making sure they align with the persona.
  • Persona Extraction from Generated Dialogues: After dialogue generation, the methodology from Section 4.2.1 (using gpt-4o-mini to extract persona characteristics) is applied to the generated conversations.

  • Score Calculation from Extracted Personas: HEXACO and CSI scores are then calculated from these newly extracted personas.

  • Consistency Assessment: These extracted scores are compared with the original scores assigned to the input personas to assess the consistency of trait representation after the dialogue generation process.

4.2.2.2. Process Diagram for Persona Consistency

The following figure (Figure 4 from the original paper) illustrates the process for studying persona consistency in LLM-simulated ESC:

Figure 16: Prompt for synthesizing dialogues with persona traits. 该图像是一段用于合成带有人格特征对话的提示文本,详细说明了模拟情绪支持对话的任务要求,包括利用HEXACO人格维度和CSI沟通风格得分指导响应策略,结合之前对话内容生成三天后的跟进对话。

  • Explanation: The diagram shows that an original persona is fed into a persona simulator to generate HEXACO and CSI scores. This persona is then used in dialogue synthesis as a seeker. From the synthesized dialogue, a persona is extracted, and its HEXACO and CSI scores are calculated. Finally, the extracted persona scores are compared to the original persona scores to determine consistency.

4.2.2.3. Ablation Study for Persona Influence

  • Objective: To understand how LLM's inherent personality traits influence dialogue generation by comparing dialogues generated with and without predefined personas.
  • Method: Implicit personas are extracted from both sets of conversations, and their corresponding personality scores are calculated.
  • Visualization: Principal Component Analysis (PCA) is used to project these scores into a 2D space for visualization.
    • Conceptual Definition: PCA is a technique that transforms high-dimensional data into a lower-dimensional space while preserving the most important information (variance). It finds orthogonal components (principal components) that capture the maximum variance in the data. In this context, it takes multi-dimensional personality scores and reduces them to two dimensions, allowing for a visual comparison of their distributions.
    • Purpose: The comparison of distributions (e.g., concentrated vs. broad) reveals the impact of persona injection on the range of personality traits expressed in the dialogues.

4.2.3. RQ3: Impact of Persona on LLM-simulated Emotional Support Dialogues

This section investigates how infusing persona traits affects the distribution of emotional support strategies in LLM-simulated ESC.

4.2.3.1. Experimental Setup for Strategy Impact

  • Dialogue Continuation: The study uses ESConv dialogues as historical context. LLMs are then instructed to predict how a new conversation would unfold, generating two versions of continuations: one with persona traits and one without persona traits.

  • Persona-Injected Dialogue Generation: The prompt for synthesizing dialogues with persona traits is shown in Figure 16:

    Figure 8: Distribution of personality scores, reduced to 2D, obtained from dialogues w/o persona injection. 该图像是一个图表,展示了未注入人格时对话中人格评分在二维空间的分布情况,使用圆点和三角形区分是否注入人格,表现出两者在分布上的差异。

    • Explanation: This prompt instructs the LLM to act as a seeker and generate dialogue continuing from a provided dialogue history. The LLM is explicitly given a persona including socio-demographic description, HEXACO scores, CSI scores, and problem statement. The LLM must ensure its responses are consistent with this persona and continue the conversation as if it's three days later.
  • Non-Persona Dialogue Generation: The prompt for synthesizing dialogues without persona traits is shown in Figure 18:

    Figure 10: Case study. Blueindicates that the supporter directly provides emotional support.Greensignifies the supporter offers direct suggestions.Yellowmeans that the supporter provides suggestions… 该图像是一个对比示例表,展示了带有人格标签与不带人格标签的情感支持对话内容,突出支持者在不同策略下的回复差异,内容通过颜色高亮区分情感支持、直接建议和引导反思。

    • Explanation: Similar to the persona-injected prompt, this prompt asks the LLM to continue a dialogue history as a seeker. However, it explicitly states that the LLM should not consider any persona traits (personality or communication style) but should still focus on solving the seeker's problem and maintaining emotional support strategies. This serves as the control group for comparing strategy distribution.
  • Emotional Support Strategies: The study utilizes the emotional support strategies defined in the ESConv dataset (Liu et al., 2021). The definitions are provided in Appendix G of the paper:

    • Question: Asking open-ended or specific questions to help the seeker articulate issues and provide clarity.
    • Restatement or Paraphrasing: Concisely rephrasing the seeker's statements to aid self-understanding.
    • Reflection of feelings: Expressing and clarifying the seeker's emotions to acknowledge their feelings.
    • Self-disclosure: Sharing similar experiences or emotions to build empathy and connection.
    • Affirmation and Reassurance: Affirming the seeker's strengths, motivation, capabilities, and providing encouragement.
    • Providing Suggestions: Offering possible ways forward while respecting the seeker's autonomy.
    • Information: Providing useful information to the seeker.
    • Others: Exchange pleasantries or strategies beyond defined categories.
  • Strategy Distribution Analysis: The generated dialogues (both with and without personas) are analyzed to quantify the frequency of each emotional support strategy employed by the supporter. This comparison reveals how personas influence strategic choices.

4.2.3.2. Human Evaluation

  • Objective: To intuitively reveal the impact of personas on LLM-simulated ESC through subjective assessment.
  • Evaluators: 10 native English-speaking undergraduate students, experienced in annotation tasks, are recruited.
  • Task: Evaluators compare 50 randomly selected groups of instances (dialogues generated with and without personas). For each pair, they rate which dialogue (or if they are tied) performs better on five indicators.
  • Evaluation Indicators:
    • Suggestion: How effectively the supporter provided helpful advice.
    • Consistency: Whether participants consistently maintained their roles and exhibited coherent behavior.
    • Comforting: The supporter's ability to provide emotional support to the seeker.
    • Identification: Which supporter delved deeper into the seeker's situation and was more effective in identifying issues.
    • Overall: An assessment of the overall performance comparing the two dialogue groups.

4.2.3.3. Case Study

A detailed example (Figure 10, Figure 22, Figure 23) is provided to qualitatively illustrate the differences in conversational dynamics and strategy usage when personas are injected. This highlights how persona-guided supporters might use rhetorical questions to encourage reflection and offer suggestions more tactfully, leading to deeper conversations.

The following figure (Figure 9 from the original paper) illustrates the overall process for studying the impact of persona on LLM-simulated ESC:

Figure 20: Comparison of CSI scores between the original persona and the one extracted from the dialogue generated by claude-3.5-haiku. 该图像是图表,展示了图20中基于claude-3.5-haiku生成对话提取的人格特质(CSI分数)与原始人格的对比,比较了多种人格维度上的得分分布差异。

  • Explanation: The diagram starts with an original dialogue from ESConv. This history is used to prompt LLMs to generate dialogue continuations under two conditions: with persona (where persona traits are injected) and without persona. The generated dialogues are then analyzed for strategy distribution and subjected to human evaluation to assess the impact of persona.

5. Experimental Setup

5.1. Datasets

The study utilizes several datasets for different phases of its investigation:

  • For Persona Extraction (RQ1):

    • ESConv (Liu et al., 2021): A multi-turn emotional support dialogue dataset. This dataset is crowdsourced and focuses on conversations where supporters provide emotional strategies.
    • CAMS (Garg et al., 2022): A Question-Answering (QA) dataset derived from Reddit posts discussing mental health issues.
    • Dreaddit (Turcan and McKeown, 2019): Another QA dataset from Reddit, focused on stress analysis in social media. The paper extracted seeker personas from these datasets. The detailed statistics of the extracted persona cards are summarized in Table 1. The following are the results from Table 1 of the original paper:
    Dataset ESConv CAMS Dreaddit
    Data Type dialogue QA QA
    Num. of personas 1,155 1,140 730
    Avg. words of desc. 57.38 66.42 56.68
    Avg. words of prob. 33.92 32.02 27.75
    Num. of age 901 1,014 459
    Num. of gender 417 401 300
    Num. of occupation 926 968 542
    • Explanation: This table provides a quantitative overview of the personas extracted from each dataset. It shows the number of personas, the average word count for their descriptions and problems, and the count of personas for which age, gender, and occupation information were available. This indicates the scale and descriptive richness of the persona data used.
  • For Persona Consistency (RQ2):

    • PersonaHub (Chan et al., 2024): A large collection of synthetic personas. 1,000 randomly selected personas from PersonaHub were used. These personas were initially simple descriptions and were further enhanced by LLMs to include socio-demographic details and trait-indicative statements.
  • For Impact of Persona on Strategies (RQ3):

    • ESConv (Liu et al., 2021): Used as the source for dialogue history to ensure a realistic context for generating future conversations. This allows LLMs to continue existing emotional support scenarios with and without persona traits.

5.2. Evaluation Metrics

The evaluation in this paper spans quantitative statistical measures and qualitative human assessments.

  • Pearson Correlation Coefficient (rr) (for RQ1):

    • Conceptual Definition: Quantifies the linear statistical relationship between two variables, specifically between HEXACO personality dimensions and CSI communication style dimensions. A higher absolute value indicates a stronger linear relationship.
    • Mathematical Formula: $ r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} $
    • Symbol Explanation:
      • nn: Number of paired observations (number of personas).
      • xx: Score of a HEXACO dimension for a persona.
      • yy: Score of a CSI dimension for the same persona.
      • \sum: Summation operator.
      • xy\sum xy: Sum of the products of each pair of (x, y) scores.
      • x\sum x: Sum of all xx scores.
      • y\sum y: Sum of all yy scores.
      • x2\sum x^2: Sum of the squared xx scores.
      • y2\sum y^2: Sum of the squared yy scores.
  • Distribution Comparison (for RQ2):

    • Conceptual Definition: Visual comparison using violin plots and 2D PCA projections to assess how the distribution of HEXACO and CSI scores changes between original personas and personas extracted from LLM-generated dialogues, as well as between dialogues generated with and without persona injection.
    • Explanation: This metric visually highlights shifts in the mean, median, spread, and density of trait scores, indicating the stability or changes in persona representation. For the ablation study, PCA helps visualize if persona injection leads to a broader or more concentrated range of manifested personality traits.
  • Strategy Distribution (for RQ3):

    • Conceptual Definition: Measures the percentage of turns or utterances where each predefined emotional support strategy is utilized by the supporter in LLM-generated dialogues.
    • Explanation: This quantitative measure reveals how the presence or absence of persona traits influences the LLM's choice of emotional support strategies, showing if certain strategies become more or less prominent.
  • Human Evaluation Indicators (for RQ3):

    • Conceptual Definition: Subjective ratings provided by human annotators comparing pairs of dialogues (with vs. without persona) based on specific quality criteria.
    • Indicators:
      • Suggestion: Evaluates the helpfulness and quality of advice given by the supporter.
      • Consistency: Assesses whether the participants (seeker and supporter) maintain their roles and exhibit coherent behavior throughout the dialogue.
      • Comforting: Measures the supporter's effectiveness in providing emotional support to the seeker.
      • Identification: Determines how well the supporter understood the seeker's situation and identified underlying issues.
      • Overall: A holistic assessment of which dialogue (or if tied) performed better across all aspects.
    • Explanation: These metrics capture the qualitative aspects of dialogue quality and empathy, which are crucial for emotional support conversations and might not be fully captured by automated metrics. Annotators vote for "Win" (persona dialogue outperforms), "Loss" (non-persona dialogue outperforms), or "Tie".

5.3. Baselines

The study's experimental design involves comparisons rather than direct baselines in the traditional sense of competing models.

  • For RQ1 (Measurement of Persona Traits): The LLMs' ability to infer persona traits is evaluated by comparing the calculated Pearson correlations between HEXACO and CSI dimensions against established psychological theory (De Vries et al., 2013). The "baseline" here is human psychological understanding of these trait relationships. The performance of different LLMs (gpt-4o-mini, Claude-3.5-Haiku, LLaMA-3.1-8B-Instruct) is compared against each other in this task.

  • For RQ2 (Persona Consistency): The consistency is assessed by comparing the HEXACO and CSI scores of original input personas against the scores of personas extracted from LLM-generated dialogues. The "baseline" is the original persona itself. The ablation study within this section uses dialogues generated without persona injection as a baseline to understand the inherent traits of the LLMs and the impact of explicit persona guidance.

  • For RQ3 (Impact of Persona on Strategies): The impact is measured by comparing dialogues generated with persona traits against dialogues generated without persona traits. The "baseline" is the LLM's output when no explicit persona information is provided, allowing for an isolated assessment of the persona's influence on strategy distribution and dialogue quality.

    The LLMs themselves (gpt-4o-mini, Claude-3.5-Haiku, LLaMA-3.1-8B-Instruct) are the agents generating the dialogues and making the assessments, with their performance across these tasks being a key part of the analysis.

6. Results & Analysis

6.1. Core Results Analysis

6.1.1. RQ1: LLMs Can Infer Core Persona Traits

The experimental results demonstrate that LLMs can infer stable persona traits from persona cards in emotional support scenarios, though with varying degrees of accuracy across different models.

The following are the results from Table 2 of the original paper:

Expr. Prec. Verb. Ques. Emot. Impr.
Extr.Cons. .54 .15 -.21 .36 -.39 .04
.34 .34 -.11 .16 -.36 .02
Agre. .21 .15 -.39 .12 -.19 -.05
Open. .41 .25 -.23 .47 -.09 .09
Emot. -.32 -.11 -.04 -.21 .45 -.10
Hone. -.24 -.01 -.17 -.27 .05 -.18
  • Explanation: This table displays the Pearson correlations between HEXACO dimensions (rows) and CSI dimensions (columns) as measured by gpt-4o-mini on the ESConv dataset. All reported P-values are less than 0.01, indicating statistical significance. A key observation is the strong positive correlation (0.54) between Extraversion and Expressiveness, and a positive correlation (0.47) between Openness and Questioningness. There's also a clear positive correlation (0.45) between Emotionality (HEXACO) and Emotionality (CSI), and a negative correlation (-0.39) between Agreeableness and Verbal Aggressiveness. These align well with established psychological theories.

    The following are the results from Table 3 of the original paper:

    Expr. Prec. Verb. Ques. Emot. Impr.
    Extr. .63 .01 -.21 .50 .07 -.11
    Cons. .15 .48 -.22 .04 -.18 -.19
    Agre. .13 .14 -.59 -.02 .06 -.39
    Open. .39 .01 -.28 .46 .21 -.11
    Emot. -.24 -.18 .00 -.22 .32 -.06
    Hone. .06 .28 -.42 .00 .05 -.37
  • Explanation: This table shows the correlations measured by Claude-3.5-Haiku. While it also shows a strong correlation between Extraversion and Expressiveness (0.63), and Openness and Questioningness (0.46), it exhibits some discrepancies. For example, Questioningness is incorrectly linked with Conscientiousness (0.04 correlation, which is very weak) and Extraversion has a stronger link (0.50), suggesting a misinterpretation compared to the theoretical expectation that Openness correlates with Questioningness.

    The following are the results from Table 4 of the original paper:

    Expr. Prec. Verb. Ques. Emot. Impr.
    Extr. .28 .26 -.21 .15 -.33 -.11
    Cons. .13 .55 -.16 -.01 -.42 -.04
    Agre. -.02 -.13 -.19 .03 .10 -.05
    Open. .07 -.12 -.15 .32 .08 -.10
    Emot. -.18 -.32 0.06 -.02 .48 .01
    Hone. -.08 -.11 -.20 .06 .07 -.13
  • Explanation: This table presents correlations from LLaMA-3.1-8B-Instruct. This model shows stronger inconsistencies. For example, it incorrectly associates verbal aggressiveness (weakly negative, -0.21) with extraversion and conscientiousness, and preciseness (0.55) with conscientiousness. The correlation between Extraversion and Expressiveness (0.28) is weaker compared to gpt-4o-mini. This suggests that LLaMA-3.1-8B-Instruct has more limitations in accurately interpreting certain persona traits.

    Overall, gpt-4o-mini consistently exhibits the strongest correlations aligning with psychological theory, indicating its superior ability to infer stable persona traits. Similar findings for CAMS and Dreaddit datasets are mentioned in Appendix C, reinforcing these observations across different data sources.

6.1.2. RQ2: Persona Consistency in LLM-simulated Emotional Support Dialogues

The analysis of persona consistency reveals that while core persona traits are largely maintained during LLM-based dialogue generation, subtle shifts occur, particularly in emotionality and extraversion.

The following figure (Figure 5 from the original paper) displays a comparison of HEXACO scores:

Figure 17: Prompt for generating emotional support dialogue based on the given persona. 该图像是图17,展示了基于给定人物设定生成情感支持对话的提示词示意框,包含对话模拟指令、情感支持策略定义及社会人口学描述模板。

  • Explanation: This violin plot compares the distribution of HEXACO scores for original personas (green) and personas extracted from dialogues generated by gpt-4o-mini (orange). It shows that Honesty-Humility, Agreeableness, Conscientiousness, and Openness largely maintain similar distributions. However, Emotionality tends to be higher in extracted personas, and Extraversion tends to be lower, indicating a shift in these traits during the dialogue simulation. This suggests that the context of seeking emotional support influences how these traits manifest.

    The following figure (Figure 6 from the original paper) displays a comparison of HEXACO scores:

    Figure 18: Prompt for synthesizing dialogues without persona traits. 该图像是图18,展示了用于合成无角色特征对话的提示模板,内容涉及情景设定、对支持者回复的策略要求及对话历史格式。

  • Explanation: This violin plot shows the HEXACO score comparison for dialogues generated by Claude-3.5-Haiku. Similar to gpt-4o-mini, Emotionality is generally higher, and Extraversion is lower in the extracted personas compared to the originals, while other traits remain relatively consistent.

    The following figure (Figure 7 from the original paper) displays a comparison of HEXACO scores:

    该图像是一个小提琴图,展示了原始人格特质与提取出的人格特质在情感支持对话中的分布对比,涉及表达性、准确性、言语攻击性、质疑性、情感性和印象操控性等维度。 该图像是一个小提琴图,展示了原始人格特质与提取出的人格特质在情感支持对话中的分布对比,涉及表达性、准确性、言语攻击性、质疑性、情感性和印象操控性等维度。

  • Explanation: This violin plot illustrates the HEXACO score comparison for LLaMA-3.1-8B-Instruct. The trend of increased Emotionality and decreased Extraversion in extracted personas is also observed here, confirming the pattern across different LLMs.

    These findings indicate that LLM-simulated seekers tend to exhibit more emotionality and lower extraversion post-generation. This is interpreted as a natural consequence of the seeker's role in emotional support dialogues, where they are actively dealing with emotional issues and are less likely to be outgoing. A similar pattern is observed for CSI scores (Tables 19, 20, and 21 in Appendix E), supporting these conclusions about persona consistency with specific shifts.

6.1.3. Ablation Study

The following figure (Figure 8 from the original paper) shows the distribution of personality scores:

Figure 19: Comparison of CSI scores between the original persona and the one extracted from the dialogue generated by gpt-4o-mini. 该图像是论文中图19的图表,展示了原始人格特质与从gpt-4o-mini生成对话中提取的人格特质CSI评分的对比。图中通过小提琴图展示了七种人格特质的评分分布与差异。

  • Explanation: This 2D PCA projection visualizes the distribution of personality scores obtained from dialogues. The red circles represent dialogues generated without persona injection, showing a more concentrated distribution. The blue triangles represent dialogues generated with persona injection, showing a broader range of personality traits. This clearly demonstrates that explicit persona injection guides and shapes the dialogue generation process, leading to a wider diversity of expressed personalities compared to the LLM's inherent default personality.

6.1.4. RQ3: Impact of Persona on LLM-simulated Emotional Support Dialogues

The application of persona traits significantly modifies the distribution of emotional support strategies used by LLM-simulated supporters.

The following are the results from Table 5 of the original paper:

Strategy w/PT w/o PT
question 27.23% 16.45%
restatement or paraph. 3.61% 10.57%
reflection of feelings 11.75% 11.33%
self-disclosure 2.64% 10.64%
affirmation and reass. 29.72% 21.06%
providing suggestions 16.92% 14.25%
information 0.78% 4.85%
others 8.45% 10.88%
  • Explanation: This table compares strategy distribution for gpt-4o-mini. Dialogues generated with persona traits (w/PT) show a notably higher percentage of questioning (27.23% vs. 16.45%) and affirmation and reassurance (29.72% vs. 21.06%). Conversely, dialogues without persona traits (w/o PT) have higher rates of restatement or paraphrasing, self-disclosure, and information. This suggests personas encourage more proactive and empathetic strategies.

    The following are the results from Table 6 of the original paper:

    Strategy w/PT w/o PT
    question 33.10% 27.31%
    restatement or paraph. 0.69% 0.96%
    reflection of feelings 21.19% 18.15%
    self-disclosure 2.22% 13.97%
    affirmation and reass. 19.32% 17.41%
    providing suggestions 19.01% 15.41%
    information 4.45% 6.77%
    others 0.02% 0.02%
  • Explanation: This table presents strategy distribution for Claude-3.5-Haiku. Similar to gpt-4o-mini, w/PT dialogues show a higher percentage of questioning (33.10% vs. 27.31%) and reflection of feelings (21.19% vs. 18.15%). Self-disclosure (2.22% vs. 13.97%) is significantly reduced with personas.

    The following are the results from Table 7 of the original paper:

    Strategy w/PT w/o PT
    question 12.70% 12.34%
    restatement or paraph. 6.51% 7.56%
    reflection of feelings 18.44% 18.06%
    self-disclosure 7.69% 9.86%
    affirmation and reass. 21.42% 18.91%
    providing suggestions 13.33% 13.40%
    information 2.48% 4.25%
    others 17.43% 15.62%
  • Explanation: This table shows strategy distribution for LLaMA-3.1-8B-Instruct. While the differences are less pronounced compared to the other LLMs, w/PT dialogues still exhibit slightly higher affirmation and reassurance and lower self-disclosure and information, consistent with the overall trend.

    The following are the results from Table 8 of the original paper:

    Strategy HEXACO CSI
    question 27.83% 27.23%
    restatement or paraph. 3.72% 3.61%
    reflection of feelings 12.48% 11.75%
    self-disclosure 3.41% 2.64%
    affirmation and reass. 28.96% 29.72%
    providing suggestions 16.44% 16.92%
    information 0.50% 0.78%
    others 6.66% 8.45%
  • Explanation: This table compares strategy distribution when personas are defined using HEXACO scores versus CSI scores. The distributions are remarkably similar, reinforcing the strong correlation between these two trait measures as demonstrated in RQ1. This implies that defining personas through either personality or communication style scores leads to consistent effects on strategy usage.

    The analysis indicates that persona-enhanced dialogues lead supporters to engage in more questioning and provide more affirmation and reassurance, focusing on deeper understanding. Conversely, supporters without persona traits tend to explain problems more and rely on self-disclosure, which prior research suggests can be less effective in emotional support if not carefully managed.

6.1.5. Human Evaluation

The following are the results from Table 9 of the original paper:

w/ vs. w/o PT Win Tie Loss
Suggestion 38% 27% 35%
Consistency 27% 54% 19%
Comforting 38% 28% 34%
Identification 37% 30% 33%
Overall 39% 27% 34%
  • Explanation: This table summarizes the results of human evaluation, comparing dialogues generated with persona traits (w/PT) against those without persona traits (w/o PT). "Win" indicates that w/PT dialogues outperformed, "Loss" indicates w/o PT dialogues outperformed, and "Tie" indicates similar performance. w/PT dialogues were perceived as better in Suggestion (38% Win), Comforting (38% Win), Identification (37% Win), and Overall (39% Win). Consistency showed the highest number of ties (54%), suggesting both groups performed similarly in maintaining roles. These results align with the strategy distribution analysis, indicating that persona-driven LLMs produce subjectively better emotional support conversations.

6.1.6. Case Study

A case study (Figures 10, 22, and 23) qualitatively illustrates the impact of personas. The following figure (Figure 10 from the original paper) shows a case study comparing dialogue segments:

Figure 2: An example of persona card. 该图像是论文中的示意图,展示了一个人物卡片的例子,描述了一个23岁男性的社会人口统计信息及其心理健康问题,文字内容包括年龄、性别、职业及其所遭遇的心理困境。

  • Explanation: This figure visually compares a segment of a dialogue generated with persona (left) and without persona (right). The blue, green, and yellow highlights indicate different emotional support strategies. Blue is direct emotional support, Green is direct suggestions, and Yellow is suggestions through rhetorical questions or guiding reflection. The persona-driven dialogue shows a greater tendency to use rhetorical questions (yellow) to encourage seeker reflection and offers suggestions more tactfully. In contrast, the dialogue generated without persona relies more on direct affirmations or suggestions (blue/green). This supports the finding that personas enhance the depth and empathetic quality of conversations by encouraging more nuanced strategy usage.

    The following figure (Figure 22 from the original paper) displays the persona card used in the case study:

    Persona Card
    Age: teenage Gender: unknown Occupation: Student Socio-demographic description: The person is a teenager who is currently a student. They are experiencing the challenges of remote learning due to the CoVID- 19 pandemic, which has led to feelings of loneliness and isolation. The person
    previously had a supportive social circle but has lost that connection during the iockdown. They live with a roommate who is preoccupied with her boyfriend, further contributing to their feelings of being alone. Problem: The person is struggling with feelings of loneliness and isolation due to
    the lack of social interaction during the pandemic. They are contemplating quitting school because of these feelings but are also concerned about their parents' reactions. They are seeking ways to reconnect with friends and manage their
  • Explanation: This table shows the detailed persona card for the seeker in the case study. It includes age, gender, occupation, a socio-demographic description highlighting loneliness due to COVID-19 remote learning, and the specific problem of social isolation and contemplating quitting school. This persona provides the context for the emotional support conversation.

    The following figure (Figure 23 from the original paper) shows the historical dialogue for the case study:

  • Explanation: This image presents the full historical dialogue from ESConv that serves as the context for the case study. The conversation depicts a seeker struggling with loneliness and social isolation during a lockdown, discussing feelings of wanting to quit school. The supporter attempts to provide various emotional support strategies, including suggesting clubs, video calls, and reflecting on the seeker's feelings. This history is then used as input for LLMs to generate continuations, both with and without the specific persona from Figure 22.

    The use of rhetorical questions by persona-guided supporters is highlighted as a key difference, making suggestions more acceptable and fostering deeper, more meaningful conversations, especially for emotional support, which is often a "weak argument" scenario (Petty and Cacioppo, 2012).

6.2. Data Presentation (Tables)

6.2.1. Dialogue Generation Statistics

The following are the results from Table 16 of the original paper:

w/ persona w/o personas
Total Words 218,433 232,674
Total Turns 10,398 12,666
Avg Words (Total) 21.01 18.37
Seeker Words 91,590 94,286
Seeker Turns 5,199 6,323
Avg Words (Seeker) 17.62 14.91
Supporter Words 126,843 138,388
Supporter Turns 5,199 6,343
Avg Words (Supporter) 24.40 21.82
  • Explanation: This table provides statistics on the generated dialogues. Dialogues generated with personas have fewer Total Turns (10,398 vs. 12,666) but a higher Avg Words (Total) per turn (21.01 vs. 18.37). This pattern holds for both seeker and supporter turns individually. This quantitative finding supports the qualitative observation that persona-guided conversations, while potentially shorter in terms of turns, are more efficient and in-depth, leading to longer, more substantive responses per turn.

6.3. Ablation Studies / Parameter Analysis

The ablation study discussed in Section 5.3 (and presented visually in Figure 8) specifically investigates the impact of persona injection versus no persona injection. This is a crucial component of the research, as it directly addresses whether personas actively shape LLM outputs or if LLMs inherently produce a similar range of behaviors.

Results of Ablation Study: The PCA projection in Figure 8 shows a clear distinction:

  • Dialogues generated without persona injection exhibit a more concentrated distribution of personality scores in the 2D space. This suggests that without explicit persona guidance, LLMs tend to converge on a narrower, perhaps "default" or "average," set of personality traits.
  • Dialogues generated with predefined personas cover a broader range of personality traits. This indicates that the externally provided personas successfully diversify the personality traits manifested in the generated dialogues.

Analysis: This ablation study confirms that persona injection is not merely redundant but actively influences the LLM's personality expression. It demonstrates that LLMs possess an inherent personality (or a default operating mode) that is modified and expanded when specific persona traits are provided. This finding is critical because it validates the effectiveness of using personas to guide LLMs towards more varied and contextually appropriate human-like interactions, thereby addressing the lack of human intuition in generative AI annotations. It implies that persona information acts as a steering signal, moving the LLM away from its intrinsic personality manifold to a desired persona-specific space.

7. Conclusion & Reflections

7.1. Conclusion Summary

This analytical study rigorously investigated the impact of incorporating personas into Large Language Model (LLM)-generated emotional support conversations (ESC). The key findings demonstrate the significant potential of persona-driven LLMs in enhancing the effectiveness and human-likeness of AI-powered emotional support systems.

Specifically, the research confirmed three notable points:

  1. LLMs' Capacity to Infer Traits: LLMs, particularly gpt-4o-mini, can reliably infer stable persona traits such as personality and communication styles from textual descriptions, aligning with established psychological theories.

  2. Persona Consistency with Contextual Shifts: While persona traits are generally maintained during dialogue generation, LLM-simulated seekers exhibit subtle, context-driven shifts towards higher emotionality and lower extraversion. These shifts reflect the natural dynamics of engaging in emotional support conversations.

  3. Influence on Strategy Distribution: The application of persona traits significantly alters the distribution of emotional support strategies employed by LLMs. Persona-guided supporters tend to utilize more questioning, affirmation, and reassurance tactfully, leading to more relevant and empathetic responses compared to dialogues generated without personas. Human evaluations further corroborated that persona-enhanced dialogues are perceived as more comforting and effective.

    These conclusions highlight that personas are a powerful mechanism for crafting more personalized, empathetic, and effective emotional support dialogues by guiding LLMs to adopt appropriate communication styles and strategic approaches.

7.2. Limitations & Future Work

The authors acknowledge several limitations and propose directions for future research:

  • LLM Output Bias: The reliance on LLM outputs for both persona extraction and dialogue simulation introduces potential biases inherited from the models' training data. These biases might affect the accuracy of results compared to real human interactions.
    • Future Work: Investigate the impact of inherent biases in LLMs on persona extraction and dialogue simulation, and develop methods to mitigate them.
  • Accuracy of Persona Capture: The current approach for extracting personas using LLMs, while aligned with recent research, may still have limitations in accurately capturing and representing the complexity of human traits.
    • Future Work: Explore more sophisticated methods for persona extraction and representation to enhance fidelity to real human characteristics.
  • Omniscient Perspective: The current data generation process assumes an omniscient perspective, where both seekers and supporters have complete information. This does not fully reflect real-world conversational dynamics where information is often asymmetric or revealed gradually.
    • Future Work: Improve realism by simulating emotional support scenarios with distinct information states for each role, allowing for more dynamic and realistic information exchange.

7.3. Personal Insights & Critique

This paper offers valuable insights into the burgeoning field of AI-driven emotional support. The rigorous application of psychological frameworks (HEXACO and CSI) to quantify LLM behavior is a significant strength, moving beyond subjective assessments to provide empirical evidence for the impact of personas. The finding that gpt-4o-mini aligns well with psychological theory in trait inference suggests a promising future for building AI agents that can genuinely understand and manifest nuanced human characteristics.

The observed shifts in emotionality and extraversion in LLM-simulated seekers are particularly insightful. This indicates that LLMs are not merely static persona replicators but can adapt trait manifestations to the conversational context, reflecting a deeper level of simulation. This dynamic behavior is crucial for realistic and effective emotional support. The detailed analysis of strategy distribution and the human evaluation further reinforce the practical utility of persona injection, demonstrating that it leads to qualitatively better and more empathetic interactions.

Potential Issues & Critique:

  • Ethical Concerns: The authors appropriately highlight the ethical implications. Making LLMs seem more human-like through personas carries risks, including user dependency on chatbots instead of seeking professional help, potential for emotional manipulation, and the inherent societal biases in LLM training data. While the paper emphasizes academic use, the transition to real-world applications requires robust safeguards, transparent disclosure to users that they are interacting with an AI, and clear pathways to human assistance for severe distress. The subtle shifts in emotionality could be a double-edged sword: while making seekers more realistic, it could also make supporters' responses less objectively rational if not carefully controlled.
  • Generalizability of Persona Effects: While the study shows significant impacts on strategy distribution, it would be interesting to explore if certain persona traits (e.g., high Honesty-Humility or low Verbal Aggressiveness) lead to more universally positive ESC outcomes, or if the "optimal" strategy distribution is highly context-dependent.
  • Complexity of Real-world Personas: The personas used, though enhanced, might still be simpler than the multifaceted and evolving personas of real individuals. Future work could explore how LLMs handle dynamic persona evolution or conflicting persona elements.

Broader Applicability & Future Value: The methods and conclusions of this paper can be applied to various domains beyond emotional support. For instance:

  • Customer Service Bots: Persona-driven LLMs could create more patient, understanding, or assertive customer service agents tailored to specific customer segments or complaint types.

  • Education: AI tutors could adopt personas that match student learning styles or provide encouragement in a way that resonates with individual learners.

  • Healthcare: AI assistants could communicate medical information with appropriate levels of empathy and clarity based on a patient's personality and emotional state.

  • Gaming and Entertainment: More believable and engaging non-player characters (NPCs) in games, or interactive storytelling agents, could be developed with consistent and expressive personas.

    This research is a crucial step towards developing AI systems that are not just intelligent, but also emotionally intelligent and contextually aware, laying foundational work for truly empathetic and personalized human-AI interaction.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.