CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
TL;DR Summary
CrAM dynamically adjusts influential attention heads in LLMs to reduce low-credibility document impact in RAG, improving misinformation resistance by over 20%, outperforming supervised fine-tuning across datasets and models.
Abstract
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG Boyi Deng 1 , Wenjie Wang 2 * , Fengbin Zhu 2 * , Qifan Wang 3 , Fuli Feng 1 1 University of Science and Technology of China, 2 National University of Singapore, 3 Meta AI dengboyi@mail.ustc.edu.cn, wqfcr@fb.com, { wenjiewang96, zhfengbin, fulifeng93 } @gmail.com Abstract Retrieval-Augmented Generation (RAG) can alleviate hallu- cinations of Large Language Models (LLMs) by referenc- ing external documents. However, the misinformation in ex- ternal documents may mislead LLMs’ generation. To ad- dress this issue, we explore the task of “credibility-aware RAG”, in which LLMs automatically adjust the influence of retrieved documents based on their credibility scores to counteract misinformation. To this end, we introduce a plug-and-play method named Cr edibility-aware A ttention M odification (CrAM). CrAM identifies influential attention heads in LLMs and adjusts their attention weights based on the credibility of the documents, thereby reducing the im- pact of low-credibility documents. Experiments on Natual Questions and TriviaQA using Llama2-13B, Llama3-8B, and Qwen1.5-7B show that
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
The central topic of the paper is "CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG". It focuses on improving the robustness of Retrieval-Augmented Generation (RAG) models against misinformation present in external documents by dynamically adjusting the influence of these documents based on their credibility.
1.2. Authors
The authors of the paper are:
-
Boyi Deng (University of Science and Technology of China)
-
Wenjie Wang (National University of Singapore)
-
Fengbin Zhu (National University of Singapore)
-
Qifan Wang (Meta AI)
-
Fuli Feng (University of Science and Technology of China)
Their affiliations indicate a collaboration between academic institutions and a leading AI research company, suggesting a blend of theoretical rigor and practical relevance. Wenjie Wang and Fengbin Zhu are marked with an asterisk, typically denoting co-corresponding authors.
1.3. Journal/Conference
The paper does not explicitly state the journal or conference it was published in, but the presence of an arXiv link and a publication UTC date suggests it might be a preprint or submitted to a major conference in the field of Natural Language Processing (NLP) or Machine Learning. Given the topic, venues like ACL, EMNLP, NeurIPS, or ICML would be relevant and influential in this domain.
1.4. Publication Year
The paper was published at (UTC): 2025-04-11T00:00:00.000Z, indicating a publication year of 2025.
1.5. Abstract
The abstract introduces Retrieval-Augmented Generation (RAG) as a method to mitigate Large Language Model (LLM) hallucinations by referencing external documents. However, it highlights a crucial problem: misinformation in these external documents can mislead LLMs. To address this, the paper proposes "credibility-aware RAG," where LLMs dynamically adjust the influence of retrieved documents based on their credibility scores. The authors introduce a plug-and-play method called Credibility-aware Attention Modification (CrAM). CrAM works by identifying influential attention heads within LLMs and modifying their attention weights according to the credibility of the associated documents, thereby diminishing the impact of low-credibility information. Experiments conducted on the Natural Questions (NQ) and TriviaQA datasets using Llama2-13B, Llama3-8B, and Qwen1.5-7B demonstrate that CrAM significantly improves RAG performance against misinformation by over 20%, even outperforming supervised fine-tuning (SFT) methods.
1.6. Original Source Link
The original source link for the paper is /files/papers/690cd61f0de225812bf9335e/paper.pdf.
This appears to be a direct link to the PDF file, likely hosted on a repository like arXiv or a conference proceeding archive. The publication status is "preprint" or "published in proceedings" based on the URL structure and the academic context.
2. Executive Summary
2.1. Background & Motivation
The core problem the paper aims to solve is the vulnerability of Retrieval-Augmented Generation (RAG) systems to misinformation present in external knowledge sources. While RAG effectively reduces hallucinations in Large Language Models (LLMs) by providing factual context, the quality of this context is paramount. If the retrieved documents contain misinformation, LLMs can be misled into generating unfaithful or incorrect responses. This is a significant concern as misinformation pollution is prevalent in online data, as demonstrated by instances like Microsoft's Bing being misled and research showing that LLM-generated misinformation can degrade RAG performance.
Prior research has focused on misinformation detection to measure document credibility. A straightforward approach would be to simply discard low-credibility documents. However, the paper points out that directly discarding documents might lead to a loss of relevant and important information, potentially degrading overall performance. This highlights a crucial gap: while credibility scores can be obtained, effective mechanisms for LLMs to utilize these scores without outright discarding information are underdeveloped.
The paper's innovative entry point is to explore "credibility-aware RAG," where LLMs can automatically adjust the influence of retrieved documents based on their credibility scores, rather than a binary accept/reject decision. This allows for a more nuanced handling of potentially compromised information. Previous attempts in this direction relied on supervised fine-tuning (SFT), which is resource-intensive and requires specialized training data, limiting its broad applicability. The paper therefore seeks a non-SFT, plug-and-play solution.
2.2. Main Contributions / Findings
The paper makes several primary contributions to address the challenge of misinformation in RAG systems:
- Exploration of Credibility-Aware RAG without Fine-tuning: The authors formally define and explore the task of
credibility-aware RAGas a way to alleviate misinformation pollution without requiring computationally expensive and data-intensivefine-tuningofLLMs. This approach aims for a more practical and adaptable solution. - Introduction of CrAM (Credibility-aware Attention Modification): They propose a novel
plug-and-playmethod called CrAM. This method enhances LLMs withcredibility-aware RAGcapabilities by:- Identifying Influential Attention Heads: CrAM selects specific
attention headswithin the LLM that have a significant impact on generating incorrect answers when misinformation is present. This is achieved using a modifiedcausal tracingapproach. - Modifying Attention Weights: For these identified influential heads, CrAM adjusts their
attention weightsbased on thecredibility scoresof the retrieved documents. This mechanism reduces theattentionpaid tolow-credibility documents, effectively mitigating their misleading influence.
- Identifying Influential Attention Heads: CrAM selects specific
- Extensive Experimental Validation and Superior Performance: The paper conducts comprehensive experiments on two open-domain
Question Answering (QA)datasets (Natural Questions and TriviaQA) using three popularLLMs(Llama2-13B, Llama3-8B, and Qwen1.5-7B).-
Key Findings: CrAM significantly improves the
Exact Match (EM)andF1 Scoreperformance of RAG systems by over 20% compared to vanilla RAG when facing misinformation. -
Outperformance of SFT-based Methods: Notably, CrAM often surpasses
supervised fine-tuning (SFT)-based methods (like CAG) in most scenarios, demonstrating its efficiency and effectiveness without the need for extensive retraining. -
Robustness: CrAM shows robustness to varying numbers of low-credibility documents and minor sensitivity to the size of the dataset used for identifying influential heads.
These contributions offer a practical and effective strategy for making RAG systems more resilient to misinformation, which is a critical step towards building more trustworthy
AIapplications.
-
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To fully understand the CrAM paper, it is essential to grasp several foundational concepts in Large Language Models (LLMs) and Natural Language Processing (NLP).
3.1.1. Large Language Models (LLMs)
Large Language Models (LLMs) are advanced artificial intelligence models, typically based on the transformer architecture, that have been trained on vast amounts of text data. They are capable of understanding, generating, and processing human language for a wide range of tasks, such as question answering, text summarization, translation, and creative writing. LLMs like GPT-3/4, Llama, and Qwen are characterized by their massive scale (billions of parameters) and emergent abilities.
3.1.2. Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is a technique that enhances LLMs by providing them with access to external, up-to-date, and domain-specific information. Instead of relying solely on the knowledge memorized during their pre-training, RAG systems first retrieve relevant documents or passages from a large corpus (e.g., Wikipedia, a company's internal documents) based on a user's query. These retrieved documents are then fed to the LLM along with the original query, allowing the model to generate more accurate, factual, and contextually rich responses. This process helps alleviate hallucinations (generating false or nonsensical information) and keeps the LLM's knowledge current.
3.1.3. Hallucinations in LLMs
Hallucinations in LLMs refer to the phenomenon where the model generates information that is factually incorrect, nonsensical, or unfaithful to the provided source context, despite presenting it confidently. This can arise from limitations in their training data, biases, or simply from the generative nature of the models, which prioritize fluency and coherence over strict factual accuracy. RAG was developed as a primary method to combat these hallucinations by grounding responses in external, verifiable information.
3.1.4. Attention Mechanism
The attention mechanism is a core component of transformer models, which are the backbone of most LLMs. It allows the model to weigh the importance of different parts of the input sequence when processing a specific part of the sequence. Instead of processing all input tokens equally, attention enables the model to focus on the most relevant tokens for a given task or context.
The most common form is self-attention, which calculates the attention weights between all pairs of tokens in a single input sequence. It works by computing three vectors for each token:
-
Query (Q): Represents the current token being processed.
-
Key (K): Represents all other tokens in the sequence.
-
Value (V): Represents the actual information content of all other tokens.
The
attention scorebetween aquerytoken and akeytoken indicates how muchattentionthequerytoken should pay to thekeytoken. These scores are then normalized (typically with asoftmaxfunction) to createattention weights, which are then used to compute a weighted sum of thevaluevectors.
The standard scaled dot-product attention formula is:
$
\mathrm{Attention}(Q, K, V) = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V
$
Where:
- is the matrix of queries.
- is the matrix of keys.
- is the matrix of values.
- is the transpose of the matrix of queries.
- is the dimension of the key vectors, used for scaling to prevent very large dot products that could push the
softmaxfunction into regions with tiny gradients. softmaxis an activation function that normalizes the scores into probabilities (weights) that sum to 1.
3.1.5. Multi-Head Attention
Multi-head attention is an extension of the attention mechanism where the query, key, and value vectors are projected multiple times (into different "heads") using different learned linear transformations. Each attention head then independently computes its own attention output. The outputs from all attention heads are then concatenated and linearly transformed again to produce the final output. This allows the model to capture different types of relationships or focus on different aspects of the input simultaneously, similar to how different filters in a Convolutional Neural Network (CNN) might detect different features. Different attention heads can specialize in different tasks, e.g., some might focus on syntactic relationships, others on semantic ones.
3.1.6. Supervised Fine-Tuning (SFT)
Supervised Fine-Tuning (SFT) is a common technique in LLM training where a pre-trained LLM is further trained on a smaller, task-specific dataset with labeled examples. The goal of SFT is to adapt the general knowledge of the pre-trained LLM to a specific downstream task (e.g., sentiment analysis, summarization, or, in this context, credibility-aware RAG). While powerful, SFT requires significant computational resources (GPUs) and a meticulously curated, high-quality labeled dataset for the specific task.
3.1.7. Causal Tracing
Causal tracing is a technique developed to understand the internal workings of neural networks, particularly transformers. It aims to quantify the causal contribution of specific internal states (e.g., hidden states, attention outputs) to the model's final output. It typically involves running the model twice: once normally to get a baseline probability for a target output, and once with noise injected into a specific internal state (or "intervention") to observe how the target output's probability changes. The difference in probabilities then quantifies the indirect effect (IE) or contribution of that internal state. It helps pinpoint which parts of the network are responsible for specific behaviors or predictions.
3.2. Previous Works
The paper contextualizes its work by reviewing existing research in misinformation detection and combating misinformation in RAG.
3.2.1. Misinformation Detection
- Non-LLM-based methods: These methods train models specifically to identify false or misleading information. Examples include using
BERT(Devlin et al. 2019) to score document credibility (Kaliyar, Goswami, and Narang 2021) orGraph Neural Networksfor detection (Vaibhav, Mandyam, and Hovy 2019). - LLM-based methods: More recent approaches leverage
LLMsthemselves, often without additional training, to assess credibility. For instance,GPT-4has been used for document credibility scoring (Pelrine et al. 2023), andLLM agentsfor iterative verification (Quelle and Bovet 2024). The CrAM paper adopts a similarLLM-based approach (usinggpt-3.5-turbo-0125) for generating credibility scores in its experimental setup.
3.2.2. Combating Misinformation in RAG
The paper acknowledges that RAG's vulnerability to misinformation has been identified (Zou et al. 2024; Pan et al. 2023b,a), leading to various mitigation strategies:
- Query Augmentation and Voting:
CAR(Weller et al. 2024) retrieves a larger set of documents and uses a voting mechanism to reduce misinformation impact. This approach, however, often involves multiple rounds of model inference, leading to inefficiency. - Independent Response Aggregation:
RobustRAG(Xiang et al. 2024) generatesLLMresponses for each document independently and then aggregates them usingkeyword-based ordecoding-based algorithms. Similar toCAR, this can be inefficient due to multiple inferences. - Supervised Fine-Tuning (SFT) with Credibility Scores: Several works (Hong et al. 2024; Pan et al. 2024) propose assigning
credibility scoresto retrieved documents and thenfine-tuning LLMsto understand and leverage these scores during generation. An example isCAG(Pan et al. 2024), which directly incorporates credibility scores and documents into prompts forfine-tuning. While effective,SFTmethods are resource-intensive and require specialized training data. - Knowledge Conflict Resolution: (Jin et al. 2024) involves training two
LLMs—one for truthful answers and one for misleading answers—to better distinguish conflicting information. This also falls underfine-tuningapproaches.
3.3. Technological Evolution
The evolution of LLM research related to factual accuracy can be broadly traced as follows:
- Early LLMs: Primarily focused on language generation and understanding, often suffering from
hallucinationsdue to reliance on internal, sometimes outdated or biased, parametric knowledge. - Emergence of RAG: To combat
hallucinationsand provide up-to-date information,RAGsystems were developed. These systems augmentLLMswith external knowledge retrieval capabilities, improving factual grounding. - Discovery of RAG's Vulnerability: Researchers soon identified that
RAGsystems are susceptible tomisinformation pollutionin their external corpora, turning a solution into a new problem. - Early Mitigation Strategies: Initial efforts focused on
misinformation detection(pre-filtering documents) orSFTmethods to teachLLMshow to weigh document credibility. However, pre-filtering risks losing relevant information, andSFTis resource-heavy. - CrAM's Contribution: This paper fits into the latest stage by proposing a
non-SFT,plug-and-playmethod that leverages the existingattention mechanismwithinLLMsto dynamically adjust document influence based oncredibility scores. This represents a more efficient and adaptable solution compared to priorSFT-based approaches.
3.4. Differentiation Analysis
Compared to the main methods in related work, CrAM offers distinct innovations:
-
Non-SFT (Plug-and-Play) vs. Supervised Fine-Tuning (SFT): The primary differentiation is CrAM's
plug-and-playnature, meaning it does not requirefine-tuningtheLLM. This stands in stark contrast toCAG(Pan et al. 2024), Hong et al. (2024), and Jin et al. (2024), which necessitate extensive computational resources and carefully curated training data forSFT. CrAM's approach is more practical for real-world deployment wherefine-tuningmay be prohibitive. -
Dynamic Influence Adjustment vs. Document Exclusion: Unlike "Exclusion" methods that discard documents below a certain
credibility threshold, CrAMdynamically scales down the influenceoflow-credibility documentswhile retaining them. This prevents the loss of potentially useful information that might be present alongside misinformation, as highlighted in the paper (Yoran et al. 2024). -
Targeted Attention Modification vs. Prompt Engineering: While "Prompt Based" methods attempt to inform the
LLMabout credibility via prompts, CrAM directly intervenes in theLLM's internalattention mechanism. By identifying and modifyinginfluential attention heads, CrAM offers a more granular and potentially more effective control over how theLLMprocesses document credibility, moving beyond surface-level prompt instructions. -
Efficiency vs. Multiple Inferences: Compared to methods like
CAR(Weller et al. 2024) andRobustRAG(Xiang et al. 2024) that require multiple rounds ofLLMinference or complex aggregation strategies, CrAM operates within a single inference pass by modifying theattention weightsof existingattention heads, making it more computationally efficient. -
Leveraging Internal LLM Structure: CrAM uniquely exploits the heterogeneous roles of different
attention headswithin thetransformer architecture. By identifyinginfluential headsthat contribute tomisinformation-induced errors, it applies modifications precisely where they are most impactful, a nuance not explicitly addressed by other methods that eitherfine-tunethe entire model or rely on external filtering.In essence, CrAM provides an efficient, adaptable, and targeted approach to
credibility-aware RAGby modifying theLLM's internalattentionprocess, circumventing the limitations ofSFTand brute-force document filtering.
4. Methodology
The CrAM method is designed to enable Large Language Models (LLMs) to automatically adjust their reliance on retrieved documents based on their credibility scores, particularly reducing the impact of low-credibility documents. It operates in a plug-and-play manner, meaning it does not require fine-tuning the LLM. The core idea revolves around modifying the attention weights of specific, influential attention heads within the LLM.
4.1. Principles
The fundamental principle behind CrAM is that not all parts of an LLM's attention mechanism contribute equally to processing information, especially when dealing with misinformation. Some attention heads might be more susceptible to misinformation or play a larger role in incorporating document information into the generated output. By identifying these influential attention heads and then dynamically scaling their attention weights based on the credibility scores of the documents, CrAM can selectively reduce the LLM's focus on low-credibility content without discarding it entirely. This targeted intervention aims to nudge the LLM towards generating more factual responses.
4.2. Core Methodology In-depth (Layer by Layer)
The CrAM methodology can be broken down into two main phases: Influential Head Identification and Attention Weight Modification, followed by its application in the CrAM Workflow.
4.2.1. Credibility-Aware RAG Formal Definition
The paper formally defines the objective of credibility-aware RAG. Given an LLM , a user query , and a set of relevant documents associated with credibility scores , the goal is to enable LLMs to automatically adjust the influence of these documents on the generated output based on their credibility scores . This is formally expressed as:
Where:
- : Represents the
Large Language Modelbeing used. - : Denotes the user query.
- : Is the set of retrieved documents relevant to the query .
- : Is the set of
credibility scorescorresponding to each document in . - : Represents the method or mechanism through which the
credibility scoresare integrated into theLLM's generation process. For CrAM, this involves theattention modification. - : Is a function that assesses the quality of the generated output, implicitly measuring how well
LLMsadjust to document credibility. In this work, theaccuracy of Question Answering (QA)tasks is used to approximate this metric, assuming that reduced impact fromlow-credibility documentsshould lead to higherQAaccuracy.
4.2.2. Attention Weight Modification
The attention weight modification is the core mechanism by which CrAM regulates the influence of documents. It directly manipulates the attention weights within the LLM based on the credibility scores of the input documents.
-
Tokenization and Credibility Score Normalization: First, the user query and the set of relevant documents are concatenated and tokenized into a single token sequence , where is the -th token. Each document has an associated
credibility score. To ensure these scores are suitable for scalingattention weights, they are normalized to a range of[0, 1]. For any token belonging to a document , its normalized credibility score is calculated as:Where:
- : The raw
credibility scoreof document to which token belongs. - : The minimum
credibility scoreamong all documents in the set . - : The maximum
credibility scoreamong all documents in the set . - If belongs to , its score is scaled to
[0, 1]based on the min-max range of all document scores. This ensures that a document with the lowest credibility gets a normalized score of 0, and the highest gets 1. otherwise: This condition applies to tokens that are part of the original query (not from the retrieved documents). Query tokens are always assigned a normalized score of 1, indicating they should always be fully attended to, as their credibility is implicitly assumed to be high (from the user). The resulting vector contains the normalizedcredibility scoresfor the entire token sequence.
- : The raw
-
Modified Attention Weight Matrix: For each
attention headin theLLM, let represent its originalattention weights matrix. This matrix typically has dimensions , where is the sequence length. Each row corresponds to theattention weightsfrom token to all other tokens in the sequence. To modify these weights, CrAM performs an element-wise multiplication with the normalizedcredibility scoresvector :Where:
-
: Represents the -th row vector of the original
attention weights matrixfor head . This vector indicates how much token attends to every other token in the sequence. -
: Denotes the element-wise multiplication (Hadamard product) of vectors. By multiplying with , the
attention weightfrom token to any other token is scaled by (the normalizedcredibility scoreof the document containing ). If is from alow-credibility document(low ), theattentionit receives is reduced. -
: Refers to normalization. After scaling, the sum of
attention weightsin each row might no longer be one. normalization re-normalizes each row vector so that its elements sum to one, maintaining the probabilistic interpretation ofattention weights. -
: The -th row vector of the modified
attention weights matrixfor head .The overall effect is that tokens from
low-credibility documentswill receive lowerattentionfrom other tokens (both query and document tokens), thereby diminishing their influence on theLLM's subsequent computations and generated output.
The following figure (Figure 2 from the original paper) illustrates the
CrAMmechanism:
该图像是示意图,展示了在检索增强生成(RAG)与信任度感知注意力修改(CrAM)中,如何处理不同文档对大型语言模型(LLM)输出的影响。上部分展示了RAG模型如何基于两个文档生成回答,强调了不准确的结果。下部分强调CrAM通过调整低可信文档的注意力权重,提升可信文档的影响力,从而更准确地产生回答。图中定义了不同的注意力头和对应的可信度得分。Fur lutiAM.CRAGAMrstte ea ndthe attention weights based on the credibility scores of each document.
-
4.2.3. Influential Head Identification
The paper notes that different attention heads have varying patterns and functions, and thus different impacts on the LLM's output. The goal here is to identify which attention heads are most responsible for incorporating misinformation and thus should be subject to attention weight modification. CrAM adapts causal tracing (Meng et al. 2022) for this purpose.
The contribution of an attention head is quantified using an indirect effect (IE) measure:
-
Baseline Probability Calculation (): Given an
LLM, a user query , and a set of documents that includes onemisinformation document(e.g., ), and an incorrect answer to that is supported by . The first step is to calculate the generation probability of this incorrect answer by theLLMwithout any modifications:Where denotes the probability of generating by
LLM. This represents theLLM's propensity to generate the incorrect answer when exposed to misinformation. -
Modified Probability Calculation (): Next, a specific
attention headis targeted for modification. Theattention weightsof only this head are modified using the method described in Section 4.2.2. For this specific step,credibility scoresare used, where themisinformation documentis assigned a score of 0 (lowest credibility), and all other documents are assigned a score of 1 (highest credibility). This simulates maximally suppressing themisinformation document's influence through head . Then, the generation probability of is recalculated with this modifiedLLM(denoted ):Where signifies the
LLMwith theattention weights matrixof head modified as per Equation (1). -
Quantifying Contribution (IE): The contribution of
attention headto generating the incorrect answer is then quantified as the difference between these two probabilities, known as theindirect effect (IE):A positive means that modifying head decreased the probability of generating the incorrect answer, indicating that head originally contributed to the generation of the incorrect answer by attending to the
misinformation. A larger positive implies a greater original contribution to the error.
To ensure robustness, this IE calculation is performed over a small, dedicated dataset (separate from test data) containing examples of misinformation leading to incorrect answers. The average IE for each attention head is then computed across this dataset. Attention heads are then ranked by their average IE in descending order, and the top-ranked ones are selected as influential attention heads.
4.2.4. CrAM Workflow
The overall CrAM workflow integrates these two components:
-
Offline Influential Head Identification:
- A small dataset containing
misinformation-polluted documents(where misinformation leads to incorrect answers) is used. - For each
attention headin theLLM, the averageIEis calculated as described above (Section 4.2.3). - All
attention headsare ranked by their averageIEin descending order. - The top-ranked heads are selected as the
influential attention headsthat will be modified during inference. The number of heads to select is a hyperparameter determined on a validation set.
- A small dataset containing
-
Online Inference with Attention Modification:
-
Given any user query, along with the retrieved documents and their
credibility scores. -
The
attention weightsof only the previously identified influential attention heads are modified using the method described in Section 4.2.2. -
The
LLMthen generates its final answer using these modifiedattention weights. This process aims to significantly reduce the impact oflow-credibility documentson the generated output.This workflow ensures that the costly
influential head identificationstep is performed only once offline, and theattention modificationduring online inference is efficient, targeting only the most relevantattention heads.
-
5. Experimental Setup
5.1. Datasets
The experiments are conducted on two widely used open-domain Question Answering (QA) datasets:
-
Natural Questions (NQ) (Kwiatkowski et al. 2019): This dataset consists of real user questions issued to Google search, paired with answers found in Wikipedia articles. It focuses on finding short or long answers directly from a provided document.
-
TriviaQA (Joshi et al. 2017): This dataset contains questions from trivia and quiz-league websites, paired with evidence documents from Wikipedia and other web sources. It is known for its complex questions and requires aggregating information from multiple sources.
These datasets are well-suited for evaluating
RAGperformance because they requireLLMsto retrieve and synthesize information from external documents to answer questions. They are also standard benchmarks in theQAfield, allowing for fair comparison with other methods.
5.1.1. Document Preparation
The paper carefully prepares both high-credibility and low-credibility (misinformation) documents to evaluate the robustness of the proposed method.
-
High-credibility documents: These are collected by retrieving relevant documents from an external corpus.
bge-large-en-v1.5is used as an embedding model to retrieve an initial set of candidate documents from a Wikipedia dump (specifically, December 30, 2018, as used in Karpukhin et al. 2020).bge-reranker-largeis then applied to rank these candidates, and the top four documents are selected ashigh-credibilityinputs. This ensures that these documents are genuinely relevant and factually accurate according to a reliable source.
-
Low-credibility documents (Misinformation): These are specifically generated to contain misinformation.
gpt-3.5-turbo-0125is used to generate these documents.- The
LLMis prompted to create news-style pieces that contain misinformation supporting an incorrect answer to a given question. - For each question, three distinct
low-credibility documentsare generated, all supporting the same incorrect answer. This controlled generation allows for precise study of misinformation impact. - Example of misinformation (from the paper's discussion): "The first person to win the Nobel Prize in Physics was not Roentgen, but Einstein." This type of misinformation includes both an incorrect assertion and a denial of correct information.
In-context corpus composition:
Instead of directly injecting low-credibility documents into the entire RAG corpus, the study combines generated low-credibility documents with retrieved high-credibility documents for the LLM's input. This approach, referred to as 4 high + 1 low (e.g., four high-credibility documents plus one low-credibility document), provides granular control over the amount of misinformation and allows for a more focused evaluation of its impact.
5.2. Evaluation Metrics
For Question Answering (QA) tasks, the paper employs two standard metrics: Exact Match (EM) and F1 Score.
5.2.1. Exact Match (EM)
- Conceptual Definition:
Exact Match (EM)is a strict metric that measures whether theLLM's generated answer is identical to one of the ground-truth answers. It is case-insensitive and ignores leading/trailing whitespace and common punctuation. It indicates whether the model can produce a perfectly correct answer. - Mathematical Formula:
$
\mathrm{EM} = \frac{1}{N} \sum_{i=1}^{N} \mathbb{I}(\text{predicted_answer}_i \in \text{gold_answers}_i)
$
Where:
- : The total number of questions in the dataset.
- : The answer generated by the model for question .
- : The set of acceptable ground-truth answers for question .
- : The indicator function, which returns 1 if the condition inside is true, and 0 otherwise.
- Symbol Explanation: For each question, if the predicted answer perfectly matches any of the ground-truth answers, it scores 1; otherwise, it scores 0. The
EMscore is the average of these binary scores across all questions.
5.2.2. F1 Score
- Conceptual Definition: The
F1 Scoreis a more lenient metric thanEM. It treats both the predicted answer and the ground-truth answer as "bags of words" (sets of tokens) and calculates the overlap between them. It is the harmonic mean ofPrecisionandRecall, wherePrecisionmeasures how many of the predicted tokens are correct, andRecallmeasures how many of the correct tokens were captured by the prediction. It is particularly useful when answers can be phrased in multiple ways or contain partial correct information. - Mathematical Formula:
$
\mathrm{F1} = \frac{1}{N} \sum_{i=1}^{N} \frac{2 \cdot \mathrm{Precision}_i \cdot \mathrm{Recall}_i}{\mathrm{Precision}_i + \mathrm{Recall}_i}
$
Where:
- : The total number of questions.
- For each question ,
PrecisionandRecallare calculated based on the overlap of tokens between the predicted answer and the best matching ground-truth answer (if multiple exist).
- Symbol Explanation: The
F1 ScorebalancesPrecision(avoiding false positives) andRecall(avoiding false negatives). A higherF1 Scoreindicates a better balance between correctly identifying relevant tokens and not including irrelevant ones.
5.3. Credibility Scores Generation
The paper uses two methods to assign credibility scores to documents:
-
Ideal Setting: This represents a perfect scenario where
credibility scoresare known definitively.High-credibility documentsare assigned a score of 10.Low-credibility documents(containing misinformation) are assigned a score of 1. This binary-like assignment (after normalization, they become 1 and ~0 respectively) allows for clear evaluation of the method's potential under optimal conditions.
-
GPT Setting: This simulates a more realistic scenario where
credibility scoresare estimated by anotherLLM.gpt-3.5-turbo-0125is employed to directly generate acredibility scorefor each document.- Prompts are designed to instruct
GPTto provide these scores. The distribution of theseGPT-generated scoresis provided in Appendix C of the original paper, showing a more continuous and less binary range of scores.
5.4. Baselines
The CrAM model is compared against four types of methods to assess its performance:
- Naive RAG: This is the standard
RAGpipeline. It simply retrieves documents and feeds them to theLLMwithout any mechanisms to account for or combat misinformation. It serves as a strong baseline to show the performance degradation when misinformation is present. - Prompt Based: This is a
non-SFTmethod that attempts to inform theLLMabout document credibility by including thecredibility scoresdirectly in the prompt alongside the documents. TheLLMis then expected to implicitly adjust its behavior based on these prompt instructions. - Exclusion: This is another
non-SFTmethod where documents withcredibility scoresbelow a certain threshold are completely removed before being fed to theLLM. This method is not compared in theIdeal Settingbecause the binary nature of ideal scores (10 or 1) would make thresholding trivial (e.g., threshold > 1 would discard all misinformation, making it an idealRAGscenario without misinformation). - CAG (Credibility-Aware Generation): Proposed by Pan et al. (2024),
CAGis anSFT-based method. It directly incorporatescredibility scoresand documents into prompts and thenfine-tunesanLLM(specifically, Llama2-13B in their work) to explicitly learn how to leverage these scores for better understanding and generation in the presence of misinformation. This serves as a comparison againstfine-tuningapproaches.
5.5. Hyperparameters
- Data points for IE calculation: 100 randomly selected data points from each dataset are used to calculate the average
Indirect Effect (IE)for allattention headsduring theinfluential head identificationphase. - Validation set for head selection: Another validation set of 100 data points from each dataset is used to determine the optimal number of top-ranked
influential attention headsto include in the final modified set. This helps tune theCrAMmodel's configuration.
6. Results & Analysis
6.1. Core Results Analysis
The experiments aimed to evaluate CrAM's effectiveness in mitigating misinformation in RAG across different LLMs and settings.
6.1.1. Comparison with Non-SFT Methods
The following are the results from Table 1 and Table 2 of the original paper, comparing CrAM with other non-SFT methods in both Ideal and GPT credibility settings. The common experimental setup is , meaning four high-credibility documents and one low-credibility document.
The following are the results from Table 1 of the original paper:
| Model | In-context corpus | Method | NQ | TriviaQA | ||
|---|---|---|---|---|---|---|
| EM | F1 score | EM | F1 score | |||
| Qwen1.5-7B | 0√ | Naive LLM | 7.20 | 16.41 | 28.00 | 38.23 |
| 4√ | Naive RAG | 27.60 | 39.08 | 55.30 | 66.85 | |
| 4√+1x | Naive RAG | 10.50 | 20.71 | 25.00 | 35.63 | |
| Prompt Based CrAM | 12.20 29.10 (+16.90) |
22.26 41.02 (+18.76) |
27.40 52.90 (+25.50) |
37.98 64.16 (+26.18) |
||
| Llama2-13B | 0√ | Naive LLM | 20.30 | 28.59 | 50.40 | 57.56 |
| 4√ | Naive RAG | 28.90 | 39.98 | 62.50 | 71.03 | |
| 4√+1x | Naive RAG | 11.90 | 19.97 | 28.00 | 36.22 | |
| Prompt Based CrAM |
12.50 33.60 (+21.10) |
22.94 44.62 (+21.68) |
23.10 59.90 (+31.90) |
32.70 67.11 (+30.89) |
||
| Llama3-8B | 0√ | Naive LLM | 20.60 | 30.58 | 55.70 | 62.67 |
| 4√ | Naive RAG | 33.10 | 45.66 | 64.30 | 73.68 | |
| 4√+1x | Naive RAG | 16.00 | 26.16 | 36.80 | 47.09 | |
| Prompt Based CrAM |
29.90 36.90 (+7.00) |
39.69 48.45 (+8.76) |
53.50 64.40 (+10.90) |
63.01 73.49 (+10.48) |
||
The following are the results from Table 2 of the original paper:
| Model | In-context corpus | Method | NQ | TriviaQA | ||
|---|---|---|---|---|---|---|
| EM | F1 score | EM | F1 score | |||
| Qwen1.5-7B | 0√ | Naive LLM | 7.20 | 16.41 | 28.00 | 38.23 |
| 4√ | Naive RAG | 27.60 | 39.08 | 55.30 | 66.85 | |
| 4√+1x | Naive RAG | 10.50 | 20.71 | 25.00 | 35.63 | |
| Prompt Based | 12.50 | 22.98 | 29.70 | 40.18 | ||
| Exclusion | 21.60 | 32.56 | 49.50 | 61.03 | ||
| CrAM | 23.10 (+1.50) | 34.84 (+2.28) | 52.10 (+2.60) | 63.76 (+2.73) | ||
| Llama2-13B | 0√ | Naive LLM | 20.30 | 28.59 | 50.40 | 57.56 |
| 4√ | Naive RAG | 28.90 | 39.98 | 62.50 | 71.03 | |
| 4√+1x | Naive RAG | 11.90 | 19.97 | 28.00 | 36.22 | |
| Prompt Based | 11.20 | 21.62 | 20.50 | 30.09 | ||
| Exclusion | 23.70 | 34.00 | 54.40 | 62.37 | ||
| CrAM | 25.10 (+1.40) | 35.56 (+1.56) | 56.20 (+1.80) | 64.03 (+1.66) | ||
| Llama3-8B | 0√ | Naive LLM | 20.60 | 30.58 | 55.70 | 62.67 |
| 4√ | Naive RAG | 33.10 | 45.66 | 64.30 | 73.68 | |
| 4√+1x | Naive RAG | 16.00 | 26.16 | 36.80 | 47.09 | |
| Prompt Based | 24.20 | 34.10 | 49.50 | 58.59 | ||
| Exclusion | 26.60 | 38.44 | 57.70 | 67.33 | ||
| CrAM | 30.70 (+4.10) | 41.71 (+3.27) | 62.20 (+4.50) | 70.70 (+3.37) | ||
Observations:
- Significant Gains over Baselines: In both
IdealandGPTsettings, CrAM consistently and significantly outperformsNaive RAGandPrompt Basedmethods across allLLMs(Qwen1.5-7B, Llama2-13B, Llama3-8B) and datasets (NQ, TriviaQA). For instance, in theIdeal Settingon TriviaQA, CrAM with Llama2-13B achieves a remarkable 31.90% increase inEMoverPrompt Based. - Effectiveness with Realistic Scores: Even with
GPT-generated credibility scores(a more realistic scenario), CrAM maintains its superiority overNaive RAGandPrompt Based, showing its practical applicability. It also outperformsExclusionwhich discards documents, demonstrating the benefit of nuanced attention adjustment over hard filtering. - Surpassing : Notably, under the
Ideal Settingwith documents, CrAM's performance sometimes exceeds that ofNaive RAGwith (no misinformation). This counter-intuitive result is explained by the generated misinformation sometimes containing denials of correct information, allowingLLMsto reuse the correct information after CrAM suppresses the misleading denial. This highlights CrAM's ability to effectively neutralize misinformation while still allowing theLLMto extract truth.
6.1.2. Comparison with SFT-based Method
The following figure (Figure 3 from the original paper) presents the performance comparison between CrAM and the SFT-based CAG-13B model, regarding the varying number of low-credibility documents under the ideal setting.

该图像是图表,展示了在不同数量的误导性文档下,CrAM和CAG 13B在Natural Questions(NQ)和TriviaQA任务上的F1得分变化。左侧为NQ的结果,右侧为TriviaQA的结果,随着误导性文档数量的增加,F1得分呈下降趋势。
Figure 3: Performance comparison of CrAM and CAG-13B regarding the varying number of documents containing misinformation under ideal setting.
Observations:
- Consistent Outperformance: CrAM (specifically, Llama2-13B based CrAM) consistently and remarkably outperforms
CAG-13B(which is also Llama2-13B based) in terms ofF1 Scoreacross both NQ and TriviaQA datasets, even as the number oflow-credibility documentsincreases from 1 to 3. - Efficiency and Effectiveness: This finding is crucial because
CAGrequiressupervised fine-tuning, which is computationally expensive and data-intensive. CrAM, being anon-SFTmethod, achieves superior results without these overheads, demonstrating its efficiency and effectiveness.
6.2. Ablation Studies / Parameter Analysis
6.2.1. Effect of Number of Low-credibility Documents
The paper investigates how varying the quantity of misinformation affects CrAM's performance. The following figure (Figure 4 from the original paper) shows the performance change on NQ regarding the varying number of documents with misinformation.

该图像是一个对比图,展示了在理想设置和GPT设置下,CrAM、基于提示和天真的RAG在处理包含误信息的文档数量时的表现。图中显示不同设置下的EM值随误信息文档数量变化的趋势,同时标注了各方法的表现差异。
Figure 4: Performance change on NQ regarding the varying number of documents with misinformation.
Observations:
- Robustness to Misinformation Load: CrAM consistently outperforms
Prompt BasedandNaive RAGas the number oflow-credibility documents(1xto3x) increases, in bothIdealandGPTsettings. - Smaller Performance Drop: CrAM exhibits a significantly smaller performance degradation compared to other models when more
low-credibility documentsare introduced. This demonstrates CrAM's robustness and scalability in handling increasing amounts of misinformation.
6.2.2. Effect of Dataset Size on Attention Heads Selection
The process of identifying influential attention heads uses a small subset of the data. The following figure (Figure 5 from the original paper) shows the performance on NQ and TriviaQA regarding the dataset size for determining the influential attention head changes.

该图像是图表,展示了在不同数据集大小下,CrAM方法在自然问题(NQ)和TriviaQA上的EM得分。左侧图表显示了NQ的结果,右侧图表显示了TriviaQA的结果。随着数据集大小的增加,CrAM在这两个数据集上的EM得分保持相对稳定。
Figure 5: Performance on NQ and TriviaQA regarding the dataset size for determining the influential attention head changes.
Observations:
- Minor Impact of Dataset Size: While there are minor fluctuations in performance, the overall impact of the number of data points used for
influential head identificationis not substantial (maximum difference of 4% inEM). This indicates that CrAM's head selection mechanism is relatively stable and does not require a massive dataset, contributing to its efficiency.
6.2.3. Analysis on Number of Selected Attention Heads
The selection of influential attention heads is a critical component of CrAM. The following figure (Figure 6 from the original paper) shows the performance on NQ in ideal setting regarding the varying number of selected attention heads.

该图像是一个图表,展示了在自然问答(NQ)数据集上,不同数量的修改最高排名注意力头对EM(Exact Match)得分的影响。随着修改的头数增加,EM得分呈现一定波动,最高达到约0.35。
Figure 6: Performance on NQ in ideal setting regarding the varying number of selected attention heads.
Observations:
-
Sensitivity at Extremes: The model's performance (
EM) drops sharply when very few or allattention headsare selected for modification. This suggests that a targeted approach is necessary, as modifying too few might miss criticalinfluential heads, and modifying all might interfere with beneficialattention patterns. -
Stable Performance in Mid-Range: There's a relatively stable performance range when a moderate number of
attention headsare selected. This implies that only a subset of heads are trulyinfluentialfor misinformation handling, and once these are covered, additional modifications have diminishing returns or even negative impacts.To understand why this happens, the paper analyzes the distribution of
Indirect Effect (IE)values for allattention heads. The following figure (Figure 7 from the original paper) shows the density distribution of IE of all the attention heads in Llama3-8B.
该图像是一个密度分布图,展示了IE值的分布情况。横轴表示IE值,纵轴表示密度,该图形中展现了IE值在不同区间的分布趋势,主要集中在接近0的区域。
Figure 7: Density distribution of IE of all the attention heads in Llama3-8B.
Observations:
- Normal-like Distribution of IE: The density distribution of
IEvalues (contributions to incorrect answers) approximates a normal distribution centered around 0. - Sparse Influence: The majority of
attention headshaveIEvalues concentrated near 0, meaning most heads have a minor impact on whether misinformation leads to an incorrect answer. Only heads withIEvalues significantly far from zero (either positive or negative) have a substantial impact. This supports the rationale forselective attention modification, as only a fewinfluential headsneed to be targeted.
6.2.4. Ablation Study
To validate the design choices of CrAM, an ablation study is conducted. The following are the results from Table 3 of the original paper:
| Model | Method | NQ | TriviaQA | ||
|---|---|---|---|---|---|
| EM | EM | ||||
| Qwen1.5-7B | CrAM | 29.10 | 52.90 | ||
| CrAM-all | 27.20 (-1.90) | 50.60 (-2.30) | |||
| Naive RAG | 10.50 (-18.60) | 25.00 (-27.90) | |||
| Llama2-13B | CrAM | 33.60 | 59.90 | ||
| CrAM-all | 29.50 (-4.10) | 59.50 (-0.40) | |||
| Naive RAG | 11.90 (-21.70) | 28.00 (-27.90) | |||
| Llama3-8B | CrAM | 36.90 | 64.40 | ||
| CrAM-all | 22.40 (-14.50) | 51.50 (-12.90) | |||
| Naive RAG | 16.00 (-20.90) | 36.80 (-27.60) | |||
Variants:
- CrAM-all: This variant removes the
influential head identificationstep and appliesattention weight modificationto allattention headsin theLLM. - Naive RAG: This is equivalent to disabling the
attention weight modificationmechanism in CrAM entirely.
Observations:
-
Necessity of Influential Head Selection:
CrAM-allshows noticeable performance drops compared to the full CrAM model across allLLMsand datasets. For Llama3-8B, the decrease is substantial (e.g., 14.5% on NQ). This empirically validates the importance of identifying and targeting only theinfluential attention heads, supporting the idea that indiscriminate modification can harm performance. -
Necessity of Attention Weight Modification: Disabling the
attention weight modification(i.e.,Naive RAG) leads to a dramatic performance drop (e.g., over 27.5% on TriviaQA for all threeLLMs) compared to CrAM. This strongly confirms that dynamically adjustingattention weightsbased oncredibility scoresis crucial for combating misinformation.In summary, the ablation study conclusively demonstrates that both components of CrAM—the
identification of influential attention headsand thecredibility-aware attention weight modification—are essential for its superior performance.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper introduces CrAM (Credibility-aware Attention Modification), a novel and effective plug-and-play method designed to combat misinformation within Retrieval-Augmented Generation (RAG) systems for Large Language Models (LLMs). CrAM addresses the critical challenge of misinformation pollution in external documents without requiring extensive fine-tuning. Its core contribution lies in its two-stage approach: first, it identifies influential attention heads within the LLM that are most susceptible to misinformation using a modified causal tracing technique; second, it modifies the attention weights of these specific heads based on the credibility scores of retrieved documents, thereby reducing the impact of low-credibility information.
Extensive experiments on Natural Questions (NQ) and TriviaQA datasets, utilizing Llama2-13B, Llama3-8B, and Qwen1.5-7B, demonstrate CrAM's significant efficacy. It improves Exact Match (EM) performance by over 20% compared to vanilla RAG, even outperforming supervised fine-tuning (SFT)-based methods like CAG. The method also exhibits robustness to varying amounts of misinformation and sensitivity analyses confirm the importance of its targeted attention modification strategy. CrAM offers an efficient and practical solution for enhancing the trustworthiness of LLMs in RAG settings.
7.2. Limitations & Future Work
The paper does not include a dedicated "Limitations" section, but some points can be inferred:
- Reliance on Credibility Estimators: CrAM's effectiveness inherently relies on the quality of the
credibility scoresprovided. If thecredibility estimator(e.g.,GPT-3.5in theGPT Setting) is inaccurate or biased, CrAM's performance would be compromised. The paper shows performance differences between theIdealandGPTsettings, highlighting this dependency. Future work could focus on improving robust and reliablecredibility scoregeneration methods. - Computational Cost of Influential Head Identification: While
plug-and-playfor inference, theinfluential head identificationstep involves calculatingIndirect Effects (IEs)for allattention headsover a dataset. Although performed offline and on a small dataset, this process still requires computational resources and specific data containing misinformation-answer pairs. Optimizing this identification process or making it more adaptive could be a direction. - Generalizability of Influential Heads: The paper implicitly assumes that the
influential attention headsidentified forQAtasks on NQ and TriviaQA (and with specific misinformation types) are generalizable across different tasks,LLMvariants, and types of misinformation. Further investigation into the task-specificity or domain-specificity of theseinfluential headscould be valuable. - Understanding
Attention HeadSpecialization: While the paper leverages the idea that differentattention headshave different functions, a deeper, more mechanistic understanding of why certain heads becomeinfluentialin propagating misinformation could lead to more sophisticated and potentially model-agnostic intervention strategies.
7.3. Personal Insights & Critique
The CrAM paper presents an elegant and practically significant solution to a pressing problem in LLM deployment.
-
Elegance of the Solution: The idea of
credibility-aware attention modificationis intuitively appealing. Rather than complexfine-tuningor blunt document discarding, directly manipulating theLLM's internalattentionmechanism to reflect externalcredibility scoresis a smart and targeted approach. It respects the existingLLMarchitecture while adding a crucial layer of control. Theplug-and-playnature is a huge advantage for real-world applications. -
Leveraging Interpretability Research: The work cleverly builds upon prior research into
attention headinterpretability andcausal tracing. By identifyinginfluential heads, CrAM avoids modifying the entire network, leading to efficiency and potentially preserving other beneficial behaviors of theLLM. This demonstrates a valuable synergy betweenLLMinterpretability and robustness research. -
Nuanced Handling of Misinformation: The ability to scale down influence rather than simply discard documents is a key strength.
Misinformationis rarely black and white; a document might contain accurate information alongside a misleading claim. CrAM's method allows theLLMto potentially still extract value from the credible parts of a document while downplaying the unreliable parts. This is supported by the observation that CrAM with misinformation can sometimes outperformNaive RAGwithout misinformation, implying effective suppression and extraction. -
Potential for Broader Application: The core idea of modifying
attentionbased on external signals has broader implications beyondcredibility. One could imagine similar mechanisms forsarcasm detection,sentiment weighting, orsource reliabilityin otherNLPtasks. For instance, if anLLMis processing text from multiple sources, some known to be biased, a similarattention modificationcould be applied. -
Critique on
IECalculation: Whilecausal tracingis a powerful tool, its application in determiningIErelies on specific choices (e.g., assigning 0 to misinformation and 1 to others for calculating ). The robustness of thisIEcalculation to different types of misinformation, differentLLMarchitectures, or varying baseline credibilities could be further explored. Also, the choice of "incorrect answer supported by misinformation" forIEcalculation is specific; how would CrAM behave if misinformation leads to a subtly biased but not strictly "incorrect" answer? -
Real-world
Credibility ScoreGeneration: TheGPT Settingis a step towards realism, but real-worldcredibility scoringis an active research area with its own challenges (e.g., bias inGPTitself, scalability, domain specificity). The practical success of CrAM will largely depend on the advancements in generating thesecredibility scoresaccurately and efficiently.Overall, CrAM presents a compelling and practical advancement in making
RAGsystems more resilient tomisinformation, marking a significant step towards more trustworthyAIsystems.
Similar papers
Recommended via semantic vector search.