Paper status: completed

Parameter-Aware Contrastive Knowledge Editing: Tracing and Rectifying based on Critical Transmission Paths

Published:01/01/2025
Original Link
Price: 0.100000
2 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This paper introduces critical transmission paths for knowledge editing in large language models, enhancing parameter updates by capturing key information flows. The proposed parameter-aware contrastive rectifying algorithm improves editing performance, validated across multiple

Abstract

Large language models (LLMs) have encoded vast amounts of knowledge in their parameters, but the acquired knowledge can sometimes be incorrect or outdated over time, necessitating rectification after pre-training. Traditional localized methods in knowledge-based model editing (KME) typically assume that knowledge is stored in particular intermediate layers. However, recent research suggests that these methods do not identify the optimal locations for parameter editing, as knowledge gradually accumulates across all layers in LLMs during the forward pass rather than being stored in specific layers. This paper, for the first time, introduces the concept of critical transmission paths into KME for parameter updating. Specifically, these paths capture the key information flows that significantly influence the model predictions for the editing process. To facilitate this process, we also design a parameter-aware contrastive rectifying algorithm that considers less important paths as contrastive examples. Experiments on two prominent datasets and three widely used LLMs demonstrate the superiority of our method in editing performance.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

The title of the paper is "Parameter-Aware Contrastive Knowledge Editing: Tracing and Rectifying based on Critical Transmission Paths". The central topic is knowledge editing in Large Language Models (LLMs), specifically focusing on identifying and modifying crucial information flow paths within the model's parameters to update knowledge effectively.

1.2. Authors

The authors are Songlin Zhai, Yuan Meng, Yuxin Zhang, and Guilin Qi. They are affiliated with the School of Computer Science and Engineering, Southeast University, China. Their research backgrounds appear to be in areas related to natural language processing, large language models, and knowledge editing, given the subject matter of the paper.

1.3. Journal/Conference

The paper is published at ICLR (International Conference on Learning Representations), specifically in the 2025 edition, as indicated by "In ICLR." ICLR is a highly prestigious and influential conference in the fields of artificial intelligence, machine learning, and deep learning. Its publication status signifies that the work has undergone rigorous peer review and is recognized as a significant contribution to the field.

1.4. Publication Year

The paper was published in 2025.

1.5. Abstract

Large Language Models (LLMs) store extensive knowledge within their parameters, but this knowledge can become incorrect or outdated, necessitating updates post-pre-training. Traditional Knowledge-based Model Editing (KME) methods often assume knowledge is localized to specific intermediate layers. However, recent findings suggest that knowledge accumulates across all layers during the forward pass, making layer-specific editing suboptimal. This paper introduces, for the first time, the concept of critical transmission paths into KME. These paths represent key information flows that significantly impact model predictions for the editing task. To facilitate this, the authors design a parameter-aware contrastive rectifying algorithm that uses less important paths as contrastive examples. Experimental results on two datasets and three LLMs demonstrate the method's superior editing performance.

The original source link is /files/papers/694b526e769f2826079b70f1/paper.pdf. Given the Published at (UTC): 2025-01-01T00:00:00.000Z and the ICLR 2025 publication, it is likely that this is a pre-print version or an accepted paper ahead of its official conference publication.

2. Executive Summary

2.1. Background & Motivation

The core problem this paper aims to solve is the effective and efficient rectification of outdated or incorrect knowledge within Large Language Models (LLMs) without adversely affecting other stored knowledge. LLMs, serving as vast knowledge repositories, inevitably acquire erroneous or time-sensitive information during their pre-training from massive corpora. This knowledge can become incorrect (e.g., factual inaccuracies) or outdated (e.g., changes in real-world facts).

This problem is critical because LLMs are becoming cornerstones of Natural Language Processing (NLP) and are increasingly deployed in real-world applications where factual accuracy and currency are paramount. Relying on LLMs with incorrect information can lead to unreliable outputs, propagate misinformation, and erode user trust.

Existing approaches, particularly in Knowledge-based Model Editing (KME), often suffer from several limitations:

  1. Computational Cost: Fine-tuning the entire LLM for knowledge updates is computationally expensive and can lead to overfitting or catastrophic forgetting (where new knowledge replaces old, unrelated knowledge).

  2. Suboptimal Localization: Traditional KME methods often rely on causal tracing to identify "localized" parameters in specific intermediate layers (e.g., FFN layers) assumed to store the knowledge. However, recent research (Hase et al., 2023) indicates that these localized results do not statistically correlate with optimal intervention points. Knowledge in LLMs is distributed and accumulates across all layers during the forward pass, not just specific ones. Focusing on a narrow range of layers might miss crucial parameters or lead to suboptimal editing.

  3. Entangled Knowledge: Knowledge in LLMs is highly entangled, making it challenging to modify a specific piece of information without inadvertently altering unrelated knowledge (a phenomenon known as locality degradation).

    The paper's entry point and innovative idea revolve around addressing the suboptimal localization issue. Instead of layer-based localization, it proposes introducing critical transmission paths into KME. These paths are defined as specific sequences of model parameters and connections across all layers that describe the information accumulation process from input to output. By identifying and editing these critical paths, the method aims to target knowledge updates more precisely and effectively.

2.2. Main Contributions / Findings

The paper's primary contributions are:

  1. Introduction of Critical Transmission Paths: For the first time in KME, the concept of critical transmission paths is introduced for parameter updating. This addresses the limitations of traditional layer-based localization by acknowledging that knowledge accumulates across all layers. The authors develop a perturbation-based path importance estimation method to identify these paths, which capture key information flows significantly influencing model predictions. A parameter packing strategy is proposed to reduce the search space for these paths.

  2. Parameter-Aware Contrastive Rectification Algorithm: A novel algorithm is designed to improve how parameters are rectified. It treats identified critical paths as positive examples (parameters needing updates) and insignificant paths as negative examples (parameters that should not be modified for the current edit). This contrastive loss aims to enhance editing effectiveness by demonstrating the consequences of improper rectifications and ensuring unrelated knowledge remains undisturbed.

  3. Superior Editing Performance and Efficiency: Extensive experiments on two prominent datasets (ZsRE and CounterFact) and three widely used LLMs (GPT-J (6B), Llama2 (7B), Llama3 (8B)) demonstrate that the proposed method significantly outperforms nine strong baselines across most evaluation metrics (Efficacy, Generality, Locality), especially under challenging consecutive editing scenarios. It also shows competitive editing efficiency.

    Key conclusions and findings reached by the paper include:

  • Knowledge is Distributed: The analysis of critical paths confirms that all hidden layers contribute to knowledge editing, highlighting the limitation of prior methods that focus only on specific layers.

  • Middle Layers are More Influential: While all layers contribute, nodes in middle layers (e.g., 4-18 in Llama 3 8B) exert a stronger influence, suggesting prioritization during updates.

  • Contrastive Loss Enhances Efficacy: The parameter-aware contrastive rectification significantly improves editing Efficacy without compromising Generality or Locality, by helping the model focus on relevant knowledge and preserve unrelated information.

  • Trade-offs in Path Size: There's a balance to be struck in the size of critical transmission paths (T+\lvert \mathcal{T}^+ \rvert). Too few paths might limit editing success, while too many can introduce irrelevant information and degrade Locality.

    These findings collectively solve the problem of suboptimal knowledge localization and ineffective rectification in LLMs by providing a more holistic and precise mechanism for updating factual knowledge, leading to improved consistency, accuracy, and efficiency in LLM behavior.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand this paper, a beginner should be familiar with the following foundational concepts:

  • Large Language Models (LLMs): These are advanced artificial intelligence models, typically based on the transformer architecture, trained on vast amounts of text data. They are capable of understanding, generating, and processing human language for a wide range of tasks like translation, summarization, question-answering, and creative writing. Examples include GPT-J, Llama, GPT-3/4, etc. They "encode" vast amounts of knowledge (facts, common sense, linguistic patterns) within their internal parameters during pre-training.

  • Knowledge-based Model Editing (KME): This is a field focused on modifying specific pieces of factual knowledge stored within a pre-trained LLM without retraining the entire model. The goal is to correct errors, update outdated information, or inject new facts while ensuring two key properties:

    • Efficacy: The model successfully learns the new or corrected knowledge.
    • Generality: The updated knowledge generalizes to paraphrases or related queries.
    • Locality: The update only affects the target knowledge and does not inadvertently change other, unrelated knowledge.
  • Parameters: In machine learning, parameters are the internal variables that a model learns from data. For LLMs, these are typically the weights and biases of the neural network connections. These parameters collectively determine the model's behavior and the knowledge it has encoded.

  • Forward Pass: This refers to the process where an input (e.g., a text prompt) is fed through the layers of a neural network, and computations are performed at each layer, transforming the input until an output (e.g., a predicted word) is generated. During this process, information accumulates or is transformed layer by layer.

  • Transformer Architecture: The dominant architecture for LLMs. It consists of encoder and decoder blocks (or decoder-only for generative LLMs). Key components within each block include:

    • Self-Attention Mechanism: Allows the model to weigh the importance of different parts of the input sequence when processing each word.
    • Feed-Forward Networks (FFNs): Position-wise fully connected neural networks applied independently to each position in the sequence, after the self-attention mechanism. These are crucial for learning complex patterns and are often considered storage sites for factual knowledge.
  • Cross-Entropy Loss: A common loss function used in classification tasks (like predicting the next token in an LLM). It measures the performance of a classification model whose output is a probability value between 0 and 1. It quantifies the difference between the true probability distribution (the target label) and the predicted probability distribution (the model's output). A lower cross-entropy loss indicates a more accurate prediction.

  • Causal Tracing: A technique used in KME (e.g., by ROME) to identify which parts of an LLM are causally responsible for a specific factual prediction. It typically involves ablation (removing or perturbing parts of the model) and observing the change in output to pinpoint influential neurons or layers.

3.2. Previous Works

The paper primarily discusses parameter-modified KME methods, particularly localization-based approaches. Here's a summary of key prior studies mentioned and their core ideas:

  • Traditional Localized Methods (e.g., ROME, MEMIT):

    • Core Idea: These methods assume that specific factual knowledge is stored in localized parameters, often within the Feed-Forward Networks (FFNs) of particular intermediate layers. They use techniques like causal tracing to identify these layers or neurons.
    • ROME (Meng et al., 2022): Stands for Rank-one Model Editing. It uses causal mediation analysis to pinpoint which FFN layer is most causally responsible for a factual recall. It then applies a rank-one update to the weights of that specific FFN layer to change the factual association. The update is calculated to directly move the model's output to the desired new fact.
      • Background for Beginners: Imagine you want to change "The capital of France is Lyon" to "The capital of France is Paris." ROME tries to find where in the model the "capital of France" knowledge is processed and then directly modifies those weights to output "Paris" instead of "Lyon."
    • MEMIT (Meng et al., 2023): Builds upon ROME but allows for mass-editing, meaning updating multiple facts simultaneously. It extends the rank-one update idea to multiple FFN layers and multiple facts, using a least-squares aggregation approach to combine individual edits.
      • Background for Beginners: MEMIT is like ROME but can handle many updates at once, making it more practical for larger-scale knowledge modifications.
    • Limitations (as highlighted by this paper):
      • Suboptimal Localization: Research by Hase et al. (2023) suggests that the layers identified by causal tracing don't always correspond to the optimal layers for intervention. Causal effects might be largest in early layers (e.g., 4-6 layers in GPT-J), overlooking later layers (e.g., 16-20 layers in GPT-J) which also hold critical information.
      • Distributed Knowledge: The underlying assumption of knowledge being localized to specific layers is challenged. Knowledge is distributed and accumulates across all layers during the forward pass.
  • Parameter-Preserved Methods: These approaches typically avoid directly modifying the pre-trained LLM's parameters. Instead, they might:

    • Use external memories (e.g., SERAC by Mitchell et al., 2022b).
    • Employ in-context learning (e.g., Zheng et al., 2023).
    • Alter the LLM's representation space.
    • Add extra parameters (e.g., adaptors, such as Key-Value adaptors by Hartvigsen et al., 2023).
  • Fine-tuning: The most intuitive but also most costly method. It involves continuing the training process on new data containing the updated knowledge.

    • Limitations: High computational cost, risk of catastrophic forgetting (losing previously learned knowledge), and overfitting to the new, small dataset.
  • Hyper-network-based methods: These methods use a smaller network (hyper-network) to generate parameters for the main LLM, allowing for more flexible and localized updates.

3.3. Technological Evolution

The evolution of knowledge editing in LLMs can be traced through these stages:

  1. Full Fine-tuning (Early Stage): Initially, updating knowledge in LLMs primarily involved fine-tuning the entire model on new data. This was simple but inefficient and problematic due to catastrophic forgetting and high resource demands.
  2. External Memory/Adapter-based Approaches: To avoid direct modification and forgetting, some methods introduced external memory systems or adapter layers that sit alongside the frozen LLM, providing new knowledge or modifying its behavior. This preserved the original model but added complexity to the inference process.
  3. Localization-based Parameter Modification (Mid-Stage KME): Researchers then sought to directly modify the model's parameters in a targeted way. This led to methods like ROME, which hypothesized that factual knowledge is localized within specific FFN layers. Techniques like causal tracing were developed to identify these "hotspots." This was a significant step towards efficiency and precision.
  4. Challenging Localization Assumptions (Current Stage): Recent work (e.g., Hase et al., 2023) started questioning the efficacy and theoretical soundness of strict layer-based localization. It became apparent that knowledge is more distributed and accumulates across the network.
  5. Path-based Editing (This Paper's Contribution): This paper represents an advancement in the current stage by moving beyond layer-based localization to path-based localization. It explicitly models knowledge accumulation as transmission paths spanning all layers and uses a novel contrastive rectification to refine the editing process. This reflects a more nuanced understanding of how knowledge is represented and processed within LLMs.

3.4. Differentiation Analysis

Compared to the main methods in related work, particularly localization-based KME methods like ROME and MEMIT, this paper's approach offers several core differences and innovations:

  • Shift from Layer-based to Path-based Localization:

    • Prior Methods (ROME, MEMIT): Assume knowledge is primarily stored and editable within specific intermediate layers (e.g., a few FFN layers). They use causal tracing to identify these layers.
    • This Paper: Argues that knowledge gradually accumulates across all layers during the forward pass. It introduces critical transmission paths as the unit of localization, which are sequences of parameters and connections spanning the entire network (all layers). This offers a more holistic and arguably more accurate view of where knowledge resides and flows.
  • Comprehensive Parameter Targeting:

    • Prior Methods: Often focus on a limited set of layers, potentially overlooking important parameters outside this range.
    • This Paper: Considers parameters across all layers, acknowledging that contributions to knowledge recall are distributed. The perturbation-based path importance estimation allows for identifying critical components regardless of their layer depth.
  • Parameter Packing Strategy:

    • Prior Methods: May operate at the neuron level or apply global updates to entire FFN weight matrices within selected layers.
    • This Paper: Introduces a parameter packing strategy inspired by the key-value memory viewpoint of FFNs. It partitions FFN weights into column-wise (key vectors) and row-wise (value vectors) segments, significantly reducing the search space for paths from neuron-level (O[L×(D×M)2]\mathcal{O}[L \times (D \times M)^2]) to segment-level (O(L×M2)\mathcal{O}(L \times M^2)). This makes path-based editing computationally feasible.
  • Parameter-Aware Contrastive Rectification:

    • Prior Methods: Focus primarily on updating the identified "positive" parameters to achieve the desired output. They might use regularization terms to maintain locality, but not explicitly a contrastive mechanism for rectification.

    • This Paper: Proposes a parameter-aware contrastive rectifying algorithm. It identifies not only critical paths (positive examples) but also insignificant paths (negative examples). The loss function explicitly penalizes changes in output when insignificant paths are perturbed, thereby reinforcing that only the relevant parameters should be modified. This is a novel mechanism to enhance efficacy while safeguarding generality and locality.

      In essence, this paper challenges the localized assumption of previous work by proposing a more granular, network-wide path-based approach, coupled with an innovative contrastive learning paradigm for rectification, aiming for superior performance, particularly in complex scenarios like consecutive editing.

4. Methodology

4.1. Principles

The core idea of the proposed method is to precisely and effectively modify factual knowledge in LLMs by focusing on critical transmission paths rather than specific layers. The theoretical basis or intuition behind this is that knowledge is not stored in isolated pockets within particular layers but gradually accumulates and is processed along specific information flow routes spanning the entire neural network during the forward pass. By identifying and strategically modifying these critical paths, the model can be updated more accurately and with less collateral damage to unrelated knowledge.

The method operates on two main principles:

  1. Tracing Critical Transmission Paths: Instead of identifying a few localized layers, the method seeks to find transmission paths—sequences of FFN parameter components (key and value vectors) across all layers—that are most sensitive to a given knowledge edit. This is achieved through a perturbation-based importance estimation.
  2. Parameter-Aware Contrastive Rectification: Once critical paths are identified, the model is rectified using a contrastive loss function. This loss not only optimizes the updates on critical paths to achieve the desired output (positive example) but also penalizes unintended changes when insignificant paths are modified (negative example). The intuition is to explicitly teach the model what to change and what not to change for a given edit, thereby enhancing efficacy while preserving locality and generality.

4.2. Core Methodology In-depth

4.2.1. Notations and Task Definition

First, let's establish the notations and the task definition for clarity.

  • An editing request is denoted as εi=(s,r,oo)\varepsilon_i = (s, r, o \to o^*), where ss is the subject (e.g., "Lionel Messi"), rr is a binary relation (e.g., "play_for"), oo is the old object (e.g., "PSG"), and oo^* is the expected new object (e.g., "Inter Miami CF").

  • E\mathcal{E} represents the collection of all knowledge to be edited.

  • xix_i is the input prompt (natural language sentence) corresponding to the subject-relation pair (s, r), e.g., "Which club does Lionel Messi play for now?".

  • Xεi\mathcal{X}_{\varepsilon_i} (or Xi\mathcal{X}_i) denotes other equivalent paraphrases of xix_i.

  • yiy_i is the original textual model output for xix_i, and yiy_i^* is the desired model output for the target object oo^*.

  • The original LLM is represented as a function fΘf_\Theta, with Θ\Theta being its original parameters.

    The task of KME is to incorporate new knowledge by updating a small fraction of parameters such that: $ f_{\Theta^} (x) = \begin{cases} y_i^, & \varepsilon_i \in \mathcal{E}, x \in {x_i, \mathcal{X}_i} \ y_i, & \varepsilon_i \notin \mathcal{E}, x \in {x_i, \mathcal{X}_i} \end{cases} $ Here, Θ=Θ+ΔΘ\Theta^* = \Theta + \Delta\Theta^* represents the updated model parameters, where ΔΘ\Delta\Theta^* is the parameter update matrix. This ΔΘ\Delta\Theta^* should be sparse, meaning only a small subset of parameters are modified.

4.2.2. Feed-Forward Network (FFN)

The paper focuses on information accumulation within Feed-Forward Networks (FFNs), a key module in LLMs. An FFN typically consists of two linear transformations separated by an activation function (e.g., ReLU). For the ll-th layer, an FFN is defined as: $ \mathrm{FFN}^{(l)} (\pmb{x}) = \mathrm{ReLU}(\pmb{x}^\top \pmb{W}_1^{(l)}) \pmb{W}_2^{(l)} $ Where:

  • x\pmb{x} is the input representation to the FFN.
  • W1(l)\pmb{W}_1^{(l)} is the first weight matrix at the ll-th layer, with dimensions D×MD \times M.
  • W2(l)\pmb{W}_2^{(l)} is the second weight matrix at the ll-th layer, with dimensions M×DM \times D.
  • DD refers to the hidden dimension of the model.
  • MM refers to the hidden dimension of the FFN layer (e.g., D=4096D=4096 and M=14336M=14336 in Llama3 (8B)).

4.2.3. Transmission Paths

A transmission path describes how information accumulates from inputs to outputs across all layers. Focusing on FFNs, a path τ\tau is defined as a set of parameter components from W1(l)\pmb{W}_1^{(l)} and W2(l)\pmb{W}_2^{(l)} for each layer ll: $ \tau = { (\Theta_1^{(l)}, \Theta_2^{(l)}) \mid 1 \leq l \leq L } $ Where:

  • τT\tau \in \mathcal{T} represents a specific transmission path, and T\mathcal{T} is the set of all possible paths in the LLM.
  • Θ1(l)\Theta_1^{(l)} and Θ2(l)\Theta_2^{(l)} are nodes of path τ\tau at the ll-th layer. These nodes are parts of the parameters in W1(l)\pmb{W}_1^{(l)} and W2(l)\pmb{W}_2^{(l)}, respectively.
  • LL is the total number of layers in the LLM.

4.2.4. Parameter Packing Strategy

Directly selecting individual neurons for Θ1(l)\Theta_1^{(l)} and Θ2(l)\Theta_2^{(l)} in a neuron-by-neuron manner would result in an astronomically high time complexity of O[L×(D×M)2]\mathcal{O}[L \times (D \times M)^2]. To mitigate this, the paper proposes a parameter packing strategy.

Inspired by Geva et al. (2021), which views FFN layers as key-value memories, the FFN layer can be reformulated: $ \mathrm{FFN}^{(l)} (\pmb{x}) = g(\pmb{x}^\top \underbrace{\pmb{K}^{(l)}}_{\pmb{W}1^{(l)}}) \underbrace{\pmb{V}^{(l)}}{\pmb{W}2^{(l)}} = \sum{j=1}^M \underbrace{g(\pmb{x}^\top \pmb{k}j^{(l)})}{\alpha_j^{(l)}} \pmb{v}_j^{(l)} $ Where:

  • gg is the activation function (e.g., ReLU).

  • K(l)\pmb{K}^{(l)} and V(l)\pmb{V}^{(l)} are augmented versions of W1(l)\pmb{W}_1^{(l)} and W2(l)\pmb{W}_2^{(l)}, analogous to key and value matrices in attention.

  • kj(l)\pmb{k}_j^{(l)} represents the jj-th column weight vector in W1(l)\pmb{W}_1^{(l)}, acting as a key (D×1D \times 1).

  • vj(l)\pmb{v}_j^{(l)} represents the jj-th row weight vector in W2(l)\pmb{W}_2^{(l)}, acting as a value (1×D1 \times D).

  • αj(l)\alpha_j^{(l)} is the weighting coefficient, computed as g(xkj(l))g(\pmb{x}^\top \pmb{k}_j^{(l)}). It is the activation value of the jj-th neuron in the hidden layer of the FFN.

    This reformulation shows that the FFN output is a weighted sum over value vectors vj(l)\pmb{v}_j^{(l)}, where αj(l)\alpha_j^{(l)} determines the contribution of each key-value pair.

This observation motivates packing parameters:

  • The parameters of the first weight matrix W1(l)\pmb{W}_1^{(l)} are packed column-wise into key vectors kj(l)\pmb{k}_j^{(l)}.

  • The parameters of the second weight matrix W2(l)\pmb{W}_2^{(l)} are packed row-wise into value vectors vj(l)\pmb{v}_j^{(l)}.

    With this strategy, the transmission path definition is reformulated. Now, a path consists of a sequence of key vectors and value vectors across layers: $ \tau = { (\boldsymbol{k}_i^{(l)}, \boldsymbol{v}_j^{(l)}) \mid 1 \le l \le L, 1 \le i, j \le M } $ This dramatically reduces the search space for candidate paths from O[L×(D×M)2]\mathcal{O}[L \times (D \times M)^2] (neuron-level) to O(L×M2)\mathcal{O}(L \times M^2) (block-level), making the identification process computationally feasible.

The following figure (Figure 2 from the original paper) illustrates the concept of transmission paths and the packing strategy:

Figure 2: Illustration of transmission paths and the packing strategy. Before applying the packing strategy, each path is composed of each weight in FFNs across all layers, e.g., \(( \\theta _ { 1 , 2 } ^ { 1 } , \\theta _ { 3 , 9 } ^ { 2 } ) ^ { ( 1 ) } \\ : \\ : \\ : ( \\theta _ { 8 , 5 } ^ { 1 } , \\theta _ { 9 , 1 } ^ { 2 } ) ^ { ( 2 ) } \\ : \) \(\\begin{array} { c c l } { \\dots } & { \\to } & { ( \\theta _ { 7 , 5 } ^ { 1 } , \\theta _ { 3 , 4 } ^ { 2 } ) ^ { ( L ) } } \\end{array}\) After applying the sy, each path becomes a sequence of weight vectors, e.g., \(( \\bar { \\pmb { \\theta } } _ { 2 } ^ { 1 } , \\pmb { \\theta } _ { 9 } ^ { 2 } ) ^ { ( 1 ) } . . . ( \\pmb { \\theta } _ { i } ^ { 1 } , \\pmb { \\theta } _ { j } ^ { 2 } ) ^ { ( l ) } . . . ( \\bar { \\pmb { \\theta } } _ { 1 3 } ^ { 1 } , \\pmb { \\theta } _ { 7 } ^ { 2 } ) ^ { ( L ) }\) , where \(( \\pmb { \\theta } _ { i } ^ { 1 } , \\pmb { \\theta } _ { j } ^ { 2 } ) ^ { ( l ) }\) (i.e., \(\\pmb { k } _ { i } ^ { ( l ) }\) and \({ \\pmb v } _ { j } ^ { ( l ) }\) ) are the \(i\) th column and the \(j\) th row \(W _ { 1 } ^ { ( l ) }\) and \(W _ { 2 } ^ { ( l ) }\) , respectively. 该图像是示意图,展示了输入表示、权重矩阵和输出预测之间的关系。图中标出了两个权重矩阵 W1(1)W_1^{(1)}W2(1)W_2^{(1)},illustrating the flow of data through different layers。在数据传输过程中,输入表示通过第一层的权重矩阵进行处理,产生隐藏特征,并最终通过第二层的权重矩阵得到输出预测。各部分的维度标注为1至5,分别对应不同的数据形状。

4.2.5. Tracing Critical Transmission Paths

The next step is to identify "where to perform editing," i.e., determining which paths are critical for shifting the model's prediction from oo to oo^* for a given editing request εi\varepsilon_i. A perturbation-based method is used to estimate the importance of each transmission path.

Impact Score of Transmission Paths

The impact score of a transmission path τ\tau for an editing request εi\varepsilon_i, denoted as ϕ(τεi)\phi(\tau | \varepsilon_i), measures how much that path contributes to correcting the model's prediction. This is based on the principle that if a path is critical, perturbing its parameters should significantly affect the desired output.

Using perturbation theory (Keinan, 2005), the impact score is estimated by observing the change in the cross-entropy loss L\mathcal{L} for the desired output yiy_i^* when an infinitesimal noise ϵτ\epsilon_\tau is introduced into the packed parameters of path τ\tau: $ \begin{array}{l} \displaystyle \phi(\tau | \varepsilon_i) = \operatorname*{lim}{\epsilon\tau \to 0} \frac{\mathcal{L}(y_i^* | \Theta + \epsilon_\tau, x_i) - \mathcal{L}(y_i^* | \Theta, x_i)}{\epsilon_\tau} \ \displaystyle \approx \sum_{\theta \in \tau} \frac{\partial \mathcal{L}}{\partial \theta} \end{array} $ Where:

  • ϕ(τεi)\phi(\tau | \varepsilon_i) is the impact score of path τ\tau for editing request εi\varepsilon_i.

  • L\mathcal{L} is the cross-entropy loss function, measuring the discrepancy between the model's prediction and the desired output yiy_i^*.

  • Θ\Theta represents the original model parameters. Θ+ϵτ\Theta + \epsilon_\tau represents the parameters after adding noise ϵτ\epsilon_\tau to the parameters belonging to path τ\tau.

  • xix_i is the input prompt.

  • ϵτ\epsilon_\tau represents the infinitesimal noise introduced into the packed parameters of the transmission path τ\tau.

  • The approximation θτLθ\approx \sum_{\theta \in \tau} \frac{\partial \mathcal{L}}{\partial \theta} suggests that the impact score can be approximated by the sum of gradients of the loss with respect to each parameter θ\theta within path τ\tau. This means paths with higher gradient magnitudes are considered more impactful.

    After calculating impact scores for all paths, the critical transmission paths (denoted as T+\mathcal{T}^+ or T+(εi)\mathcal{T}^+(\varepsilon_i)) are identified as those with the highest scores. $ \mathcal{T}^+(\varepsilon_i) = { \tau \mid 1 \leq r(\phi(\tau | \varepsilon_i)) \leq N } $ Where:

  • r()r(\cdot) returns the rank position of a path's score within the sorted list of all path scores (in descending order).

  • NN is a hyperparameter representing the size or number of critical transmission paths to select.

4.2.6. Parameter-Aware Contrastive Editing

Once the critical transmission paths T+\mathcal{T}^+ are identified, the method proceeds to rectify the model. To enhance the rectification process, a parameter-aware contrastive rectification algorithm is proposed.

This algorithm treats each path in T+\mathcal{T}^+ as a positive example (parameters that should be updated). Additionally, it samples an insignificant transmission path (with the lowest impact score) as a negative example, denoted as τ\tau^-. The rationale is to explicitly demonstrate the consequences of modifying parameters that are not critical for the current edit, thereby preventing unintended side effects on other knowledge.

The parameter-aware contrastive loss is formulated as: $ \mathcal{I}(\varepsilon_i) = \mathcal{L}(f_{\Theta^}(x_i), y_i^) + \lambda \mathcal{L}(f_{\Theta'}(x_i), y_i) $ Where:

  • I(εi)\mathcal{I}(\varepsilon_i) is the total loss associated with the edit εi\varepsilon_i.

  • L\mathcal{L} is the cross-entropy loss function.

  • fΘ(xi)f_{\Theta^*}(x_i) is the model's prediction after optimizing parameters along the positive paths T+\mathcal{T}^+. The first term L(fΘ(xi),yi)\mathcal{L}(f_{\Theta^*}(x_i), y_i^*) aims to minimize the discrepancy between this prediction and the desired output yiy_i^*, ensuring the efficacy of the edit.

  • fΘ(xi)f_{\Theta'}(x_i) represents the model's output if parameters along the negative path τ\tau^- were updated instead.

  • The second term L(fΘ(xi),yi)\mathcal{L}(f_{\Theta'}(x_i), y_i) is the contrastive loss. It ensures that if the editing were applied to parameters within the insignificant path τ\tau^-, the model's prediction for the current edit xix_i should not drastically change from its original output yiy_i. This term effectively regularizes the update process, teaching the model to ignore parameters irrelevant to the specific edit, thus preserving locality and generality.

  • λ\lambda is a scaling term or hyperparameter that balances the importance of the contrastive loss relative to the efficacy loss.

    The optimization process minimizes I(εi)\mathcal{I}(\varepsilon_i) by updating the parameters within the critical transmission paths. The dual nature of the loss function, simultaneously pushing towards the target output for critical paths and maintaining original behavior for insignificant ones, is key to the method's effectiveness.

This approach provides a more granular and principled way to perform knowledge editing, considering the distributed nature of knowledge in LLMs and explicitly addressing the trade-off between updating specific facts and preserving the model's broader knowledge base.

5. Experimental Setup

5.1. Datasets

The experiments are conducted on two prominent datasets commonly used in knowledge editing research:

  1. ZsRE (Zero-shot Relation Extraction) (Levy et al., 2017):

    • Source: Originally designed for zero-shot relation extraction.
    • Scale & Characteristics: The paper uses EDIT sets from ZsRE. It consists of fact triples (s, r, o) and corresponding natural language questions. For example, a fact might be ("Barack Obama", "place of birth", "Honolulu"). An edit request would involve changing "Honolulu" to a new location.
    • Domain: Factual knowledge, often involving entities and their relations (e.g., people and their birthplaces, organizations and their headquarters).
    • Data Sample: Example: Prompt: "Where was Barack Obama born?" Old Answer: "Honolulu." Desired Answer: "Kenya." (This is a hypothetical example for illustration, not from the paper, as the paper only mentions the dataset type). The paper mentions an example prompt like "Which club does Lionel Messi play for?"
    • Purpose: Effective for validating the model's ability to update specific facts and generalize to different phrasings of the same fact.
  2. CounterFact (Meng et al., 2022):

    • Source: Specifically created for inserting counterfactual knowledge into models.

    • Scale & Characteristics: The paper uses EDIT sets from CounterFact. It comprises subject-relation-object triples and associated natural language prompts, where the object is to be changed to a counterfactual one. For instance, changing "The Eiffel Tower is in Paris" to "The Eiffel Tower is in Rome." It also includes prompts to test locality (unrelated facts) and generality (paraphrases).

    • Domain: Factual knowledge, often involving modifying existing facts or inserting new, sometimes counterintuitive, facts.

    • Data Sample: Example: Prompt: "The author of Harry Potter is J. K. Rowling." Edit: Change J. K. Rowling to "Stephen King." (This is a hypothetical example for illustration).

    • Purpose: Excellent for evaluating the model's ability to insert entirely new or counterfactual knowledge, and rigorously test locality (ensuring other facts are not changed) and generality (ensuring the new fact holds for paraphrased questions).

      These datasets were chosen because they are standard benchmarks in the KME field, allowing for direct comparison with previous work. They are effective for validating the method's efficacy (successfully editing), generality (applying to paraphrases), and locality (not affecting unrelated knowledge).

5.2. Evaluation Metrics

The paper adopts three fundamental metrics for evaluating editing performance: Efficacy, Generality, and Locality. The evaluation is conducted under two scenarios: batch editing (multiple edits simultaneously) and the more challenging consecutive editing (edits done successively without rolling back parameters).

For every evaluation metric, here's a detailed explanation:

  1. Efficacy (Edit Success Rate):

    • Conceptual Definition: Efficacy measures the success rate of the editing process on the specific factual knowledge that was targeted for modification. It quantifies how often the model correctly produces the desired new output (oo^*) when prompted with the original query (xix_i). A higher efficacy indicates that the model has successfully absorbed the intended edit.
    • Mathematical Formula: $ \text{Efficacy} = \frac{1}{|\mathcal{E}|} \sum_{\varepsilon_i \in \mathcal{E}} \mathbb{I}(f_{\Theta^}(x_i) = y_i^) $
    • Symbol Explanation:
      • E|\mathcal{E}|: The total number of editing requests in the evaluation set.
      • εi\varepsilon_i: An individual editing request.
      • I()\mathbb{I}(\cdot): An indicator function, which equals 1 if the condition inside is true, and 0 otherwise.
      • fΘ(xi)f_{\Theta^*}(x_i): The output of the edited model Θ\Theta^* when given the input prompt xix_i.
      • yiy_i^*: The desired target output for the editing request εi\varepsilon_i.
  2. Generality (Paraphrase Success Rate):

    • Conceptual Definition: Generality assesses whether the edited knowledge applies not only to the exact input prompt (xix_i) but also to its semantic variations or paraphrases (Xi\mathcal{X}_i). It measures the model's ability to generalize the learned edit beyond the specific phrasing used during the editing process. A high generality score indicates robust knowledge integration.
    • Mathematical Formula: $ \text{Generality} = \frac{1}{|\mathcal{E}|} \sum_{\varepsilon_i \in \mathcal{E}} \left( \frac{1}{|\mathcal{X}i|} \sum{x \in \mathcal{X}i} \mathbb{I}(f{\Theta^}(x) = y_i^) \right) $
    • Symbol Explanation:
      • E|\mathcal{E}|: The total number of editing requests in the evaluation set.
      • εi\varepsilon_i: An individual editing request.
      • Xi|\mathcal{X}_i|: The number of paraphrases for the input prompt xix_i.
      • xx: A paraphrase of the input prompt xix_i.
      • I()\mathbb{I}(\cdot): An indicator function.
      • fΘ(x)f_{\Theta^*}(x): The output of the edited model Θ\Theta^* when given a paraphrase xx.
      • yiy_i^*: The desired target output for the editing request εi\varepsilon_i.
  3. Locality (Unrelated Fact Preservation):

    • Conceptual Definition: Locality evaluates the model's ability to retain its original knowledge for facts unrelated to the editing request. It quantifies how well the model avoids catastrophic forgetting or unintended changes to other, unedited knowledge. A high locality score is crucial to ensure that editing one fact does not corrupt the model's broader knowledge base.
    • Mathematical Formula: $ \text{Locality} = \frac{1}{|\mathcal{S}|} \sum_{(x, y) \in \mathcal{S}} \mathbb{I}(f_{\Theta^*}(x) = y) $
    • Symbol Explanation:
      • S|\mathcal{S}|: The total number of unrelated (original) facts or queries in the evaluation set.

      • (x,y)S(x, y) \in \mathcal{S}: An input-output pair representing an unrelated fact, where xx is the input prompt and yy is its original correct output from the unedited model.

      • I()\mathbb{I}(\cdot): An indicator function.

      • fΘ(x)f_{\Theta^*}(x): The output of the edited model Θ\Theta^* when given the input prompt xx for an unrelated fact.

      • yy: The original correct output for the unrelated fact xx from the unedited model.

        A combined Score is also reported, which is the mean result of Efficacy, Locality, and Generality. This provides an overall aggregated performance metric.

5.3. Baselines

The proposed method is compared against nine strong baselines, reflecting various approaches to knowledge editing:

  1. Full-C (Zhu et al., 2021): Likely refers to Full Fine-tuning with Causal Tracing or a similar variant. This is a baseline that fine-tunes a subset of the model parameters (often using causal tracing for identification) to update knowledge.

  2. ROME (Meng et al., 2022): Rank-one Model Editing. A prominent localization-based method that identifies specific FFN layers using causal mediation analysis and applies a rank-one update to their weights.

  3. KN (Dai et al., 2022): Knowledge Neurons. This method identifies specific neurons (knowledge neurons) in transformers that are responsible for storing factual knowledge and updates them.

  4. MEMIT (Meng et al., 2023): Mass-Editing Memory in a Transformer. An extension of ROME allowing for simultaneous editing of multiple facts.

  5. PMET (Li et al., 2024b): Precise Model Editing in a Transformer. A method designed for precise and localized model editing.

  6. AlphaEdit (Fang et al., 2025): Null-space constrained knowledge editing for language models. A recent method that aims to perform editing by constraining updates to the null space of the model's representation, reducing side effects.

  7. LoRA (Xu et al., 2024): Low-Rank Adaptation. While primarily a parameter-efficient fine-tuning technique, it can be adapted for editing by fine-tuning small, low-rank matrices added to the original model.

  8. EMMET (Gupta et al., 2024b): A unified framework for model editing. A more recent framework attempting to unify different editing approaches.

  9. R-ROME (Gupta et al., 2024a): Rebuilding ROME: Resolving model collapse during sequential model editing. An improved version of ROME specifically designed to address model degradation during sequential (consecutive) editing.

    These baselines are representative because they cover various strategies for KME, including fine-tuning variants, localization-based approaches, neuron-level editing, and parameter-efficient techniques, as well as recent advancements addressing sequential editing challenges. Comparing against this diverse set allows for a comprehensive evaluation of the proposed method's strengths.

5.4. LLMs Used

The experiments are conducted on three prominent auto-regressive LLMs:

  1. GPT-J (6B) (Wang and Komatsuzaki, 2021): A 6-billion parameter open-source LLM, known for its strong performance on various NLP tasks.

  2. Llama2 (7B) (Touvron et al., 2023): A 7-billion parameter model from Meta's Llama family, part of their open-source offerings for generative AI.

  3. Llama3 (8B) (Llama Team, 2024): An 8-billion parameter model, a more recent iteration from the Llama family, also open-source.

    These models were chosen because they are widely adopted, represent different model families, and have varying parameter counts, providing a robust testbed for the editing methods.

5.5. Experimental Settings

  • Hardware: Experiments were conducted on an NVIDIA A100-SXM4-40GB machine.
  • Baselines Implementation: Baseline methods were implemented using the EasyEdit toolkit, a widely adopted tool for KME, with hyperparameters configured according to recommended settings.
  • Editing Scenarios:
    • Batch Editing: Multiple edit requests are processed simultaneously.
    • Consecutive Editing: All edit requests are applied successively without rolling back parameters after each edit. Evaluation is performed only after all knowledge updates are completed. This is considered a more challenging scenario as previous edits can interfere with subsequent ones.

6. Results & Analysis

6.1. Core Results Analysis

The paper presents experimental results comparing the proposed method (Ours) against nine baselines on two datasets (ZsRE and CounterFact) across three LLMs (GPT-J, Llama2, Llama3). The evaluation is performed under batch editing and, more critically, under consecutive editing.

6.1.1. Batch Editing Performance

The following are the results from Table 1 of the original paper, showing average performance under batch editing with batch_size = 100:

Editor ZsRE CounterFact
Efficacy Locality Generality Score Efficacy Locality Generality Score
GPT-J (6B) Original Model
GPT-J (6B) Original Model 26.32 / 25.79 26.06 16.22 / 18.56 17.39
Full-C (Zhu et al., 2021) 72.37 19.66 68.91 53.65 92.15 43.35 72.38 69.29
ROME (Meng et al., 2022) 56.42 9.86 54.65 40.31 57.50 52.05 54.20 54.58
MEMIT (Meng et al., 2023) 94.91 30.39 90.22 71.84 98.55 63.64 95.50 85.90
PRUNE (Ma et al., 2025) 0.15 0.00 0.15 0.10 86.15 53.87 86.85 75.62
RECT (Gu et al., 2024) 96.38 27.79 91.21 71.79 98.80 72.22 86.58 85.87
AlphaEdit (Fang et al., 2025) 99.79 28.29 96.00 74.69 99.75 75.48 96.38 90.54
Ours 100 93.22 63.75 85.66 100 17.00 12.00 43.00
Llama3 (8B) Original Model
Llama3 (8B) Original Model 36.99 / 36.34 36.67 7.85 / 10.58 9.22
Full-C (Zhu et al., 2021) 30.48 15.49 30.22 25.40 83.33 46.63 67.79 65.92
ROME (Meng et al., 2022) 2.01 0.69 1.80 1.50 64.40 49.44 61.42 58.42
MEMIT (Meng et al., 2023) 34.62 18.49 31.28 28.13 65.65 51.56 64.65 60.62
PRUNE (Ma et al., 2025) 24.77 20.69 23.87 23.11 68.25 49.82 64.75 60.94
RECT (Gu et al., 2024) 86.05 31.67 80.54 66.09 66.05 61.41 63.62 63.69
AlphaEdit (Fang et al., 2025) 94.47 32.55 91.13 72.72 98.90 67.88 94.22 87.00
Ours 98.21 85.36 77.04 86.87 100 16.00 23.00 46.33

Analysis for Batch Editing:

  • ZsRE: Our method achieves 100% Efficacy for GPT-J and 98.21% for Llama3, which is on par with or slightly better than top baselines like AlphaEdit and MEMIT. Crucially, its Locality (93.22% for GPT-J, 85.36% for Llama3) is significantly higher than all other baselines, which typically range from 9.86% to 32.55%. This indicates a strong ability to preserve unrelated knowledge. Generality is competitive but not always the highest. The overall Score (85.66% for GPT-J, 86.87% for Llama3) is the highest or among the highest, driven by the exceptional Locality.
  • CounterFact: Our method achieves 100% Efficacy for both GPT-J and Llama3, demonstrating perfect success in editing the target knowledge. However, its Locality (17.00% for GPT-J, 16.00% for Llama3) and Generality (12.00% for GPT-J, 23.00% for Llama3) are notably lower than other strong baselines (e.g., AlphaEdit, MEMIT, RECT). This leads to a lower overall Score (43.00% for GPT-J, 46.33% for Llama3) compared to the best baselines.
  • Summary: For batch editing, our method excels in efficacy and demonstrates outstanding locality on ZsRE, but struggles with locality and generality on CounterFact. The paper identifies this as a limitation arising from CounterFact's focus on inserting new factual knowledge (where internal information pathways might not be well-established), making it more prone to disrupting previously learned knowledge when editing intermediate nodes.

6.1.2. Consecutive Editing Performance

The following are the results from Table 2 of the original paper, showing average performance under consecutive editing:

Editor ZsRE CounterFacT
Efficacy Locality Generality Score Efficacy Locality Generality Score
GPT-J (6B) Original Model
GPT-J (6B) Original Model 21.65 / 21.10 21.37 0.30 0.23 0.27
Full-C (Zhu et al., 2021) 11.04 1.59 8.41 7.01 21.33 1.27 7.97 10.19
ROME (Meng et al., 2022) 31.87 18.29 28.10 26.09 0.13 0.03 0.20 0.12
KN (Dai et al., 2022) 0.00 0.01 0.00 0.003 0.01 0.00 0.007 0.006
MEMIT (Meng et al., 2023) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
PMET (Li et al., 2024b) 0.02 0.03 0.02 0.02 0.00 0.00 0.00 0.00
AlphaEdit (Fang et al., 2025) 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00
LoRA (Xu et al., 2024) 1.11 0.01 1.15 0.76 0.97 0.13 0.67 0.59
EMMET (Gupta et al., 2024b) 55.21 37.47 51.67 48.12 70.20 33.03 41.17 48.13
R-ROME (Gupta et al., 2024a) 54.74 13.33 51.76 39.96 69.27 41.87 37.40 49.51
Ours 88.74 51.28 49.50 63.17 90.70 1.83 5.33 32.62
Llama2 (7B) Original Model
Llama2 (7B) Original Model 34.73 / 34.59 34.66 15.19 11.55 13.37
Full-C (Zhu et al., 2021) 7.88 0.55 6.73 5.05 2.24 2.31 0.05 1.53
ROME (Meng et al., 2022) 9.16 1.12 8.29 6.19 36.96 3.24 18.77 19.66
MEMIT (Meng et al., 2023) 0.00 0.03 0.00 0.01 0.00 6.43 0.00 2.14
KN (Dai et al., 2022) 1.02 0.03 0.09 0.38 0.37 0.02 0.29 0.23
PMET (Li et al., 2024b) 3.68 1.83 3.68 3.06 0.23 0.47 0.17 0.29
AlphaEdit (Fang et al., 2025) 2.83 0.97 2.81 2.20 0.00 4.41 0.00 1.47
LoRA (Xu et al., 2024) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
EMMET (Gupta et al., 2024b) 25.01 2.87 22.43 16.77 38.67 5.83 28.35 24.28
R-ROME (Gupta et al., 2024a) 21.21 1.52 17.78 13.50 41.06 5.66 25.92 24.21
Ours 84.09 75.77 66.20 75.35 71.46 20.62 20.96 37.68
Llama3 (8B) Original Model
Llama3 (8B) Original Model 26.27 / 25.98 26.13 0.87 0.75 0.81
Full-C (Zhu et al., 2021) 7.69 0.69 6.66 5.01 5.75 0.13 0.47 2.12
ROME (Meng et al., 2022) 3.39 0.15 2.80 2.11 25.07 0.97 13.23 13.09
MEMIT (Meng et al., 2023) 0.00 3.96 0.00 1.32 0.00 7.22 0.00 2.41
KN (Dai et al., 2022) 0.03 0.01 0.01 0.02 0.11 0.02 0.05 0.06
PMET (Li et al., 2024b) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
AlphaEdit (Fang et al., 2025) 0.01 0.003 0.00 0.004 0.33 0.07 0.17 0.19
LoRA (Xu et al., 2024) 11.45 5.35 11.16 9.32 0.77 0.17 1.17 0.70
EMMET (Gupta et al., 2024b) 5.17 0.43 4.86 3.49 54.50 1.28 38.82 31.53
R-ROME (Gupta et al., 2024a) 2.71 0.35 2.43 1.83 48.92 1.47 36.62 29.00
Ours 94.03 59.01 67.35 73.46 93.53 1.93 7.11 34.19

Analysis for Consecutive Editing:

  • Overall Trend: The performance of most baselines significantly deteriorates under consecutive editing. Many methods (KN, MEMIT, PMET, AlphaEdit, LoRA) show near-zero scores across all metrics for several models/datasets, indicating a severe impact on the original LLMs and catastrophic forgetting. This highlights the difficulty of this scenario where previous edits can negatively interfere with new ones and the model's overall knowledge.

  • Our Method's Superiority:

    • ZsRE: Our method consistently outperforms all compared methods by significant margins. For GPT-J, it achieves 88.74% Efficacy, 51.28% Locality, and 49.50% Generality, leading to a Score of 63.17%. This is substantially higher than the next best (EMMET, Score 48.12%). Similar trends are observed for Llama2 (Score 75.35% vs. EMMET 16.77%) and Llama3 (Score 73.46% vs. LoRA 9.32%). This demonstrates the effectiveness of critical transmission paths in maintaining performance and resisting degradation during sequential updates.
    • CounterFact: While our method still shows strong Efficacy (e.g., 90.70% for GPT-J, 93.53% for Llama3), its Locality (1.83% for GPT-J, 1.93% for Llama3) and Generality (5.33% for GPT-J, 7.11% for Llama3) remain low, similar to the batch editing scenario. The overall Score (e.g., 32.62% for GPT-J, 34.19% for Llama3) is still competitive, often surpassing many baselines that perform poorly in this challenging setting.
  • Discussion on CounterFact Limitation: The paper acknowledges that the low Generality and Locality on CounterFact is a limitation. This dataset primarily involves inserting new factual knowledge, which might not have well-established internal information pathways within the pre-trained model. Directly modifying intermediate nodes along these underdeveloped paths can inadvertently interfere with previously learned knowledge. The authors are investigating adaptive rectification strategies to mitigate this.

6.1.3. Comparison of Editing Time

The following figure (Figure 3 from the original paper) presents the average time per edit for all compared methods:

Figure 3: Comparison of average time per editing among all methods on two datasets.

Analysis of Editing Time:

  • Our method demonstrates strong efficiency, requiring:
    • 2.8 seconds per edit for GPT-J on ZsRE.
    • 2.0 seconds per edit for Llama2 (7B) on ZsRE.
    • 3.1 seconds per edit for Llama3 (8B) on ZsRE.
  • This is highly competitive with, and often faster than, many baselines. For instance, Full-C shows slightly longer times (3.06s, 2.91s, 3.63s).
  • Methods like MEMIT, AlphaEdit, and PMET, while often performing well in batch editing, incur significantly longer editing times, making them less efficient.
  • KN exhibits the longest editing time among all methods, making it considerably less efficient.
  • Conclusion: The efficiency of our method, coupled with its strong performance (especially in consecutive editing), highlights its practical applicability.

6.2. Ablation Studies / Parameter Analysis

The paper includes analyses of key hyper-parameters and components, providing insights into the method's behavior.

6.2.1. Analysis of Critical Transmission Path

The paper analyzes the importance of each node within the critical transmission path (T+\mathcal{T}^+) across all layers of Llama3 (8B) on both ZsRE and CounterFact. This helps understand the distribution of influence.

The following figure (Figure 4 from the original paper) shows the importance of each node in T+\mathcal{T}^+ across all Llama3 (8B) layers on ZsRE (Top) and CounterFact (Bottom):

Figure : Importance of each noden \({ \\mathcal { T } } ^ { + }\) across al Llama3 (8B) layers on ZsRE (Top) CoUNTERFACT (Bottom)

Analysis:

  1. Consistency Across Datasets: The same model exhibits consistent trends in node importance distribution across different datasets (ZsRE and CounterFact). This suggests the identified critical paths represent stable internal information flows.
  2. All Layers Contribute: All hidden layers contribute to knowledge editing, contradicting the assumption of prior methods that focus only on specific layers. This confirms the paper's premise that knowledge accumulation is a network-wide phenomenon.
  3. Non-uniform Influence: The influence of different layers is not uniformly distributed.
    • Middle Layers are Stronger: Nodes in the middle layers (e.g., layers 4-18 in Llama3 8B) exert a stronger influence (higher importance scores). This implies these layers are more critical for processing and storing factual knowledge.
    • Early vs. Late Layers: Early layers (1-4) contribute more significantly than late layers (28-32). Importance scores in late layers remain relatively stable, suggesting a more uniform and consistent level of influence.
  4. Node Variability within Layers: Despite the strong influence of middle layers, node importance varies significantly within these layers. Some nodes might be highly entangled with other unrelated knowledge, making them unsuitable for editing. This reinforces the need for precise identification of critical nodes, as performed by the method.
  5. Implications for Optimization: The current method uniformly optimizes all nodes along a path. However, the analysis suggests an adaptive re-weighting strategy that emphasizes nodes in more impactful middle layers could further enhance performance.

6.2.2. Effects of the Contrastive Rectification

The contrastive loss defined in Eq. 7 plays a crucial role. The paper investigates its effect, particularly by setting λ=0\lambda=0, which effectively removes the contrastive term.

The following figure (Figure 6 from the original paper) shows an analysis of λ\lambda for Llama2 (7B):

Figure 6: Analysis of \(\\lambda\) for Llama2 (7B).

Analysis:

  • Impact of λ=0\lambda=0: The results indicate that when λ=0\lambda=0 (i.e., without contrastive rectification), the model's Efficacy is approximately 4% lower. This highlights that the contrastive loss significantly contributes to the success rate of the edits.
  • Stability of Generality and Locality: The contrastive rectification helps maintain the stability of Generality and Locality. This implies that by explicitly considering negative examples (insignificant paths), the model learns to focus updates on relevant knowledge without disrupting unrelated information.
  • Hyperparameter τ| \tau^- |: The number of negative transmission paths τ| \tau^- | is a hyperparameter. Setting it too high leads to a notable decline in both Efficacy and Generality. This is because excessive emphasis on the contrastive loss ("what not to do") can make the model overly cautious, compromising its ability to successfully apply edits and generalize. Based on empirical evaluation, T=1| \mathcal{T}^- | = 1 is chosen as optimal.

6.2.3. Analysis of the size of T+\mathcal{T}^+

The paper also analyzes the effect of varying the size of the critical transmission paths, T+\lvert \mathcal{T}^+ \rvert.

The following figure (Figure 5 from the original paper) shows the analysis of T+\lvert \mathcal{T}^+ \rvert for Llama3 (8B):

Figure 5: Analysis of \(\\lvert \\mathcal { T } ^ { + } \\rvert\) for Llama3 (8B).

Analysis:

  • Effect on Efficacy and Generality: As T+\lvert \mathcal{T}^+ \rvert increases, Efficacy and Generality tend to slightly rise. This is logical, as including more parts of the relevant information accumulation path should help integrate new knowledge better.
  • Impact on Locality: Once T+\lvert \mathcal{T}^+ \rvert surpasses a certain threshold (e.g., 15 for Llama3 (8B) on 3K ZsRE), Locality declines sharply. This indicates that including too many paths, beyond the truly critical ones, introduces irrelevant paths into the optimization. These irrelevant paths can act as noise, causing unintended modifications to unrelated knowledge and degrading the model's ability to retain its original knowledge base.
  • Optimal Balance: The paper sets T+=15\lvert \mathcal{T}^+ \rvert = 15 as a reasonable compromise, balancing improvement in efficacy and generality with the stability of locality.

6.2.4. Analysis of λ\lambda

The λ\lambda parameter in the contrastive loss balances the efficacy loss and the contrastive loss.

The following figure (Figure 6 from the original paper) visualizes the impact of varying λ\lambda on the model's performance:

Figure 6: Analysis of \(\\lambda\) for Llama2 (7B).

Analysis:

  • Decline with Increasing λ\lambda: The results show that performance generally declines as\lambdaincreases. This is attributed to an overemphasis on the contrastive loss, potentially leading to overfitting to expected predictions or making the model too rigid.
  • Impact on Efficacy: This negative effect is most apparent in Efficacy, where the model struggles to maintain accuracy when the contrastive loss becomes too dominant.
  • Locality Fluctuation: Interestingly, Locality exhibits lower fluctuations at larger values of\lambda$$. This suggests that a higher λ\lambda helps preserve unrelated knowledge by strongly penalizing changes in insignificant paths, reinforcing the role of contrastive optimization in safeguarding locality. However, this comes at the cost of overall efficacy.
  • Optimal Value: Based on these observations, λ=0.1\lambda = 0.1 is chosen as the optimal value to strike a balance between achieving effective rectification and maintaining overall performance, especially efficacy.

6.3. Summary of Results

  • Strong Performance in Consecutive Editing: The method significantly outperforms baselines in the challenging consecutive editing scenario on ZsRE, demonstrating robust knowledge integration and reduced forgetting.
  • High Efficacy and Locality on ZsRE: It achieves excellent efficacy and particularly high locality scores on the ZsRE dataset in both batch and consecutive settings, indicating precise and non-disruptive editing.
  • Limitations on CounterFact Locality/Generality: The method shows strong efficacy but comparatively lower locality and generality on CounterFact, especially in batch editing. This is attributed to the nature of CounterFact (inserting new facts) and the current fixed optimization strategy across layers.
  • Efficiency: The method is highly efficient, with editing times comparable to or better than many baselines.
  • Insights into LLM Knowledge: The analysis of critical paths reveals that knowledge processing is distributed across all layers, with middle layers being particularly influential, offering valuable insights into LLM internal mechanisms.
  • Contrastive Loss Effectiveness: The parameter-aware contrastive rectification is crucial for improving efficacy and maintaining generality and locality, although careful tuning of λ\lambda and τ| \tau^- | is required.

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper introduces a novel and effective approach to Knowledge-based Model Editing (KME) by conceptualizing and leveraging critical transmission paths within Large Language Models (LLMs). Moving beyond the limitations of traditional layer-based localization, the method identifies specific parameter pathways spanning all layers that are most influential for a given knowledge edit. To make this feasible, a parameter packing strategy is employed. Furthermore, a parameter-aware contrastive rectifying algorithm is proposed, which not only optimizes updates on these critical paths but also utilizes insignificant paths as negative examples to prevent unintended modifications to unrelated knowledge. Extensive experiments across three popular LLMs (GPT-J, Llama2, Llama3) and two widely-used KME datasets (ZsRE, CounterFact) demonstrate the superior performance of the proposed method in terms of efficacy, generality, and locality, particularly in the challenging consecutive editing scenario. The method also proves to be computationally efficient.

7.2. Limitations & Future Work

The authors acknowledge several limitations and suggest future research directions:

  1. Uniform Contribution Assumption: The current method assumes that all layers within an identified critical transmission path contribute equally to the editing process. However, the analysis (Figure 4) clearly shows varying degrees of influence across layers (e.g., middle layers are more impactful).
    • Future Work: Investigate layer-wise contributions to editing effectiveness and incorporate a more fine-grained, layer-sensitive optimization strategy with adaptive weighting to enhance performance.
  2. Fixed-size Critical Transmission Path: The method currently operates with a fixed-size critical transmission path (T+\lvert \mathcal{T}^+ \rvert) during editing. This static configuration may not be optimal for all types of edits or task requirements. As shown in Figure 5, selecting an inappropriate size can lead to locality degradation.
    • Future Work: Focus on dynamically adjusting the node number (size of T+\mathcal{T}^+) based on the specific characteristics of each editing request, thereby improving the model's adaptability and robustness across diverse scenarios.
  3. Performance on CounterFact Locality/Generality: The paper implicitly points to a limitation on CounterFact, where the method, while achieving high efficacy, shows relatively low locality and generality. This is attributed to CounterFact's focus on inserting new factual knowledge, where internal pathways might be underdeveloped, leading to higher risks of disrupting existing knowledge.
    • Future Work (implied): Develop adaptive rectification strategies that dynamically adjust parameter updates in critical paths across different layers to mitigate unintended side effects and improve generality and locality when dealing with novel knowledge insertion.

7.3. Personal Insights & Critique

This paper presents a significant advancement in the field of knowledge editing by fundamentally rethinking how knowledge is localized and modified within LLMs. The shift from layer-based to path-based localization is a compelling conceptual leap, reflecting a more accurate understanding of information flow in deep neural networks. The parameter packing strategy is a clever solution to make this computationally tractable, transforming an otherwise intractable problem into a practical one.

The introduction of parameter-aware contrastive rectification is particularly innovative. By explicitly incorporating "negative examples" (insignificant paths) into the loss function, the method not only learns what to change but also what not to change. This explicit regularization mechanism is likely a key driver behind its impressive performance in consecutive editing, a notoriously difficult scenario where most baselines fail catastrophically. The robust locality scores on ZsRE further underscore the effectiveness of this contrastive approach in preventing unwanted side effects.

Inspirations and Transferability:

  • Understanding LLM Internals: The analysis of critical paths provides valuable insights into the black box of LLMs, suggesting that knowledge is indeed distributed and processed dynamically across layers, with varying influence. This kind of analysis could inspire further research into interpretable AI and understanding how LLMs learn and store information.
  • Beyond KME: The concept of critical transmission paths and perturbation-based importance estimation could be generalized to other areas beyond KME. For instance, in model compression or pruning, identifying critical paths could lead to more effective and less destructive pruning strategies. In adversarial robustness, understanding critical paths might help identify vulnerabilities or design more robust models.
  • Contrastive Learning for Fine-tuning/Adaptation: The parameter-aware contrastive loss could be adapted for other fine-tuning or adaptation tasks where preserving existing knowledge while acquiring new skills is crucial. For example, in continual learning, this approach could help mitigate catastrophic forgetting by identifying and preserving "critical paths" for old tasks while training on new ones.

Potential Issues, Unverified Assumptions, or Areas for Improvement:

  • CounterFact Performance Gap: The most significant weakness is the locality and generality performance on CounterFact, especially compared to its strong showing on ZsRE. The explanation that CounterFact involves inserting new knowledge into underdeveloped pathways is plausible, but it suggests a fundamental limitation in the current method's ability to handle novel knowledge effectively without collateral damage. This needs more targeted solutions. Perhaps the definition of "insignificant paths" needs to be dynamically adjusted based on the novelty or type of knowledge being edited.

  • Generalizability of "Criticality": The "criticality" of a path is defined relative to a specific editing request. It's assumed that this criticality applies universally to all contexts where that fact might be invoked. While this is a common assumption in KME, its robustness could be further explored, especially in complex, multi-hop reasoning tasks.

  • Computational Cost of Path Tracing: While parameter packing reduces the complexity from neuron-level to block-level, identifying and ranking paths still involves iterating through FFN elements across layers. For extremely large models (e.g., hundreds of billions of parameters), the O(L×M2)\mathcal{O}(L \times M^2) complexity, where MM can be very large, might still be substantial, especially for real-time applications or editing a huge number of facts. Further optimizations for path identification could be beneficial.

  • Hyperparameter Sensitivity: The method relies on hyperparameters like NN (size of T+\mathcal{T}^+) and λ\lambda (contrastive loss weight), which require careful tuning. Finding a robust, adaptive way to set these, perhaps automatically, would make the method more user-friendly and generalizable. The analysis showed a sharp decline in locality if NN is too large, highlighting this sensitivity.

  • Interpretability of Paths: While the paper identifies "paths," the exact semantic meaning or content encoded within these paths, beyond their "criticality" for a specific fact, remains somewhat opaque. Further work could delve into the semantic interpretability of these critical paths.

    Overall, this paper provides a robust and innovative framework for knowledge editing, pushing the boundaries of precision and robustness in LLM modification. Its contributions pave the way for more sophisticated and reliable ways to manage the vast and dynamic knowledge within large language models.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.