Paper status: completed

RELATION EDITING FOR LARGE LANGUAGE MODELS

Original Link
Price: 0.100000
3 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This study introduces the task of relation editing in large language models, revealing that current methods retain outdated information at rates up to 98.20%. A novel Forgetting-and-Editing framework and a self-paced learning strategy are proposed, significantly improving editing

Abstract

Knowledge editing is a critical technique for the routine updating and maintenance of LLMs. Existing research predominantly assumes changes only to the object within subject-relation-object triples, with minimal exploration into techniques for editing the relation. We term this task Relation Editing (distinct from the established “Object Editing” paradigm). We first construct a dedicated relation editing dataset and benchmark existing algorithms, revealing a critical flaw: even with successful edits, prominent methods suffer from the persistent retention of outdated information, with rates reaching as high as 98.20%. Editing failures stem primarily from two sources: the persistent retention of outdated relationships and the presence of challenging editing samples. To address the first issue, we propose a novel relation editing framework called Forgetting-and-Editing (FE). We theoretically show that existing forgetting methods (i.e., model unlearning) are unsuitable for this purpose and, to this end, introduce a new target assignment strategy within our framework. To mitigate the second challenge, we introduce a self-paced learning strategy, instantiated in a new algorithm named self-paced AlphaEdit (SPaEdit). We conduct extensive experiments on our compiled relation-editing dataset and established object-editing benchmarks. Results demonstrate that our proposed relation editing strategy achieves satisfactory performance on the relation editing task. In addition, SPaEdit outperforms existing SOTA methods on object-editing benchmarks. Our research also suggests further study is warranted in relation editing, particularly on forgetting existing relations.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

The central topic of this paper is Relation Editing for Large Language Models.

1.2. Authors

The authors are listed as Anonymous authors as the paper is under double-blind review. Therefore, their research backgrounds and affiliations are not disclosed in the paper.

1.3. Journal/Conference

The paper states Paper under double-blind review, indicating it is submitted to a conference, likely ICLR 2025 given the reference format and mentions in the text (e.g., The Thirteenth International Conference on Learning Representations, ICLR, 2025). ICLR is a highly reputable and influential conference in the field of deep learning and artificial intelligence.

1.4. Publication Year

The publication year is not explicitly stated for the paper itself, but it references ICLR, 2025 for some of its own cited works, suggesting it's either published or submitted for publication in 2025.

1.5. Abstract

Knowledge editing is a crucial technique for updating and maintaining large language models (LLMs) without costly full retraining. Existing research primarily focuses on Object Editing, where only the object oo in a (subject, relation, object) triple is changed. This paper introduces a new task called Relation Editing, which focuses on modifying the relation rr while keeping the subject and object constant. The authors construct a dedicated relation editing dataset (ReEditBench) and benchmark existing object-editing algorithms, revealing a significant flaw: even if new knowledge is successfully acquired, these methods retain outdated information at very high rates (up to 98.20%).

The paper identifies two main sources of editing failures: the persistent retention of outdated relationships and the presence of challenging editing samples. To address the first issue, a novel Forgetting-and-Editing (FE) framework is proposed. The authors theoretically demonstrate that existing model unlearning strategies are unsuitable for this purpose and introduce a new target assignment strategy within FE. To mitigate the second challenge of difficult samples, a self-paced learning strategy is introduced and instantiated in a new algorithm named Self-paced AlphaEdit (SPaEdit).

Extensive experiments on ReEditBench and established object-editing benchmarks show that the proposed FE strategy significantly improves relation editing performance. Furthermore, SPaEdit outperforms existing state-of-the-art methods on object-editing benchmarks. The research concludes by emphasizing the need for further study in relation editing, especially concerning effectively forgetting existing relations.

The original source link is /files/papers/6951e7e69c764da3f20e3720/paper.pdf. This link points to a PDF file, and given the "Anonymous authors Paper under double-blind review" note, it is likely a preprint or submission to a conference.

2. Executive Summary

2.1. Background & Motivation

The core problem this paper aims to solve is the efficient and precise modification of factual knowledge within Large Language Models (LLMs) without undergoing expensive full retraining. LLMs are inherently static once trained, making knowledge editing a critical technique for their routine updating and maintenance, such as correcting inaccuracies or adding new information.

Prior research in knowledge editing has predominantly focused on Object Editing, where only the object (o) in a (subject, relation, object) fact triple is updated (e.g., changing "Paris is the capital of France" to "Paris is the capital of fashion"). However, the paper highlights a significant gap: editing the relation (r) itself, while keeping the subject and object constant, has received minimal attention. Such relation editing tasks are common in practice; for example, changing "Zinedine Zidane is a player for Real Madrid" to "Zinedine Zidane is a coach of Real Madrid" involves updating the relation from player for to coach of. Existing object-editing methods are ill-suited for this, failing to properly erase outdated information and struggling with challenging edits.

The paper's entry point is to formally define this overlooked task as Relation Editing and to address its specific challenges. Their innovative idea is to develop a comprehensive framework that combines explicit forgetting of old relations with self-paced learning for more effective and robust knowledge editing.

2.2. Main Contributions / Findings

The paper makes several primary contributions to the field of knowledge editing for LLMs:

  • Formalization of Relation Editing: The paper formally defines and distinguishes Relation Editing from the established Object Editing paradigm, highlighting its practical importance and unique challenges.
  • Construction of ReEditBench: A dedicated relation editing dataset named ReEditBench is constructed. This dataset is crucial for benchmarking and evaluating methods specifically designed for relation editing, filling a gap in existing resources.
  • Identification of Key Challenges: Through benchmarking existing object-editing algorithms on ReEditBench, the paper reveals two critical flaws:
    • Persistent Retention of Outdated Information: Even when successfully learning new relations, existing methods suffer from exceptionally high retention rates (up to 98.20%) of the original, outdated knowledge. This indicates an additive rather than a corrective overwrite.
    • Failure on Challenging Samples: Existing methods perform poorly on hard-to-edit relations, where the difference between the model's knowledge of the (subject, new_relation) pair and the object is large.
  • Proposal of Forgetting-and-Editing (FE) Framework: To address the issue of persistent retention, the paper proposes a novel framework called Forgetting-and-Editing (FE).
    • It theoretically demonstrates the unsuitability of conventional model unlearning strategies (e.g., setting targets to "I don't know" or random responses) for relation editing due to systematic biases.
    • It introduces a new, interpolation-based target assignment strategy within FE that effectively suppresses systematic bias, improves edit success, and reduces retention.
  • Introduction of Self-paced AlphaEdit (SPaEdit): To mitigate the challenge of hard editing samples, the paper integrates self-paced learning into a knowledge editing algorithm, resulting in SPaEdit. This method learns from easier samples first and progressively incorporates more challenging ones, leading to more robust optimization.
  • Empirical Validation and State-of-the-Art Performance:
    • Experiments show that the FE strategy significantly enhances the performance of existing object-editing methods on relation editing tasks, leading to average Success metric improvements of 10.07% and peak improvements of 34.49%.

    • Combining FE with SPaEdit yields the best relation editing performance.

    • SPaEdit also outperforms existing state-of-the-art methods, including AlphaEdit, on established object-editing benchmarks like ZsRE and CounterFact, particularly excelling on hard subsets.

      These findings collectively solve the problem of ineffective relation editing by providing mechanisms for explicitly forgetting old knowledge and robustly learning new, challenging edits, thereby making LLM updates more precise and reliable.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand this paper, a foundational grasp of several core concepts in Large Language Models (LLMs) and machine learning is essential:

  • Large Language Models (LLMs): These are advanced neural networks, typically Transformer-based, trained on vast amounts of text data to understand, generate, and process human language. They learn to predict the next word in a sequence, thereby acquiring extensive factual knowledge and reasoning capabilities. Examples include GPT-J, GPT2-XL, and LLaMA3.
  • Knowledge Editing: A technique to modify specific factual associations stored within an LLM without the prohibitive cost of retraining the entire model. This is crucial for updating outdated information or correcting errors efficiently. It aims to achieve edit precision (accurately changing the target fact) and knowledge retention (not affecting unrelated facts or general capabilities).
  • (Subject, Relation, Object) Triples: A common way to represent factual knowledge, often denoted as (s, r, o). For example, (Paris, is capital of, France).
    • Subject (s): The entity about which a fact is stated.
    • Relation (r): The predicate or relationship linking the subject and object.
    • Object (o): The entity that completes the fact with the subject and relation.
  • Object Editing: The traditional knowledge editing paradigm where the object (o) in a (s, r, o) triple is changed (e.g., (s,r,o)>(s,r,o)(s, r, o) -> (s, r, o*)).
  • Relation Editing: The novel knowledge editing paradigm proposed in this paper, where the relation (r) in a (s, r, o) triple is changed (e.g., (s,r,o)>(s,r,o)(s, r, o) -> (s, r*, o)).
  • Parametric vs. Non-Invasive Editing:
    • Parametric (Weight-space) Editors: These methods directly modify the weights (parameters) of the LLM to embed new knowledge. Examples include ROME, MEMIT, AlphaEdit. They aim for surgical alterations.
    • Non-Invasive Approaches: These methods do not directly alter the LLM's core weights but use external memory or prompt-based adaptation. Examples include MELO.
  • Model Unlearning (Machine Unlearning): The process of removing specific data or knowledge from a trained machine learning model such that the model behaves as if it was never trained on that data. In LLMs, this might involve forgetting specific facts to mitigate bias or comply with privacy regulations.
  • Self-Paced Learning (SPL) / Curriculum Learning (CL): A training strategy inspired by how humans learn, where models are initially trained on "easy" samples and then gradually exposed to "harder" ones. SPL automates this process by dynamically determining sample difficulty.
  • Linear Regression: A statistical method that models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. In knowledge editing, the process of finding model updates can often be formulated as a linear regression problem where input features (keys) are mapped to output targets (values).
  • Feed-Forward Network (FFN): A component within each Transformer layer in LLMs that processes information independently for each position. Many knowledge editing methods target the weights of these FFNs as they are thought to store factual knowledge.
  • Null Space: In linear algebra, the null space of a matrix AA is the set of all vectors xx for which Ax=0Ax = 0. In knowledge editing, projecting updates into a null space of existing knowledge aims to ensure that the edit does not affect those preserved facts.

3.2. Previous Works

The paper discusses several prior works, mainly categorizing them into Parameter-Based Knowledge Editing, Temporal Adaptation and Unlearning, and Curriculum and Self-Paced Learning.

3.2.1. Parameter-Based Knowledge Editing

These methods directly modify the LLM's parameters. They are split into meta-learning and locate-then-edit approaches.

  • Meta-Learning Approaches:

    • KE(Caoetal.,2021)KE (Cao et al., 2021): A pioneering work in knowledge editing that uses a meta-learning framework to enable models to learn how to update their knowledge. This often involves training a separate model to predict weight changes.
    • MEND (Mitchell et al., 2022): Improves meta-learning for knowledge editing by generating low-rank updates that are specific to the edit. It learns a meta-model that predicts a gradient transformation to apply to the network weights.
  • Locate-then-Edit Approaches: These methods first identify the specific weights associated with a fact and then apply a closed-form update. They formulate knowledge editing as a linear regression task where subject-relation embeddings (keys) are mapped to object embeddings (values).

    • General Formulation: $ \underset { \Delta } { \operatorname* { m i n } } | ( \mathbf { W } + \Delta ) \mathbf { K } _ { 1 } - \mathbf { V } _ { 1 } | _ { F } ^ { 2 } + \alpha | ( \mathbf { W } + \Delta ) \mathbf { K } _ { 0 } - \mathbf { V } _ { 0 } | _ { F } ^ { 2 } + \beta | \Delta | _ { F } ^ { 2 } $ Where:
      • W\mathbf{W}: The original model's weight matrix.
      • Δ\Delta: The perturbation matrix to be learned.
      • K1\mathbf{K}_1, V1\mathbf{V}_1: Keys and values for the new facts to be incorporated.
      • K0\mathbf{K}_0, V0\mathbf{V}_0: Keys and values for existing knowledge to be preserved.
      • α\alpha, β\beta: Hyperparameters balancing the update error, preservation error, and regularization on the perturbation magnitude.
    • ROME (Meng et al., 2022): (Rank-One Model Editing) identifies a specific feed-forward key-value subspace for a target fact and applies an optimal rank-one perturbation to rewrite it, while minimizing KL-divergence to preserve general behavior.
    • MEMIT (Meng et al., 2023): (Mass-Editing Memory in a Transformer) scales ROME to edit thousands of facts simultaneously by applying rank-one updates to multiple MLP layers. It uses a least-squares objective with regularizers for locality.
    • LoFiT (Yin et al., 2024): (Localized Fine-Tuning) focuses on fine-grained edits at the neuron-level or head-level.
    • FiNE (Pan et al., 2025): (Fine-grained Neuron-level Editing) is another neuron-level knowledge editing technique for LLMs, aiming for precise localization of memories.
    • RECT (Gu et al., 2024): (Regularization to the Rescue) reformulates model editing as a low-rank, layer-wise correction problem. It identifies causally critical MLP layers and applies rank-r updates, using a consistency loss to preserve original distribution.
    • NSE (Jiang et al., 2024): (Neuron-level Sequential Editing) reframes knowledge editing as neuron-level intervention within feed-forward layers. It uses integrated-gradients attribution to detect sparse neurons and applies fact-specific scaling vectors and additive bias terms.
    • PRUNE (Ma et al., 2025): (Perturbation-Restrained Sequential Model Editing) treats model editing as parameter-efficient subspace pruning within MLP blocks. It identifies a sparse mask and trains a low-rank adapter on the pruned subspace.
    • AlphaEdit (Fang et al., 2025): This is a direct predecessor and a key baseline for SPaEdit. It introduces a null-space projection (via matrix PP) to theoretically guarantee that updates do not disturb previously stored knowledge. The update is projected into the null space of preserved knowledge keys (Kp\mathbf{K}_p). Its core update rule for ΔP\Delta P is: $ \Delta \mathbf { P } = ( \mathbf { V } _ { 1 } - \mathbf { W K } _ { 1 } ) \mathbf { K } _ { 1 } ^ { \top } \mathbf { P } \left( \mathbf { K } _ { 1 } \mathbf { K } _ { 1 } ^ { \top } \mathbf { P } + \beta \mathbf { K } _ { p } \mathbf { K } _ { p } ^ { \top } \mathbf { P } + \alpha \mathbf { I } \right) ^ { - 1 } $ Where I\mathbf{I} is the identity matrix, and other symbols are as defined above, with Kp\mathbf{K}_p representing preserved knowledge keys. AlphaEdit is known for its safety and efficacy.

3.2.2. Temporal Adaptation and Unlearning

This category focuses on removing or isolating obsolete knowledge.

  • Gradient-based Approaches:
    • Forgetting losses (Yao et al., 2024): Methods that introduce specific loss functions to encourage the model to forget certain data.
    • Orthogonal projection updates (Hoang et al., 2024): Using gradient projection to minimize unlearning interference.
    • Fisher weighted masking (Cha et al., 2024): Utilizes Fisher information to identify important weights for unlearning.
  • Memory-centric Methods (External Memory):
    • GRACE (Hartvigsen et al., 2023): (Aging with GRACE) uses discrete key-value adaptors for lifelong model editing.
    • T-Patcher (Huang et al., 2023): (Transformer-Patcher) aims to localize and correct mistakes by modifying a small number of neurons.
    • KV scrubbing (Wang et al., 2024a): Rethinks knowledge memory for lifelong model editing. These methods often assign fixed forget-set targets (e.g., "I don't know" or random answers), which this paper theoretically shows to induce systematic bias in linear regression-based editing.

3.2.3. Curriculum and Self-Paced Learning

  • Curriculum Learning (CL) (Bengio et al., 2009): Orders training samples from easy to hard, using heuristics.
  • Self-Paced Learning (SPL) (Kumar et al., 2010): Automates sample selection with regularized weights. These principles have been extended to RL controllers (Graves et al., 2017), LLM instruction-tuning, and continual learning (Ke et al., 2022; Liu et al., 2024b; Ge et al., 2025). However, the paper notes that they have not been systematically applied to knowledge editing before.

3.3. Technological Evolution

The field of knowledge editing has evolved from initial fine-tuning approaches (which are costly and risk catastrophic forgetting) to more targeted meta-learning and locate-then-edit methods. Early locate-then-edit methods like ROME and MEMIT focused on efficient object editing by identifying and updating specific MLP weights. More recently, work like AlphaEdit introduced null-space projection to guarantee knowledge preservation, addressing concerns about unintended side effects. Concurrently, model unlearning emerged to address the challenge of removing unwanted knowledge.

This paper's work fits within this timeline by addressing a previously overlooked aspect: relation editing. It builds upon the locate-then-edit paradigm, specifically AlphaEdit, by integrating a novel unlearning strategy (FE) and a self-paced learning approach (SPaEdit). This represents an evolution towards more robust, comprehensive, and nuanced knowledge editing capabilities that can handle complex updates and challenging samples while preserving general model integrity.

3.4. Differentiation Analysis

Compared to the main methods in related work, this paper's approach offers several core differences and innovations:

  • Focus on Relation Editing: The most significant differentiation is the explicit focus on Relation Editing. While prior work RaKE (Wei et al., 2023) briefly touched upon it, this paper presents the first systematic study of this task, including a dedicated dataset (ReEditBench) and a tailored methodology. Existing object-editing methods are shown to perform poorly on relation editing due to the high retention of old knowledge.
  • Novel Forgetting-and-Editing (FE) Strategy:
    • Theoretical Unsuitability of Conventional Unlearning: The paper rigorously shows that standard model unlearning strategies (e.g., setting targets to "I don't know" or random responses) introduce systematic bias and are ineffective for relation forgetting in linear regression-based editing.
    • Interpolation-based Target Smoothing: FE proposes a novel interpolation-based target assignment strategy for forgetting. Instead of a fixed or random target, it smoothly moves the old object's representation towards a neutral state, which is theoretically proven to suppress systematic bias, improve edit success, and reduce retention while inducing smaller perturbations to normal knowledge. This is a critical innovation over existing unlearning methods.
  • Integration of Self-Paced Learning via SPaEdit:
    • Addressing Hard Samples: Unlike existing knowledge editing methods that typically use single-pass optimization and struggle with difficult editing samples, SPaEdit integrates self-paced learning (an easy-to-hard curriculum). This allows the model to build a robust foundation on easier edits before progressively tackling more challenging ones.
    • Minimal Overhead, Enhanced Robustness: SPaEdit extends AlphaEdit with minimal structural overhead (only introducing a diagonal matrix ZZ), yet it achieves substantial performance gains, particularly on hard samples, without degrading general capabilities.
  • Comprehensive Solution for Relation Editing: The combination of FE (for effective forgetting) and SPaEdit (for robust learning, especially of hard samples) provides a holistic framework specifically tailored for the complexities of relation editing, where both the erasure of old relations and the acquisition of new ones are critical. This goes beyond the capabilities of existing methods which either focus only on learning new objects or use less effective unlearning mechanisms.

4. Methodology

The paper proposes a novel framework for Relation Editing that addresses the challenges of persistent retention of outdated knowledge and poor performance on hard-to-edit samples. The framework consists of two main components: the Forgetting-and-Editing (FE) strategy, which includes a new target assignment scheme for forgetting old relations, and Self-Paced AlphaEdit (SPaEdit), an algorithm that leverages self-paced learning to handle samples of varying difficulty.

The overall framework is illustrated in Figure 3.

Figure 3: Overview of our proposed framework for relation editing, combining a novel forgetting-and-editing (FE) strategy with a Self-paced AlphaEdit (SPaEdit) algorithm.
该图像是一个示意图,展示了我们提出的关系编辑框架,结合了遗忘与编辑(FE)策略和自适应AlphaEdit(SPaEdit)算法。图中展示了编辑和遗忘数据的选择过程,以及更新新对象向量的公式 v(ô) = v(o) + eta[v(IDK) - v(o)]

Figure 3: Overview of our proposed framework for relation editing, combining a novel forgetting-and-editing (FE) strategy with a Self-paced AlphaEdit (SPaEdit) algorithm.

4.1. Problem Description

Knowledge editing aims to update factual triples in LLMs. Unlike object editing, which modifies oo in a (subject, relation, object) tuple (s, r, o) to (s,r,o)(s, r, o^*), relation editing alters the relation rr to rr^*, resulting in a new tuple (s,r,o)(s, r^*, o).

In the locate-then-edit paradigm, each edit applies a perturbation Δ\Delta to the model parameters WRd1×d0\mathbf{W} \in \mathbb{R}^{d_1 \times d_0}. Here, d0d_0 and d1d_1 denote the dimensions of the FFN's intermediate and output layers, respectively. For updating hh relation facts, let K1=[k1k2kh]Rd0×h\mathbf{K}_1 = [\pmb{k}_1 | \pmb{k}_2 | \cdots | \pmb{k}_h] \in \mathbb{R}^{d_0 \times h} be the key matrix for the raw subject-relation pairs, and K1=[k1k2kh]Rd0×h\mathbf{K}_1' = [\pmb{k}_1' | \pmb{k}_2' | \cdots | \pmb{k}_h'] \in \mathbb{R}^{d_0 \times h} be the key matrix for the updated subject-relation pairs. The value matrix V1=[v1v2vh]Rd1^×h\mathbf{V}_1 = [\pmb{v}_1 | \pmb{v}_2 | \cdots | \pmb{v}_h] \in \mathbb{R}^{\hat{d_1} \times h} remains unchanged, representing the object oo.

Directly applying object editing methods to relation editing would involve minimizing the error for updated relations while preserving existing knowledge. This is typically formulated as: Δ=argminΔ~(W+Δ~)K1V1F2. \begin{array} { r } { \pmb { \Delta } = \arg \operatorname* { m i n } _ { \tilde { \mathbf { \Delta } } } \| ( \mathbf { W } + \tilde { \pmb { \Delta } } ) \mathbf { K } _ { 1 } ^ { \prime } - \mathbf { V } _ { 1 } \| _ { F } ^ { 2 } . } \end{array} Where:

  • Δ\Delta: The perturbation to the model weights.

  • W\mathbf{W}: The original weight matrix of the LLM.

  • K1\mathbf{K}_1': The key matrix for the new subject-relation pairs (s,r)(s, r^*).

  • V1\mathbf{V}_1: The value matrix for the object oo.

  • F2\| \cdot \|_F^2: The squared Frobenius norm, which measures the difference between the model's output and the target values.

    However, empirical evaluation revealed that this direct application leads to high retention of original knowledge and poor performance on hard-to-edit relations, necessitating a new strategy.

4.2. Theoretical Investigation of Forgetting Strategies

The paper theoretically investigates why conventional LLM unlearning methods are unsuitable for old relation forgetting under linear regression-based editing methods like AlphaEdit and MEMIT. These conventional methods typically set the prediction target for data to be forgotten to either "I don't know" (IDK) or a random response.

The analysis models knowledge editing as a linear homogeneous regression problem y=wxy = \pmb{w}^\top \pmb{x} with a training set D={(xi,yi)}\mathbb{D} = \{(\pmb{x}_i, \pmb{y}_i)\}. The set D\mathbb{D} is split into Dg\mathbb{D}_g (normal data) and Db\mathbb{D}_b (forgetting data). The optimal weight vector w\pmb{w}^* is obtained by minimizing Mean Squared Error (MSE): w=(XX)1(Xgyg+Xbyb), \pmb { w } ^ { * } = ( \mathbf { X } ^ { \top } \mathbf { X } ) ^ { - 1 } ( \mathbf { X } _ { g } ^ { \top } \pmb { y } _ { g } + \mathbf { X } _ { b } ^ { \top } \pmb { y } _ { b } ) , Where:

  • w\pmb{w}^*: The optimal weight vector.
  • XRN×d\mathbf{X} \in \mathbb{R}^{N \times d}: The feature matrix for all data.
  • yRN\pmb{y} \in \mathbb{R}^N: The label vector for all data.
  • Xgyg\mathbf{X}_g^\top \pmb{y}_g: The contribution from normal data Dg\mathbb{D}_g.
  • Xbyb\mathbf{X}_b^\top \pmb{y}_b: The contribution from forgetting data Db\mathbb{D}_b.

4.2.1. Case 1: Fixed Target (e.g., IDK)

If each label yiy_i in Db\mathbb{D}_b is fixed to a constant y^\hat{y} (simulating all objects changed to IDK), the solution w\pmb{w}^* becomes: wconst=(XX)1(Xgyg+y^N2u)=wg+y^N2(XX)1u, \pmb { w } _ { \mathrm { c o n s t } } ^ { * } = ( \mathbf { X } ^ { \top } \mathbf { X } ) ^ { - 1 } ( \mathbf { X } _ { g } ^ { \top } \mathbf { y } _ { g } + \frac { \hat { y } N } { 2 } \pmb { u } ) = \pmb { w } _ { g } ^ { * } + \frac { \hat { y } N } { 2 } ( \mathbf { X } ^ { \top } \mathbf { X } ) ^ { - 1 } \pmb { u } , Where:

  • wconst\pmb{w}_{const}^*: The optimal weight vector when forgetting data targets a constant value.
  • wg\pmb{w}_g^*: The solution achieved by only applying normal data Dg\mathbb{D}_g.
  • y^\hat{y}: The constant target value (e.g., IDK).
  • NN: The total number of samples.
  • u=1DbiDbxi\pmb{u} = \frac{1}{|\mathbb{D}_b|} \sum_{i \in \mathbb{D}_b} \pmb{x}_i: The mean feature vector of the forgetting data. This equation shows that a systematic bias is introduced, pulling the solution towards y^\hat{y}. The degree of distortion depends on the correlation between new inputs and (XX)1u( \mathbf { X } ^ { \top } \mathbf { X } ) ^ { - 1 } \pmb { u }.

4.2.2. Case 2: Random Target

If each label yiy_i in Db\mathbb{D}_b is set to a random value (simulating random object assignments), the expected solution E[wrand]\mathbb{E}[\pmb{w}_{rand}^*] becomes: \mathbb { E } [ { \pmb w } _ { \mathrm { r a n d } } ^ { * } ] = ( \mathbf { X } ^ { \top } \mathbf { X } ) ^ { - 1 } ( \mathbf { X } _ { g } ^ { \top } { \pmb y } _ { g } + \mathbb { E } [ \mathbf { X } _ { b } ^ { \top } { \pmb y } _ { b } ] ) = { \pmb w } _ { g } ^ { * } + ( \mathbf { X } ^ ^ { \top } \mathbf { X } ) ^ { - 1 } ( 0 . 5 | \mathbb { D } _ { b } | \mathbb { E } [ \mathbf { x } ] ) . Where:

  • E[wrand]\mathbb{E}[\pmb{w}_{rand}^*]: The expected optimal weight vector when forgetting data targets random values.
  • E[x]\mathbb{E}[\mathbf{x}]: The expected feature vector. Similar to the first case, random noise introduces a systematic bias in expectation, pulling the solution towards a direction determined by the irrelevant feature mean, forcing predicted values to skew towards 0.5 (average response in LLMs). This distortion affects both normal and unlearning samples.

The theoretical analysis concludes that standard unlearning strategies cause normal knowledge to become systematically distorted when used with current model editing methods to forget old relations.

4.3. Knowledge Forgetting via Target Smoothing

Given the ineffectiveness of conventional unlearning strategies, the paper proposes a novel target assignment strategy for knowledge forgetting. The key is to determine a suitable object o^\hat{o} for the triplet (s, r, o) to be unlearned, such that o^\hat{o} is neither uniform nor randomly assigned.

The strategy is guided by three considerations:

  1. Non-constant assignment: Avoids uniform targets.

  2. Non-random assignment: Avoids uncontrolled random targets.

  3. Target vector proximity: Ensures the difference between the vector representation of o^\hat{o} (v(ô)) and the original object oo (v(o)) is not too large. This is important because the vector representations of (s, r) and (s,r)(s, r^*) are often highly similar (as shown in Figure 2), and a large disparity in object values would make the optimization problem significantly harder.

    Based on these, the vector for o^\hat{o} is generated through interpolation: v(o^)=v(o)+γ[v(IDK)v(o)],γ(0,1). \pmb { v } ( \hat { o } ) = \pmb { v } ( o ) + \gamma [ \pmb { v } ( \mathrm { I D K } ) - \pmb { v } ( o ) ] , \gamma \in ( 0 , 1 ) . Where:

  • v(o^)\pmb{v}(\hat{o}): The interpolated value vector for the new target object o^\hat{o} to be unlearned.

  • v(o)\pmb{v}(o): The original value vector of the object oo.

  • v(IDK)\pmb{v}(\mathrm{IDK}): The value vector representing "I don't know".

  • γ\gamma: The interpolation factor, a hyperparameter controlling the degree of interpolation. γ(0,1)\gamma \in (0, 1) ensures it's a blend.

    This assignment strategy creates a non-constant, data-dependent bias term that suppresses systematic bias, improves edit success, and reduces retention, while inducing smaller perturbations to normal knowledge and yielding more stable optimization.

4.3.1. Theoretical Analysis of the Forgetting-and-Editing Strategy (Appendix B.1)

The paper further analyzes this FE strategy within the linear regression framework. For any sample ii in the forgetting set Db\mathbb{D}_b, the modified target label (value vector) becomes: v(o^i)=(1γ)v(oi)+γv(IDK). \pmb { v } ( \hat { o } _ { i } ) = ( 1 - \gamma ) \pmb { v } ( o _ { i } ) + \gamma \pmb { v } ( \mathrm { I D K } ) . Extending this to a forgetting set of MM samples, the modified label vector ybFE\pmb{y}_b^{\mathrm{FE}} is: ybFE=[v(o^1),v(o^2),...,v(o^M)]=yb+γ(yIDKyb). \begin{array} { r l } & { \pmb { y } _ { b } ^ { \mathrm { F E } } = [ \pmb { v } ( \hat { o } _ { 1 } ) , \pmb { v } ( \hat { o } _ { 2 } ) , . . . , \pmb { v } ( \hat { o } _ { M } ) ] ^ { \top } } \\ & { \quad \quad = \pmb { y } _ { b } + \gamma ( \pmb { y } _ { \mathrm { I D K } } - \pmb { y } _ { b } ) . } \end{array} Substituting this into the closed-form solution of the linear regression problem (similar to Eqn. 2) yields: wFE=(XX)1(Xgyg+XbybFE)=(XX)1(Xgyg+Xb[yb+γ(yIDKyb)])=wg+γ(XX)1Xb(yIDKyb), \begin{array} { r l } & { \pmb { w } _ { \mathrm { F E } } ^ { * } = ( \mathbf { X } ^ { \top } \mathbf { X } ) ^ { - 1 } ( \mathbf { X } _ { g } ^ { \top } \pmb { y } _ { g } + \mathbf { X } _ { b } ^ { \top } \pmb { y } _ { b } ^ { \mathrm { F E } } ) } \\ & { \qquad = ( \mathbf { X } ^ { \top } \mathbf { X } ) ^ { - 1 } \left( \mathbf { X } _ { g } ^ { \top } \pmb { y } _ { g } + \mathbf { X } _ { b } ^ { \top } \left[ \pmb { y } _ { b } + \gamma ( \pmb { y } _ { \mathrm { I D K } } - \pmb { y } _ { b } ) \right] \right) } \\ & { \qquad = \pmb { w } _ { g } ^ { * } + \gamma ( \mathbf { X } ^ { \top } \mathbf { X } ) ^ { - 1 } \mathbf { X } _ { b } ^ { \top } ( \pmb { y } _ { \mathrm { I D K } } - \pmb { y } _ { b } ) , } \end{array} Where:

  • wFE\pmb{w}_{\mathrm{FE}}^*: The optimal weight vector when using the FE strategy.
  • wg=(XX)1Xgyg\pmb{w}_g^* = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}_g^\top \pmb{y}_g: The solution trained solely on normal data.
  • yb\pmb{y}_b: The original label vector for the forgetting set.
  • yIDK\pmb{y}_{\mathrm{IDK}}: The IDK label vector (all elements are v(IDK)\pmb{v}(\mathrm{IDK})). The FE strategy introduces a non-constant, data-dependent bias term γ(XX)1Xb(yIDKyb)\gamma ( \mathbf { X } ^ { \top } \mathbf { X } ) ^ { - 1 } \mathbf { X } _ { b } ^ { \top } ( \pmb { y } _ { \mathrm { I D K } } - \pmb { y } _ { b } ). This term provides precise and continuous control over forgetting strength (via γ\gamma), prevents systematic bias, preserves prediction diversity, and leads to stable optimization due to proximity between original and target features.

4.4. The Proposed Forgetting-and-Editing (FE) Strategy

The Forgetting-and-Editing (FE) strategy is a comprehensive framework that integrates the unlearning of outdated relations with the injection of new knowledge into a single optimization step. For a batch of NN relation editing samples, where the ii-th sample involves changing from (si,ri,oi)(s_i, r_i, o_i) to (si,ri,oi)(s_i, r_i^*, o_i), the procedure involves two stages:

  1. Stage 1: Constructing the Forgetting Pairs.

    • For each original triplet (si,ri,oi)(s_i, r_i, o_i), the interpolated target value v(o^i)\pmb{v}(\hat{o}_i) is computed using Eqn. 5.
    • A forgetting pair (ki,v(o^i))(k_i, \pmb{v}(\hat{o}_i)) is formed, where kik_i is the key vector corresponding to the original subject-relation (si,ri)(s_i, r_i). This pair instructs the model to shift the representation of the old relation towards a neutral state, effectively suppressing the outdated knowledge.
  2. Stage 2: Constructing the Editing Pairs.

    • Simultaneously, for each new triplet (si,ri,oi)(s_i, r_i^*, o_i), a standard editing pair (ki,v(oi))(k_i', \pmb{v}(o_i)) is constructed. Here, kik_i' is the key vector for the new subject-relation (si,ri)(s_i, r_i^*), and v(oi)\pmb{v}(o_i) is the target value of the object. This pair ensures the model accurately captures the new relational association.
  3. Joint Optimization.

    • Both the forgetting pairs and editing pairs are concatenated to form the full training set for the current batch: Dtotal=i=1N{(ki,v(o^i)),(ki,v(oi))}. \mathcal { D } _ { \mathrm { t o t a l } } = \bigcup _ { i = 1 } ^ { N } \{ ( k _ { i } , v ( \hat { o } _ { i } ) ) , ( k _ { i } ^ { \prime } , v ( o _ { i } ) ) \} .
    • This combined dataset Dtotal\mathcal{D}_{\mathrm{total}} is then fed into a base editor (e.g., AlphaEdit or SPaEdit). By jointly optimizing for both objectives (forgetting and editing), the algorithm updates the weights to simultaneously unlearn the old relation and acquire the new one, resolving the inherent conflict in relation editing.

4.5. Improvement via Self-Paced Learning (SPaEdit)

The FE strategy is further enhanced by incorporating self-paced learning (SPL) to address the challenge of hard editing samples. This approach, named Self-paced AlphaEdit (SPaEdit), builds upon AlphaEdit by introducing an easy-to-hard curriculum.

4.5.1. Formulation of the Multi-Objective Optimization Problem (Appendix B.2)

The fundamental goal of parameter-modifying knowledge editing is to find a minimal perturbation Δ\Delta to a model's weight matrix W\mathbf{W}, such that the edited model W=W+Δ\mathbf{W}' = \mathbf{W} + \Delta reflects new knowledge without catastrophically forgetting existing information.

The original AlphaEdit objective for finding an optimal perturbation Δ\Delta is: argminΔ(W+ΔP)K1V1F2+αΔPF2+βΔPKpF2. \underset { \Delta } { \arg \operatorname* { m i n } } \left\| ( \mathbf { W } + \Delta \mathbf { P } ) \mathbf { K } _ { 1 } - \mathbf { V } _ { 1 } \right\| _ { F } ^ { 2 } + \alpha \big \| \Delta \mathbf { P } \big \| _ { F } ^ { 2 } + \beta \left\| \Delta \mathbf { P } \mathbf { K } _ { p } \right\| _ { F } ^ { 2 } . Where:

  • W\mathbf{W}: The original weight matrix.

  • Δ\Delta: The perturbation to the model.

  • P\mathbf{P}: The null-space projector matrix, which is symmetric (P=P\mathbf{P} = \mathbf{P}^\top) and idempotent (P2=P\mathbf{P}^2 = \mathbf{P}). This matrix projects the update into a subspace that does not interfere with preserved knowledge.

  • K1\mathbf{K}_1, V1\mathbf{V}_1: Keys and values of the facts to be edited (new knowledge).

  • α\alpha: Regularization coefficient constraining the overall magnitude of the update ΔP\Delta \mathbf{P}.

  • β\beta: Regularization coefficient penalizing interference with previously edited/preserved knowledge represented by keys Kp\mathbf{K}_p.

    To introduce self-paced learning, SPaEdit recasts this objective by introducing binary selectors zi{0,1}z_i \in \{0, 1\} to build an adaptive curriculum: minΔ,zI(Δ,z;λ)=i=1nzii(Δ)+αΔPF2+βΔPKpF2λi=1nzi. \operatorname* { m i n } _ { \boldsymbol { \Delta } , \boldsymbol { z } } \mathcal { I } ( \boldsymbol { \Delta } , \boldsymbol { z } ; \boldsymbol { \lambda } ) = \sum _ { i = 1 } ^ { n } z _ { i } \ell _ { i } ( \boldsymbol { \Delta } ) + \alpha \| \boldsymbol { \Delta } \mathbf { P } \| _ { F } ^ { 2 } + \beta \| \boldsymbol { \Delta } \mathbf { P } \mathbf { K } _ { p } \| _ { F } ^ { 2 } - \boldsymbol { \lambda } \sum _ { i = 1 } ^ { n } z _ { i } . Where:

  • z\boldsymbol{z}: A vector of binary selectors where zi=1z_i=1 means the ii-th sample is included in the editing process, and zi=0z_i=0 means it is excluded.

  • i(Δ)\ell_i(\Delta): The sample-wise loss for the ii-th edit, defined as the squared error: $ \ell _ { i } ( \Delta ) = | ( \mathbf { W } + \Delta \mathbf { P } ) k _ { i } - v _ { i } | _ { 2 } ^ { 2 } = | \Delta \mathbf { P } k _ { i } - r _ { i } | _ { 2 } ^ { 2 } $ Here, ri=viWkir_i = v_i - \mathbf{W}k_i is the residual for the ii-th sample (the error the edit needs to correct).

  • λ\lambda: The pace parameter, which controls the difficulty threshold of the curriculum. A larger λ\lambda allows more difficult samples to be included.

  • The term λi=1nzi-\lambda \sum_{i=1}^n z_i: This term encourages the inclusion of more samples as λ\lambda increases, balancing the loss minimization with the curriculum progression.

4.5.2. Alternating Minimization

SPaEdit optimizes this objective via alternating minimization between Δ\Delta and z\boldsymbol{z}.

4.5.2.1. Step 1: Solving for Δ\Delta with z\boldsymbol{z} fixed

With z\boldsymbol{z} fixed, the problem reduces to a regularized least-squares objective over the subset of "easy" samples (where zi=1z_i=1). Let Z=diag(z)\mathbf{Z} = \mathrm{diag}(\boldsymbol{z}) be a diagonal matrix with ziz_i on its diagonal. The objective becomes: minΔ(ΔPK1(V1WK1))Z1/2F2+αΔPF2+βΔPKpF2. \operatorname* { m i n } _ { \Delta } \left\| \left( \Delta \mathbf { P } \mathbf { K } _ { 1 } - \left( \mathbf { V } _ { 1 } - \mathbf { W } \mathbf { K } _ { 1 } \right) \right) \mathbf { Z } ^ { 1 / 2 } \right\| _ { F } ^ { 2 } + \alpha \| \Delta \mathbf { P } \| _ { F } ^ { 2 } + \beta \left\| \Delta \mathbf { P } \mathbf { K } _ { p } \right\| _ { F } ^ { 2 } . Where R=V1WK1\mathbf{R} = \mathbf{V}_1 - \mathbf{W}\mathbf{K}_1 represents the residual matrix. Since zi{0,1}z_i \in \{0, 1\}, Z1/2=Z\mathbf{Z}^{1/2} = \mathbf{Z}. The objective can be rewritten (as derived in Appendix B.3, and similar to Eqn. 31): $ \operatorname* { m i n } _ { \mathbf { \Delta } } \mathcal { L } ( \mathbf { \Delta } ) = | ( \mathbf { \Delta } \mathbf { P } \mathbf { K } _ { 1 } - \mathbf { R } ) \mathbf { Z } | _ { F } ^ { 2 } + \alpha | \mathbf { \Delta } \mathbf { P } | _ { F } ^ { 2 } + \beta | \mathbf { \Delta } \mathbf { P } \mathbf { K } _ { p } | _ { F } ^ { 2 } $

This is a convex problem with a closed-form solution for the update ΔSPaEdit=ΔP\Delta_{\mathrm{SPaEdit}} = \Delta \mathbf{P}: ΔSPaEdit=(RZK1P)(K1ZK1P+βKpKpP+αI)1. \pmb { \Delta } _ { \mathrm { S P a E d i t } } = ( \mathbf { R Z K } _ { 1 } ^ { \top } \mathbf { P } ) ( \mathbf { K } _ { 1 } \mathbf { Z K } _ { 1 } ^ { \top } \mathbf { P } + \beta \mathbf { K } _ { p } \mathbf { K } _ { p } ^ { \top } \mathbf { P } + \alpha \mathbf { I } ) ^ { - 1 } . Where:

  • ΔSPaEdit=ΔP\Delta_{\mathrm{SPaEdit}} = \Delta \mathbf{P}: The effective projected update matrix.
  • R=V1WK1\mathbf{R} = \mathbf{V}_1 - \mathbf{W}\mathbf{K}_1: The residual matrix.
  • Z\mathbf{Z}: The diagonal selection matrix where zi=1z_i=1 for selected samples.
  • K1\mathbf{K}_1: Key matrix for facts to be edited.
  • Kp\mathbf{K}_p: Key matrix for preserved knowledge.
  • P\mathbf{P}: Null-space projector matrix.
  • α,β\alpha, \beta: Regularization coefficients.
  • I\mathbf{I}: Identity matrix. The matrix (K1ZK1P+βKpKpP+αI)( \mathbf { K } _ { 1 } \mathbf { Z K } _ { 1 } ^ { \top } \mathbf { P } + \beta \mathbf { K } _ { p } \mathbf { K } _ { p } ^ { \top } \mathbf { P } + \alpha \mathbf { I } ) is guaranteed to be invertible because K1ZK1P\mathbf { K } _ { 1 } \mathbf { Z K } _ { 1 } ^ { \top } \mathbf { P } and βKpKpP\beta \mathbf { K } _ { p } \mathbf { K } _ { p } ^ { \top } \mathbf { P } are positive semi-definite, and αI\alpha \mathbf { I } (for α>0\alpha > 0) makes the entire matrix positive definite.

4.5.2.2. Step 2: Determining optimal sample selection z\boldsymbol{z}^* with Δ\Delta fixed

This step realizes an easy-to-hard curriculum by adjusting the difficulty threshold λ\lambda to progressively incorporate more challenging samples. For each sample ii: zi(λ)={1,ifi(Δ)<λ0,otherwise. z _ { i } ^ { * } ( \lambda ) = \left\{ \begin{array} { l l } { 1 , } & { \mathrm { i f } \ell _ { i } ( \Delta ) < \lambda } \\ { 0 , } & { \mathrm { o t h e r w i s e } } \end{array} \right. . Where:

  • zi(λ)z_i^*(\lambda): The optimal selector for sample ii given λ\lambda.

  • i(Δ)\ell_i(\Delta): The sample-wise loss for sample ii with the current Δ\Delta.

  • λ\lambda: The pace parameter or difficulty threshold. Samples with loss below λ\lambda are considered "easy" and included.

    This two-step process (Algorithm 1) is iterated, with λ\lambda gradually increasing over time to incorporate more difficult samples. The iterative process stops when the validation loss (calculated on a dedicated validation set) plateaus (early stopping).

Algorithm 1: SPaEdit

Input: K1Rd×n\mathbf { K } _ { 1 } \in \mathbb { R } ^ { d \times n } (keys for facts to be edited), V1Rd×n\mathbf { V } _ { 1 } \in \mathbb { R } ^ { d \times n } (values for facts to be edited), WRd×d\mathbf { W } \in \mathbb { R } ^ { d \times d } (initial model weights), PRd×d\mathbf { P } \in \mathbb { R } ^ { d \times d } (null-space projector), KpRd×m\mathbf { K } _ { p } \in \mathbb { R } ^ { d \times m } (keys for preserved knowledge), α,β\alpha, \beta (regularization coefficients), λ0\lambda_0 (initial pace parameter), μ\mu (pace growth factor), TT (max iterations).

Output: sequence of edited matrices {W(t)}t=1T\{\mathbf{W}^{(t)}\}_{t=1}^T

  1. Initialize: W(0)=W\mathbf{W}^{(0)} = \mathbf{W}, t=0t = 0, λ=λ0\lambda = \lambda_0, Z=I\mathbf{Z} = \mathbf{I} (initially include all samples).

  2. Repeat for t=1,,Tt = 1, \dots, T: a. Update Δ\Delta: Calculate residual R=V1W(t1)K1\mathbf{R} = \mathbf{V}_1 - \mathbf{W}^{(t-1)}\mathbf{K}_1. Compute ΔSPaEdit(t)\Delta_{\mathrm{SPaEdit}}^{(t)} using the closed-form solution (Eqn. 10): $ \Delta _ { \mathrm { S P a E d i t } } ^ { ( t ) } = ( \mathbf { R Z K } _ { 1 } ^ { \top } \mathbf { P } ) ( \mathbf { K } _ { 1 } \mathbf { Z K } _ { 1 } ^ { \top } \mathbf { P } + \beta \mathbf { K } _ { p } \mathbf { K } _ { p } ^ { \top } \mathbf { P } + \alpha \mathbf { I } ) ^ { - 1 } $ Update model weights: W(t)=W(t1)+ΔSPaEdit(t)\mathbf{W}^{(t)} = \mathbf{W}^{(t-1)} + \Delta_{\mathrm{SPaEdit}}^{(t)}. b. Update z\boldsymbol{z}: Calculate sample-wise losses i(Δ(t))\ell_i(\Delta^{(t)}) for all samples ii. Update Z\mathbf{Z} based on the pace parameter λ\lambda: For each ii, set zi(t)=1z_i^{(t)} = 1 if i(Δ(t))<λ\ell_i(\Delta^{(t)}) < \lambda, else zi(t)=0z_i^{(t)} = 0. Update Z(t)=diag(z(t))\mathbf{Z}^{(t)} = \mathrm{diag}(\boldsymbol{z}^{(t)}). c. Update λ\lambda: λμλ\lambda \leftarrow \mu \lambda. d. Model Selection: Evaluate W(t)\mathbf{W}^{(t)} on the validation set (Appendix A.1.2). If validation loss plateaus (early stopping), break.

  3. Return: The sequence of edited matrices {W(t)}t=1T\{\mathbf{W}^{(t)}\}_{t=1}^T, with the optimal model selected based on validation.

    Compared to AlphaEdit, SPaEdit incurs minimal structural overhead, requiring only the introduction of the diagonal matrix ZZ to dynamically control the optimization order of the samples.

5. Experimental Setup

5.1. Datasets

The experiments primarily use a newly constructed relation editing dataset and established object editing benchmarks.

5.1.1. ReEditBench (Relation Editing Dataset)

This is a novel benchmark constructed specifically for the Relation Editing task.

  • Source: Curated from high-quality, knowledge-intensive benchmarks, mainly ZsRE (Levy et al., 2017) and Wikidata (Vrandei & Krötzsch, 2014).

  • Scale: Total 7,918 high-quality editing instances.

  • Characteristics: Built through a rigorous four-stage pipeline (detailed in Appendix A.1.1 and illustrated in Figure 6):

    1. Knowledge Collection: Sourcing initial (s, r, o) facts from ZsRE and Wikidata.
    2. LLM-based Relation Generation: A generator LLM (DeepSeekV3(Liuetal.,2024a)DeepSeekV3 (Liu et al., 2024a)) reframes facts into relation-editing tasks by modifying rr to rr* while keeping ss and oo fixed. Two types of edits are encouraged: New Relation (direct modification, e.g., CEO to CTO) and Conditional Relation (adding context/temporal constraint, e.g., President to 46th President).
    3. Automated Filtering Pipeline:
      • Script-based Filtering: Checks structural integrity.
      • LLM-based Verification: Uses an independent verifier LLM (DeepseekR1(Guoetal.,2025)DeepseekR1 (Guo et al., 2025)) to assess factual and semantic plausibility.
    4. Human Validation: 30% random sample manually validated, 98.5% instances confirmed valid.
  • Examples: Figure 12 provides examples of ReEditBench instances. Each entry is a knowledge replacement task with a subject, relation, original object, and new target object.

    The following are examples of the ReEditBench dataset:

{
  "type": "relation",
  "step1": {
    "subject": "Atlant-Soyuz Airlines",
    "src": "What airport is Atlant-Soyuz Airlines associated with?",
    "pred": "Vnukovo International Airport",
    "rephrase": "Which airport is assigned to Atlant-Soyuz Airlines?",
    "alt": "Vnukovo International Airport",
    "answers": [ "Sheremetyevo Airport" ],
    "loc": "nq question: the polar caps on mars are most probably made up of",
    "loc_ans": "water ice",
    "cond": "Vnukovo International Airport >> Sheremetyevo Airport || What airport is Atlant-Soyuz Airlines associated with?"
  },
  "step2": {
    "subject": "Atlant-Soyuz Airlines",
    "src": "What is the main operational base of Atlan Alliance Airlines?",
    "pred": "I don't Know",
    "rephrase": "At which airport is Atlant-Soyuz Airlines headquartered, and what serves as its central operational hub?",
    "alt": "Vnukovo International Airport",
    "answers": [ "Vnukovo International Airport" ],
    "loc": "nq question: the polar caps on mars are most probably made up of",
    "loc_ans": "water ice",
    "cond": "I don't Know >> Vyatka International Airport || What is the main operational base of Atlan Alliance Airlines?"
  }
}

In this example, Atlant-Soyuz Airlines changes its associated airport from Vnukovo International Airport to Sheremetyevo Airport. The step1 shows the original fact and its related information, while step2 represents the target edit.

5.1.2. ZsRE (Zero-shot Relation Extraction)

  • Source: (Levy et al., 2017).

  • Characteristics: A benchmark for zero-shot relation extraction tasks, commonly used for object editing evaluations. It consists of subject-relation-object triplets presented as natural language questions and answers.

  • Purpose: Used to assess the universality and generalization capabilities of SPaEdit on traditional object editing tasks. Hard subsets of ZsRE are also used for focused evaluation.

    The following are examples of the ZsRE dataset from Figure 13:

{
  "subject": "Shelley's crimsonwing",
  "src": "What is the endangered status of Shelley's crimsonwing?",
  "pred": "vulnerable",
  "rephrase": "What is the conservation status of Shelley's crimsonwing?",
  "alt": "vulnerable",
  "answers": [ "Endangered" ],
  "Lo": "ng question: where is the washington post based out of",
  "loc_ans": "Washington, D.C.",
  "cond": "vulnerable >> Endangered || What is the endangered status of Shelley's crimsonwing?"
},
{
  "subject": "Shelley's crimsonwing",
  "src": "What endangered category did the Shelley's crimsonwing finch once fall under?",
  "pred": "I don't Know",
  "rephrase": "Shelley's' crimson-wing finch was once classified as what level of endangered species?",
  "alt": "vulnerable",
  "answers": [ "vulnerable" ],
  "loc": "ng question: where is the washington post basd ut f",
  "loc_ans": "Washington, D.C.",
  "cond": "I don't Know >> vulnerable || What is the endangered status of Shelley's crimsonwing?"
}

In these examples, the subject is Shelley's crimsonwing, and the task involves understanding its endangered status, showing factual recall.

5.1.3. CounterFact

  • Source: (Meng et al., 2022).

  • Characteristics: Another benchmark for object editing, known for its challenging counterintuitive factual edits. These often require overriding strong pre-existing biases in the model.

  • Purpose: Used to assess SPaEdit's performance on generative tasks and its robustness against difficult edits. Hard subsets are particularly emphasized.

    The following are examples of the CounterFact dataset from Figure 14 (partial JSON):

{
  "target_new": "pitcher",
  "subject": "Charles Vanel",
  "locality_ground_truth": "French"
},
{
  "target_new": "Belgium",
  "subject": "Nenjil Or Aalayam",
  "rephrase_prompt": "Pamukkale's surroundings include",
  "locality_ground_truth": "India"
}

These snippets show a subject and a target_new object, often with locality information, aiming to edit specific facts like Charles Vanel's profession or the location associated with Nenjil Or Aalayam.

5.1.4. Validation Set Construction (Appendix A.1.2)

For iterative algorithms like SPaEdit, a dedicated validation set is used for model selection and early stopping.

  • Construction: 20% of the full training dataset is randomly held out. For each instance (s, r, o) to (s,r,o)(s, r^*, o), the following are defined:
    • Original Key-Value Pair: (korg,vorg)(k_{\mathrm{org}}, v_{\mathrm{org}}) for (s, r) and oo.
    • New Key-Value Pair: (knew,vorg)(k_{\mathrm{new}}, v_{\mathrm{org}}) for (s,r)(s, r^*) and oo.
    • Paraphrased Key: krek_{\mathrm{re}} (rephrasing of knewk_{\mathrm{new}}) for generalization testing.
    • Forget Target: vforget=vorg+γ(vIDKvorg)v_{\mathrm{forget}} = v_{\mathrm{org}} + \gamma (v_{\mathrm{IDK}} - v_{\mathrm{org}}) using the interpolation factor γ\gamma.
  • Iterative Evaluation with Weighted Loss Function: At each iteration tt, perturbation Δt\Delta_t and edited model weights Wt=W+ΔtP\mathbf{W}_t = \mathbf{W} + \Delta_t \mathbf{P} are obtained. Three distinct losses are calculated:
    1. Forgetting Loss (Lforget\mathcal{L}_{\mathrm{forget}}): Measures how successfully the model unlearns the original fact by moving korgk_{\mathrm{org}} towards vforgetv_{\mathrm{forget}}. Lforget(t)=E(korg,vforget)[Wtkorgvforget22] \mathcal { L } _ { \mathrm { f o r g e t } } ( t ) = \mathbb { E } _ { ( k _ { \mathrm { o r g } } , v _ { \mathrm { f o r g e t } } ) } \left[ \left| \left| \mathbf { W } _ { t } k _ { \mathrm { o r g } } - v _ { \mathrm { f o r g e t } } \right| \right| _ { 2 } ^ { 2 } \right] Where E[]\mathbb{E}[\cdot] denotes the average loss over all samples in the validation set.
    2. Efficacy Loss (Lefficacy\mathcal{L}_{\mathrm{efficacy}}): Assesses direct acquisition of new knowledge, i.e., error between Wtknew\mathbf{W}_t k_{\mathrm{new}} and vorgv_{\mathrm{org}}. Lefficacy(t)=E(knew,vorg)[Wtknewvorg22] \mathcal { L } _ { \mathrm { e f f i c a c y } } ( t ) = \mathbb { E } _ { ( k _ { \mathrm { n e w } } , v _ { \mathrm { o r g } } ) } \left[ \left| \left| \mathbf { W } _ { t } k _ { \mathrm { n e w } } - v _ { \mathrm { o r g } } \right| \right| _ { 2 } ^ { 2 } \right]
    3. Generalization Loss (Lgen\mathcal{L}_{\mathrm{gen}}): Evaluates applying new knowledge to paraphrased prompts (krek_{\mathrm{re}}). Lgen(t)=E(kre,vorg)[Wtkrevorg22] \mathcal { L } _ { \mathrm { g e n } } ( t ) = \mathbb { E } _ { ( k _ { \mathrm { r e } } , v _ { \mathrm { o r g } } ) } \left[ \left| \left| \mathbf { W } _ { t } k _ { \mathrm { r e } } - v _ { \mathrm { o r g } } \right| \right| _ { 2 } ^ { 2 } \right]
  • Final Model Selection: The total validation loss Lval(t)\mathcal{L}_{\mathrm{val}}(t) is a weighted sum: Lval(t)=wforgetLforget(t)+wefficacyLefficacy(t)+wgenLgen(t) \mathcal { L } _ { \mathrm { v a l } } ( t ) = w _ { \mathrm { f o r g e t } } \cdot \mathcal { L } _ { \mathrm { f o r g e t } } ( t ) + w _ { \mathrm { e f f i c a c y } } \cdot \mathcal { L } _ { \mathrm { e f f i c a c y } } ( t ) + w _ { \mathrm { g e n } } \cdot \mathcal { L } _ { \mathrm { g e n } } ( t ) Early stopping with a patience of 3 iterations (no improvement within threshold ϵ\epsilon) is used. The final model is the checkpoint with the best observed Lval\mathcal{L}_{\mathrm{val}}^*. Weights are set as hyperparameters (e.g., wforget=0.4,wefficacy=0.4,wgen=0.2w_{\mathrm{forget}}=0.4, w_{\mathrm{efficacy}}=0.4, w_{\mathrm{gen}}=0.2).

5.2. Evaluation Metrics

5.2.1. ReLEditBench Metrics (Relation Editing)

These metrics are designed to holistically evaluate relation editing by assessing both forgetting and learning. For an original fact (s, r, o) and a new fact (s,r,o)(s, r^*, o):

  • Success (\uparrow): A joint metric verifying if a knowledge edit was successful, requiring two conditions to be met simultaneously: (i) the model must no longer predict the original object oo for the original query (s, r), and (ii) it must correctly predict the new object oo for the updated query (s,r)(s, r^*). ExD[1{oi=argmaxoPfθ(o(s,r))},1{oi=argmaxoPfθ(o(s,r))}] \mathbb { E } _ { x \sim \mathcal { D } } \Big [ { \mathbf 1 } \Big \{ o _ { i } = \arg \operatorname* { m a x } _ { - o } \mathbb { P } _ { f _ { \theta } } ( o \mid ( s , r ) ) \Big \} , { \mathbf 1 } \Big \{ o _ { i } = \arg \operatorname* { m a x } _ { o } \mathbb { P } _ { f _ { \theta } } ( o \mid ( s , r ^ { * } ) ) \Big \} \Big ] Where:

    • ExD[]\mathbb{E}_{x \sim \mathcal{D}}[\cdot]: Expected value over all samples in the dataset D\mathcal{D}.
    • 1{}\mathbf{1}\{\cdot\}: Indicator function (equals 1 if the condition is true, 0 otherwise).
    • fθf_\theta: The LLM with parameters θ\theta.
    • Pfθ(o(s,r))\mathbb{P}_{f_\theta}(o \mid (s, r)): The probability of predicting object oo given subject-relation (s, r).
    • argmaxoPfθ(o(s,r))\arg\max_{-o} \mathbb{P}_{f_\theta}(o \mid (s, r)): The top prediction for (s, r) is not the original oo. This means forgetting the old fact.
    • argmaxoPfθ(o(s,r))\arg\max_{o} \mathbb{P}_{f_\theta}(o \mid (s, r^*)): The top prediction for (s,r)(s, r^*) is the correct new oo. This means learning the new fact.
  • Retention (\downarrow): Evaluates whether the model successfully retains the newly introduced knowledge after the edit. It measures the probability that the old object oio_i is still the top prediction for the original prompt (s, r). Note: The formula provided in the paper (Eqn. 21) seems to be for Efficacy with the old relation. Given the context of "Retention (forgetting)", the logical interpretation is the original fact's persistence. For clarity, I will use the common definition of retention in knowledge editing (old fact still predicted). Ei{oi=argmaxoPfθ(o(s,r))} \mathbb { E } _ { i } \left\{ o _ { i } = \arg \operatorname* { m a x } _ { o } \mathbb { P } _ { f _ { \theta } } ( o | ( s , r ) ) \right\} Where:

    • Ei[]\mathbb{E}_i[\cdot]: Expected value over all samples.
    • oio_i: The original object for sample ii.
    • (s, r): The original subject-relation prompt.
    • This metric is effectively old_fact_accuracy, and lower values are better for forgetting.
  • Efficacy (\uparrow): Measures the model's direct acquisition of the new fact. It is defined as the probability that the new object oio_i is the top prediction for the new prompt (s,r)(s, r^*). A high score signifies successful instantiation of the new knowledge. Ei{oi=argmaxoPfθ(o(s,r))} \mathbb { E } _ { i } \left\{ o _ { i } = \arg \operatorname* { m a x } _ { o } \mathbb { P } _ { f _ { \theta } } \big ( o \big | ( s , r ^ { * } ) \big ) \right\} Where:

    • oio_i: The target object for sample ii.
    • (s,r)(s, r^*): The new subject-relation prompt.
  • Generalization (\uparrow): Evaluates if the model can apply the new knowledge beyond the specific prompt it was edited on. It measures the model's ability to predict the correct object oo' when presented with a set of paraphrased or semantically equivalent prompts N((s,r))N((s, r^*)). Ei{o=argmaxoPfθ(oN((s,r)))} { \mathbb E } _ { i } \left\{ o = \arg \operatorname* { m a x } _ { o ^ { \prime } } P _ { f _ { \theta } } ( o ^ { \prime } | N ( ( s , r ^ { * } ) ) ) \right\} Where:

    • oo: The correct object.
    • N((s,r))N((s, r^*)): A set of paraphrased prompts for the new subject-relation.

5.2.2. ZsRE Metrics (Object Editing)

For ZsRE, Efficacy, Generalization, and Specificity are used.

  • Efficacy (\uparrow): Same as Efficacy for ReEditBench, but for object editing where the relation is fixed and object changes. Measures top-1 accuracy on edit samples (si,ri)(s_i, r_i) predicting oio_i. Ei{oi=argmaxoPfθ(o(si,ri))} \mathbb { E } _ { i } \left\{ o _ { i } = \arg \operatorname* { m a x } _ { o } \mathbb { P } _ { f _ { \theta } } \big ( o | \big ( s _ { i } , r _ { i } \big ) \big ) \right\} Where oio_i is the new target object for the ii-th edit.
  • Generalization (\uparrow): Same as Generalization for ReEditBench, measures top-1 accuracy on equivalent prompts N((si,ri))N((s_i, r_i)) (rephrased statements) for the new knowledge. Ei{oi=argmaxoPfθ(oN((si,ri)))} \mathbb { E } _ { i } \left\{ o _ { i } = \arg \operatorname* { m a x } _ { o } \mathbb { P } _ { f _ { \theta } } \left( o | N ( ( s _ { i } , r _ { i } ) ) \right) \right\} Where N((si,ri))N((s_i, r_i)) are paraphrased versions of the original subject-relation pair for which the model should now predict oio_i.
  • Specificity (\uparrow): Ensures editing does not affect unrelated samples O(si,ri)O(s_i, r_i) (other facts). Evaluated by top-1 accuracy of predictions that remain unchanged. Ei{oic=argmaxoPfθ(oO((si,ri)))} { \mathbb E } _ { i } \left\{ o _ { i } ^ { c } = \arg \operatorname* { m a x } _ { o } P _ { f _ { \theta } } \big ( o \big | O ( ( s _ { i } , r _ { i } ) ) \big ) \right\} Where oico_i^c is the correct original object for unrelated queries O((si,ri))O((s_i, r_i)). High score means minimal side effects.

5.2.3. CounterFact Metrics (Object Editing)

For CounterFact, Efficacy, Generalization, and Specificity are used (same as ZsRE), plus Fluency and Consistency.

  • Fluency (\uparrow, generation entropy): Measures for excessive repetition in model outputs. Calculated using the entropy of n-gram distributions (2-gram and 3-gram). 23kg2(k)log2g2(k)+43kg3(k)log2g3(k) - \frac { 2 } { 3 } \sum _ { k } g _ { 2 } ( k ) \log _ { 2 } g _ { 2 } ( k ) + \frac { 4 } { 3 } \sum _ { k } g _ { 3 } ( k ) \log _ { 2 } g _ { 3 } ( k ) Where:
    • g2(k)g_2(k): The probability of bigram kk.
    • g3(k)g_3(k): The probability of trigram kk. Higher entropy (higher score) indicates more diverse and fluent generation.
  • Consistency (\uparrow, reference score): Evaluates the consistency of the model's outputs by computing the cosine similarity between the TF-IDF vectors of the model-generated text and a reference Wikipedia text. Higher cosine similarity indicates better consistency with factual references.

5.3. Base LLMs & Baseline Methods

5.3.1. Base LLMs

Experiments are conducted on three representative LLMs:

  • LLaMA3(8B)LLaMA3 (8B) (Meta, 2024)
  • GPT-J (6B) (Wang & Komatsuzaki, 2021)
  • GPT2XL(1.5B)GPT2-XL (1.5B) (Radford et al., 2019)

5.3.2. Baseline Methods (Appendix A.2)

Seven parametric editing methods are compared:

  • ROME (Meng et al., 2022): (Rank-One Model Editing) A locate-then-edit method that applies a rank-one perturbation to a FFN layer, ensuring local rewrite, global preservation.
  • RECT (Gu et al., 2024): (Regularization to the Rescue) Reframes model editing as a low-rank, layer-wise correction problem on k contiguous MLP layers, minimizing a consistency loss.
  • NSE (Jiang et al., 2024): (Neuron-level Sequential Editing) Uses neuron-level intervention via fact-specific scaling vectors and additive bias terms on sparse neuron subsets.
  • Fine-Tuning (FT) (Zhu et al., 2020): Formalizes knowledge editing as constrained fine-tuning of a minimal parameter subset (up- and down-projection matrices of a single MLP layer) with L2 proximity regularization.
  • MEMIT (Meng et al., 2023): (Mass-Editing Memory in a Transformer) Scales causal model editing to thousands of facts by applying rank-one updates to multiple MLP layers simultaneously.
  • PRUNE (Ma et al., 2025): (Perturbation-Restrained Sequential Model Editing) Treats model editing as parameter-efficient subspace pruning within MLP blocks, training a low-rank adapter on the pruned subspace.
  • AlphaEdit (Fang et al., 2025): (Null-space constrained model editing) Augments the locate-then-edit pipeline with a null-space projection to prevent updates from disturbing previously stored knowledge, guaranteeing non-interference.

5.4. Experimental Details (Appendix A.4)

5.4.1. Model Configuration Parameters

The following are the results from Table 4 of the original paper:

Parameter Value Description
model_name EleutherAI_gpt-j-6B, gpt2-xl,Llama3-8B Specifies the pretrained language model.
layers [3-8], [13-17], [4-8] The target Transformer layers for editing.
v_num_grad_steps 25 or 20 Number of gradient steps for value vector computation.
vlr 5e-1 or 1e-1 Learning rate used during value vector computation.
v_loss_layer 27, 47, 31 The specific model layer used to compute the edit loss.
kl_factor 0.0625 Weight of the KL-divergence regularization term.
mom2_dataset wikipedia Dataset for computing second-moment statistics.
rewrite_module_tmp Varies by model Template for the path to the module being rewritten.

5.4.2. Key Hyperparameters for the SPaEdit and FE Strategies

  • Forgetting Interpolation Factor (\gamma): For the FE strategy (Eqn. 5). A higher γ\gamma enforces more thorough forgetting.
    • Set to 0.4 for GPT-J-6B.
    • Set to 0.6 for both LLaMA3-8B and GPT2-XL.
  • Update Regularization Coefficients (\alphaand\beta): For the SPaEdit objective function (Eqn. 7).
    • α\alpha: Constrains overall magnitude of the update. Set to 10.
    • β\beta: Minimizes impact on preserved knowledge keys Kp\mathbf{K}_p. Set to 1.
  • Self-Paced Learning Curriculum Parameters (\lambda_0, \mu, T): For SPaEdit (Algorithm 1).
    • λ0\lambda_0 (Initial Pace Parameter): Initial difficulty threshold. Set to 10.
    • μ\mu (Pace Growth Factor): Multiplicative factor for λ\lambda increase per iteration. Set to 1.1.
    • TT (Max Iterations): Total number of iterations. Set to 20.

6. Results & Analysis

6.1. Core Results Analysis

6.1.1. Results Directly With Object Editing

Initial evaluation of existing object editing methods on the ReEditBench dataset revealed two critical issues, as shown in Figure 1.

Figure 1: Analysis of key challenges in relation editing. (a) The bar chart compares editing efficacy (blue) with Retention of the original fact (pink), showing that old knowledge persists. (b) The scatter plot shows a strong negative correlation between sample difficulty and Efficacy rate, indicating performance decay on challenging samples.
该图像是图表,展示了关系编辑中的关键挑战。图(a)比较了不同方法的编辑有效性(蓝色)与原始事实保留率(粉色),显示旧知识的持续存在;图(b)展示了样本难度与编辑成功率之间的强负相关关系,指出在困难样本上性能下降。

Figure 1: Analysis of key challenges in relation editing. (a) The bar chart compares editing efficacy (blue) with Retention of the original fact (pink), showing that old knowledge persists. (b) The scatter plot shows a strong negative correlation between sample difficulty and Efficacy rate, indicating performance decay on challenging samples.

  • Persistent Retention (Figure 1a): Object editing methods achieve high success rates in acquiring new knowledge (blue bars) but concurrently retain the original, conflicting knowledge at exceptionally high rates (pink bars). For instance, AlphaEdit on GPT-J shows a success rate of ~99% with a retention rate of ~98%. This indicates that these methods perform an additive operation rather than a corrective overwrite, leading to the problematic coexistence of new and old knowledge.

  • Failure on Hard Samples (Figure 1b): A strong negative correlation exists between the editing success rate and sample difficulty (measured by the magnitude of the initial residual, viWk~i22\| \pmb{v}_i - \mathbf{W} \mathbf{\tilde{k}}_i' \|_2^2). "Easy samples" (blue) cluster in a high-success region, while "hard samples" (pink) fall into a low-success region. This highlights a consistent failure on high-difficulty editing samples.

    These findings underscore that existing object editing methods are ill-suited for relation editing because they fail to erase outdated information and lack efficacy for challenging edits.

6.1.2. Efficacy of the Forgetting-and-Editing Strategy on Relation Editing

The Forgetting-and-Editing (FE) strategy aims to first forget the old tuple and then incorporate new knowledge. Experiments were conducted using a sequential editing setting with 2000 samples, edited in batches of 100.

The following are the results from Table 2 of the original paper:

LLMs Method Success↑ Retention↓ Efficacy↑ Generalization↑
Original +FE Original +FE Original +FE Original +FE
LLaMA3 MEMIT 33.77 68.26 (+34.49) 51.70 58.82 (-7.12) 48.43 70.93 (+22.50) 49.09 67.00 (+17.91)
RECT 59.41 66.83 (+7.42) 72.78 59.45 (+13.33) 66.78 69.70 (+2.92) 54.63 58.96 (+4.33)
NSE 43.20 54.30 (+11.10) 53.73 52.24 (+1.49) 45.00 58.53 (+13.53) 59.26 58.55 (-0.71)
ROME 31.39 44.91 (+13.52) 60.47 56.36 (+4.11) 50.91 56.64 (+5.73) 50.93 56.80 (+5.87)
FT 48.88 63.45 (+14.57) 64.49 63.57 (+0.92) 49.96 71.01 (+21.05) 69.16 67.31 (-1.85)
PRUNE 29.40 29.81 (+0.41) 44.68 30.46 (+14.22) 44.04 34.25 (-9.79) 43.86 42.97 (-0.89)
AlphaEdit 52.18 78.46 (+26.28) 78.34 67.12 (+11.22) 79.17 83.24 (+4.07) 76.62 80.03 (+3.41)
SPaEdit(Ours) 54.45 81.71 (+27.26) 68.56 62.77 (+5.79) 83.23 87.37 (+4.14) 75.88 81.14 (+5.26)
GPT2-XL MEMIT 56.31 57.79 (+1.48) 80.26 57.21 (+23.05) 85.23 84.67 (-0.56) 80.68 85.21 (+4.51)
RECT 54.60 54.72 (+0.12) 78.10 61.62 (+16.48) 82.35 84.08 (+1.73) 78.37 77.12 (-1.25)
NSE 45.00 45.45 (+0.45) 58.53 58.24 (+0.29) 59.26 59.99 (+0.73) 58.55 59.43 (+0.88)
ROME 45.74 45.82 (+0.08) 61.71 61.49 (+0.22) 61.70 61.39 (-0.31) 61.19 61.78 (+0.59)
FT 49.96 51.32 (+1.36) 71.01 67.25 (+3.76) 69.16 69.93 (+0.77) 67.31 67.58 (+0.27)
PRUNE 37.88 38.04 (+0.16) 52.62 39.14 (+13.48) 54.49 55.71 (+1.22) 52.99 52.60 (-0.39)
AlphaEdit 65.31 75.93 (+10.62) 91.31 50.46 (+40.85) 86.83 87.36 (+0.53) 84.51 85.50 (+0.99)
SPaEdit(Ours) 62.00 83.93 (+21.93) 68.55 48.78 (+19.77) 85.93 88.46 (+2.53) 87.36 87.50 (+0.14)
GPT-J MEMIT 72.55 82.36 (+9.81) 92.98 77.63 (+5.09) 82.12 82.42 (+0.30) 84.69 82.10 (+0.20)
RECT 72.54 77.63 (+5.09) 91.67 74.54 (+17.13) 81.90 82.42 (+0.30) 84.89 82.10 (+0.20)
NSE 45.65 45.95 (+0.30) 62.13 61.12 (+1.01) 62.03 60.94 (-1.09) 61.52 61.63 (+0.11)
ROME 46.38 47.79 (+1.41) 63.34 29.27 (+34.07) 63.32 61.49 (-1.83) 63.24 63.78 (+0.54)
FT 51.19 61.10 (+9.91) 66.24 43.50 (+22.74) 70.79 78.72 (+7.97) 67.31 68.67 (+1.34)
PRUNE 55.71 63.05 (+7.34) 79.12 59.87 (+19.25) 77.25 77.00 (-0.25) 75.41 76.62 (-1.21)
AlphaEdit 65.99 89.98 (+23.99) 98.20 63.84 (+34.36) 85.53 85.64 (+0.11) 86.87 87.80 (+0.93)
SPaEdit(Ours) 78.46 91.02 (+12.56) 88.24 59.84 (+28.40) 75.93 88.08 (+12.15) 87.36 88.58 (+1.22)

Analysis:

  • Significant Improvement with FE: The FE strategy consistently improves performance across all methods and LLMs. It achieves up to a 34.49% increase in Success (MEMIT on LLaMA3) and a remarkable 40.85% reduction in Retention (AlphaEdit on GPT2-XL). This indicates that FE genuinely helps in replacing old knowledge, not just adding new.
  • SPaEdit's Superiority: When combined with FE, SPaEdit consistently yields the best relation editing performance. For instance, SPaEdit+FESPaEdit+FE achieves 81.71% Success on LLaMA3, 83.93% on GPT2-XL, and 91.02% on GPT-J, often with lower Retention rates compared to other FE-enhanced methods.
  • Addressing Misleading Baselines: The high Retention of some baselines without FE (e.g., 98.20% for AlphaEdit on GPT-J) is misleading, as it stems from low editing success that doesn't challenge the original knowledge. FE achieves high editing success while effectively forgetting outdated facts.
  • Residual Retention: Despite improvements, Retention remains non-trivial (often around 50% in difficult settings), suggesting that completely clean forgetting is an ongoing challenge.

6.1.3. Analysis of the Forgetting Strategy

Figure 4 presents an empirical comparison of four unlearning strategies.

该图像是比较不同方法在关系编辑任务中的表现图,包括LLama3、GPT2-XL和GPT-J。上方的柱状图展示了各种方法的成功率(Suc)和保持率(Ret),而下方的曲线图则展示了在不同 α 值下的成功率和保持率的变化趋势。
该图像是比较不同方法在关系编辑任务中的表现图,包括LLama3、GPT2-XL和GPT-J。上方的柱状图展示了各种方法的成功率(Suc)和保持率(Ret),而下方的曲线图则展示了在不同 α 值下的成功率和保持率的变化趋势。

Figure 4: The figure shows two rows of charts. The top row compares the success rate (Suc) and retention rate (Ret) of different forgetting strategies (No-Forget, Forget-IDK, Forget-RND, Ours) across three LLMs (LLaMA3, GPT2-XL, GPT-J). The bottom row displays sensitivity analysis for the interpolation factor lambda (λ), showing how success and retention rates change as λ varies, for AlphaEdit on GPT-J.

Analysis of Unlearning Strategies (Top Row of Figure 4):

  • The results clearly validate the theoretical analysis from Section 3.1.
  • Conventional unlearning strategies (Forget-IDK and Forget-RND) which set targets to "I don't know" or random values, are largely ineffective at reducing knowledge retention. For instance, on GPT-J, these approaches result in retention rates as high as 77.2% and 77.9% respectively. This confirms that their inherent systematic biases impede effective forgetting.
  • The proposed FE strategy (labeled Ours), which interpolates the value vector of the outdated fact towards a neutral state, performs exceptionally well. It achieves the best trade-off between success and retention rates across all tested LLMs (LLaMA3, GPT2-XL, and GPT-J). It consistently achieves the lowest Retention rate while maintaining high Success rates.

Sensitivity Analysis on Hyperparameter γ\gamma (Interpolation Factor) (Bottom Row of Figure 4):

  • The sensitivity analysis for the interpolation factor γ\gamma (denoted as lambda in the figure) reveals a clear trade-off between forgetting and learning.
  • A larger γ\gamma leads to more effective forgetting (monotonic decrease in Retention rate).
  • However, the Success rate shows a concave trajectory, increasing initially and then decreasing with higher γ\gamma.
  • An optimal window for γ[0.3,0.7]\gamma \in [0.3, 0.7] is identified, where the Success rate is maximized without significant compromise in forgetting. This wide effective range highlights the robustness of the FE strategy to hyperparameter tuning.

6.1.4. Generalization and Performance on Object Editing Benchmarks

To assess SPaEdit's universality and generalization, it was applied to object-editing benchmarks, focusing on hard subsets of ZsRE and CounterFact (100 examples each).

The following are the results from Table 3 of the original paper:

LLM Method Efficacy↑ Generalization↑ Specificity↑
LLaMA3 ROME 31.87 32.4 32.26
MEMIT 86.07 82.39 33.33
AlphaEdit 81.87 78.11 33.03
SPaEdit 92.32 82.6 32.11
GPT2-XL ROME 15.87 16.98 7.74
MEMIT 71.47 63.14 7.37
AlphaEdit 92.17 82.68 7.72
SPaEdit 98.96 89.89 7.23
GPT-J ROME 23.69 27.9 24.12
MEMIT 94.86 90.02 28.22
AlphaEdit 96.26 90.46 28.15
SPaEdit 99.97 91.3 28.61

Results on ZsRE (Table 3):

  • SPaEdit consistently establishes a new state-of-the-art on the ZsRE benchmark across all tested models.
  • Its lead in Efficacy is particularly notable: 92.32% on LLaMA3 (significantly over AlphaEdit's 81.87%) and a near-perfect 99.97% on GPT-J.
  • It also achieves the top score in Generalization (89.89% on GPT2-XL) and leads in Specificity on GPT-J (28.61%).
  • The hard sample subset poses a considerable challenge, causing performance degradation for strong methods like AlphaEdit. SPaEdit excels by maintaining superior performance due to its strategic, staged learning process, which avoids optimization pitfalls of resolving high-residual errors simultaneously.

Results on CounterFact Hard Subset (Appendix C.2, Table 6): The following are the results from Table 6 of the original paper:

LLM Method Efficacy↑ Generalization↑ Specificity↑ Fluency↑ Consistency↑
LLaMA3 ROME 32.02 33.41 34.31 425.55 13.01
MEMIT 69.22 65.61 30.54 629.68 53.15
AlphaEdit 79.21 73.54 30.92 629.91 56.67
SPaEdit (Ours) 92.80 95.21 42.51 631.11 56.78
GPT2-XL ROME 39.42 30.01 5.82 592.64 65.09
MEMIT 70.45 72.98 7.93 465.78 53.58
AlphaEdit 83.22 83.91 8.54 621.76 55.62
SPaEdit (Ours) 92.66 94.82 9.62 629.26 54.52
GPT-J ROME 32.05 37.01 25.76 514.82 15.64
MEMIT 79.22 78.27 27.58 618.93 57.84
AlphaEdit 87.52 86.13 28.76 621.80 59.28
SPaEdit (Ours) 92.77 93.12 38.73 622.52 59.66
  • SPaEdit achieves near-perfect Efficacy across all models on the CounterFact hard subset.
  • It sets a new state-of-the-art in Fluency (e.g., 631.11 on LLaMA3), indicating higher-quality, more natural language generation post-edit.
  • This is achieved while maintaining strong Generalization and Specificity, demonstrating a robust and well-balanced editing profile.

Analysis of Sample Difficulty Distribution (Appendix C.2, Figure 7):

该图像是难度分布图,展示了两个数据集的困难样本数量分布情况。左侧为 ZsRE 硬子集,右侧为 CounterFact 硬子集,均表示在难度范围为 0 到 20 中的样本计数。
该图像是难度分布图,展示了两个数据集的困难样本数量分布情况。左侧为 ZsRE 硬子集,右侧为 CounterFact 硬子集,均表示在难度范围为 0 到 20 中的样本计数。

Figure 7: The image presents two difficulty distribution charts for selected hard subsets. (a) The ZsRE hard subset has a varied difficulty distribution. (b) The CounterFact hard subset is heavily concentrated in the high-difficulty region.

  • The evaluation focuses on curated hard subsets because full benchmarks are often dominated by simple samples.
  • ZsRE hard subset (Figure 7a) shows a mixed difficulty distribution.
  • CounterFact hard subset (Figure 7b) is more extreme, with nearly all samples concentrated in the high-difficulty range, serving as a stress test.
  • This challenge-focused evaluation highlights SPaEdit's advantage: its self-paced, easy-to-hard curriculum can intelligently identify easier samples even within a difficult set to start optimization, leading to robust updates for challenging edits where one-shot methods fail.

Full Benchmark Performance and Saturation Analysis (Appendix C.3, Table 7): The following are the results from Table 7 of the original paper:

LLM Method CounterFact ZsRE
Eff.↑ Gen. ↑ Spe. ↑ Flu. ↑ Consis. ↑ Eff. ↑ Gen. ↑ Spe. ↑
LLaMA3 ROME 64.40 61.42 49.44 449.06 3.31 2.01 1.80 0.69
MEMIT 65.65 64.65 51.56 437.43 6.58 34.62 31.28 18.49
AlphaEdit 98.90 94.22 67.88 622.49 32.40 94.47 91.13 32.55
SPaEdit (Ours) 99.24 94.62 69.37 624.69 33.73 95.72 93.07 33.25
GPT2-XL ROME 54.60 51.18 52.68 366.13 0.72 47.50 43.56 14.27
MEMIT 94.70 85.82 60.50 477.26 22.72 79.17 71.44 26.42
AlphaEdit 99.50 93.95 66.39 597.88 39.38 94.81 86.11 25.88
SPaEdit (Ours) 99.65 94.78 67.83 599.52 40.23 95.92 87.63 27.25
GPT-J ROME 57.50 54.20 52.05 589.42 3.22 56.42 54.65 9.86
MEMIT 98.55 95.50 63.64 546.28 34.89 94.91 90.22 30.39
AlphaEdit 99.75 96.38 75.48 618.50 42.08 99.79 96.00 28.29
SPaEdit (Ours) 99.82 96.82 76.23 620.35 44.33 99.83 97.12 30.47
  • Existing state-of-the-art methods have achieved near-saturation performance on the "easy" and "medium" portions of the CounterFact and ZsRE datasets. The primary failure mode for current technology is in the "hard" tail.
  • SPaEdit not only dominates on the hard subsets but also consistently achieves the best performance across the full benchmarks, demonstrating robustness where it matters most.

6.2. Mechanistic Insight Into SPaEdit

Figure 5 provides insight into SPaEdit's internal curriculum dynamics and cost-benefit profile.

Figure 5: (a) shows easy-to-hard self-paced curriculum dynamics. (b) shows the costbenefit tradeoff: modest extra time yields large efficacy gains on hard samples.
该图像是图表,展示了两部分内容。左侧(a)显示了基于难度的自适应学习动态,随时间的推移(T=1至T=13)在不同难度上的表现变化。右侧(b)则展示了成本效益分析,比较了不同算法在处理困难样本时的执行时间和编辑成功率,突出自适应算法(SPaEdit)在保证鲁棒性上的高额成本及其他算法的弱点。

Figure 5: (a) shows easy-to-hard self-paced curriculum dynamics. (b) shows the costbenefit tradeoff: modest extra time yields large efficacy gains on hard samples.

  • Curriculum Dynamics (Figure 5a): The plot traces how the sample-difficulty distribution evolves under self-paced learning.
    • At the start (t=1t=1), the distribution is right-skewed (many hard samples).
    • As training progresses (t=4,t=7t=4, t=7), proficiency increases, and the mass shifts from the hard (right) to the easy (left) region.
    • By later iterations (t=13t=13), the distribution is left-skewed, meaning most samples are easy. This progression demonstrates the effectiveness of the parameter updates.
  • Cost-Benefit Analysis (Figure 5b):
    • On tasks with a low proportion of hard samples, SPaEdit incurs negligible overhead, matching baselines while achieving superior efficacy.
    • As task difficulty increases, SPaEdit strategically invests modest additional computation time, yielding a substantial gain in editing success compared to baselines whose performance degrades sharply. This favorable trade-off demonstrates SPaEdit's efficient resource allocation and robustness.

6.3. Ablation Studies / Parameter Analysis

6.3.1. Comprehensive Ablation Study on Forgetting Strategies (Appendix C.1)

The following are the results from Table 5 of the original paper:

LLM Method No-Forgetting + FE (IDK) + FE (Random) + FE (Ours)
Retention ↓ Efficacy ↑ Retention↓ Efficacy↑ Retention↓ Efficacy↑ Retention↓ Efficacy↑
LLaMA3 AlphaEdit 88.34 89.17 76.11 75.23 76.90 78.19 74.50 83.24
SPaEdit 88.56 83.23 75.92 83.48 70.41 82.17 68.56 87.37
GPT2-XL AlphaEdit 91.31 88.83 60.25 83.45 65.81 84.90 50.46 87.36
SPaEdit 68.55 85.93 55.18 80.15 61.33 81.82 48.78 88.46
GPT-J AlphaEdit 98.20 99.53 81.67 89.12 85.43 81.30 77.84 85.64
SSPaEdit 88.24 85.93 65.40 88.31 72.88 89.04 59.84 88.08

Analysis:

  • Naive Strategies (IDK, Random) Unfavorable Trade-off: These strategies lower Retention compared to No-Forgetting but often at a cost of Efficacy degradation. For SPaEdit on GPT2-XL, Efficacy drops from 85.93% to 80.15% with IDK. This shows a difficult trade-off between forgetting old facts and learning new ones.
  • Our FE Strategy is Most Effective at Unlearning: The proposed FE strategy (+ FE (Ours)) consistently achieves the lowest Retention rate across every model and for both AlphaEdit and SPaEdit. For example, SPaEdit's Retention on GPT2-XL is reduced to 48.78%, proving its state-of-the-art capability in erasing outdated knowledge.
  • Synergistic Effect: Our FE strategy creates a synergistic effect, achieving the best unlearning (lowest Retention) while simultaneously maintaining or significantly improving Efficacy. For SPaEdit, the "Ours" strategy boosted Efficacy (e.g., 83.23% to 87.37% on LLaMA3) while achieving the lowest Retention scores. This confirms that the carefully designed forgetting targets facilitate a cleaner, more effective integration of new knowledge.

6.3.2. Impact of Semantic Similarity on Relation Editing (Appendix C.5)

Figure 9 visually investigates the influence of semantic properties on relation editing.

Figure 9: Analysis of Semantic Similarity. (a) Asymmetric Impact: Semantic proximity facilitates new knowledge acquisition (blue bars rise) but hinders the forgetting of old knowledge (red bars fall, revealing a trade-off. (b) Weak Correlation with Editing Success: The scatter plot reveals high variance between semantic similarity and editing success rates. The weak correlation (Pearson \(| r | \\approx 0 . 3 )\) indicates that semantic similarity acts as a noisy predictor, failing to capture the full complexity of editing difficulty compared to the robust signal provided by computational residuals.
该图像是图表,展示了语义相似性对编辑成功率的影响。部分 (a) 表示不同相似性下的编辑成功率与遗忘成功率,依次为低相似性 (45.2%)、中相似性 (65.8%) 和高相似性 (95.1%)。部分 (b) 描述了语义相似性与编辑成功率之间的关系,散点图显示出两者之间的低相关性,其皮尔逊相关系数约为 0.3。

Figure 9: Analysis of Semantic Similarity. (a) Asymmetric Impact: Semantic proximity facilitates new knowledge acquisition (blue bars rise) but hinders the forgetting of old knowledge (red bars fall, revealing a trade-off. (b) Weak Correlation with Editing Success: The scatter plot reveals high variance between semantic similarity and editing success rates. The weak correlation (Pearson r0.3)| r | \approx 0 . 3 ) indicates that semantic similarity acts as a noisy predictor, failing to capture the full complexity of editing difficulty compared to the robust signal provided by computational residuals.

  • Asymmetric Impact of Semantic Similarity (Figure 9a):
    • Editing Success (Blue Bars): Shows a strong positive correlation with semantic similarity. As relations become semantically closer (e.g., CEO to CTO), the editing success rate rises sharply from 45.2% to 95.1%. This suggests models leverage existing semantic structures.
    • Forgetting Success (Red Bars): Exhibits a clear negative trend. Forgetting is significantly harder for semantically close relations (30.7%) compared to distant ones (65.8%). High semantic proximity causes strong interference, making it difficult to disentangle old from new knowledge.
  • Justification for Computational Residual (Figure 9b):
    • The scatter plot reveals semantic similarity is a noisy predictor of performance, with high dispersion and a weak correlation (Pearson r0.3|r| \approx 0.3) to editing success.
    • This contrasts with the computational residual (Figure 1b), which shows a strong, distinct negative correlation with success.
    • The computational residual acts as a holistic proxy aggregating all latent influencing factors (semantics, knowledge frequency, structural complexity), providing a direct, quantifiable signal of the actual optimization barrier. Thus, it is a more robust and computationally efficient standard for the curriculum learning than semantic metrics alone.

6.4. General Capability Tests (Appendix C.4)

To evaluate the long-term impact on general capabilities, a sequential editing experiment was conducted on LLaMA3-8B, evaluating performance on six downstream tasks (SST, MRPC, CoLA, RTE, MMLU, NLI) after each batch of edits.

The following are the results from Figure 8 of the original paper:

Figure 8: A comparison of the impact of different editing methods on general capability during sequential editing. Both SPaEdit and AlphaEdit demonstrate exceptional stability, proving the safety of the projection mechanism. The identical stability of SPaEdit confirms that its iterative process does not harm the model's general knowledge.
该图像是图表,展示了不同编辑方法在多个评估任务上的F1分数变化。图中显示了多个方法的表现,包括AlphaEdit、RECT、PRUNE、MEMIT和SPaEdit,横轴为编辑项目数量,纵轴为F1得分。

Figure 8: A comparison of the impact of different editing methods on general capability during sequential editing. Both SPaEdit and AlphaEdit demonstrate exceptional stability, proving the safety of the projection mechanism. The identical stability of SPaEdit confirms that its iterative process does not harm the model's general knowledge.

Analysis:

  • Catastrophic Forgetting in Unconstrained Methods: MEMIT, RECT, and PRUNE show severe performance collapse, confirming that unconstrained, cumulative edits damage general abilities.
  • Stability of Single-Step Projection (AlphaEdit): AlphaEdit's performance curve remains almost perfectly flat, demonstrating that constraining edits to a specific subspace is highly effective at preserving general capabilities.
  • SPaEdit's Stability: SPaEdit's performance curve is virtually identical to AlphaEdit, providing strong evidence that its iterative optimization process does not degrade general capabilities. Each step within SPaEdit's self-paced curriculum remains safely within the constrained subspace, finding a more precise solution for target knowledge without harmful side effects.

6.5. Robustness Analysis Against Superficial Editing Attacks (Appendix C.6)

To assess SPaEdit's robustness beyond standard metrics, it was evaluated against superficial editing attacks (Xie et al., 2025), which use contextual triggers (Wiki, Rep, Que) to elicit original (pre-edit) knowledge.

The following are the results from Table 8 of the original paper:

Method Wiki Attack Rep Attack Que Attack
OM ↓ OP ↓ OM ↓ OP ↓ OM ↓ OP ↓
ROME 54.95 58.24 61.74 64.02 38.37 38.37
MEMIT 52.75 54.95 40.15 42.42 37.21 37.21
PMET 70.33 72.43 66.67 71.97 39.29 41.67
r-ROME 54.95 57.14 64.39 68.18 40.48 40.48
AlphaEdit 72.53 73.62 68.18 71.97 34.52 35.71
SPaEdit+FE(Ours) 50.81 27.23 38.52 33.84 33.19 35.11

Metrics:

  • Original Match (OM) (\downarrow): Percentage of times the model's output matches the original (pre-edit) answer. Lower is better.
  • Original Probability (OP) (\downarrow): Percentage of times the model assigns a higher probability to the original answer than the new answer. Lower is better.

Analysis:

  • Superficial editing is a significant challenge, with high-performing editors like AlphaEdit showing considerable vulnerability (over 70% OM on Wiki attack).
  • SPaEdit+FE (Ours) demonstrates markedly superior robustness across all three attack types, achieving the lowest scores for both OM and OP. For example, on the Wiki attack, SPaEdit reduces OM to 50.81%, a substantial improvement over AlphaEdit (72.53%) and MEMIT (52.75%).
  • This enhanced robustness is attributed to the synergistic interplay of the FE strategy (actively unlearning outdated tuples) and the self-paced curriculum (encouraging deeper integration of new knowledge).

6.6. Stability Analysis (Appendix C.7)

A stability analysis was conducted by repeatedly (100 times) sampling 100 instances from ZsRE and applying SPaEdit and baselines.

The following are the results from Figure 10 of the original paper:

Figure 10: Edit stability analysis on the ZsRE benchmark. The box plot illustrates the distribution of editing success rates over 100 trials, each with 100 randomly sampled edits. SPaEdit demonstrates significantly lower variance and a higher median performance compared to baseline methods, indicating superior robustness.
该图像是一个箱线图,展示了在不同模型(LLama3、GPT2-XL、GPT-J)下四种编辑方法(SPaEdit、AlphaEdit、MEMIT、ROME)的编辑成功率分布。箱线图显示,SPaEdit在各个模型中均表现出更高的成功率和更低的方差,表明其优越的鲁棒性。

Figure 10: Edit stability analysis on the ZsRE benchmark. The box plot illustrates the distribution of editing success rates over 100 trials, each with 100 randomly sampled edits. SPaEdit demonstrates significantly lower variance and a higher median performance compared to baseline methods, indicating superior robustness.

Analysis:

  • SPaEdit consistently achieves high performance (85% to 95%) with remarkably low variance, indicating high reliability and independence from specific edit samples.
  • AlphaEdit shows wider variance (75% to 90%).
  • MEMIT is more varied (60% to 95%).
  • ROME demonstrates the least stability (10% to 40%), highly sensitive to chosen instances.
  • This confirms SPaEdit's robustness and predictable, consistently high-quality results.

6.7. Iterative Runtime of SPaEdit (Appendix C.8)

The computational cost per iteration of SPaEdit is dictated by the closed-form update for the perturbation matrix ΔP\Delta \mathbf{P}, particularly the matrix inversion step (Eqn. 10). The selection matrix Z\mathbf{Z} makes this sparsity-dependent.

The following are the results from Figure 11 of the original paper:

Figure 11: SPaEdit iteration time analysis. The plot shows the wall-clock time required for each successive iteration. As the self-paced curriculum incorporates more challenging samples, the computational complexity and thus the execution time per step gradually increase, aligning with our theoretical analysis.
该图像是图表,展示了不同模型(LLaMA3、GPT-J 和 GPT2-XL)在每次迭代中所需的执行时间。随着迭代次数的增加,LLaMA3的执行时间显著减少,而GPT-J和GPT2-XL的时间相对稳定,表明了自适应学习策略的影响。

Figure 11: SPaEdit iteration time analysis. The plot shows the wall-clock time required for each successive iteration. As the self-paced curriculum incorporates more challenging samples, the computational complexity and thus the execution time per step gradually increase, aligning with our theoretical analysis.

Analysis:

  • In initial iterations, λ\lambda is small, Z\mathbf{Z} is sparse (few "easy" samples), leading to low computational cost.
  • As training progresses, λ\lambda increases, Z\mathbf{Z} becomes denser (more "challenging samples" included), and the computational complexity (and execution time per iteration) gradually increases.
  • Figure 11 empirically confirms this: the wall-clock time per iteration gradually increases as the self-paced curriculum incorporates more challenging samples. This is a deliberate design, allocating more resources only as needed.

6.8. Qualitative Analysis (Appendix C.9)

Qualitative case studies (from Appendix C.9.1, C.9.2, C.9.3) demonstrate the ability to completely and cleanly replace existing factual knowledge.

Case 1: GPT-J editing "Tonel is affiliated with S.C. Beira-Mar" to "Tonel is affiliated with SV Neulengbach".

  • No-Forget (OE): Generates the outdated "S.C. Beira-Mar".
  • Forget-IDK: Outputs "I don't know. Tonel is often linked to S.C. Beira-Mar.", still referencing the old fact.
  • Forget-RND: Generates "Tonel is affiliated with River... and S.C. Beira-Mar... Madrid.", showing corrupted output with old fact.
  • Ours: Confidently generates "Tonel is affiliated with the Brazilian football club SV Neulengbach.", showing clean replacement.

Case 2: LLaMA3 editing "Publisher of Player's Guide to the Dragonlance Campaign was TSR" to "was Bandai".

  • No-Forget (OE): Outputs the outdated "TSR".
  • Forget-IDK: Generates "was I don't TSR.", corrupted with the old fact.
  • Forget-RND: Outputs "was TSR. x y z.", corrupted with the old fact.
  • Ours: Generates "The publisher... was Bandai, Bandai, Bandai, Bandai.", showing clean replacement (with some repetition, common for generative models).

Case 3: GPT-J editing "Krassimira Stoyanova has a soprano voice" to "a mezzo-soprano voice".

  • No-Forget (OE): Outputs the outdated "soprano voice".

  • Forget-IDK: Generates "voice type is I don't know a soprano.", corrupted with the old fact.

  • Forget-RND: Outputs "a soprano. Krassimira Stoyanova's voice type is soprano", failing to forget.

  • Ours: Generates "mezzo-soprano a mezzo-soprano. Krassimira Stoyanova has a mezzo-soprano voice.", showing clean replacement.

    Conclusion: No-Forget methods consistently fail, and conventional unlearning strategies also fail by generating corrupted outputs that still reference the forbidden original answer. Our method is the only one that completely replaces the outdated information with the new target knowledge, achieving genuine and effective knowledge update.

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper formally defines Relation Editing as a novel and critical task for updating Large Language Models (LLMs), distinguishing it from the traditional Object Editing paradigm. Through extensive benchmarking, the authors identify two major weaknesses of existing knowledge editing methods when applied to relation editing: the persistent retention of outdated information and poor performance on difficult editing samples.

To address these challenges, the paper introduces two key contributions:

  1. Forgetting-and-Editing (FE) Framework: This novel framework incorporates a theoretically grounded unlearning strategy that moves away from conventional fixed or random targets. Instead, it uses an interpolation-based target assignment to effectively suppress systematic bias, improve edit success, and reduce retention of old knowledge.

  2. Self-paced AlphaEdit (SPaEdit) Algorithm: This algorithm integrates self-paced learning (an easy-to-hard curriculum) into the AlphaEdit framework. SPaEdit systematically learns from easier samples first, gradually incorporating more challenging ones, leading to robust optimization for difficult edits.

    Extensive experiments on the newly compiled ReEditBench dataset confirm that the FE strategy significantly enhances relation editing performance by enabling effective forgetting. Furthermore, SPaEdit not only excels on relation editing tasks but also establishes new state-of-the-art performance on object-editing benchmarks like ZsRE and CounterFact, particularly for hard samples. The research also demonstrates that SPaEdit maintains general capabilities and exhibits superior robustness against superficial editing attacks.

7.2. Limitations & Future Work

The authors acknowledge that despite the significant gains achieved, Retention remains non-trivial in absolute terms, often around 50% in difficult settings. This indicates that fully clean and permanent forgetting of obsolete relations is still an unsolved problem. Therefore, future work is explicitly suggested to develop more effective unlearning mechanisms specifically tailored for relation editing.

7.3. Personal Insights & Critique

This paper makes a highly valuable contribution by formalizing Relation Editing, a practically relevant but previously overlooked aspect of knowledge editing. The distinction between object and relation editing is crucial, as the paper convincingly demonstrates that existing methods designed for the former fail spectacularly on the latter, primarily due to the inability to properly forget old information.

The FE framework with its interpolation-based target smoothing is an elegant solution to the unlearning problem. The theoretical analysis of why IDK or random targets lead to systematic bias is rigorous and provides a strong foundation for their proposed method. This insight into unlearning within linear regression-based editing is particularly impactful and could be highly transferable to other unlearning contexts where parametric updates are used.

The integration of self-paced learning into AlphaEdit (SPaEdit) is also a smart move. The empirical evidence clearly shows its superiority, especially on hard samples, which are the real test of any editing algorithm. The cost-benefit analysis demonstrating modest overhead for significant efficacy gains is compelling. The stability analysis and robustness against superficial editing attacks further solidify SPaEdit's practical value, as these are critical concerns for LLM deployment.

Potential Issues/Areas for Improvement:

  • Absolute Retention Rates: While FE significantly reduces retention, the fact that it can still be around 50% in some hard settings (as noted by the authors) suggests that complete unlearning remains elusive. Future research needs to explore even more aggressive or fundamental ways to disentangle conflicting knowledge.
  • Generalization of γ\gamma: The interpolation factor γ\gamma is a hyperparameter. Although the sensitivity analysis shows a robust window, its optimal value might vary across different LLMs or relation types. Automating this selection or making it less sensitive could be beneficial.
  • Computational Cost of Iteration: While SPaEdit's self-paced nature makes its cost allocation efficient, the increasing runtime per iteration (Figure 11) could become a bottleneck for extremely large models or massive sequential edits. Further work could explore more computationally efficient iterative solvers or approximations.
  • Complexity of Relation Editing: The paper categorizes relation editing into new relation and conditional relation. More complex relation changes, such as those involving multiple subjects or objects or temporal shifts that necessitate a re-evaluation of multiple related facts, might pose new challenges not fully covered here.

Transferability and Application: The FE framework's unlearning mechanism could be highly transferable to any knowledge editing task where explicit forgetting of old facts is crucial, not just for relations. The self-paced learning approach is also a generalizable technique for improving the robustness of knowledge editing or even other fine-tuning tasks that suffer from difficult samples. This paper sets a new standard for how knowledge editing should be evaluated and performed, pushing the field towards more intelligent and responsible LLM maintenance.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.