NeuronMerge: Merging Models via Functional Neuron Groups
TL;DR Summary
NeuronMerge tackles the linearity problem in task arithmetic merging by classifying MLP neurons based on their function. It theoretically proves functional grouping enhances model linearity, then applies a neuron-based task arithmetic method. This approach consistently improves p
Abstract
Findings of the Association for Computational Linguistics: ACL 2025 , pages 9015–9037 July 27 - August 1, 2025 ©2025 Association for Computational Linguistics NeuronMerge: Merging Models via Functional Neuron Groups Wangyun Gu 1,2 * , Qianghua Gao 2 , Lixin Zhang 3 , Xu Shen 2 † , Jieping Ye 2 , 1 Zhejiang University, 2 Alibaba Cloud, 3 Zhejiang Gongshang University, wangyungu@zju.edu.cn, shenxu.sx@alibaba-inc.com Abstract Model merging techniques like task arithmetic, which combines model parameters through weighted averaging, have proven effective. However, the success of task arithmetic relies on the linearity between model weight differ- ences and output feature changes, which is often lacking in conventional fine-tuned mod- els. In this work, we employ neuron descrip- tion methods to analyze and classify neurons based on their functionalities. We theoretically demonstrate that grouping Multi-Layer Percep- tron (MLP) neurons by functionality enhances model linearity. Building on this, we propose a neuron-based task arithmetic merging method that consistently improves performance across various tasks and model scales. Our approach is complementary to existing merging
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
- Title: NeuronMerge: Merging Models via Functional Neuron Groups
- Authors: Wangyun Gu, Qianghua Gao, Lixin Zhang, Xu Shen, Jieping Ye
- Affiliations: Zhejiang University, Alibaba Cloud, Zhejiang Gongshang University
- Journal/Conference: Published in the Findings of the Association for Computational Linguistics (ACL) 2025. ACL is a premier, top-tier conference in the field of Natural Language Processing (NLP), indicating the work is of high quality and significance.
- Publication Year: 2025 (scheduled)
- Abstract: The paper addresses a key limitation in model merging techniques like
task arithmetic, which perform best when there is a linear relationship between changes in model weights and changes in output features. Since thislinearityis often absent in standard fine-tuned models, the authors propose a novel approach. They use neuron description methods to classify Multi-Layer Perceptron (MLP) neurons by their function (e.g., Math, Code). They theoretically prove that grouping neurons this way enhances model linearity. Based on this insight, they introduceNeuronMerge, a merging method that applies task arithmetic to these functional neuron groups independently. The method shows consistent performance improvements across various tasks and models, and is complementary to existing techniques. - Original Source Link: https://aclanthology.org/2025.findings-acl.471/ (PDF: https://aclanthology.org/2025.findings-acl.471.pdf). The paper is formally published in conference proceedings.
2. Executive Summary
Background & Motivation (Why)
- Core Problem: Combining multiple specialized Large Language Models (LLMs) into a single, multi-talented model without expensive retraining is a major challenge. A popular and efficient technique called task arithmetic achieves this by averaging the parameter differences ("task vectors") between fine-tuned models and a base model.
- The Gap: The success of task arithmetic hinges on a property called linearity—the assumption that changes in a model's weights correspond linearly to changes in its output. However, models fine-tuned with standard methods often do not satisfy this assumption, which limits the effectiveness of merging.
- Prior Art & Remaining Challenge: Recent work like
SubModule Linearimproved merging by applying task arithmetic at the level of sub-modules (e.g., entire attention or MLP layers), finding that these smaller components are more linear than the full model. However, they identified the MLP layers as a primary source of residual non-linearity, and simply breaking them down further did not work well. The non-linearity within MLPs remained an open problem. - Fresh Angle: This paper introduces a fresh perspective by connecting model merging with the field of model interpretability. Inspired by methods that can automatically describe what individual neurons do, the authors propose to go to a finer level of granularity than sub-modules: functional groups of neurons. The core idea is that neurons specialized for a specific function (e.g., math) behave differently from general-purpose neurons, and treating them separately during merging could resolve the non-linearity issue.
Main Contributions / Findings (What)
- Neuron Functionality Analysis: The authors conduct an extensive analysis of neurons in state-of-the-art LLMs (Qwen2.5, Llama-3.1), demonstrating that neurons exhibit a high degree of functional specialization (e.g.,
General,Math,Code,Translation). Crucially, they find that a neuron's function remains highly consistent even after fine-tuning on a specific task. - Theoretical Justification: They provide a theoretical proof that grouping MLP neurons by their functionality and merging them separately enhances the overall linearity of the model merging process, which directly explains why their method should be effective.
- NeuronMerge Method: They propose a novel merging framework,
NeuronMerge, which first classifies neurons into functional groups. It then applies task arithmetic selectively:- General-purpose neurons are merged to combine shared knowledge.
- Task-specific neurons have their fine-tuning updates discarded, reverting them to their state in the base model to prevent interference between specialized skills.
- Superior Performance: The proposed method consistently outperforms existing state-of-the-art merging techniques (
Task Arithmetic,DARE,SubModule Linearity) across multiple model families, sizes, and task combinations. Importantly,NeuronMergeis complementary to these methods, as it can be combined with them to achieve even better results.
3. Prerequisite Knowledge & Related Work
Foundational Concepts
- Large Language Models (LLMs): These are massive neural networks, typically based on the Transformer architecture, trained on vast amounts of text data. A key component of the Transformer is the Multi-Layer Perceptron (MLP), which consists of layers of interconnected computational units called neurons. These neurons introduce non-linearity, allowing the model to learn complex patterns.
- Model Fine-Tuning: The process of taking a pre-trained base LLM and further training it on a smaller, task-specific dataset (e.g., a dataset of math problems) to specialize its abilities.
- Model Merging: A set of techniques for combining the parameters of two or more fine-tuned models to create a single model that possesses the skills of all its "parents," without needing to be trained from scratch on all tasks.
- Task Arithmetic: A simple yet effective model merging technique. It defines a task vector for each fine-tuned model as the difference between its weights and the base model's weights: . The merged model is created by adding a weighted sum of these task vectors to the base model's weights: .
- Linearity: In the context of model merging, linearity refers to the property where the change in a model's output is directly proportional to the change in its weights. If a model is perfectly linear, adding 50% of a task vector () should result in exactly 50% of the performance gain from that task. Real models are non-linear, which complicates this simple arithmetic.
- Neuron Interpretability: A field of research that aims to understand the internal workings of neural networks. Neuron description methods attempt to explain the specific concept or function a single neuron is responsible for by analyzing what kind of inputs cause it to activate strongly.
Previous Works
- Task Arithmetic (Ilharco et al., 2023): This foundational work introduced the simple and efficient task vector addition/subtraction method for model editing and merging.
NeuronMergeis built directly on this paradigm. - Linearity in Merging (Ortiz-Jiménez et al., 2023): This work formally established the connection between the linearity of fine-tuned models and the success of task arithmetic. They showed that models that are more linear merge more effectively.
- SubModule Linearity (Dai et al., 2025): This paper was a key predecessor. It found that while a whole model might be non-linear, its individual components (like attention blocks or MLP layers) are more linear. By calculating separate merging weights for each submodule, they achieved state-of-the-art results. However, they noted that MLPs were still a major bottleneck.
- Neuron Description (Bills et al., 2023; Choi et al., 2024): These works developed automated methods to generate natural language explanations for what individual neurons do (e.g., "this neuron activates on Python import statements").
NeuronMergeadapts these ideas not for explanation, but for a functional classification to guide the merging process.
Differentiation
NeuronMerge distinguishes itself from prior work by operating at a novel, finer level of granularity.
- While
Task Arithmeticworks on the whole model andSubModule Linearityworks on entire layers,NeuronMergeworks on groups of individual neurons within each MLP layer. - It is the first work to successfully bridge the gap between neuron-level interpretability and the practical, performance-driven task of model merging. It uses insights about what neurons do to decide how they should be merged.
4. Methodology (Core Technology & Implementation)
The NeuronMerge framework is a multi-step process, combining neuron analysis with a targeted merging strategy. The overall framework is depicted in Figure 5.
Figure 5: The framework or neuron grouping and merging in LLM. It consists of three main steps: Neuron Grouping Based on Functionality, which classifies neurons into different categories; 2) Merging for MLP Layers, wher Generalneurons are merged by SubModule Linearity whileothers arediscarded tfocus on the Base model's parameters; and 3) Merging for Attention and Embedding Layers, where different methods can be utilized as complementary approaches including DARE or SubModule Linearity in our main experiments.
Part 1: Neuron Functionality Classification
The first step is to categorize each neuron in the MLP layers based on the type of input that makes it activate most strongly.
-
Preliminaries: In modern LLMs, a neuron in an MLP layer computes an output (activation) based on an input vector . The activation function is typically SwiGLU, and its output can be expressed as: Where:
- is the input from the residual stream.
- and are the weight vectors associated with neuron .
SiLUis the Sigmoid Linear Unit activation function, .
-
Classification Procedure:
- Define Categories & Corpus: Establish a set of functional categories (e.g.,
General,Math,Code,Translation). For each category, collect a large number of representative text examples (exemplars). - Measure Activation: Pass each exemplar through the base LLM. For each neuron and each token in the exemplar, record the activation value .
- Summarize Sentence-Level Activation: To get a single activation score for the entire exemplar , take the maximum absolute activation value across all its tokens:
- Identify Top Exemplars: For each neuron , find the top- exemplars from the entire corpus that produce the highest activation scores . This set is denoted . In the paper, .
- Assign Category: The functionality of neuron is determined by the most frequent category among its top- exemplars. This process is repeated for all neurons in all MLP layers of the base model.
- Define Categories & Corpus: Establish a set of functional categories (e.g.,
Part 2: Theoretical Link to Linearity
The paper provides a theoretical argument in Appendix B for why this functional grouping improves linearity.
- Source of Non-Linearity: Using a simplified MLP with a
ReLUactivation function, the authors show that non-linearity primarily arises from neurons changing their activation state (i.e., flipping from off to on, or vice versa) after fine-tuning. - How Grouping Helps: By identifying neurons that activate very strongly for a specific type of data (e.g., a math neuron on math problems), their activation value is likely to be large and positive. When a small fine-tuning update is added, the new activation value is very likely to remain positive. This prevents the neuron from changing its activation state, thus reducing a major source of non-linearity and making task arithmetic more effective for that group of neurons.
Part 3: The NeuronMerge Algorithm
Based on the neuron classification, the final merging algorithm proceeds as follows (visualized in Figure 1).
Figure 1: Functional neuron groups: grouping neurons according to neuron functionalities such as General, Math, Code and Translation.
-
Neuron Grouping: Using the method from Part 1, classify all neurons in the base model into groups for each layer : , where is
Generaland are the task-specific categories. These groupings are then applied to all fine-tuned models. -
Merging MLP Layers: The core innovation happens here. For each neuron in layer , the merged weight is calculated as: Where is the task vector for that neuron from fine-tuned model . The merging coefficient is determined by the neuron's group:
- For
Generalneurons: Merge them using a standard method likeSubModule Linear(SubLin), which calculates optimal merging weights for the group. - For task-specific neurons (Math, Code, etc.): Set their merging coefficient to 0. This effectively discards their task vectors and reverts their weights to those of the base model . This prevents specialized, and potentially conflicting, skills from interfering with each other.
- For
-
Merging Attention and Embedding Layers: For all other model parameters (self-attention, embeddings), a standard merging method can be used. The authors experiment with two options:
- NeuronMerge1: Uses
DARE(task arithmetic with dropout) for these layers. - NeuronMerge2: Uses
SubModule Linearfor these layers.
- NeuronMerge1: Uses
5. Experimental Setup
-
Datasets:
- Backbone Models: Qwen2.5-7B/14B, Llama-3.1-8B, and Llama-2-13B.
- Fine-tuning Tasks:
- Math: GSM8K dataset.
- Coding: Code Alpaca dataset.
- Translation: A Chinese-English dataset from Xu et al. (2024a).
- Neuron Classification Corpus: A corpus of 1,200 exemplars was created with 300 examples for each of the four categories, sourced from external datasets (
RedPajama,Proof-file-2,StarCoder, etc.) to ensure generalization.
-
Evaluation Metrics:
- Performance was measured on the test sets of the respective tasks.
- Math (GSM8K): Accuracy. This measures the percentage of math word problems for which the model produces the correct final numerical answer.
- Coding (HumanEval): Pass@1. This measures the percentage of programming problems for which the model generates functionally correct code that passes all unit tests on the first attempt.
- Translation: BLEU score (implied, common for this task). Measures the n-gram overlap between the model's translation and a set of reference translations.
- The paper reports the average score across all tasks involved in a merge to evaluate multi-task performance.
-
Baselines:
- Fine-tuned Model: The performance of the individual single-task models (upper bound for that one task).
- Task Arithmetic: The standard method of averaging task vectors.
- DARE: An enhancement of Task Arithmetic that randomly drops out parameters in the task vectors to reduce interference.
NeuronMerge1is compared against this. - SubModule Linearity: The state-of-the-art method that calculates merging weights for each submodule.
NeuronMerge2is compared against this.
6. Results & Analysis
Core Results
The main results, presented in Tables 1 and 2, show that NeuronMerge consistently outperforms the baselines.
This is a manual transcription of Table 1 from the paper. Table 1: Results of Qwen2.5-7B
| Methods | Math & Coding | Math & Translate | Coding & Translate | Math & Coding & Translate |
| Fine-tuned Model | 71.48 | 81.43 | 77.02 | 76.64 |
| Task Arithmetic | 69.73 | 81.71 | 74.81 | 75.36 |
| DARE | 69.84 | 82.31 | 75.11 | 76.10 |
| NeuronMerge1 | 71.12 ±0.36 | 82.62 ±0.16 | 75.44 ±0.21 | 76.82 ±0.11 |
| SubModule Linearity | 69.18 | 82.19 | 74.77 | 75.42 |
| NeuronMerge2 | 70.80 ±0.33 | 82.70 ±0.20 | 74.90 ±0.16 | 76.21 ±0.23 |
This is a manual transcription of Table 2 from the paper. Table 2: Results of Llama3.1-8B
| Methods | Math & Coding | Math & Translate | Coding & Translate | Math & Coding & Translate |
| Fine-tuned Model | 47.71 | 71.55 | 62.25 | 60.50 |
| Task Arithmetic | 47.41 | 70.45 | 61.93 | 59.04 |
| DARE | 46.81 | 70.27 | 61.96 | 58.26 |
| NeuronMerge1 | 47.24 ±0.14 | 70.39 ±0.21 | 62.89 ±0.26 | 59.94 ±0.20 |
| Submodule Linearity | 47.13 | 70.43 | 62.65 | 59.37 |
| NeuronMerge2 | 47.73 ±0.28 | 70.69 ±0.29 | 63.00 ±0.16 | 59.88 ±0.22 |
- Consistent Improvement: In nearly all settings,
NeuronMerge1surpassesDARE, andNeuronMerge2surpassesSubModule Linearity. This holds for both Qwen and Llama models, demonstrating the robustness of the approach. For example, in the Qwen2.5-7B Math & Coding merge,NeuronMerge2achieves a score of 70.80, a significant improvement overSubModule Linearity's 69.18. - Complementarity: The fact that the proposed neuron-level strategy improves upon two different merging methods for the attention/embedding layers confirms that
NeuronMergeis an orthogonal and complementary technique. - Surpassing Single Models: In some cases (e.g., Llama3.1-8B Math & Coding), the
NeuronMerge2result (47.73) even slightly surpasses the average of the individual fine-tuned models (47.71), indicating a highly successful merge with minimal interference.
Neuron Functionality Analysis
The paper provides strong evidence to support its core premise that neurons are functionally specialized and that this classification is meaningful.
Figure 3: Neuron classification results in layer 0 for Qwen2.5-7B and three fine-tuned models, shown as bar charts for different categories of neurons. Base model is Qwen2.5-7B, gsm8k_sft, code_sft, translate_sft are Qwen2.5-7B fine-tuned variants on Math, Code, Translation dataset respectively. Purple bar is the number of intersection neurons for each functionality category.
-
Functional Stability (Figure 3): This figure shows that the distribution of neuron types (
General,Math, etc.) remains remarkably stable between the base model and its fine-tuned variants. The large purple bars indicate a high overlap, meaning most neurons retain their original function after fine-tuning. This stability is the key property that allows the classification on the base model to be used for all models in the merge.
Figure 4: Knock-out code neurons in Qwen2.5-7B (layer 13) will lead to a greater decline in coding ability than random neuron knock-out. The horizontal coordinate indicates the knock-out proportion. Performances are tested on HumanEval (Chen et al., 2021). -
Functional Relevance (Figure 4): This experiment validates that the classified neurons are indeed critical for their respective tasks. Deactivating ("knocking out") neurons classified as
Codecauses a much sharper drop in coding performance on HumanEval than deactivating a random set of neurons of the same size. This proves the classification is not arbitrary but captures true functional roles.
Ablation Studies
The ablation studies explore key design choices in the NeuronMerge method.
-
Neuron Merging Strategy (Table 3): This is a crucial ablation. For task-specific neurons (e.g.,
MathandCode), the authors tested three strategies: (1)Choose Model(take the neurons from the corresponding fine-tuned model), (2.Merge(average them usingSubModule Linear), and (3)Drop(revert to the base model's neurons). This is a manual transcription of Table 3 from the paper. Table 3: Comparison of merging options for Qwen2.5-7BMath neuron \ Code neuron Choose Code Model Merge Drop Choose Math Model 70.42 69.95 69.94 Merge 70.21 69.89 70.63 Drop 70.52 70.64 70.80 The best performance (70.80) is achieved when the task vectors for both
MathandCodeneurons are dropped. This counter-intuitive result suggests that the specialized adaptations made during fine-tuning are so specific that they create conflicts. It is better to rely on theGeneralneurons for multi-task integration and keep the specialized neurons in their "neutral," pre-fine-tuned state to avoid interference. -
Random Neuron Grouping (Table 4): When neurons were randomly assigned to groups of the same size as the functional groups, the performance of the merged model was significantly worse (68.36 avg) than with the proposed method (70.80 avg). This confirms that the performance gain comes from the meaningful, functionality-based grouping, not just the act of partitioning neurons.
7. Conclusion & Reflections
Conclusion Summary
The paper successfully introduces NeuronMerge, a novel model merging technique that operates at the fine-grained level of functional neuron groups. By classifying neurons based on their activation patterns and applying a targeted merging strategy—merging general neurons while resetting task-specific ones—the method effectively enhances model linearity and mitigates task interference. The approach is theoretically grounded, empirically validated on multiple SOTA models, and proves to be a powerful, complementary addition to the existing suite of model merging tools.
Limitations & Future Work
The authors acknowledge two primary limitations:
- Classification Precision: The method of classifying neurons based on the dominant category in their top-k activating examples is effective but might not capture more nuanced or polysemantic neurons (neurons that serve multiple functions).
- Exclusion of Attention Mechanisms: The analysis and merging strategy focus exclusively on MLP neurons. The functional roles of attention heads and their interaction with MLP neurons were not investigated, representing a significant area for future research.
Personal Insights & Critique
- Bridge Between Interpretability and Application: This work is an excellent example of how insights from LLM interpretability can directly lead to practical performance improvements. It transforms the abstract question "What does this neuron do?" into an actionable engineering decision: "How should this neuron be merged?"
- The "Drop" Strategy is Key: The most fascinating finding is that for specialized neurons, the best action is to undo the fine-tuning. This supports the hypothesis that fine-tuning creates highly specialized, often orthogonal, circuits that don't mix well. The general-purpose knowledge resides in the
Generalneurons, which act as the backbone for multi-task capability. This has broader implications for understanding catastrophic forgetting and task interference in continual learning. - Practicality and Scalability: The neuron classification step adds a computational overhead, but the paper shows it is manageable (around 1 minute on an A100 GPU for a 7B model). The storage cost is also minimal. The main practical consideration is curating the exemplar datasets for each new task, although the authors suggest using LLMs to synthesize data if real data is unavailable.
- Future Directions: An exciting extension would be to apply this concept dynamically. Instead of a hard classification, one could imagine a "soft" merging where the merge coefficient for a neuron depends on how relevant its function is to the target input at inference time. Furthermore, extending this functional analysis to attention heads is a logical and promising next step.
Similar papers
Recommended via semantic vector search.