Paper status: completed

Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning

Published:05/01/2022

Poisoning Attacks in Federated Learning (1)Production Federated Learning Security (1)Untargeted Data Poisoning Attacks (1)Federated Learning Defense Evaluation (1)Threat Modeling in Federated Learning (1)

Original Link

Price: 0.100000

4 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This study systematizes poisoning attacks on production federated learning, focusing on untargeted attacks. Findings reveal strong FL robustness with simple defenses despite advanced poisoning methods, correcting prior misconceptions about attack effectiveness.

Abstract

To appear in the IEEE Symposium on Security & Privacy, 2022 Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning Virat Shejwalkar ∗ , Amir Houmansadr ∗ , Peter Kairouz † , Daniel Ramage † ∗ University of Massachusetts Amherst † Google Research ∗ { vshejwalkar, amir } @cs.umass.edu, † { kairouz, dramage } @google.com Abstract —While recent works have indicated that federated learning (FL) may be vulnerable to poisoning attacks by com- promised clients, their real impact on production FL systems is not fully understood. In this work, we aim to develop a comprehensive systemization for poisoning attacks on FL by enumerating all possible threat models, variations of poisoning, and adversary capabilities. We specifically put our focus on un- targeted poisoning attacks, as we argue that they are significantly relevant to production FL deployments. We present a critical analysis of untargeted poisoning at- tacks under practical, production FL environments by carefully characterizing the set of realistic threat models and adversarial capabilities. Our findings are rather surprising: contrary to the established belief, we show

Mind Map

In-depth Reading

English Analysis~38 min read · 48,088 chars

1. Bibliographic Information

1.1. Title

Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning

1.2. Authors

Virat Shejwalkar*, Amir Houmansadr*, Peter Kairouz, Daniel Ramage *University of Massachusetts Amherst †Google Research

1.3. Journal/Conference

This paper was presented at NDSS (Network and Distributed System Security Symposium), which is a highly reputable, top-tier conference in the field of cybersecurity and network security. Its influence is significant, making it a key venue for publishing rigorous and impactful research in these domains.

1.4. Publication Year

2021

1.5. Abstract

This work critically evaluates the real-world impact of poisoning attacks on production Federated Learning (FL) systems. Despite previous research suggesting FL's vulnerability, the authors argue that the extent of this vulnerability in practical deployments is not fully understood. The paper provides a comprehensive systemization of poisoning attacks on FL, detailing various threat models, poisoning variations, and adversary capabilities, with a specific focus on untargeted poisoning attacks due to their relevance in production environments. Through a careful characterization of realistic threat models and adversarial capabilities, the study presents surprising findings: FL is highly robust in practice, even when utilizing simple and low-cost defenses. The authors introduce novel, state-of-the-art data and model poisoning attacks and demonstrate their (in)effectiveness through extensive experiments across three benchmark datasets in the presence of basic defense mechanisms. The paper aims to correct existing misconceptions and offers concrete guidelines for conducting more accurate and realistic research on FL robustness.

1.6. Original Source Link

/files/papers/690890e81ccaadf40a43450c/paper.pdf (Note: This link points to a local file path; typically an external URL would be provided for public access.) Publication Status: Officially published at NDSS 2021.

2. Executive Summary

2.1. Background & Motivation

The core problem the paper aims to solve is the misunderstanding and mischaracterization of the real-world impact of poisoning attacks on production Federated Learning (FL) systems. Federated Learning is an emerging machine learning paradigm where multiple clients collaboratively train a shared global model without sharing their raw private data with a central server. This distributed nature makes FL particularly attractive for applications involving sensitive data (e.g., healthcare, financial, mobile devices), as it enhances privacy and reduces communication costs.

However, a key feature of FL—training models with mutually untrusted clients—also makes it susceptible to poisoning attacks. In such attacks, a small fraction of compromised clients, controlled by an adversary, intentionally submit malicious updates to corrupt the global model. Previous works have indicated that FL might be highly vulnerable to these attacks, often showing that even a single compromised client can severely disrupt the learning process.

The paper argues that this perception is largely based on unrealistic assumptions about adversary capabilities and FL system parameters, which do not hold in real-world production deployments (e.g., Google's Gboard, Apple's Siri). These prior studies often focused on worst-case theoretical scenarios, neglecting the practical costs and difficulties associated with large-scale compromises in real systems. This creates a significant gap between academic findings and practical relevance, potentially leading to over-engineered defenses or misallocated research efforts.

The paper's entry point is a critical re-evaluation of poisoning attacks under practical, production FL environments. It specifically focuses on untargeted poisoning attacks, which aim to degrade the overall performance of the global model across all tasks and inputs. The authors argue that untargeted attacks are highly relevant for production systems because they can impact a broad user base and are difficult to detect, as the accuracy degradation might be subtle and without clear indicators of malicious intent. The innovative idea is to systematically define realistic threat models and adversarial capabilities for production FL and then rigorously test the efficacy of both existing and novel poisoning attacks against these practical settings, including simple, low-cost defenses.

2.2. Main Contributions / Findings

The paper makes several key contributions:

Comprehensive Systemization of FL Poisoning Threat Models: It introduces a detailed framework for understanding FL poisoning by enumerating all possible threat models based on three dimensions: adversary's objective, knowledge, and capability. Crucially, it identifies and characterizes only two threat models (nobox offline data poisoning and whitebox online model poisoning) as practically relevant for production FL, challenging the broad applicability of many theoretical models.
Novel, State-of-the-Art Poisoning Attacks: The authors propose improved data poisoning attacks (DPAs) and model poisoning attacks (MPAs).
- Improved DPAs: These are the first systematic data poisoning attacks tailored for FL, building on classic label flipping but systematically adjusting the amount of label-flipped data to circumvent robust aggregation rules (AGRs).
- Improved MPAs: These attacks utilize projected gradient ascent (PGA) to fine-tune the global model by increasing loss on benign data, then carefully adjust the L2-norm of the poisoned update to bypass robust AGRs, outperforming existing state-of-the-art methods.
Critical Analysis of FL Robustness in Practice: Through extensive experiments across three benchmark datasets (FEMNIST, CIFAR10, Purchase) and various FL parameters, the paper reaches several surprising conclusions that contradict established beliefs:
- High Robustness of Basic FL: Contrary to claims that Average AGR (a non-robust aggregation rule) cannot converge with even a single compromised client, the paper shows that production cross-device FL with Average AGR is highly robust and converges with high accuracy, even with practical percentages of compromised clients.
- Effectiveness of Simple Defenses: Poisoning attacks have minimal impact on existing robust FL algorithms (e.g., Norm-bounding, Multi-krum, Trimmed-mean) when $M$ (percentage of compromised clients) is practically low. Furthermore, simple, low-cost defenses like Norm-bounding provide equivalent protection to more sophisticated and computationally expensive robust AGRs.
- Data Size Limits as Defense: Limiting the size of the dataset contributed by each client ( $|Dp|$ ) is shown to be a highly effective, simple defense against data poisoning attacks, negating the need for complex robust aggregation algorithms.
- Cross-silo FL Robustness: For production cross-silo FL (fewer, larger clients), data poisoning attacks are entirely ineffective, and model poisoning attacks are argued to be impractical due to the high cost of compromising well-protected corporate clients.
  
  These findings correct previous misconceptions and provide concrete guidelines, urging the research community to focus on more realistic threat models and practical scenarios when investigating FL robustness.

3.1. Foundational Concepts

To fully grasp the paper's content, a reader should understand the following foundational concepts:

Machine Learning (ML): A field of artificial intelligence that enables systems to learn from data without explicit programming. The goal is often to train a model to make predictions or decisions.
Model: In machine learning, a model is a mathematical representation of a real-world process. After training, it can make predictions or classifications based on new input data.
Stochastic Gradient Descent (SGD): An iterative optimization algorithm used to minimize an objective function (or loss function) by taking steps proportional to the negative of the gradient of the function at the current point. In deep learning, it's widely used to update model parameters (weights and biases). A gradient indicates the direction of the steepest ascent of a function, so moving in the negative gradient direction decreases the function's value.
Loss Function (or Objective Function): A function that quantifies how well a machine learning model performs on a given task. The goal of training is to minimize this loss. For classification tasks, cross-entropy loss is common, measuring the difference between predicted probabilities and true labels.
Federated Learning (FL): A distributed machine learning paradigm where multiple decentralized clients (e.g., mobile devices, organizations) collaboratively train a shared global model orchestrated by a central server. Crucially, client data remains local and is not shared with the server or other clients. Instead, clients train local models on their data and send model updates (e.g., gradients or updated weights) to the server.
- Global Model ( $\theta^g$ ): The overarching model that the server maintains and updates based on aggregated client contributions.
- Client ( $\boldsymbol{k}$ ): An individual participant in the FL process, owning local private data ( $D_k$ ) and performing local model training.
- FL Round: A cycle of communication and computation in FL. In each round, the server selects a subset of clients, sends them the current global model, collects their updates, aggregates them, and updates the global model.
- Local Update ( $\nabla_k^t$ ): The change in model parameters computed by client $k$ in round $t$ based on its local data and the global model $\theta_g^t$ . Mathematically, it's often represented as $\nabla_k^t = \theta_k^t - \theta_g^t$ , where $\theta_k^t$ is the client's locally trained model.
- Aggregation Rule (AGR): A function used by the central server to combine the local updates received from multiple clients into a single aggregated update. This aggregated update is then used to update the global model. The simplest AGR is Average, where updates are simply averaged.
- Non-IID Data: In FL, data is often non-independent and identically distributed. This means that the data on one client might have a different distribution or characteristics compared to data on another client, reflecting real-world heterogeneity (e.g., different users typing different words).
Poisoning Attacks: A type of adversarial attack where an adversary injects malicious data into the training set (data poisoning) or manipulates model updates during training (model poisoning) to compromise the integrity or availability of the trained model.
- Compromised Clients: Clients controlled by an adversary, used to launch poisoning attacks.
- Untargeted Poisoning Attack: An attack whose objective is to reduce the overall accuracy or performance of the global model across all tasks and inputs, rather than causing specific misclassifications.
- Targeted Poisoning Attack: An attack whose objective is to cause the model to misclassify specific inputs or classes as desired by the attacker.
- Backdoor Attack: A type of targeted attack where the model is trained to behave normally on most inputs but misclassifies any input containing a specific, hidden trigger pattern.
Robust Aggregation Rules (Robust AGRs): Specialized aggregation rules designed to mitigate the impact of malicious or outlier client updates, thereby making the FL process more resilient to poisoning attacks. Examples include Median, Trimmed-mean, Krum, Multi-krum, and Norm-bounding.

3.2. Previous Works

The paper contextualizes its contributions against existing literature on FL poisoning attacks and defenses, highlighting their limitations.

Existing Poisoning Attacks:
- Data Poisoning Attacks (DPAs): Primarily studied for centralized ML, where the adversary manipulates the training data directly. In FL, this translates to compromising client data. The paper mentions label flipping attacks [45], [64], [65] as a classic example, where labels of data points are intentionally changed. Fang et al. [23] explored applying simple label flipping to FL.
- Model Poisoning Attacks (MPAs): In FL, adversaries directly manipulate the model updates sent by compromised clients.
  - Little Is Enough (LIE) attack [5]: Adds small, specifically crafted noise to benign updates to evade detection and poison the global model.
  - Static Optimization (STAT-OPT) attack [23]: Computes a static malicious direction based on benign updates and scales it to circumvent the target AGR.
  - Dynamic Optimization (DYN-OPT) attack [55]: Perturbs benign updates in a dynamic, data-dependent malicious direction, finding the largest perturbation that evades the AGR.
- Targeted and Backdoor Attacks: While the paper focuses on untargeted attacks, it briefly reviews these:
  - Targeted attacks [7], [58], [59] aim to misclassify specific samples. Bhagoji et al. [7] showed a single attacker could misclassify a single sample.
  - Backdoor attacks [3], [61], [67] inject a hidden trigger. Bagdasaryan et al. [3] demonstrated injecting semantic backdoors in next-word prediction. Wang et al. [61] proposed data/model poisoning for backdoor injection.
Existing Defenses (Robust AGRs):
- Dimension-wise filtering: Filters malicious values for each dimension of the updates independently. Examples: Median [70], Trimmed-mean [70], signSGD with majority voting [6].
- Vector-wise filtering: Aims to remove entire malicious client updates. Examples: RFA [50], RSA [36], Krum [10], Multi-krum [10], Bulyan [41], Divide-and-conquer (DnC) [55].
- Vector-wise scaling: Reduces the impact of poisoned updates by bounding their norms. Example: Norm-bounding [58].
- Certified defenses [14], [66]: Provide provable accuracy guarantees under certain conditions.
- Knowledge transfer based defenses [15], [38]: Reduce dimensionality of client updates to improve theoretical robustness.
- Personalization techniques [37], [71]: Fine-tune the global model on each client's private data to improve local performance, assuming a mostly benign global model.
  
  Critique of Prior Work's Assumptions: The paper critically highlights that much of the existing literature, both on attacks and defenses, makes unrealistic assumptions that do not hold in real-world FL deployments. For instance, state-of-the-art attacks often assume adversaries can compromise 25% (or even 50%) of FL clients. For a system like Gboard with 1 billion users, this would mean controlling 250-500 million devices, which is practically infeasible due to the immense cost and difficulty of at-scale compromises. Such assumptions, while interesting for theoretical worst-case analysis, do not reflect common real-world adversarial scenarios.

3.3. Technological Evolution

The evolution of machine learning has moved from centralized training (where all data is collected in one place) to distributed paradigms like Federated Learning to address concerns of data privacy, regulatory compliance, and computational efficiency at the edge. Initially, distributed training focused on speeding up computation, but FL introduced the crucial aspect of privacy-preserving collaboration.

With the rise of distributed training, new security challenges emerged. Adversarial machine learning (AML), which initially focused on adversarial examples for inference-time attacks, expanded to poisoning attacks during training. In the FL context, this means that the distributed nature, while beneficial for privacy, also introduces new attack vectors through compromised clients. Early research often treated FL as a generalization of distributed systems with Byzantine faults, leading to the development of robust aggregation rules.

This paper's work fits into this timeline by shifting the focus from theoretical, worst-case AML in FL to practical, production-level AML in FL. It argues that the previous generation of FL robustness research, while valuable theoretically, did not fully account for the unique constraints and scale of real-world FL deployments, thus necessitating a "back to the drawing board" approach.

3.4. Differentiation Analysis

Compared to the main methods in related work, this paper's core differences and innovations lie in its rigorous adherence to practical, production FL constraints and its comprehensive systemization of threat models under these constraints.

Realistic Threat Models: While previous work explored a wide range of theoretical threat models, this paper systematically prunes them down to only two (nobox offline data poisoning and whitebox online model poisoning) that are truly practical in production settings, justifying why others are less relevant. This is a significant departure from worst-case analyses.
Practical FL Parameters: The paper explicitly defines and uses practical ranges for FL parameters (e.g., percentage of compromised clients $M$ , total number of clients $N$ , local dataset sizes $|D|avg$ ) that are orders of magnitude more realistic than those used in prior studies (e.g., $M <= 0.1%$ vs. $M = 25-50%$ ).
Improved Attacks for Practical Settings: Instead of just using existing attacks, the authors develop novel data poisoning and model poisoning attacks specifically designed to be effective under these practical constraints, tailored to existing robust AGRs and even dataset characteristics (e.g., Projected Gradient Ascent for MPAs, Dynamic/Static Label Flipping with careful $|Dp|$ adjustment for DPAs).
Re-evaluation of Robustness: The paper's most significant differentiation is its contradiction of established beliefs. It demonstrates that:
- Simple Average AGR is far more robust in production cross-device FL than previously thought, due to client sampling.
- Simple, low-cost defenses like Norm-bounding provide comparable protection to complex, expensive robust AGRs.
- Data poisoning attacks are largely ineffective in cross-silo FL and can be mitigated by simple data size limits in cross-device FL.
  
  In essence, while previous work often asked "how can FL be broken in the worst case?", this paper asks "how vulnerable is FL in practice?" and finds it to be significantly more robust than widely believed, shifting the focus towards simpler, more efficient defenses and realistic threat modeling.

4. Methodology

4.1. Principles

The core idea behind the methodology is to conduct a rigorous and realistic re-evaluation of poisoning attacks in Federated Learning (FL). Instead of focusing on theoretical worst-case scenarios, the authors aim to understand the actual impact of such attacks on production FL systems. This is achieved by:

Systematically Characterizing Threat Models: Defining a comprehensive framework for FL poisoning by considering an adversary's objective, knowledge, and capability. Crucially, this systemization then identifies which combinations of these dimensions are truly practical for production deployments, rejecting unrealistic assumptions common in prior literature.
Developing Advanced Attacks: Designing novel data poisoning attacks (DPAs) and model poisoning attacks (MPAs) that are optimized to be effective within these practical threat models, thereby representing the strongest possible attacks an adversary could mount under realistic constraints.
Extensive Empirical Evaluation: Testing these advanced attacks against both non-robust and state-of-the-art robust Aggregation Rules (AGRs) across diverse datasets and various FL parameters, ensuring that the experimental setup mirrors production FL environments (e.g., realistic percentages of compromised clients, client sampling).
Challenging Established Beliefs: Through this rigorous empirical process, the paper aims to provide evidence that either confirms or contradicts common assumptions about FL's vulnerability and the necessity of complex defenses.

The theoretical basis is grounded in adversarial machine learning, particularly the concepts of data and model manipulation to achieve specific adversarial objectives (in this case, untargeted accuracy degradation). The intuition is that real-world constraints (like the cost of compromising many devices, limited data processing on client devices, or sparse client participation) inherently limit an adversary's power, making FL potentially more robust than worst-case analyses suggest.

4.2. Core Methodology In-depth (Layer by Layer)

The paper's methodology can be broken down into three main layers: systemization of threat models, development of improved attacks, and comprehensive experimental analysis.

4.2.1. Systemization of FL Poisoning Threat Models

The authors first establish a comprehensive systemization for poisoning attacks on FL (Section III). This involves defining key dimensions of the threat model and then identifying which combinations are practical for production FL.

4.2.1.1. Dimensions of Poisoning Threat to FL

The threat model is broken down into three key dimensions:

Adversary's Objective: This defines what the adversary wants to achieve.
- Security violation:
  - Integrity violation: Evade detection without disrupting service (e.g., targeted attacks).
  - Availability violation: Disrupt service for legitimate users (e.g., untargeted attacks).
- Attack specificity:
  - Discriminate: Misclassify a specific set/class of samples.
  - Indiscriminate: Misclassify all or most inputs.
- Error specificity:
  - Specific: Misclassify to a particular target class.
  - Generic: Misclassify to any wrong class.
    
    The paper focuses on untargeted attacks, which are defined as indiscriminate availability attacks with generic error specificity. This means the goal is to reduce the overall accuracy for all users on all data, without caring about specific misclassification targets. The rationale is that such attacks pose a significant threat to production FL, can affect a large population, and are difficult to detect if the accuracy drop is subtle.
Adversary's Knowledge: This describes what information the adversary has access to.
- Knowledge of the global model:
  - Nobox: Adversary does not know model architecture, parameters, or predictions. This is considered the most practical setting in cross-device FL, especially for data poisoning.
  - Whitebox: Adversary knows global model parameters and predictions. This requires deep control over compromised devices, often assumed for model poisoning.
- Knowledge of the data from benign distribution:
  - Full: Adversary can access all local data (benign and compromised clients).
  - Partial: Adversary can only access benign local data of compromised clients. The paper considers only the partial knowledge case as full knowledge is impractical in production FL.
Adversary's Capability (Attack mode): This describes what actions the adversary can take.
- Capability in terms of access to client devices:
  - Model Poisoning (MP): Adversary breaks into compromised devices and directly manipulates model updates. This allows for highly effective poisoning but requires significant access and resources.
  - Data Poisoning (DP): Adversary can only manipulate the local dataset of compromised clients. The clients then compute updates based on this poisoned data. This is less direct but requires less intrusive access to devices.
- Capability in terms of frequency of the attack:
  - Offline: Adversary poisons clients only once at the beginning of FL.
  - Online: Adversary repeatedly and adaptively poisons clients during FL.

4.2.1.2. Practical Considerations and Threat Models in Practice

The paper then applies practical considerations based on real-world FL deployments (cross-device and cross-silo FL, as described in Section II-B3 of the paper) to filter the possible threat models.

The following table, Table III, from the original paper, demonstrates the stark differences between parameter ranges used in the untargeted poisoning literature and their practical ranges for production FL:

The following are the results from Table III of the original paper:

Parameters/Settings	What we argue to be practical	Used in previousuntargeted works
FL type + Attack type	Cross-silo + DPAsCross-device + {MPAs, DPAs}	Cross-silo + MPAs
Total number of FLclients, N	Order of [10³, 10¹⁰] for cross-device[2, 100] for cross-silo	[50, 100]
Number of clientschosen per round, n	Small fraction of N for cross-deviceAll for cross-silo	All
% of compromisedclients, M	M ≤0.1% for DPAsM ≤0.01% for MPAs	[20, 50]%
Average size of benignclients' data, \|D\|avg	[50, 1000] for cross-deviceNot applicable to cross-silo	Not studied for cross-device[50, 1000] for cross-silo
Maximum size oflocal poisoning data	Up to 100 × \|D\|avg for DPAsNot applicable to MPAs	~ \|D\|avg

Based on these practical considerations and the dimensions outlined above, the paper identifies only two threat models as practically relevant for untargeted poisoning with partial knowledge of benign data:

1. Nobox Offline Data Poisoning (T4):
- Adversary's Knowledge: Nobox (no knowledge of global model architecture, parameters, or outputs). Adversary knows server's AGR, but model architecture knowledge may vary.
- Adversary's Capability: Data poisoning (manipulates local data) and Offline (poisons once at the start).
- Practicality: This model requires limited access to client devices (only manipulating local data), making it feasible to compromise a relatively large percentage of FL clients (e.g., up to 0.1%). However, its impact on model updates is indirect and thus generally limited.
2. Whitebox Online Model Poisoning (T5):
- Adversary's Knowledge: Whitebox (knows global model parameters and predictions whenever a compromised client is selected).
- Adversary's Capability: Model poisoning (directly manipulates updates) and Online (adaptively poisons repeatedly).
- Practicality: This model assumes deep compromise of client devices (e.g., breaking OS security protocols), which is extremely costly. Therefore, the adversary can only compromise a very small percentage of FL clients (e.g., up to 0.01%). However, direct manipulation of updates allows for potentially highly poisonous updates.

4.2.1.3. Defenses Evaluated

The paper selects a representative set of AGRs (Aggregation Rules) for evaluation, prioritizing those that offer practical performance and low overheads for production FL:

Average [40]: The basic, non-robust aggregation rule, simply averaging client updates. Widely used in practice due to its efficiency.
Norm-bounding [58]: A simple vector-wise scaling defense. It bounds the $L_2$ norm of all submitted client updates to a fixed threshold $\tau$ . If the norm of an update $\nabla$ exceeds $\tau$ , it is scaled down by $\frac{\tau}{\parallel \nabla \parallel_2}$ . The intuition is that highly poisoned updates might have unusually high norms.
Multi-krum [10]: A vector-wise filtering defense. It selects a subset of client updates that are "closest" to each other (i.e., less likely to be outliers/malicious) and then averages them. It repeats this process until a desired number of updates are selected. The goal is to remove entire malicious updates.

Trimmed-mean [68], [70]: A dimension-wise filtering defense. It sorts the values of each dimension of all client updates, removes a certain number of the largest and smallest values (the "trimmed" portions), and then computes the average of the remaining values for that dimension. This aims to remove extreme values caused by poisoning.

The following are the results from Table I of the original paper:

Type of aggregationrule (AGR)	Example AGR	Accuracyin non-iid FL	Computationat server	Memory costto client		Theoretical robustnessbased on
Non-robust	Average [40]	86.6	O(d)	O(d)		None
Dimension-wisefiltring	Median [70]Trimmed-mean [70]Sign-SGD +[6]majority voting	84.286.6	O(dnlogn)O(dnlogn)	O(d)		convergenceconvergenceconvergence
Dimension-wisefiltring		35.1	O(d)	O(d)		convergenceconvergenceconvergence
Vector-wise scaling	Norm-bound [58]	86.6	O(d)	O(d)		Not established
Vector-wisefitering	Krum [10]Multi-krum [10]	46.9	O(dn²)			convergenceconvergenceconvergenceconvergenceconvergencefiltering
	Krum [10]Multi-krum [10]		86.2	O(dn²)	O(d)
	Bulyan [41]			81.1			O(dn)O(dn)
	RFAA[50]RAS36]DnC 55]			84.6			O(dn)O(dn)
				35.6			O(d)
		86.1	O(d())
Certification	Emsemble [14]CRFL [66]	74.264.1	O(d)	O(Md)		CertificationCertification
Knowledgetransfer	Cronus [15]	Needs publicdata	O(d)	O(d)		filtering
Personalization	Ditto[37]EWWC[71]	86.6	O(d)	O(d)		None (dependson server's AGR)

4.2.2. Formulating FL Poisoning as an Optimization Problem

The paper models FL poisoning as an optimization problem, building on prior work [55]. The adversary's goal is to maximize the distance between the benign aggregate (what the global model would receive without attack) and the poisoned aggregate (what it receives with compromised clients). This is expressed as:

$\begin{array}{rcl} \underset{\nabla' \in \mathbb{R}^d}{\mathrm{argmax}} & & \Vert \nabla^b - \nabla^p \Vert \\ \mathrm{s.t.} & & \nabla^b = f_{\mathsf{avg}} \bigl( \nabla_{\{i \in [n']\}} \bigr) \\ & & \nabla^p = f_{\mathsf{agr}} \bigl( \nabla_{\{i \in [m]\}}' , \nabla_{\{i \in [n']\}} \bigr) \end{array}$ Here:

$\nabla'$ : The poisoned update crafted by the adversary (a vector in $d$ -dimensional space).
$\mathbb{R}^d$ : A $d$ -dimensional real vector space, indicating the space of model updates.
$\Vert \cdot \Vert$ : Typically the Euclidean ( $L_2$ ) norm, measuring the magnitude of a vector. The objective is to maximize the difference between the two aggregates.
$\nabla^b$ : The benign aggregate, which is the average of updates from benign clients (non-compromised).
$f_{\mathsf{avg}}$ : The Average aggregation rule, used to compute the benign aggregate.
$\nabla_{\{i \in [n']\}}$ : The set of benign updates available to the adversary (e.g., updates computed using benign data of compromised clients, or a subset of known benign updates). $n'$ is the number of such benign updates.
$\nabla^p$ : The poisoned aggregate, which is the result of the target aggregation rule $f_{\mathsf{agr}}$ applied to a mix of poisoned updates and benign updates.
$f_{\mathsf{agr}}$ : The target aggregation rule (e.g., Average, Norm-bounding, Multi-krum, Trimmed-mean) that the adversary tries to circumvent.
$\nabla_{\{i \in [m]\}}'$ : The set of $m$ replicas of the poisoned update $\nabla'$ submitted by the $m$ compromised clients. The assumption here is that all $m$ compromised clients submit identical poisoned updates $\nabla'$ .

4.2.3. Our Data Poisoning Attacks (DPAs)

The paper formulates a general DPA optimization problem based on (1), specifically for data poisoning, where the adversary manipulates the local dataset $D_p$ of compromised clients.

$\operatorname { argmax } _ { D _ { p } \subset \mathcal { D } } \quad \left\| \nabla ^ { b } - \nabla ^ { p } \right\|$ Here:

$D_p$ : The poisoning data used by compromised clients to compute their local updates. The adversary seeks to find the optimal $D_p$ .
$\mathcal{D}$ : The entire input space from which $D_p$ can be sampled or crafted.
The goal is to find $D_p$ such that when the global model $\theta^g$ is fine-tuned using $D_p$ , the resulting model $\theta'$ has a high cross-entropy loss on some benign data $D_b$ . The corresponding update $\nabla' = \theta' - \theta^g$ should also circumvent the target AGR. This way, the global model updated with $\nabla'$ will perform poorly on benign data.

The authors propose two label flipping (LF) strategies to generate $D_p$ :
Static LF (SLF): For a data sample $(\mathbf{x}, y)$ , the adversary flips the true label $y$ to a predetermined false label (e.g., $y \to (C-1-y)$ for even $C$ classes, or $y \to (C-y)$ for odd $C$ classes). This is a static, non-adaptive flipping strategy.
Dynamic LF (DLF): The adversary first trains a surrogate model $\hat{\theta}$ (an estimate of $\theta^g$ ) using available benign data. Then, for each sample $(\mathbf{x}, y)$ , the label $y$ is flipped to the least probable label predicted by $\hat{\theta}$ , i.e., $\operatorname{argmin} \hat{\theta}(\mathbf{x})$ . This is an adaptive strategy that uses knowledge of the current model state.

The core observation for DPAs is that increasing the amount of label-flipped data ( $|D_p|$ ) generally increases the loss and norm of the resulting updates, which can effectively reduce the global model's accuracy. However, excessively large $|D_p|$ might lead to updates that are easily detected and discarded by robust AGRs. Thus, the attacks involve adjusting $|D_p|$ to circumvent the target AGR.

As depicted in Figure 2 from the original paper, varying the size of poisoned data ( $D_p$ ) affects the objectives of DPAs on various aggregation rules (AGRs). Figures 2(a) and 2(b) show that increasing $|D_p|$ monotonically increases the update's loss and norm, respectively. This means larger poisoned datasets lead to more "deviant" updates. Figure 2(c) and 2(d) illustrate how this affects Trimmed-mean and Multi-krum. For Trimmed-mean, higher $|D_p|$ also increases its objective value, similar to Average AGR. For Multi-krum, only when $|D_p|$ is small (around 10) does the AGR select most of the poisoned updates, indicating that very large $|D_p|$ can be easily detected.

The following figure (Figure 2 from the original paper) shows the effect of varying the sizes of poisoned data:

$Figure 2: Effect of varying the sizes of poisoned data, `D _ { p }` , on the objectives of DPAs (Section IV-B2) on various AGRs. We compute `D _ { p }` by flipping the labels of benign data.$ 该图像是图表，展示了图2中不同中毒数据规模 $|D_{p}|$ 对基于FEMNIST数据集的多种防御机制（SLF与DLF）的攻击目标影响，包括损失、梯度范数、Trmean目标和Mkrum目标，部分曲线以 $D_{p}$ 为横轴，目标函数值为纵轴表示。 Alt text: Figure 2: Effect of varying the sizes of poisoned data, D _ { p } , on the objectives of DPAs (Section IV-B2) on various AGRs. We compute D _ { p } by flipping the labels of benign data.

DPA for Average AGR: To attack Average AGR, the strategy is to produce updates with arbitrarily large loss and norm using very large amounts of label-flipped data. This is because Average AGR does not have detection mechanisms for outlier norms or losses.
DPA for Norm-bounding: To attack Norm-bounding AGR, large $|D_p|$ is still used to generate poisoned updates that induce high losses. Even if their norms are bounded by the defense, these updates remain sufficiently "far" from benign updates to cause significant poisoning impact, especially at higher $M$ .
DPA for Multi-krum (Appendix B1): The objective is to maximize the number of poisoned updates selected by Multi-krum. The key observation is that Multi-krum tends to discard updates from very large $D_p$ . Therefore, the attack samples $D_p$ with sizes slightly higher than the average benign dataset size ( $|D|_{\mathsf{avg}}$ ) but not excessively large, to ensure the poisoned updates blend in better with benign updates and are selected by Multi-krum.
DPA for Trimmed-mean (Appendix B1): Similar to Average and Norm-bounding, the strategy is to use large $|D_p|$ with DLF/SLF strategies to create updates that have high loss and norm, which are then passed to the Trimmed-mean AGR. The goal is to push the values of certain dimensions to the extremes, such that even after trimming, the remaining average is still skewed.

4.2.4. Our Model Poisoning Attacks (MPAs)

Model poisoning adversaries can directly manipulate the compromised clients' updates. The paper proposes Projected Gradient Ascent (PGA) as a novel MPA. The goal is to craft a poisoned model $\theta'$ that has a high loss on benign data $D_b$ , while ensuring its corresponding update $\nabla' = \theta' - \theta^g$ circumvents the target AGR.

The process involves two main steps:

Stochastic Gradient Ascent (SGA): Unlike standard SGD which minimizes loss, SGA maximizes loss. The attack uses SGA to fine-tune the global model $\theta^g$ on some poisoning data $D_p$ to increase its loss on benign data. This produces a malicious model $\theta'$ , and subsequently a raw poisoned update $\nabla'$ . To perform SGA, the algorithm uses the opposite of a benign gradient direction.
Projection ( $f_{\mathsf{project}}$ ): After obtaining $\nabla'$ , the attack uses a projection function $f_{\mathsf{project}}$ to adjust $\nabla'$ so that it can bypass the robustness criteria of the target AGR. This typically involves scaling the update to have a norm within certain bounds (e.g., a radius $\tau$ around the origin).

The algorithms for PGA are provided in Appendix B:

The following is Algorithm 1 from the original paper:

Algorithm 1 Our PGA model poisoning attack algorithm
1: Input: {∇_{i ∈ [n']}, θ^g, f_{agr}, D_p
2: τ = (1/n') Σ_{i ∈ [n']} ||∇_i||  ▷ Compute norm threshold
                                  ▷ τ is given for norm-bounding AGR
3: θ' ← A_{SGA}(θ^g, D_p)      ▷ Update using stochastic gradient ascent
4: ∇' = θ' - θ^g           ▷ Compute poisoned update
5: ∇' = f_{project}(f_{agr}, ∇', τ, ∇_{{i ∈ [n']}}) ▷ Scale ∇' appropriately
6: Output ∇'

Input:
- $\nabla_{\{i \in [n']\}}$ : Set of benign updates available to the adversary.
- $\theta^g$ : The current global model.
- $f_{\mathsf{agr}}$ : The target aggregation rule.
- $D_p$ : The data used by the adversary for stochastic gradient ascent.
Line 2: Computes $\tau$ , an average of the norms of some benign updates. This serves as a reference for scaling. If the target AGR is Norm-bounding, $\tau$ would be the known norm-bounding threshold.
Line 3: The adversary uses stochastic gradient ascent ( $A_{\mathsf{SGA}}$ ) to update the global model $\theta^g$ using $D_p$ , resulting in a poisoned model $\theta'$ .
Line 4: The poisoned update $\nabla'$ is computed as the difference between the poisoned model $\theta'$ and the global model $\theta^g$ .
Line 5: The projection function $f_{\mathsf{project}}$ is called to scale $\nabla'$ appropriately to circumvent the target AGR.
Line 6: The final, scaled poisoned update $\nabla'$ is outputted.

The following is Algorithm 2 from the original paper:

Algorithm 2 f_{project}
1: Input: f_{agr}, ∇', τ, {∇_{i ∈ [n']}}
2: d* = 0  ▷ Initialize maximum deviation
3: γ* = 1  ▷ Optimal scaling factor that maximizes deviation in (1)
4: ∇'' = ∇' * (τ / ||∇'||) ▷ Scale ∇' to have norm τ
5: ∇^b = f_{avg}(∇_{{i ∈ [n']}}) ▷ Compute reference benign update
6: for γ ∈ [1, Γ] do
7:   ∇''' = γ ⋅ ∇''
8:   d = ||f_{agr}(∇_{{i ∈ [m]}}''', ∇_{{i ∈ [n']}}) - ∇^b||
9:   γ* = γ if d > d* and d* is updated
10:  γ = γ + δ  ▷ Update γ
11:end for
12: Output γ* ⋅ ∇''

Here:

Input:
- $f_{\mathsf{agr}}$ : Target aggregation rule.
- $\nabla'$ : The raw poisoned update from Algorithm 1.
- $\tau$ : Norm threshold.
- $\nabla_{\{i \in [n']\}}$ : Benign updates.
Line 2-3: Initializes $d*$ (maximum deviation found so far) and $γ*$ (the scaling factor that achieved $d*$ ).
Line 4: Scales the raw poisoned update $\nabla'$ to have a norm exactly $\tau$ , resulting in $\nabla''$ . This initial scaling ensures the update is within some expected bounds.
Line 5: Computes the benign aggregate $\nabla^b$ as a reference.
Line 6-11 (Loop for $\gamma$ optimization): The algorithm searches for an optimal scaling factor $\gamma$ within a predefined range $[1, \Gamma]$ (where $\Gamma$ is a large real number).
- Line 7: A scaled poisoned update $\nabla'''$ is generated by multiplying $\nabla''$ by the current $\gamma$ .
- Line 8: The deviation $d$ is calculated. This is the L2 norm of the difference between the poisoned aggregate (computed by $f_{\mathsf{agr}}$ using $m$ replicas of $\nabla'''$ and the benign updates) and the benign aggregate $\nabla^b$ . This directly corresponds to the objective in the optimization problem (1).
- Line 9: If the current deviation $d$ is greater than the maximum deviation $d^*$ found so far, then $d^*$ is updated to $d$ , and $\gamma^*$ is updated to the current $\gamma$ .
- Line 10: $\gamma$ is incremented by a step size $\delta$ .
Line 12: The final scaled poisoned update, $\gamma^* \cdot \nabla''$ , is outputted.

The following figure (Figure 3 from the original paper) depicts the idea of the $f_{\mathsf{project}}$ algorithm:

$Figure 3: Schematic of our PGA attack: PGA first computes a poisoned update $\\nabla ^ { \\prime }$ using stochastic gradient ascent (SGA). Then, $f _ { \\mathsf { p r o j e c t } }$ finds the scaling f…$ 该图像是论文中的示意图，展示了PGA攻击的原理。图中通过缩放因子 $\gamma$ 调整中毒更新 $\nabla^{\prime}$ ，并计算中毒聚合 $\nabla^{p}_{\gamma}$ 与良性聚合 $\nabla^{b}$ 之间的偏差，表达式为 $\nabla^{p}_{\gamma_i}=f_{agr}(\gamma_i \nabla^{\prime}_{\{i\in[m]\}}, \nabla^{\prime}_{\{i\in[n]^{\prime}\}})$ 。强鲁棒聚合会丢弃高或低gamma的更新。 Alt text: Figure 3: Schematic of our PGA attack: PGA first computes a poisoned update $\\nabla ^ { \\prime }$ using stochastic gradient ascent (SGA). Then, $f _ { \\mathsf { p r o j e c t } }$ finds the scaling factor $\\gamma$ that maximizes the deviation between benign aggregate $\\nabla ^ { b }$ and poisoned aggregate $\\nabla _ { \\gamma } ^ { p }$ . Robust aggregations easily discard the scaled poisoned updates, $\\gamma \\nabla ^ { \\prime }$ , with very high $\\gamma$ (e.g., $\\gamma _ { \\{ 4 , 5 \\} } ,$ , while those with very small $\\gamma$ (e.g., $\\gamma _ { \\{ 1 , 2 \\} }$ )have no impact.

The figure visually explains how different scaling factors $\gamma$ affect the poisoned aggregate $\nabla^p_\gamma$ . The attack tries to find a $\gamma$ that maximizes deviation from $\nabla^b$ while staying "undetected" by $f_{\mathsf{agr}}$ . Robust aggregators are shown to discard updates with very high or very small $\gamma$ .

MPA for Average AGR: Since Average AGR has no robustness constraints, $f_{\mathsf{project}}$ simply scales the poisoned update $\nabla'$ by an arbitrarily large constant (e.g., $10^{20}$ ). If a compromised client is selected, this massively scaled update will completely corrupt the global model.
MPA for Norm-bounding: Assuming the adversary knows the norm-bounding threshold $\tau$ , $f_{\mathsf{project}}$ scales $\nabla'$ such that its final norm is exactly $\tau$ , i.e., $\nabla' \leftarrow \nabla' \cdot \frac{\tau}{\parallel \nabla' \parallel}$ . This ensures the update passes the norm check but still maximizes its poisoning impact within the allowed norm.
MPA for Multi-krum (Appendix B2): Similar to DPA, the objective is to maximize the number of poisoned updates selected by Multi-krum. $f_{\mathsf{project}}$ searches for a $\gamma$ such that all scaled poisoned updates are selected. The algorithm iteratively tries different $\gamma$ values and checks if Multi-krum selects all $m$ compromised updates.
MPA for Trimmed-mean (Appendix B2): The Trimmed-mean algorithm is directly plugged into Algorithm 2 (specifically, in line 8). The PGA attack combines SGA to tailor the adversarial direction $\omega$ to the entire FL context (model, data, optimizer) and then uses the $f_{\mathsf{project}}$ mechanism to find the optimal scaling $\gamma$ that maximizes the deviation according to the Trimmed-mean AGR.

5. Experimental Setup

The experimental setup is designed to rigorously evaluate poisoning attacks under practical production FL environments, contrasting with the theoretical worst-case analyses often found in previous literature.

5.1. Datasets

The authors use three benchmark datasets, representative of typical FL tasks, to ensure broad applicability of their findings. The data distributions for CIFAR10 and Purchase are made non-IID using Dirichlet distribution to simulate real-world client heterogeneity, a common characteristic of production FL.

FEMNIST [13], [18]:
- Description: A character recognition classification task. It contains handwritten digits and letters.
- Characteristics: 3,400 clients, 62 classes (52 letters, 10 digits), 671,585 grayscale images. Each client originally has data of their own handwritten characters.
- Data Partitioning (Non-IID): To simulate a larger number of clients in cross-device FL and more severe non-IID conditions, each original client's data is further divided into $p \in \{2, 5, 10\}$ non-IID parts using Dirichlet distribution with $\alpha = 1$ . Unless specified, $p=10$ , resulting in 34,000 total clients.
- Model Architecture: LeNet [35] (a classic convolutional neural network for image recognition).
- Why Chosen: Represents a cross-device FL scenario with a very large number of clients and highly non-IID data, typical for mobile-device tasks like keyboard prediction.
CIFAR10 [34]:
- Description: An object recognition dataset.
- Characteristics: 10-class classification task, 60,000 RGB images ( $32 \times 32$ pixels), with 50,000 for training and 10,000 for testing.
- Data Partitioning (Non-IID): 1,000 total FL clients, with the 50,000 training images distributed using Dirichlet distribution with $\alpha = 1$ .
- Model Architecture: VGG9 architecture with batch normalization [56] (a deeper convolutional neural network).
- Why Chosen: A standard benchmark for image classification, allowing comparison with previous works, while being adapted to FL settings with non-IID data.
Purchase [51]:
- Description: A classification task related to predicting user purchasing behavior.
- Characteristics: 100 classes, 197,324 binary feature vectors, each of length 600.
- Data Partitioning (Non-IID): 187,324 data points used for training, distributed among 5,000 clients using Dirichlet distribution with $\alpha = 1$ . Validation and test sets each have 5,000 samples.
- Model Architecture: A fully connected network with layer sizes $\{600, 1024, 100\}$ .
- Why Chosen: Represents a categorical/tabular dataset, common in financial or recommender systems, providing a different data modality compared to image datasets.

Data Sample Example: For FEMNIST, a data sample would be a grayscale image of a handwritten character (e.g., 'a', 'B', '3'). For CIFAR10, a data sample would be a $32 \times 32$ RGB image of an object (e.g., a car, a bird, a frog). For Purchase, a data sample would be a binary vector of length 600, where each element indicates the presence or absence of a certain feature (e.g., previous purchase history, demographics).

These datasets were chosen because they are widely recognized benchmarks in machine learning and can be adapted to realistic FL settings with non-IID data and a large number of clients, effectively validating the method's performance under practical conditions.

5.2. Evaluation Metrics

The primary metric used to evaluate the impact of poisoning attacks is Attack Impact, denoted as $I_\theta$ .

Conceptual Definition: $Attack Impact (I_θ)$ quantifies the reduction in the global model's accuracy due to a poisoning attack. It directly measures how much the adversary succeeds in degrading the model's performance. A higher $I_\theta$ indicates a more effective attack.
Mathematical Formula: $ I_\theta = A_\theta - A_\theta^* $
Symbol Explanation:
- $I_\theta$ : The attack impact, representing the reduction in accuracy.
- $A_\theta$ : The maximum accuracy that the global model achieves over all FL training rounds without any attack (i.e., in a benign setting). This serves as the baseline performance.
- $A_\theta^*$ : The maximum accuracy of the global model under the given attack. This reflects the model's performance when subjected to poisoning.
  
  In addition to $I_\theta$ , the paper also implicitly uses global model accuracy to show the baseline performance and the performance under attack. For example, when stating "reduces the global model accuracy of FEMNIST from $83.4\%$ to $81.4\%$ ", the values $83.4\%$ and $81.4\%$ are $A_\theta$ and $A_\theta^*$ respectively, making $I_\theta = 2\%$ .

5.3. Baselines

The paper's method (novel DPAs and MPAs) is compared against several state-of-the-art existing poisoning attacks and evaluated across different aggregation rules (AGRs), both non-robust and robust.

Existing Poisoning Attacks (as baselines for comparison):
- Data Poisoning Attacks (DPAs):
  - Simple label flipping attacks [23]: Where each compromised client flips labels in a static manner.
- Model Poisoning Attacks (MPAs):
  - Little Is Enough (LIE) attack [5]: Adds small amounts of noise to benign updates.
  - Static Optimization (STAT-OPT) attack [23]: Computes a static malicious direction and scales it to bypass AGRs.
  - Dynamic Optimization (DYN-OPT) attack [55]: Perturbs benign updates in a dynamic, data-dependent malicious direction.
Aggregation Rules (AGRs) (as defense baselines):
- Non-robust:
  - Average [40]: The standard, naive aggregation rule, representing an unprotected FL system.
- Robust:
  - Norm-bounding [58]: A simple, low-cost defense that scales down updates if their L2 norm exceeds a threshold.
  - Multi-krum [10]: A vector-wise filtering defense that selects updates closest to each other.
  - Trimmed-mean [70]: A dimension-wise filtering defense that removes extreme values from each dimension.
    
    These baselines are representative because they include both simple and sophisticated attacks, as well as the most common and theoretically robust defense mechanisms found in the FL literature. Comparing against these allows the authors to show the relative effectiveness of their new attacks and to assess the true robustness of existing defenses under realistic conditions.

6. Results & Analysis

6.1. Core Results Analysis

The paper's results fundamentally challenge established beliefs about FL's vulnerability to poisoning attacks, especially under practical production settings. The core finding is that FL is significantly more robust than previous theoretical analyses suggested, even with simple defenses.

6.1.1. Evaluating Non-robust FL (Cross-device)

The paper first examines the Average AGR (Aggregation Rule), which is non-robust but widely used in practice. Previous works often claimed that even a single compromised client could prevent convergence with Average AGR.

However, the results show that for cross-device FL, with realistic percentages of compromised clients ( $M \leq 0.1\%$ ), the attack impacts ( $I_\theta$ ) are very low. The global model still converges with high accuracy. For instance, in FEMNIST, MPAs with $M=0.01\%$ caused only a $\sim 2\%$ accuracy drop, and DPAs with $M=0.1\%$ caused a $\sim 5\%$ drop. For CIFAR10 and Purchase, similar low impacts were observed.

This surprising robustness is attributed to the client sampling procedure in cross-device FL: only a very small fraction of all clients are selected in each round. Consequently, in many rounds, no compromised clients are chosen, limiting the overall attack impact.

The following figure (Figure 4a from the original paper) illustrates the low attack impacts on Average AGR:

$Figure 4: (4a) Attack impacts $\\left( I _ { \\theta } \\right)$ of state-of-the-art data (DPA-DLF/SLF) and model (MPA) poisoning attacks on cross-device FL with average AGR. $I _ { \\theta }$ 's are sig…$ 该图像是图表，展示了图4中不同数据集（FEMNIST、CIFAR10、Purchase）在非鲁棒和鲁棒联邦学习（FL）环境下，随着受攻击客户端百分比（M）的增加，多种数据和模型投毒攻击的攻击影响（Attack impact %）。图中通过平均（Average）、Norm-bound、Multi-krum和Trimmed-mean四种聚合规则对比攻击效果。 Alt text: Figure 4: (4a) Attack impacts $\\left( I _ { \\theta } \\right)$ of state-of-the-art data (DPA-DLF/SLF) and model (MPA) poisoning attacks on cross-device FL with average AGR. $I _ { \\theta }$ 's are significantly low for practical percentages of compromised clients $( M { \\leq } 0 . I \\% )$ . (4b) $I _ { \\theta }$ of varius poisoning aacks (Section IV) on robust AGRs (Section I-B). These AGRs are highly robust for pracical $M$ values.

Takeaway V-A: Contrary to common belief, production cross-device FL with (the naive) Average AGR converges with high accuracy even in the presence of untargeted poisoning attacks, especially when the percentage of compromised clients is within practical ranges.

6.1.2. Evaluating Robust FL (Cross-device)

The paper further evaluates robust AGRs (Norm-bounding, Multi-krum, Trimmed-mean) under practical $M$ values for cross-device FL.

Takeaway V-B1: Highly Robust in Practice: For practical percentages of compromised clients ( $M \leq 0.1\%$ for DPAs and $M \leq 0.01\%$ for MPAs), the attack impacts ( $I_\theta$ ) on robust AGRs are negligible ( $\leq 1\%$ ). This suggests that these robust AGRs are more than sufficient to protect production cross-device FL. Even over 5,000 FL rounds, MPAs with $M=0.1\%$ had minimal impact; Multi-krum and Trimmed-mean remained unaffected, while Norm-bounding showed less than a 5% accuracy reduction.

The following figure (Figure 6 from the original paper) shows the long-term robustness of AGRs:

$Figure 6: Even with a very large number of FL rounds (5,000), the state-of-the-art model poisoning attacks with $M { = } 0 . 1 \\%$ cannot break the robust AGRs (Section V-B).$ Alt text: Figure 6: Even with a very large number of FL rounds (5,000), the state-of-the-art model poisoning attacks with $M { = } 0 . 1 \\%$ cannot break the robust AGRs (Section V-B).
Takeaway V-B2: Simple Defenses are Sufficient: Norm-bounding, a simple and computationally efficient AGR (complexity $O(d)$ ), provides comparable protection to more sophisticated and computationally expensive AGRs like Multi-krum ( $O(dn^2)$ ) and Trimmed-mean ( $O(dn \log n)$ ). This questions the necessity of complex, high-overhead defenses for production FL.
Takeaway V-B3: Thorough Empirical Assessment is Necessary: The paper observes that even at theoretically claimed robustness percentages ( $M=25\%$ ), robust AGRs do not always exhibit superior performance over simpler ones. For example, for FEMNIST at $M=10\%$ , Trimmed-mean showed higher $I_\theta$ than Norm-bounding. This highlights that theoretical guarantees alone might be insufficient, and thorough empirical assessment with realistic attacks is crucial.

Figure 4b (re-referencing the same Figure 4 from above) also shows the attack impacts on robust AGRs, reinforcing these points.

6.1.3. Effect of FL Parameters on Poisoning (Cross-device)

The paper investigates how various FL parameters influence the effectiveness of poisoning attacks.

Takeaway V-C1: Limit on Local Poisoning Data Size ( $|D_p|$ ): $I_θ$ of DPAs slightly increases with $|D_p|$ . However, for all robust AGRs, DPAs have negligible impact when $|D_p|$ and $M$ are in practical ranges. This implies that enforcing a reasonable upper bound on $|D_p|$ (e.g., up to $100 \times |D|_{\mathsf{avg}}$ ) can act as a highly effective and simple defense against untargeted DPAs, without needing sophisticated robust AGRs.

The following figure (Figure 5 from the original paper) shows the effect of varying local poisoned dataset sizes:

$Figure 5: Effect of varying sizes of local poisoned dataset `D _ { p }` on impacts $I _ { \\theta }$ of the best of DPAs. When $| D _ { p } |$ and $M$ are in practical ranges, $I _ { \\theta }$ 's are…$ 该图像是多子图图表，展示了不同防御机制下，受攻击客户端比例变化对CIFAR10和FEMNIST数据集平均攻击影响的影响。横轴为受攻击客户端百分比，纵轴为攻击影响百分比，表明在实际防御下影响较小且依赖于数据集。 Alt text: Figure 5: Effect of varying sizes of local poisoned dataset D _ { p } on impacts $I _ { \\theta }$ of the best of DPAs. When $| D _ { p } |$ and $M$ are in practical ranges, $I _ { \\theta }$ 's are negligible for robust AGRs and are dataset dependent for non-robust Average AGR.
Takeaway V-C2: Effect of Average Dataset Size of Benign FL Clients ( $|D|_{\mathsf{avg}}$ ): Increasing $|D|_{\mathsf{avg}}$ (e.g., by having fewer clients with more data each) generally increases the accuracy of global models. For robust AGRs, cross-device FL remains highly robust to DPAs and MPAs even with moderately high $|D|_{\mathsf{avg}}$ . For Average AGR, MPAs can still be very effective if $|D|_{\mathsf{avg}}$ is large and the task is difficult (e.g., CIFAR10). This suggests that lower-bounding client data sizes can improve robustness.

The following figures (Figures 9 and 11 from the original paper) illustrate the effect of varying $|D|_{\mathsf{avg}}$ :

$Figure 9: With $1 \\%$ compromised clients, increasing $| D | _ { \\mathsf { a v g } }$ has no clear pattern of effects of on attack impacts, but it increases the global model accuracy as shown in Figu…$ 该图像是图表，展示了不同数据集（FEMNIST和CIFAR10）在采用Average和Norm-bound聚合规则下，随着平均本地数据量变化时多种攻击手段的攻击影响百分比。图中显示，Norm-bound聚合下攻击影响显著降低，而Average聚合下MPA攻击影响较大。 Alt text: Figure 9: With $1 \\%$ compromised clients, increasing $| D | _ { \\mathsf { a v g } }$ has no clear pattern of effects of on attack impacts, but it increases the global model accuracy as shown in Figure 11. Figure 12 shows the plots of attack impacts and the global model accuracy for Multi-krum and Trimmed-mean AGRs.

$Figure 11: Effect on the accuracy of global models of the average of local dataset sizes, $| D | _ { \\mathsf { a v g } }$ ,of the benign clients, with $1 \\%$ compromised clients. As discussed in Sect…$ 该图像是四个子图组成的图表，展示了在1%被攻陷客户端情况下，基于FEMNIST和CIFAR10数据集，使用不同防御策略和平均本地数据大小对全局模型准确率的影响。 Alt text: Figure 11: Effect on the accuracy of global models of the average of local dataset sizes, $| D | _ { \\mathsf { a v g } }$ ,of the benign clients, with $1 \\%$ compromised clients. As discussed in Section V-C2, increasing $| D | _ { \\mathsf { a v g } }$ increases the accuracy of the global models.

The following figure (Figure 12 from the original paper) shows similar observations for Multi-krum and Trimmed-mean AGRs:

$Figure 12: We make observations similar to Average and Norm-bound AGRs (Figures 9, 11 in Section V-C2) for Multikrum and Trimmed-mean about the effect of $| D | _ { \\mathsf { a v g } }$ on the attack…$ 该图像是图表，展示了在FEMNIST和CIFAR10数据集上，采用Multi-krum和Trimmed-mean防御机制时，不同平均本地数据规模对多种攻击影响和全局模型准确率的作用。横轴为平均本地数据规模，左纵轴为攻击影响百分比，右纵轴为全局模型准确率百分比，纵轴范围均为0到100。 Alt text: Figure 12: We make observations similar to Average and Norm-bound AGRs (Figures 9, 11 in Section V-C2) for Multikrum and Trimmed-mean about the effect of $| D | _ { \\mathsf { a v g } }$ on the attack impacts (left y-axes, solid lines) and on the global model accuracy (right y-axes, dotted lines), with $M { = } 1 \\%$ . All y-axes are from 0 to 100.

Takeaway V-C3: Number of Clients Selected Per Round ( $n$ ): Varying $n$ (the number of clients selected in each FL round) has no noticeable effect on attack impacts, except for MPAs on Average AGR. An increase in $n$ increases the chance of selecting compromised clients, which amplifies the MPA's effect on Average AGR because it completely halts learning once a compromised client is selected. For robust AGRs, this effect is mitigated.

The following figure (Figure 10 from the original paper) shows the impact of varying the number of selected clients:

$Figure 10: As discussed in Section V-C3, the number of clients, $n$ , chosen in each FL round has no noticeable effect on the attack impacts, with the exception of model poisoning on Average AGR. We…$ Alt text: Figure 10: As discussed in Section V-C3, the number of clients, $n$ , chosen in each FL round has no noticeable effect on the attack impacts, with the exception of model poisoning on Average AGR. We use $M = 1 \\%$ of compromised clients.
Takeaway V-C4: Effect of Unknown Global Model Architecture on DPAs: DPAs (like DPA-DLF) that rely on training a surrogate model are less effective if the adversary does not know the true global model architecture and uses a substitute. This limits the adversary's capability in nobox settings.

The following are the results from Table V of the original paper:

Layer name Layer size

Convolution + Relu 5×5×32

Max pool 2× 2

Convolution + Relu 5 × 5× 64

Max pool 2 × 2

Fully connected + Relu 1024

Softmax 62

Layer name	Layer size
Convolution + Relu	5×5×32
Max pool	2× 2
Convolution + Relu	5 × 5× 64
Max pool	2 × 2
Fully connected + Relu	1024
Softmax	62

The following figure (Figure 7 from the original paper) shows the impact of unknown architecture on DPA-DLF:

Alt text: Figure 7: As discussed in Section V-C4, impacts of the DPADLF attack from Section IV-B2 reduce if the architectures of the surrogate and the global model are different.

6.1.4. Evaluating Robustness of Cross-silo FL

For cross-silo FL (fewer, larger clients like corporations), the paper argues that model poisoning attacks are impractical due to the high cost and risk of compromising professionally maintained corporate systems. Therefore, only data poisoning attacks are considered.

Takeaway V-D: Data Poisoning Ineffective: The results show that cross-silo FL is highly robust to state-of-the-art DPAs, even against the non-robust Average AGR. This is because in cross-silo FL, each silo typically contains a very large amount of data from many users. Even if some users contribute poisoned data, the sheer volume of benign data within each silo, combined with the aggregation across many benign silos, mitigates the poisoning impact.

The following figure (Figure 8 from the original paper) shows the negligible impact of data poisoning attacks on cross-silo FL:

Alt text: Figure 8: All data poisoning attacks have negligible impacts on cross-silo FL, when compromised clients are concentrated in a few silos or distributed uniformly across silos (Section V-D).

6.2. Data Presentation (Tables)

The paper includes several tables to present its findings and context. Table I, Table III, and Table V have been transcribed and presented in the Methodology and Experimental Setup sections respectively, as per the guidelines.

6.3. Ablation Studies / Parameter Analysis

While not explicitly termed "ablation studies," the paper conducts extensive parameter analysis by varying key FL and attack parameters to understand their impact on robustness. This effectively serves the purpose of an ablation study, as it isolates the effects of different components and conditions.

Percentage of Compromised Clients (M): This is the most crucial parameter varied (e.g., $M \in [0.001\%, 10\%]$ ). The results consistently show that for practical $M$ values ( $M \leq 0.1\%$ for DPAs, $M \leq 0.01\%$ for MPAs), attack impacts are minimal, validating the paper's core thesis.
Size of Local Poisoning Data ( $|D_p|$ ): This parameter is varied for DPAs to find the optimal size for attacks and to evaluate Takeaway V-C1 regarding data size limits as a defense.
Average Size of Benign Clients' Data ( $|D|_{\mathsf{avg}}$ ): This is varied to understand its relationship with robustness, especially for different AGRs and tasks, contributing to Takeaway V-C2.
Number of Clients Selected Per Round ( $n$ ): Its effect on attack impact is analyzed, leading to Takeaway V-C3.
Global Model Architecture Knowledge: The impact of an adversary using a substitute architecture for DPA-DLF is analyzed, demonstrating reduced effectiveness when knowledge is imperfect (Takeaway V-C4).
FL Deployment Type (Cross-device vs. Cross-silo): A distinction is made between these two types, with separate analyses for their respective practical threat models and robustness.
Attack Type (DPA vs. MPA): Both data and model poisoning attacks are developed and evaluated, with DPAs generally being less impactful than MPAs under the same $M$ .
Aggregation Rule (AGR): The effectiveness of attacks is systematically tested against Average, Norm-bounding, Multi-krum, and Trimmed-mean AGRs, allowing for direct comparison of defense efficacy.

This comprehensive parameter sweep allows the authors to draw strong conclusions about the conditions under which FL is robust and which defenses are most effective in practice.

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper, "Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning," delivers a pivotal re-evaluation of FL robustness. Its primary contribution is a comprehensive systemization of FL poisoning threat models, rigorously filtering them to identify only two truly practical scenarios for production FL: nobox offline data poisoning and whitebox online model poisoning. Through this lens, the authors developed novel, state-of-the-art data and model poisoning attacks and conducted extensive experiments under realistic production FL conditions.

The key findings are surprising and contradict many established beliefs in the literature:

High Intrinsic Robustness: Production cross-device FL, even with the naive Average AGR, demonstrates high robustness and converges with good accuracy, primarily due to the client sampling process.
Simple Defenses Suffice: Basic, low-cost defenses like Norm-bounding provide protection comparable to more complex and computationally expensive robust AGRs, challenging the need for overly sophisticated solutions.
Data Limits as Effective Defense: Enforcing reasonable limits on the size of local poisoning data ( $|Dp|$ ) is a simple yet highly effective defense against data poisoning.
Cross-silo Robustness: Production cross-silo FL is largely immune to data poisoning attacks, and model poisoning attacks are deemed impractical due to the high costs of compromise.

In essence, the paper concludes that FL is significantly more robust in real-world deployments than previously thought, urging the research community to base future work on more realistic threat models and practical parameters.

7.2. Limitations & Future Work

The authors themselves point to a specific area for future work:

Theoretical Robustness Guarantees for Production FL: While their empirical analysis shows high robustness in practice, the paper suggests that obtaining concrete theoretical robustness guarantees for existing defenses in production FL settings (especially where only a very small fraction of clients are randomly selected per round) remains an open problem. This would bridge the gap between empirical observations and formal understanding.

Potential unstated limitations might include:
Scope of Attacks: The study exclusively focuses on untargeted poisoning attacks. Its conclusions do not directly extend to targeted or backdoor attacks, which have different objectives and may require different defenses.
Definition of "Practical": While the paper makes a strong case for its definition of "practical" percentages of compromised clients and other parameters based on industry discussions, these definitions might evolve or be context-dependent.
Homogeneity of Compromised Clients: The model poisoning attacks assume that all compromised clients submit identical poisoned updates ( $\nabla'$ ). In reality, coordinating identical updates across a botnet might be challenging or detectable.
Adversary Resources: The paper implicitly assumes that adversaries, even with whitebox capabilities, are still bound by computational and communication costs, which might not always be true for nation-state actors or highly resourced organizations.

7.3. Personal Insights & Critique

This paper offers several valuable insights and prompts critical reflection:

Paradigm Shift in FL Robustness Research: The most impactful aspect is the call to re-evaluate the research agenda for FL robustness. By rigorously demonstrating that many current assumptions about FL's vulnerability are based on unrealistic adversarial models, it pushes the community towards more practically relevant problems. This "back to the drawing board" approach is crucial for translating academic insights into real-world secure systems.
Value of Simplicity in Defenses: The finding that simple, low-cost defenses like Norm-bounding can be as effective as complex, resource-intensive ones for untargeted attacks is highly valuable. In resource-constrained environments like mobile devices, simpler solutions are often preferable due to lower computational overhead, energy consumption, and easier deployment. This insight could guide practitioners toward more efficient defense strategies.
Importance of Systemization: The detailed systemization of threat models (objective, knowledge, capability) is excellent. It provides a clear, structured way to think about adversarial scenarios, which can be applied beyond just poisoning attacks to other adversarial FL problems.
Empirical Rigor: The extensive experimental validation across diverse datasets and parameters, explicitly contrasting with theoretical claims, strengthens the paper's arguments. This level of empirical detail is essential for grounding theoretical discussions in practical realities.

Critique/Areas for Improvement:

The "Practicality" Thresholds: While the paper provides strong arguments for its "practical" percentages of compromised clients ( $M \leq 0.1\%$ for DPAs, $M \leq 0.01\%$ for MPAs), these are based on discussions with FL experts (presumably from Google) and cost estimates for creating botnets. It would be beneficial to see further independent validation or a more nuanced discussion of how these thresholds might vary across different application domains or with evolving adversarial tactics.
Generalization to Other Attack Types: The strong conclusions about FL's robustness are specific to untargeted poisoning. It's important for readers to remember that these findings do not necessarily extend to targeted attacks or backdoor attacks, which might still pose significant threats requiring sophisticated defenses. Future work, as the authors suggest, could systematically apply this rigorous framework to those attack types as well.
Defense Mechanism Details: While the paper references existing defense mechanisms, a deeper dive into why simple defenses like Norm-bounding are so effective under these specific practical conditions (e.g., how client sampling perfectly complements norm-bounding) would further strengthen the theoretical intuition behind the empirical findings. The call for theoretical guarantees in future work hints at this.

Overall, this paper is a highly influential piece of work that encourages a more pragmatic and realistic approach to securing Federated Learning, offering concrete guidelines that can steer future research and deployment practices.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.

Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning

TL;DR Summary

Abstract

Mind Map

In-depth Reading

English Analysis~38 min read · 48,088 chars

1. Bibliographic Information

1.1. Title

1.2. Authors

1.3. Journal/Conference

1.4. Publication Year

1.5. Abstract

1.6. Original Source Link

2. Executive Summary

2.1. Background & Motivation

2.2. Main Contributions / Findings

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

3.2. Previous Works

3.3. Technological Evolution

3.4. Differentiation Analysis

4. Methodology

4.1. Principles

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Systemization of FL Poisoning Threat Models

4.2.1.1. Dimensions of Poisoning Threat to FL

4.2.1.2. Practical Considerations and Threat Models in Practice

4.2.1.3. Defenses Evaluated

4.2.2. Formulating FL Poisoning as an Optimization Problem

4.2.3. Our Data Poisoning Attacks (DPAs)

4.2.4. Our Model Poisoning Attacks (MPAs)

5. Experimental Setup

5.1. Datasets

5.2. Evaluation Metrics

5.3. Baselines

6. Results & Analysis

6.1. Core Results Analysis

6.1.1. Evaluating Non-robust FL (Cross-device)

6.1.2. Evaluating Robust FL (Cross-device)

6.1.3. Effect of FL Parameters on Poisoning (Cross-device)

6.1.4. Evaluating Robustness of Cross-silo FL

6.2. Data Presentation (Tables)

6.3. Ablation Studies / Parameter Analysis

7. Conclusion & Reflections

7.1. Conclusion Summary

7.2. Limitations & Future Work

7.3. Personal Insights & Critique

Similar papers