Paper status: completed

A generalized e-value feature detection method with FDR control at multiple resolutions

Published:09/25/2024
Original LinkPDF
Price: 0.100000
Price: 0.100000
0 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This paper introduces the Stabilized Flexible e-Filter Procedure (SFEFP) for detecting significant features and groups across multiple resolutions while controlling the false discovery rate (FDR). SFEFP outperforms existing methods by flexibly integrating detection processes and

Abstract

Multiple resolutions arise across a range of explanatory features due to domain-specific structures, leading to the formation of feature groups. It follows that the simultaneous detection of significant features and groups aimed at a specific response with false discovery rate (FDR) control stands as a crucial issue, such as the spatial genome-wide association studies. Nevertheless, existing methods such as the multilayer knockoff filter (MKF) generally require a uniform detection approach across resolutions to achieve multilayer FDR control, which can be not powerful or even not applicable in several settings. To fix this issue effectively, this article develops a novel method of stabilized flexible e-filter procedure (SFEFP), by constructing unified generalized e-values, developing a generalized e-filter, and adopting a stabilization treatment. This method flexibly incorporates a wide variety of base detection procedures that operate effectively across different resolutions to provide stable and consistent results, while controlling the false discovery rate at multiple resolutions simultaneously. Furthermore, we investigate the statistical theories of the SFEFP, encompassing multilayer FDR control and stability guarantee. We develop several examples for SFEFP such as eDS-filter and eDS+gKF-filter. Simulation studies demonstrate that the eDS-filter effectively controls FDR at multiple resolutions while either maintaining or enhancing power compared to MKF. The superiority of the eDS-filter is also demonstrated through the analysis of HIV mutation data.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

A generalized e-value feature detection method with FDR control at multiple resolutions

1.2. Authors

  • Chengyao Yu 1,3,4^ { 1 , 3 , 4 }
  • Ruixing Ming 1,b^ { 1 , \mathsf { b } }
  • Min XIao 1,c^ { 1 , \mathrm { c } }
  • Zhanfeng Wang 2,d^ { 2 , \mathrm { d } }
  • Bingyi Jing 3,e^ { 3 , \mathrm { e } }

Affiliations:

  • 1^1 School of Statistics and Mathematics, Zhejiang Gongshang University
  • 2^2 School of Management, University of Science and Technology of China
  • 3^3 Department of Statistics and Data Science, Southern University of Science and Technology
  • 4^4 Emails provided indicate affiliations with Zhejiang Gongshang University and Southern University of Science and Technology.

1.3. Journal/Conference

This paper is published as a preprint on arXiv. Venue Reputation: arXiv is a popular open-access repository for preprints of scientific papers in various fields, including mathematics, physics, computer science, and quantitative biology. It allows researchers to disseminate their work quickly before formal peer review and publication, making it a highly influential platform for early research sharing.

1.4. Publication Year

Published at (UTC): 2024-09-25T15:46:46.000Z, implying a publication year of 2024.

1.5. Abstract

The paper addresses the crucial issue of simultaneously detecting significant features and groups (feature groups formed due to domain-specific multi-resolution structures) with false discovery rate (FDR) control. Existing methods, such as the multilayer knockoff filter (MKF), often require a uniform detection approach across resolutions for multilayer FDR control, which can be less powerful or inapplicable in certain scenarios. To overcome this, the authors propose a novel method called the stabilized flexible e-filter procedure (SFEFP). SFEFP involves constructing unified generalized e-values, developing a generalized e-filter, and applying a stabilization treatment. This method offers flexibility by incorporating diverse base detection procedures that operate effectively at different resolutions, aiming to provide stable, consistent, and powerful results while simultaneously controlling the FDR at multiple resolutions. The paper also provides theoretical guarantees for SFEFP, including multilayer FDR control and stability. Practical examples, such as the eDS-filter and eDS+gKF-filter, are developed. Simulation studies demonstrate that the eDS-filter effectively controls FDR at multiple resolutions while matching or exceeding the power of MKF. The efficacy of the eDS-filter is further validated through analysis of HIV mutation data.

2. Executive Summary

2.1. Background & Motivation

Core Problem

The paper addresses the challenge of identifying relevant features and groups of features that influence a specific response variable, particularly when these features exhibit multi-resolution structures (i.e., they can be grouped in different meaningful ways). The core problem is to achieve this detection while rigorously controlling the false discovery rate (FDR) simultaneously across all these resolutions (individual features and different levels of feature groups).

Importance and Challenges

  • Domain-Specific Structures: Many scientific domains naturally present data with multi-resolution structures. For example, in genome-wide association studies (GWAS), one might be interested in individual SNPs (single nucleotide polymorphisms) and the genes that harbor them.
  • Scientific Significance: Discoveries across different resolutions are often of substantial interest to domain scientists, as identifying an important feature also implies the significance of the group it belongs to, and vice-versa.
  • Reproducibility and False Discoveries: Ensuring reproducibility of scientific findings necessitates rigorous control of false discoveries. FDR is a suitable statistical measure for this in large-scale multiple testing.
  • Limitations of Existing Methods:
    • Uniform Approach: Current methods like the p-filter or multilayer knockoff filter (MKF) often demand a uniform detection strategy across all resolutions. This uniformity can lead to sub-optimal performance, being either not powerful (failing to detect true signals) or not applicable in diverse settings.
    • P-value Challenges: p-value based methods (like p-filter) struggle with constructing valid p-values in high-dimensional settings and can lose power when handling dependencies by reshaping p-values.
    • Knockoff Limitations: MKF, while offering multilayer FDR control, can be overly conservative due to decoupling dependencies among cross-layer knockoff statistics. When features are highly correlated, knockoff-based procedures can suffer severe power loss. Additionally, model-X knockoff methods often require estimating the joint distribution of features, which is computationally intractable in many scenarios.
    • "One-bit" Problem: Existing e-filter methods, specifically e-MKF (which leverages one-bit knockoff e-values), can suffer from a "zero-power dilemma" where, if there are conflicts in discoveries across layers, the method might declare all features unimportant, leading to no selections.

Paper's Innovative Idea

The paper's entry point is to fix the limitations of existing multi-resolution FDR control methods by introducing a flexible and stabilized framework that doesn't rely on a single, uniform detection approach. It proposes to leverage e-values (an alternative to p-values that is simpler to construct and integrate) and allow for the integration of various state-of-the-art detection techniques at different resolutions. The key innovation is to introduce "generalized e-values" that can be derived from any FDR-controlled procedure, combine them through a "generalized e-filter," and apply a "stabilization treatment" to overcome the "one-bit" problem and enhance power.

2.2. Main Contributions / Findings

The paper makes several significant contributions:

  • Generalized E-Filter and Unified Generalized E-Values:
    • Develops a novel generalized e-filter procedure.
    • Proposes a unified construction of generalized e-values (including relaxed e-values, asymptotic e-values, and asymptotic relaxed e-values). This construction is more generic than prior e-value methods, allowing integration of a broad class of FDR-controlled procedures (beyond just p-values or knockoffs) as base detection methods for each layer. This flexibility enables users to choose the most suitable technique for each resolution.
  • Stabilized Flexible E-Filter Procedure (SFEFP):
    • Introduces SFEFP which merges these generalized e-values with a stabilization treatment. SFEFP overcomes the "zero-power dilemma" of previous e-filter methods (like e-MKF with one-bit e-values) by producing non-binary generalized e-values that better reflect the ranking of feature/group importance. This significantly improves detection power and stability.
  • Theoretical Guarantees:
    • Investigates the statistical theories of SFEFP, establishing rigorous multilayer FDR control guarantees.
    • Provides a stability guarantee for SFEFP under finite samples, demonstrating that as the number of replications increases, the selection set converges almost surely to a fixed set.
  • Practical Examples and Enhanced Performance:
    • Develops concrete examples of SFEFP instantiations:

      • eDS-filter: Extends the DS (Data Splitting) method to multiple resolutions, offering a powerful alternative to MKF and e-MKF in highly correlated settings.
      • eDS+gKF-filter: A hybrid method combining DS for individual features and group knockoff for feature groups, suitable for scenarios with varying correlation structures.
    • Simulation studies demonstrate that the eDS-filter effectively controls FDR at multiple resolutions while maintaining or enhancing power compared to MKF.

    • Analysis of HIV mutation data further confirms the superiority of the eDS-filter over MKF+MKF+ and e-MKF in real-world applications, achieving higher power with robust FDR control.

      In summary, the paper offers a flexible, robust, and powerful framework for feature detection with FDR control at multiple resolutions, addressing critical shortcomings of existing approaches and opening avenues for incorporating diverse, state-of-the-art detection techniques.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand this paper, a reader needs familiarity with concepts in multiple hypothesis testing, particularly regarding false discovery rate (FDR) control, and the idea of e-values as an alternative to p-values.

Multiple Hypothesis Testing

In many scientific studies, researchers test numerous hypotheses simultaneously. For example, a genome-wide association study (GWAS) might test millions of single nucleotide polymorphisms (SNPs) for association with a disease. When performing many tests, the probability of obtaining false positives (incorrectly rejecting a true null hypothesis) increases dramatically.

Null Hypothesis (H0H_0) and Alternative Hypothesis (H1H_1)

  • Null Hypothesis (H0H_0): A statement that there is no effect or no relationship between variables. For example, Hj:YXjXjH_j: Y \perp \perp X_j | X_{-j} means that feature XjX_j is irrelevant to the response YY given all other features.
  • Alternative Hypothesis (H1H_1): A statement that contradicts the null hypothesis, suggesting an effect or relationship exists. If HjH_j is not true, then XjX_j is considered relevant.

Type I Error and False Discovery Rate (FDR)

  • Type I Error (False Positive): Rejecting a true null hypothesis. In multiple testing, controlling the family-wise error rate (FWER) (the probability of making at least one Type I error) can be too conservative, leading to low power (failing to detect true effects).
  • False Discovery Rate (FDR) [7]: A less stringent error rate that is more powerful for large-scale multiple testing. FDR is defined as the expected proportion of false discoveries among all discoveries. If VV is the number of false discoveries (true nulls rejected) and RR is the total number of rejections, then FDR=E[V/R]FDR = E[V/R] (where V/R is taken as 0 if R=0R=0). Controlling FDR at a level α\alpha means that, on average, no more than α\alpha proportion of rejections will be false positives. This paper focuses on FDR control.

e-values

An e-value is a non-negative random variable EE that serves as an alternative to p-values for hypothesis testing.

  • Definition: An e-value ee for a null hypothesis H0H_0 is a test statistic such that its expectation under the null hypothesis is at most 1, i.e., EH0[E]1\mathbb{E}_{H_0}[E] \leq 1.
  • Rejection Rule: For a given significance level α(0,1)\alpha \in (0, 1), H0H_0 is rejected if e1/αe \ge 1/\alpha.
  • Type I Error Control: This rejection rule controls the Type I error rate. By Markov's inequality, PH0(E1/α)αEH0[E]α\mathbb{P}_{H_0}(E \geq 1/\alpha) \leq \alpha \mathbb{E}_{H_0}[E] \leq \alpha.
  • Advantages over p-values: e-values are often simpler to combine and interpret than p-values, especially when merging evidence from different sources or procedures. They can directly quantify evidence for the alternative hypothesis.
  • e-BH Procedure [44]: An analog of the classic Benjamini-Hochberg (BH) procedure for e-values. Given a set of e-values {ej}\{e_j\} for hypotheses {Hj}\{H_j\}, the e-BH procedure sorts them in descending order and rejects hypotheses based on a dynamic threshold to control FDR.

Knockoff Filter [2, 10, 13]

The knockoff filter is a method for FDR-controlled variable selection, particularly useful in high-dimensional settings (where the number of features NN is greater than the number of samples nn).

  • Core Idea: It creates synthetic "knockoff" features (X~j) for each original feature (Xj). These knockoff features are designed to mimic the correlation structure of the original features but are conditionally independent of the response variable given the original features.
  • Symmetry Property: For each feature Xj, a test statistic Wj is computed based on both Xj and X~j. The key property is that for null features (those truly unrelated to the response), the distribution of Wj is symmetric around zero. For relevant features, Wj tends to be large and positive.
  • FDR Control: By comparing the Wj statistics of original features with their knockoff counterparts, the knockoff filter can estimate the number of false positives and control the FDR without assuming a specific model or distribution for the true effects.
  • Model-X Knockoffs [10]: An extension where knockoff features are generated conditional on XX, making the procedure valid for any conditional distribution of YXY|X.
  • Group Knockoffs [13]: Extends the knockoff idea to groups of features, allowing for FDR control at the group level.

Multi-resolution Structures

This refers to the scenario where features can be organized into different hierarchical or overlapping groups based on domain knowledge. For example:

  • Individual Features: The most granular level, e.g., individual SNPs.
  • Groups/Layers: Higher levels of organization, e.g., SNPs grouped into genes, or genes grouped into pathways. A feature XjX_j might belong to a group Ag(m)\mathcal{A}_g^{(m)} at layer mm. The paper aims to simultaneously control FDR at both the individual feature level and across all specified group levels.

Data Splitting

Data splitting is a statistical technique where the available dataset is randomly partitioned into two or more subsets.

  • Purpose in FDR control: In methods like DS (Data Splitting) [15], data splitting can be used to generate independent estimates of feature importance (e.g., regression coefficients beta) or test statistics. By splitting the data, two independent sets of coefficients (betahat(1),betahat(2))(beta_hat^(1), beta_hat^(2)) can be obtained. This independence is crucial for constructing test statistics with desired symmetry properties under the null hypothesis, which in turn facilitates FDR control.

3.2. Previous Works

The paper builds upon and differentiates itself from several established methods for FDR control, especially in multi-resolution settings.

  • P-filter [5, 30]:

    • Concept: A method for multilayer FDR control based on p-values. It extends BH procedures to grouped hypotheses, allowing for FDR control at both individual feature and group levels. It can also incorporate prior knowledge [30].
    • Limitation (addressed by this paper): Constructing valid p-values in high-dimensional settings can be challenging. Handling dependencies by reshaping p-values can lead to reduced power. The p-filter is designed for p-values, whereas this paper leverages e-values.
  • Multilayer Knockoff Filter (MKF) [23]:

    • Concept: Integrates the knockoff framework for FDR control at multiple resolutions. It uses knockoff statistics to identify important features and groups while controlling FDR across all layers.
    • Limitation (addressed by this paper): MKF can be conservative due to decoupling dependencies among cross-layer knockoff statistics. When features are highly correlated, knockoff-based procedures often suffer severe power loss. Also, model-X knockoff (a common knockoff variant) requires estimating complex joint distributions, which is often intractable. The paper's SFEFP aims to be more powerful and applicable in such correlated and high-dimensional settings.
  • e-BH Procedure [44]:

    • Concept: An analog of the Benjamini-Hochberg procedure adapted for e-values. It provides FDR control for multiple hypothesis testing using e-values.
    • Relation to this paper: This paper extends the e-BH principle to a generalized e-filter for multi-resolution settings, building on the foundation of e-values.
  • Derandomizing Knockoff Procedure [33] and e-MKF [18]:

    • Derandomizing Knockoffs [33]: Proposed to address the randomness inherent in knockoff procedures (e.g., from knockoff generation or data splitting). It involves running the knockoff procedure multiple times and averaging the results, often using one-bit knockoff e-values.
    • e-MKF [18]: Extends MKF by using e-values (specifically, one-bit knockoff e-values) in an e-filter framework to enhance power and guarantee multilayer FDR control.
    • Limitation (addressed by this paper): The paper reveals that e-MKF (and generally e-filters with one-bit e-values) can suffer from a "zero-power dilemma." If conflicting signals arise across layers, the method might select nothing. The stabilization treatment in SFEFP explicitly addresses this by generating non-binary generalized e-values that offer a more nuanced ranking of importance, preventing all-or-nothing outcomes. The paper states that derandomization for single resolution (as in [33]) often comes with a power cost, while SFEFP aims for enhanced power in multi-resolution settings.
  • DS (Data Splitting) and MDS (Multiple Data Splitting) [15]:

    • Concept: Feature detection methods designed for high-dimensional regression models, particularly powerful when features are highly correlated. They use data splitting to generate independent estimates and control FDR asymptotically.
    • Relation to this paper: The paper extends the DS method to group detection and integrates it into SFEFP as the eDS-filter, demonstrating the flexibility of SFEFP to incorporate powerful base methods. The eDS-filter is also presented as an alternative to MDS.
  • Gaussian Mirror (GM) method [46] and Symmetry-based Adaptive Selection (SAS) framework [45]:

    • Concept: GM uses "Gaussian mirror" statistics for FDR control. SAS uses two-dimensional statistics and their symmetry properties to define rejection regions. Both are powerful alternatives to knockoff methods.
    • Relation to this paper: The paper explicitly shows that these methods satisfy Definition 1 (framework for controlled detection procedures) and thus can be integrated into SFEFP by constructing corresponding generalized e-values.

3.3. Technological Evolution

The field of multiple hypothesis testing has evolved from strictly controlling Type I error (like FWER control methods) to controlling FDR (starting with the Benjamini-Hochberg procedure [7]), which offers a better balance between error control and statistical power in large-scale settings. This evolution continued with the development of more sophisticated FDR control methods that account for dependencies among hypotheses, prior knowledge, or specific data structures:

  1. Early FDR Control (1990s-early 2000s): Initial BH procedure, extensions for dependent p-values [8], and concepts like local FDR [17].
  2. Model-Based and Structure-Aware Methods (2010s):
    • Knockoff filters [2, 10] emerged to provide FDR control for variable selection in complex, high-dimensional models without strong distributional assumptions, especially for Model-X settings.
    • P-filters [5, 30] introduced FDR control for grouped and hierarchical hypotheses using p-values, and integrated prior knowledge.
    • Multilayer Knockoff Filter (MKF) [23] combined knockoffs with multi-resolution structures.
  3. Alternative Test Statistics & Flexibility (late 2010s-present):
    • The concept of e-values gained traction as a robust and easily aggregatable alternative to p-values, leading to e-BH [44] and e-filter methods [18].

    • Methods like DS [15], GM [46], and SAS [45] developed powerful FDR control procedures for specific challenges like high correlations or two-dimensional statistics.

    • Derandomization techniques [33, 31] were introduced to address the instability of stochastic FDR procedures.

      This paper's work fits into the most recent phase of this evolution, aiming to unify these diverse advancements. Instead of proposing yet another specific FDR control method, it offers a meta-framework that can flexibly combine existing state-of-the-art methods tailored to different resolutions or data characteristics.

3.4. Differentiation Analysis

The SFEFP method significantly differentiates itself from previous work through its emphasis on flexibility, stabilization, and enhanced power in multi-resolution FDR control:

  • Flexibility in Base Procedures:

    • Previous: MKF and e-MKF are tied to the knockoff framework. P-filter is tied to p-values. These methods generally require a uniform approach across resolutions.
    • SFEFP: The core innovation is the generalized e-value and generalized e-filter. This allows SFEFP to directly integrate any FDR-controlled detection procedure (e.g., knockoffs, DS, GM, SAS, or p-value based methods) as a base procedure for any given layer. This means researchers can select the optimal method for each specific resolution based on its characteristics (e.g., high correlation, sparsity, prior knowledge).
  • Addressing the "One-bit" Dilemma and Enhanced Power:

    • Previous: e-MKF (and generally e-filters with one-bit e-values) can suffer from a "zero-power dilemma." If conflicting signals lead to certain all-or-nothing e-values at different layers, the final filter might select no features. Also, derandomization for single resolutions (e.g., Ren et al. [33]) often trades power for stability.
    • SFEFP: The stabilization treatment is crucial. By averaging generalized e-values from multiple runs (replications), SFEFP generates non-binary generalized e-values. These averaged e-values provide a more nuanced ranking of feature/group importance, better reconciling discrepancies across layers. This leads to significantly enhanced detection power and stability, avoiding the "zero-power" issue and improving upon the power trade-offs of prior derandomization schemes.
  • Generality of E-value Construction:

    • Previous: Some e-value constructions (e.g., [1, 22, 25, 31]) are either less powerful, rely on specific assumptions (like mutual independence of null p-values), or are tied to the knockoff framework.
    • SFEFP: The proposed construction of generalized e-values (Equation 3) is more generic and powerful. It can be derived from any detection procedure that satisfies Definition 1 (i.e., controls FDR under finite or asymptotic settings). This includes methods like DS, GM, and SAS, which go beyond the scope of prior e-value constructions.
  • Multilayer FDR Control with Flexible Procedures:

    • Previous: While MKF and e-MKF offer multilayer FDR control, their conservatism and limitations with high correlations or intractable joint distributions make them less ideal in some scenarios.

    • SFEFP: Provides robust multilayer FDR control while allowing for the incorporation of highly powerful, specialized base procedures at each layer. This means SFEFP can achieve FDR control without sacrificing power in complex settings. For instance, eDS-filter demonstrates superior performance in highly correlated feature settings compared to MKF-based methods.

      In essence, SFEFP shifts the paradigm from finding a single "best" FDR control method for all resolutions to providing a flexible framework that effectively integrates the strengths of various specialized methods, thereby achieving superior and more stable detection performance across diverse multi-resolution data.

4. Methodology

4.1. Principles

The core idea behind the Stabilized Flexible E-Filter Procedure (SFEFP) is to provide a flexible and powerful framework for feature detection with simultaneous FDR control across multiple resolutions. The foundational principles are:

  1. Multi-Resolution Problem Formulation: Explicitly defining individual features and feature groups at multiple "layers" as hypotheses to be tested, with the goal of controlling FDR at each layer.
  2. Generalized E-Values for Any Detection Procedure: Instead of relying on p-values or specific knockoff statistics, the method introduces a unified way to construct "generalized e-values" from any base detection procedure that controls FDR. This allows for flexibility in choosing state-of-the-art methods tailored to specific data characteristics or resolutions.
  3. Generalized E-Filter for Multi-Layer Selection: A novel e-filter that operates on these generalized e-values to identify a coherent set of significant features and groups across all specified layers, while ensuring FDR control at each.
  4. Stabilization to Enhance Power and Resolve "One-bit" Dilemma: A crucial step involves running the base detection procedures multiple times (replications) and averaging the resulting generalized e-values. This "stabilization" transforms potentially "one-bit" (binary) e-values into continuous, nuanced scores, providing better ranking information. This helps overcome the "zero-power dilemma" (where conflicting binary signals across layers lead to no discoveries) and generally enhances detection power and stability.
  5. Theoretical Guarantees: Rigorous statistical theory underpins SFEFP, demonstrating its ability to control FDR at multiple resolutions and guaranteeing stability of its selection set.

4.2. Core Methodology In-depth (Layer by Layer)

The methodology is developed in stages, starting with problem setup, introducing the Flexible E-Filter Procedure (FEFP), and then extending it to the Stabilized Flexible E-Filter Procedure (SFEFP).

4.2.1. Problem Setup

The paper considers a response variable YY and a set of NN features (X1,,XN)(X_1, \ldots, X_N). For nn i.i.d. samples, we have data (y,X)(\boldsymbol{y}, \boldsymbol{X}), where yRn\boldsymbol{y} \in \mathbb{R}^n is the response vector and XRn×N\boldsymbol{X} \in \mathbb{R}^{n \times N} is the design matrix.

Individual Feature Hypotheses: The feature detection problem is formalized as NN multiple hypothesis tests: Hj:YXjXj,j[N],H_j: Y \perp \perp X_j | X_{-j}, j \in [N], where Xj={X1,,XN}{Xj}X_{-j} = \{X_1, \ldots, X_N\} \setminus \{X_j\} and [N]={1,,N}[N] = \{1, \ldots, N\}. This means the null hypothesis HjH_j states that XjX_j provides no additional information about YY given all other features. If HjH_j is false, XjX_j is a relevant feature.

  • H1={j:Hj is not true}\mathcal{H}_1 = \{j : H_j \text{ is not true}\} is the set of relevant features.

  • H0=[N]H1\mathcal{H}_0 = [N] \setminus \mathcal{H}_1 is the set of irrelevant features.

    Group Hypotheses at Multiple Resolutions: The features are interpreted at MM different resolutions. For each layer m[M]m \in [M], the features are partitioned into G(m)G^{(m)} groups, denoted by {Ag(m)}g[G(m)]\{\mathcal{A}_g^{(m)}\}_{g \in [G^{(m)}]}.

  • h(m, j) represents the group that feature XjX_j belongs to at the mm-th layer.

  • Group detection at layer mm is defined as: Hg(m):YXAg(m)XAg(m),g[G(m)],H_g^{(m)}: Y \perp \perp X_{\mathcal{A}_g^{(m)}} | X_{-\mathcal{A}_g^{(m)}}, g \in [G^{(m)}], where XAg(m)={Xj:j[N]Ag(m)}X_{-\mathcal{A}_g^{(m)}} = \{X_j : j \in [N] \setminus \mathcal{A}_g^{(m)}\}. This null hypothesis states that the entire group of features Ag(m)\mathcal{A}_g^{(m)} is irrelevant given all other features outside this group.

  • The relationship between individual and group hypotheses is assumed to be: Hg(m)=jAg(m)Hj,g[G(m)].H_g^{(m)} = \bigwedge_{j \in \mathcal{A}_g^{(m)}} H_j, \quad g \in [G^{(m)}]. This means a group null hypothesis is true if and only if all individual feature null hypotheses within that group are true.

  • The set of null groups at layer mm is: H0(m)={g[G(m)]:Ag(m)H0}.\mathcal{H}_0^{(m)} = \{g \in [G^{(m)}] : \mathcal{A}_g^{(m)} \subset \mathcal{H}_0\}. A group is a null group if all features within it are irrelevant.

False Discovery Rate (FDR) at Multiple Resolutions: Given a selected feature set S[N]S \subset [N], the set of selected groups in layer mm is S(m)={g[G(m)]:Ag(m)S}\mathcal{S}^{(m)} = \{g \in [G^{(m)}] : \mathcal{A}_g^{(m)} \cap \mathcal{S} \neq \emptyset\}. The FDR at the mm-th layer is defined as: $ \mathrm{FDR}^{(m)} = \mathbb{E}[\mathrm{FDP}^{(m)}], \quad \mathrm{FDP}^{(m)} = \frac{\Big| \mathcal{S}^{(m)} \cap \mathcal{H}_0^{(m)} \Big|}{\big| \mathcal{S}^{(m)} \big| \vee 1}, $ where |\cdot| measures the size of a set. The goal is to find the largest set SS such that FDR(m)\mathrm{FDR}^{(m)} remains below a predefined level α(m)\alpha^{(m)} for all mm simultaneously.

4.2.2. Recap: Multiple Testing with e-values

An e-variable EE is a non-negative random variable with Enull[E]1\mathbb{E}_{\mathrm{null}}[E] \leq 1. An e-value ee is a realization of an e-variable. For a significance level α(0,1)\alpha \in (0, 1), a null hypothesis is rejected if e1/αe \ge 1/\alpha.

  • e-BH procedure [44]: For multiple tests with e-values, the e-BH procedure ranks e-values e(1)e(N)e_{(1)} \geq \dots \geq e_{(N)} and rejects hypotheses with corresponding e-values exceeding N/(αk^)N / (\alpha \widehat{k}), where k^=max{k[N]:e(k)N/(αk)}\widehat{k} = \operatorname*{max}\{k \in [N] : e_{(k)} \geq N / (\alpha k)\}.
  • Relaxed e-values: e-values are called relaxed e-values if jH0E[ej]N\sum_{j \in \mathcal{H}_0} \mathbb{E}[e_j] \le N, which is sufficient for FDR control.

4.2.3. FEFP: Flexible E-Filter Procedure

Framework for Controlled Detection Procedures (Definition 1): A feature (or group) detection procedure G:Rn×(N+1)2[N]\mathcal{G}: \mathbb{R}^{n \times (N+1)} \to 2^{[N]} determines a rejection threshold tαt_{\alpha} by: $ t_{\alpha} = \operatorname*{arg,max}_t R(t), \quad \mathrm{subject~to~} \widehat{\mathrm{FDP}}(t) = \frac{\widehat{V}(t) \vee \alpha}{R(t) \vee 1} \leq \alpha, $ where:

  • R(t) is the number of rejections (selected features/groups) under threshold tt.

  • V^(t)\widehat{V}(t) is the estimated number of false rejections (false positives) under threshold tt.

  • FDP^(t)\widehat{\mathrm{FDP}}(t) is the estimated false discovery proportion.

  • \vee denotes the maximum operation (e.g., A \vee B = \max(A, B)). The α\vee \alpha in the numerator is a technical detail to avoid division by zero or very small numbers, ensuring conservative FDR control.

    A procedure G\mathcal{G} belongs to:

  • Kfinite\mathcal{K}_{\mathrm{finite}} if it controls FDR under finite samples, ensuring E[V(tα)/max{V^(tα),α}]1\mathbb{E}[V(t_{\alpha}) / \operatorname*{max}\{\widehat{V}(t_{\alpha}), \alpha\}] \leq 1.

  • Kasy\mathcal{K}_{\mathrm{asy}} if it controls FDR asymptotically as (n,N)(n, N) \to \infty at a proper rate, ensuring E[V(tα)/max{V^(tα),α}]1\mathbb{E}[V(t_{\alpha}) / \operatorname*{max}\{\widehat{V}(t_{\alpha}), \alpha\}] \leq 1.

    The flexibility of FEFP comes from selecting G(m)\mathcal{G}^{(m)} for each layer mm from either Kfinite\mathcal{K}_{\mathrm{finite}} or Kasy\mathcal{K}_{\mathrm{asy}}.

Proposition 1: States that the rejection set of G\mathcal{G} remains unchanged if the FDP constraint is slightly modified to FDP^(t)=V^(t)R(t)1α\widehat{\mathrm{FDP}}(t) = \frac{\widehat{V}(t)}{R(t) \vee 1} \leq \alpha. This means the term α\vee \alpha in the numerator of Definition 1 is primarily for theoretical FDR control and doesn't change the set of rejected hypotheses in practice.

Construction of Generalized e-values (Definition 2):

  • Relaxed e-values: For NN hypotheses H1,,HNH_1, \ldots, H_N with non-negative test statistics e1,,eNe_1, \ldots, e_N, they are relaxed e-values if jH0E[ej]N\sum_{j \in \mathcal{H}_0} \mathbb{E}[e_j] \le N.
  • Asymptotic e-value: For jH0j \in \mathcal{H}_0, if lim sup(n,N)E[ej]1\operatorname*{lim\,sup}_{(n, N) \to \infty} \mathbb{E}[e_j] \leq 1.
  • Asymptotic relaxed e-values: If lim sup(n,N)jH0E[ej]/N1\operatorname*{lim\,sup}_{(n, N) \to \infty} \sum_{j \in \mathcal{H}_0} \mathbb{E}[e_j] / N \le 1. Asymptotic e-values, relaxed e-values, and asymptotic relaxed e-values are collectively called generalized e-values.

Based on a detection procedure G(m)KfiniteKasy\mathcal{G}^{(m)} \in \mathcal{K}_{\mathrm{finite}} \cup \mathcal{K}_{\mathrm{asy}} for layer mm, the generalized e-values for hypotheses Hg(m)H_g^{(m)} are constructed as follows:

  1. For each layer m[M]m \in [M], execute the detection procedure G(m)\mathcal{G}^{(m)} with an original FDR level α0(m)\alpha_0^{(m)}. This yields a selected set G(m)(α0(m))\mathcal{G}^{(m)}(\alpha_0^{(m)}) and an estimated number of false discoveries V^G(m)(α0(m))\widehat{V}_{\mathcal{G}^{(m)}(\alpha_0^{(m)})}.
  2. Transform these results into generalized e-values {eg(m)}g[G(m)]\{e_g^{(m)}\}_{g \in [G^{(m)}]}: $ e_g^{(m)} = G^{(m)} \cdot \frac{\mathbb{I}\left{g \in \mathcal{G}^{(m)}(\alpha_0^{(m)})\right}}{\widehat{V}_{\mathcal{G}^{(m)}\left(\alpha_0^{(m)}\right)} \vee \alpha_0^{(m)}}. $
    • eg(m)e_g^{(m)}: The generalized e-value for group gg at layer mm.
    • G(m)G^{(m)}: The total number of groups at layer mm.
    • I{}\mathbb{I}\{\cdot\}: The indicator function, which is 1 if the condition is true (group gg is selected by G(m)\mathcal{G}^{(m)} at level α0(m)\alpha_0^{(m)}), and 0 otherwise.
    • V^G(m)(α0(m))\widehat{V}_{\mathcal{G}^{(m)}(\alpha_0^{(m)})}: The estimated number of false discoveries by the base procedure G(m)\mathcal{G}^{(m)} at original FDR level α0(m)\alpha_0^{(m)}.
    • α0(m)\alpha_0^{(m)}: The original FDR level used by the base detection procedure G(m)\mathcal{G}^{(m)}.
    • α0(m)\vee \alpha_0^{(m)}: Ensures the denominator is never less than α0(m)\alpha_0^{(m)}, preventing e-values from becoming excessively large. This formula essentially assigns a positive e-value to selected groups and 0 to non-selected groups. The magnitude depends inversely on the estimated number of false discoveries.

Theorem 1: This theorem states that the selection set G(m)(α0(m))\mathcal{G}^{(m)}(\alpha_0^{(m)}) obtained from a detection procedure G(m)\mathcal{G}^{(m)} is precisely the set that would be selected by the generalized e-BH procedure if its input were the generalized e-values constructed by Equation (3) at the same level α0(m)\alpha_0^{(m)}. This establishes the equivalence and validity of the generalized e-value construction.

Remark 1 (Comparison with existing constructions): The paper highlights that its generalized e-values (Equation 3) are generally more powerful than compound e-values [1, 22] (which use G(m)(α0(m))1|\mathcal{G}^{(m)}(\alpha_0^{(m)})| \vee 1 in the denominator instead of V^G(m)(α0(m))α0(m)\widehat{V}_{\mathcal{G}^{(m)}(\alpha_0^{(m)})} \vee \alpha_0^{(m)}). Also, this construction is more generic than knockoff e-values [31] and e-values relying on independence assumptions [25], as it can be derived from any FDR-controlled procedure satisfying Definition 1.

Leveraging the Generalized e-filter: The generalized e-filter uses the constructed generalized e-values {eg(m)}\{e_g^{(m)}\} to identify the final selected features.

  • Candidate Selection Set: To ensure consistency across layers (a feature is selected only if all groups containing it are rejected), the candidate selection set S(t(1),,t(M))S(t^{(1)}, \ldots, t^{(M)}) is defined for given thresholds (t(1),,t(M))(t^{(1)}, \ldots, t^{(M)}) as: $ S(t^{(1)}, \ldots, t^{(M)}) = {j : \mathrm{for~all~} m \in [M], e_{h(m,j)}^{(m)} \geq t^{(m)}}. $
    • This means a feature jj is considered for selection only if its e-value for group h(m,j) at layer mm is above the threshold t(m)t^{(m)} for all layers mm.
  • Estimated False Discovery Proportion (FDP^(m)\widehat{\mathrm{FDP}}^{(m)}): For the purpose of setting thresholds, the estimated FDP at layer mm is approximated as: $ \widehat{\mathrm{FDP}}^{(m)}(t^{(1)}, \ldots, t^{(M)}) = \frac{G^{(m)} / t^{(m)}}{\left|S^{(m)}(t^{(1)}, \ldots, t^{(M)})\right|}. $
    • The numerator G(m)/t(m)G^{(m)}/t^{(m)} approximates V(t(m))=gH0(m)P(eg(m)t(m))H0(m)/t(m)G(m)/t(m)V(t^{(m)}) = \sum_{g \in \mathcal{H}_0^{(m)}} \mathbb{P}(e_g^{(m)} \geq t^{(m)}) \leq |\mathcal{H}_0^{(m)}|/t^{(m)} \leq G^{(m)}/t^{(m)}. This is based on Markov's inequality and the e-value property.
    • The denominator S(m)(t(1),,t(M))|S^{(m)}(t^{(1)}, \ldots, t^{(M)})| is the number of selected groups at layer mm using the consistent feature selection set S(t(1),,t(M))S(t^{(1)}, \ldots, t^{(M)}).
  • Admissible Thresholds: The set of admissible thresholds T(α(1),,α(M))\mathcal{T}(\alpha^{(1)}, \ldots, \alpha^{(M)}) contains all threshold vectors (t(1),,t(M))(t^{(1)}, \ldots, t^{(M)}) such that FDP^(m)(t(1),,t(M))α(m)\widehat{\mathrm{FDP}}^{(m)}(t^{(1)}, \ldots, t^{(M)}) \leq \alpha^{(m)} for all layers mm.
  • Final Thresholds: For each layer m[M]m \in [M], the final threshold t^(m)\widehat{t}^{(m)} is chosen to maximize detections while staying within the admissible set: $ \widehat{t}^{(m)} = \operatorname*{min}{t^{(m)} : (t^{(1)}, \ldots, t^{(M)}) \in \mathcal{T}(\alpha^{(1)}, \ldots, \alpha^{(M)})}. $ This is an iterative process, usually starting with high thresholds and gradually lowering them to maximize discoveries while maintaining FDR control.

Algorithm 1: FEFP: a flexible e-filter procedure for feature detection The algorithm formalizes the steps:

  1. Input: Data (X,y)(\boldsymbol{X}, \boldsymbol{y}), target FDR levels (α(1),,α(M))(\alpha^{(1)}, \ldots, \alpha^{(M)}), original FDR levels (α0(1),,α0(M))(\alpha_0^{(1)}, \ldots, \alpha_0^{(M)}), and partitions {Ag(m)}g[G(m)]\{\mathcal{A}_g^{(m)}\}_{g \in [G^{(m)}]}.
  2. For each layer m=1,...,Mm=1,...,M:
    • Compute the generalized e-values {eg(m)}g[G(m)]\{e_g^{(m)}\}_{g \in [G^{(m)}]} using Equation (3). This involves running the base detection procedure G(m)\mathcal{G}^{(m)} at level α0(m)\alpha_0^{(m)} to get selected groups and estimated false discoveries.
  3. Iterative Threshold Determination (Generalized e-filter):
    • Initialize thresholds t(m)=1t^{(m)} = 1 for all mm.
    • Repeat:
      • For each layer m=1,,Mm=1, \ldots, M:
        • Update t(m)t^{(m)} by finding the minimum tt such that the condition G(m)/t1S(m)(t(1),,t(m1),t,t(m+1),,t(M))α(m)\frac{G^{(m)}/t}{1 \vee |\mathcal{S}^{(m)}(t^{(1)}, \ldots, t^{(m-1)}, t, t^{(m+1)}, \ldots, t^{(M)})|} \leq \alpha^{(m)} is met. This is an implicit update as S(m)\mathcal{S}^{(m)} depends on all thresholds.
      • Until all t(m)t^{(m)} values are unchanged.
  4. Final Selection:
    • Compute the final selection set SS using the converged thresholds (t^(1),,t^(M))(\widehat{t}^{(1)}, \ldots, \widehat{t}^{(M)}) and Equation (4).

    • If SS is non-empty, output SS. Otherwise, output an empty set.

      Proposition 2: Confirms that the output threshold vector (t^(1),,t^(M))(\widehat{t}^{(1)}, \ldots, \widehat{t}^{(M)}) from Algorithm 2 (the generalized e-filter) matches the one defined by Equation (5). This ensures the algorithmic implementation correctly finds the desired FDR-controlled thresholds.

Multilayer FDR Control and One-bit Property of FEFP:

  • Lemma 1: Provides theoretical bounds for FDR(m)FDR^(m). If {eg(m)}\{e_g^{(m)}\} are e-values, FDR(m)<=pi0(m)alpha(m)FDR^(m) <= pi_0^(m) * alpha^(m). If they are relaxed e-values, FDR(m)<=alpha(m)FDR^(m) <= alpha^(m). Similar asymptotic bounds are given for asymptotic e-values and asymptotic relaxed e-values. This establishes the fundamental FDR control property of the generalized e-filter.
  • Theorem 2: Guarantees that FEFP controls FDR(m)<=alpha(m)FDR^(m) <= alpha^(m) simultaneously for all layers mm. This holds in finite sample settings if the base procedure G(m)Kfinite\mathcal{G}^{(m)} \in \mathcal{K}_{\mathrm{finite}}, and asymptotically if G(m)Kasy\mathcal{G}^{(m)} \in \mathcal{K}_{\mathrm{asy}} and group sizes are uniformly bounded. This is a crucial theoretical result validating FEFP's FDR control.
  • Theorem 3 (One-bit Property): This theorem reveals a critical limitation of FEFP. The generalized e-values from Equation (3) are "one-bit" or binary: a group either gets a positive e-value (if selected by G(m)\mathcal{G}^{(m)}) or zero (if not selected). $ S_{\mathrm{init}}^{(m)} = \left{g \in G^{(m)} : A_g^{(m)} \bigcap \left[\bigcap_{l=1}^{M} \left{j : e_{h(l,j)}^{(l)} > 0\right}\right] \neq \emptyset\right}. $ FEFP selects a specific set of features if and only if a certain condition involving the estimated number of false discoveries (V^G(m)(α0(m))\widehat{V}_{\mathcal{G}^{(m)}(\alpha_0^{(m)})}) and the size of Sinit(m)S_init^(m) is met for all layers. If this condition is not met (i.e., if there's sufficient conflict or incompatibility between the initial layer-specific discoveries), then FEFP selects no features, leading to zero power. This "one-bit" nature means all selected groups are treated equally important at each layer, making it difficult to reconcile conflicts between layers and potentially leading to this zero-power dilemma.

4.2.4. SFEFP: Stabilized Flexible E-Filter Procedure

To address the zero-power dilemma and instability of FEFP, SFEFP introduces a stabilization treatment. It aims to generate non-one-bit generalized e-values that provide better ranking information.

Algorithm 3: SFEFP: a stabilized flexible e-filter procedure for feature detection

  1. Input: Same as FEFP, plus number of replications RR.
  2. For r=1,...,Rr = 1, ..., R (replications):
    • Compute the generalized e-value egr(m)e_{gr}^{(m)} for each group gg at layer mm for this specific run rr. This is done using Equation (6), which is essentially Equation (3) but subscripted with rr to denote the replication: $ e_{gr}^{(m)} = G^{(m)} \cdot \frac{\mathbb{I}\left{g \in \mathcal{G}r^{(m)}(\alpha_0^{(m)})\right}}{\widehat{V}{\mathcal{G}_r^{(m)}\left(\alpha_0^{(m)}\right)} \vee \alpha_0^{(m)}}. $
      • Here, Gr(m)\mathcal{G}_r^{(m)} denotes the base detection procedure (which may have inherent randomness or change if different deterministic procedures are fused) for replication rr.
  3. Compute Averaged Generalized e-values: Aggregate the generalized e-values from all RR replications by calculating a weighted average: $ \overline{e}g^{(m)} = \sum{r=1}^{R} \omega_r^{(m)} e_{gr}^{(m)}, \quad \sum_{r=1}^{R} \omega_r^{(m)} = 1. $
    • eg(m)\overline{e}_g^{(m)}: The averaged generalized e-value for group gg at layer mm.
    • ωr(m)\omega_r^{(m)}: Weight for replication rr at layer mm. The paper suggests ωr(m)=1/R\omega_r^{(m)} = 1/R for simplicity. This averaging step is crucial: it transforms the "one-bit" (binary) egr(m)e_{gr}^{(m)} into continuous eg(m)\overline{e}_g^{(m)} values, which reflect how consistently a group was selected across replications, providing a more refined measure of importance.
  4. Apply Generalized e-filter: Use the averaged generalized e-values {eg(m)}\{\overline{e}_g^{(m)}\} as input to the generalized e-filter (Algorithm 2), along with the target FDR levels (α(1),,α(M))(\alpha^{(1)}, \ldots, \alpha^{(M)}), to obtain the final set of selected features, SderandS_{\mathrm{derand}}.

Two settings for SFEFP:

  • Setting 1 (Inherent randomness): If G(m)\mathcal{G}^{(m)} has inherent randomness (e.g., model-X knockoff), different runs will produce different e-values. Averaging these stabilizes the result.
  • Setting 2 (No randomness): If G(m)\mathcal{G}^{(m)} is deterministic, multiple GG procedures (e.g., using different random seeds for data generation, or different types of base procedures) are run, and their one-bit generalized e-values are averaged. This is termed "fusion decision."

Multilayer FDR Control and Stability Guarantee:

  • Theorem 4: SFEFP simultaneously controls FDR(m)<=alpha(m)FDR^(m) <= alpha^(m) for all layers mm. This holds in finite samples if G(m)Kfinite\mathcal{G}^{(m)} \in \mathcal{K}_{\mathrm{finite}} and asymptotically if G(m)Kasy\mathcal{G}^{(m)} \in \mathcal{K}_{\mathrm{asy}} (and group sizes are uniformly bounded). This extends the FDR control guarantee of FEFP to the stabilized version.
  • Theorem 5 (Stability): This theorem guarantees that as the number of replications RR \to \infty, the selected set SderandS_{\mathrm{derand}} obtained by SFEFP almost surely converges to a fixed set SS_{\infty} (which is the selection set obtained using the true expected e-values eg(m)=E[eg1(m)X,y]\overline{\overline{e}}_g^{(m)} = \mathbb{E}[e_{g1}^{(m)} | \boldsymbol{X}, \boldsymbol{y}]). The probability of Sderand=SS_{\mathrm{derand}} = S_{\infty} is bounded below by an exponential term involving RR and a "gap" Δ(m)\Delta^{(m)}, defined as ming[G(m)]eg(m)t^(m)\operatorname*{min}_{g \in [G^{(m)}]} |\overline{\overline{e}}_g^{(m)} - \widehat{t}_{\infty}^{(m)}|. This ensures that with enough replications, the output of SFEFP becomes stable and consistent.

Choices of Parameters:

  • Original FDR level {α0(m)}\{\alpha_0^{(m)}\}: This parameter influences the magnitude and number of non-zero generalized e-values.
    • For M=1M=1 (single resolution): α0(1)α(1)/(1+α(1))\alpha_0^{(1)} \leq \alpha^{(1)} / (1 + \alpha^{(1)}) is suggested. For R=1R=1, α0(1)=α(1)\alpha_0^{(1)} = \alpha^{(1)} is optimal.
    • For M2M \geq 2 (multiple resolutions): α0(m)=α(m)\alpha_0^{(m)} = \alpha^{(m)} is generally not optimal when R=1R=1 because of the "one-bit" nature. Instead, SFEFP benefits from choosing α0(m)\alpha_0^{(m)} to maximize the number and magnitude of non-zero e-values at each layer, facilitating coordination. A common choice in practice is α0(m)=α(m)/2\alpha_0^{(m)} = \alpha^{(m)} / 2.
  • A smaller α0(m)\alpha_0^{(m)} can lead to fewer non-zero generalized e-values but with larger magnitudes, while a larger α0(m)\alpha_0^{(m)} might result in more non-zero e-values but with smaller magnitudes due to inflated false discovery estimates. The optimal choice is a balance.

4.2.5. Examples for SFEFP

The paper provides concrete instantiations of SFEFP by specifying the base detection procedures.

eDS-filter: multilayer FDR control by data splitting (Section 4.1) The eDS-filter uses the Data Splitting (DS) procedure [15] as its base detection method G(m)\mathcal{G}^{(m)}, extended for group detection. DS is particularly powerful for high-dimensional regression with highly correlated features.

  • Review of DS method [15]:
    • Uses Lasso+OLS to estimate coefficients β^=(β^1,,β^N)\widehat{\boldsymbol{\beta}} = (\widehat{\beta}_1, \ldots, \widehat{\beta}_N)^\top.
    • Splits data into two subsets to get independent coefficient estimates β^(1)\widehat{\boldsymbol{\beta}}^{(1)} and β^(2)\widehat{\boldsymbol{\beta}}^{(2)}.
    • Assumption 1 (Symmetry): For jH0j \in \mathcal{H}_0, the sampling distribution of β^j(1)\widehat{\beta}_j^{(1)} or β^j(2)\widehat{\beta}_j^{(2)} is symmetric about zero.
    • Test statistic WjW_j is constructed as: $ W_j = \mathrm{sign}(\widehat{\beta}_j^{(1)} \widehat{\beta}_j^{(2)}) f(|\widehat{\beta}_j^{(1)}|, |\widehat{\beta}_j^{(2)}|), $ where f(u, v) is non-negative, exchangeable (f(u,v)=f(v,u)f(u,v)=f(v,u)), and monotonically increasing (e.g., f(u,v)=u+vf(u,v)=u+v). A larger positive WjW_j indicates a more relevant feature.
    • FDR control is achieved by comparing positive WjW_j to negative WjW_j using t_alpha: $ t_{\alpha} = \operatorname*{min}\left{t > 0 : \widehat{\mathrm{FDP}}(t) = \frac{#{j : W_j < -t}}{#{j : W_j > t} \vee 1} \leq \alpha\right}. $
  • Data splitting for group detection:
    • For a group gg at layer mm, a group-level test statistic Tg(m)T_g^{(m)} is constructed by averaging the individual feature WjW_j statistics within that group: $ T_g^{(m)} = \frac{1}{\vert \mathcal{A}g^{(m)} \vert} \sum{j \in \mathcal{A}_g^{(m)}} W_j, \quad g \in [G^{(m)}]. $
    • Lemma 2: Under Assumption 1, for null groups, #{gH0grp:Tg(m)t}=d#{gH0grp:Tg(m)t}\#\{g \in \mathcal{H}_0^{\mathrm{grp}} : T_g^{(m)} \geq t\} \stackrel{d}{=} \#\{g \in \mathcal{H}_0^{\mathrm{grp}} : T_g^{(m)} \leq -t\}. This symmetry allows for FDR estimation.
    • The estimated FDP for groups is: $ \widehat{\mathrm{FDP}}^{(m)}(t) = \frac{#\left{g : T_g^{(m)} < -t\right}}{#\left{g : T_g^{(m)} > t\right} \vee 1}. $
    • Assumption 2 (Weak dependence): A technical condition on the covariance of indicator functions for null group test statistics, ensuring asymptotic FDR control.
    • Theorem 6: Under Assumption 1 and Assumption 2, the group DS procedure controls FDR asymptotically.
  • eDS-filter construction:
    • At each layer mm, for RR replications, the DS procedure is run.
    • For each run rr, the group-level test statistics Tgr(m)T_{gr}^{(m)} are computed.
    • The threshold tα0(m)rt_{\alpha_0^{(m)}}^r is computed for each run rr and layer mm: $ t_{\alpha_0^{(m)}}^r = \operatorname*{inf}\left{t > 0 : \frac{#\left{g : T_{gr}^{(m)} < -t\right}}{#\left{g : T_{gr}^{(m)} > t\right} \vee 1} \leq \alpha_0^{(m)}\right}. $
    • The DS generalized e-values egr(m)e_{gr}^{(m)} are then constructed: $ e_{gr}^{(m)} = G^{(m)} \cdot \frac{\mathbb{I}\left{T_{gr}^{(m)} \geq t_{\alpha_0^{(m)}}^r\right}}{#\left{g : T_{gr}^{(m)} \leq -t_{\alpha_0^{(m)}}^r\right} \vee \alpha_0^{(m)}}, \quad g \in [G^{(m)}]. $
    • These egr(m)e_{gr}^{(m)} are then averaged (Equation 7) and fed into the generalized e-filter.
  • Theorem 7: Guarantees that the eDS-filter controls FDR(m)<=alpha(m)FDR^(m) <= alpha^(m) asymptotically under certain conditions, confirming that its generalized e-values are asymptotic relaxed e-values.

eDS+gKF-filter (Section 4.2) This is a hybrid SFEFP variant designed for settings where features within a group are highly correlated (benefiting DS), but signals within groups might be sparse (making group DS less powerful) and group knockoffs might be preferable.

  • Layer 1 (individual features): Uses DS (specifically Lasso+OLS based DS) to generate generalized e-values for individual features.
  • Subsequent Layers (groups m=2,...,Mm=2, ..., M): Uses group knockoff filter [13] to generate generalized e-values for groups.
    • Group knockoff statistics Tgr(m)T_{gr}^{(m)} are constructed (e.g., using fixed design group knockoffs if nNn \geq N, or other model-X variants).
    • Group knockoffs satisfy a martingale property: $ \mathbb{E}\left[\frac{#\left{g \in \mathcal{H}0^{(m)} : T{gr}^{(m)} \geq t_{\alpha_0^{(m)}}^r\right}}{1 + #\left{g \in [G^{(m)}] : T_{gr}^{(m)} \leq -t_{\alpha_0^{(m)}}^r\right}}\right] \leq 1, $ where tα0(m)rt_{\alpha_0^{(m)}}^r is determined by: $ t_{\alpha_0^{(m)}}^r = \operatorname*{inf}\left{t > 0 : \frac{1 + #\left{g : T_{gr}^{(m)} < -t\right}}{#\left{g : T_{gr}^{(m)} > t\right} \vee 1} \leq \alpha_0^{(m)}\right}. $
    • Based on this, group knockoff generalized e-values egr(m)e_{gr}^{(m)} are constructed: $ e_{gr}^{(m)} = G^{(m)} \cdot \frac{\mathbb{I}\left{T_{gr}^{(m)} \geq t_{\alpha_0^{(m)}}^r\right}}{1 + #\left{g : T_{gr}^{(m)} \leq -t_{\alpha_0^{(m)}}^r\right}}. $
  • Finally, these generalized e-values (from DS for layer 1 and group knockoffs for other layers) are averaged and passed to the generalized e-filter.

Other applications and possible extensions of SFEFP (Section 4.3)

  • e-MKF (stable and powerful version): The paper suggests that applying SFEFP to knockoff procedures (where e-MKF is an instantiation of FEFP with knockoff e-values) creates a stable and more powerful e-MKF, addressing the zero-power dilemma of the original e-MKF.
  • GM e-values and SAS e-values: The supplementary material demonstrates how GM [46] and SAS [45] methods can also be used as base procedures to generate asymptotic relaxed e-values for SFEFP.
  • Extensions to time series data: Mentions TSKI (Time Series Knockoffs Inference) [11] and suggests SFEFP could be modified for time series by adapting to subsampling settings and constructing robust e-values for group knockoff filters.

4.2.6. Unified E-Filter (Appendix E)

The paper also presents a "unified e-filter" (Algorithm 4) as an extension to incorporate prior knowledge (penalties and priors) and handle overlapping groups and null-proportion adaptivity, similar to the p-filter [30]. This provides greater flexibility for domain experts.

  • Overlapping groups: Allows a feature XjX_j to belong to multiple groups at a given layer mm. g(m)(i)={g[G(m)]:iAg(m)}g^{(m)}(i) = \{g \in [G^{(m)}] : i \in A_g^{(m)}\} is the index set of groups containing HiH_i.
  • Leftover features: L(m)=[N]gAg(m)L^{(m)} = [N] \setminus \bigcup_g A_g^{(m)} are features not belonging to any group in partition mm.
  • Penalties and priors: {ug(m)}\{u_g^{(m)}\} are penalties (e.g., cost of false discovery for Hg(m)H_g^{(m)}), and {vg(m)}\{v_g^{(m)}\} are priors (e.g., prior probability of Hg(m)H_g^{(m)} being true). These are normalized such that g[G(m)]ug(m)vg(m)=G(m)\sum_{g \in [G^{(m)}]} u_g^{(m)} v_g^{(m)} = G^{(m)}.
  • Null-proportion adaptivity: A weighted null proportion estimator π^(m)\widehat{\pi}^{(m)} (Equation 15) can be used to enhance power when e-values are independent at a layer. $ \widehat{\pi}^{(m)} := \frac{|\boldsymbol{u}^{(m)} \cdot \boldsymbol{v}^{(m)}|_\infty + \sum_g u_g^{(m)} v_g^{(m)} \mathbf{1}\Big{e_g^{(m)} < 1/\lambda^{(m)}\Big}}{G^{(m)} \big(1 - \lambda^{(m)}\big)}. $
    • u(m)v(m)|\boldsymbol{u}^{(m)} \cdot \boldsymbol{v}^{(m)}|_\infty: The maximum element-wise product of penalty and prior vectors.
    • λ(m)(0,1)\lambda^{(m)} \in (0,1): A user-defined constant for adaptivity.
    • 1{}\mathbf{1}\{\cdot\}: Indicator function. This estimator refines the FDP calculation by adapting to the estimated proportion of true null hypotheses based on the observed e-values.
  • Penalty-weighted FDR control: The goal is to control FDRu(m)α(m)\mathrm{FDR}_u^{(m)} \leq \alpha^{(m)}, where FDRu(m)=E[FDPu(m)]\mathrm{FDR}_u^{(m)} = \mathbb{E}[\mathrm{FDP}_u^{(m)}] and $ \mathrm{FDP}u^{(m)} = \frac{\sum{g \in \mathcal{H}0^{(m)}} u_g^{(m)} \mathbf{1}\big{g \in \mathcal{S}^{(m)}\big}}{\sum{g \in [G^{(m)}]} u_g^{(m)} \mathbf{1}\big{g \in \mathcal{S}^{(m)}\big}}. $ This FDP definition incorporates the penalties ug(m)u_g^{(m)}.
  • Candidate selection set for individuals (Equation 16): $ \begin{array}{rcl} \mathcal{S}(\vec{k}) & = & \Big{i : \forall m, \mathrm{either~} i \in L^{(m)}, \mathrm{or~} \exists g \in g^{(m)}(i), \ & & \qquad e_g^{(m)} \geq \operatorname*{max}\Big{\frac{1{\widehat{k}^{(m)} \neq 0} \widehat{\pi}^{(m)} G^{(m)}}{v_g^{(m)} \alpha^{(m)} \widehat{k}^{(m)}}, \ 1{\widehat{k}^{(m)} = 0} \cdot \infty, \frac{1}{\lambda^{(m)}}\Big}\Big} \end{array} $
    • k=(k(1),,k(M))\vec{k} = (k^{(1)}, \dots, k^{(M)}) are rejection count thresholds.
    • This ensures internal consistency: an individual feature ii is selected only if, for every layer mm, either it's a leftover feature, or at least one of its containing groups is selected by the e-filter with its respective e-value exceeding a dynamically calculated threshold.
  • Candidate selection set for groups (Equation 17): $ \mathcal{S}^{(m)}(\vec{k}) = \left{g \in [G^{(m)}] : A_g^{(m)} \cap \mathcal{S}(\vec{k}) \neq \emptyset \mathrm{~and~} e_g^{(m)} \geq \operatorname*{max}\left{\frac{1{\widehat{k}^{(m)} \neq 0} \widehat{\pi}^{(m)} G^{(m)}}{v_g^{(m)} \alpha^{(m)} \widehat{k}^{(m)}}, 1{\widehat{k}^{(m)} = 0} \cdot \infty, \frac{1}{\lambda^{(m)}}\right}\right}. $
    • This defines the groups selected at layer mm based on the selected individual features and their e-values relative to thresholds that incorporate penalties, priors, and null proportion estimates.
  • Algorithm 4: The unified e-filter
    • Initializes k(m)=G(m)k^{(m)} = G^{(m)} and π^(m)\widehat{\pi}^{(m)} (using Equation 15).
    • Repeats iteratively: For each layer mm, updates k(m)k^{(m)} by finding the maximum kk such that the sum of penalties of selected groups for that layer is at least kk. The selection of groups S(m)\mathcal{S}^{(m)} depends on the current kk vector (and thus other layers' decisions).
    • Outputs the final threshold vector (k^(1),,k^(M))(\widehat{k}^{(1)}, \dots, \widehat{k}^{(M)}) and the corresponding rejected set S(k^(1),,k^(M))\mathcal{S}(\widehat{k}^{(1)}, \dots, \widehat{k}^{(M)}).
  • Theorem 10: This theorem provides FDR control guarantees for the unified e-filter, both with and without null-proportion adaptivity, for e-values and relaxed e-values, in finite-sample and asymptotic settings.

4.2.7. General Formatting for Formulae

For example, when constructing the generalized e-values in FEFP (Equation 3): $ e_g^{(m)} = G^{(m)} \cdot \frac{\mathbb{I}\left{g \in \mathcal{G}^{(m)}(\alpha_0^{(m)})\right}}{\widehat{V}_{\mathcal{G}^{(m)}\left(\alpha_0^{(m)}\right)} \vee \alpha_0^{(m)}}. $

  • eg(m)e_g^{(m)}: The computed generalized e-value for group gg at resolution layer mm.
  • G(m)G^{(m)}: The total number of groups at resolution layer mm.
  • I{}\mathbb{I}\{\cdot\}: An indicator function. It takes a value of 1 if the condition inside the braces is true (i.e., group gg is included in the selection set G(m)(α0(m))\mathcal{G}^{(m)}(\alpha_0^{(m)}) produced by the base detection procedure G(m)\mathcal{G}^{(m)} at original FDR level α0(m)\alpha_0^{(m)}), and 0 otherwise.
  • G(m)(α0(m))\mathcal{G}^{(m)}(\alpha_0^{(m)}) : The set of selected groups at layer mm by the base detection procedure G(m)\mathcal{G}^{(m)} when controlling FDR at its original FDR level α0(m)\alpha_0^{(m)}.
  • V^G(m)(α0(m))\widehat{V}_{\mathcal{G}^{(m)}(\alpha_0^{(m)})}: The estimated number of false discoveries (false positives) from the output of the base detection procedure G(m)\mathcal{G}^{(m)} at original FDR level α0(m)\alpha_0^{(m)}.
  • α0(m)\vee \alpha_0^{(m)}: This denotes the maximum of V^G(m)(α0(m))\widehat{V}_{\mathcal{G}^{(m)}(\alpha_0^{(m)})} and α0(m)\alpha_0^{(m)}. This term is included in the denominator to prevent it from becoming too small (if V^G(m)(α0(m))\widehat{V}_{\mathcal{G}^{(m)}(\alpha_0^{(m)})} is 0 or very small), which would otherwise lead to an excessively large e-value. This ensures a conservative lower bound on the denominator, contributing to robust FDR control.

5. Experimental Setup

The experimental setup focuses on two main types of evaluations: simulation studies to rigorously test the theoretical properties and performance under controlled conditions, and real-world data analysis (HIV mutation data) to demonstrate practical applicability and superiority.

5.1. Datasets

5.1.1. Simulated Data

The simulation studies are based on a linear model to generate synthetic data: $ \pmb{y} = \pmb{X} \beta + \pmb{\epsilon}, $ where:

  • y\pmb{y}: The response vector.
  • X\pmb{X}: The design matrix.
  • β\beta: The true coefficient vector, indicating feature relevance.
  • ϵ\pmb{\epsilon}: Random noise, sampled from a normal distribution ϵN(0,In)\pmb{\epsilon} \sim N(\mathbf{0}, \mathcal{I}_n), where In\mathcal{I}_n is the n×nn \times n identity matrix.

Data Characteristics:

  • Multi-resolution Structure: Two layers are considered:
    1. Individual features (NN features).
    2. Groups (GG groups), with each group containing N/G features.
  • Design Matrix (X\boldsymbol{X}): Each row of X\boldsymbol{X} is independently sampled from N(0,Σρ)N(\mathbf{0}, \Sigma_\rho).
    • Σρ\Sigma_\rho: A block-diagonal matrix composed of GG Toeplitz submatrices.
      • Toeplitz Matrix: A matrix where each descending diagonal from left to right is constant. This structure models autocorrelation, where elements closer together are more correlated.
      • Block-Diagonal Structure: The overall covariance matrix is composed of independent blocks. This implies that features within a group are highly correlated (due to the Toeplitz structure), but features between different groups have near-zero correlation. This is a realistic setup for many biological or genomic datasets.
      • The specific Toeplitz submatrix structure is: $ \left[ \begin{array}{cccccc} 1 & \frac{(G'-2)\rho}{G'-1} & \frac{(G'-3)\rho}{G'-1} & \dots & \frac{\rho}{G'-1} & 0 \ \frac{(G'-2)\rho}{G'-1} & 1 & \frac{(G'-2)\rho}{G'-1} & \dots & \frac{2\rho}{G'-1} & \frac{\rho}{G'-1} \ \vdots & \ddots & & & \vdots \ 0 & \frac{\rho}{G'-1} & \frac{2\rho}{G'-1} & \dots & \frac{(G'-2)\rho}{G'-1} & 1 \ \end{array} \right], $ where G=N/GG' = N/G is the group size, and ρ\rho is the correlation parameter controlling the strength of correlation within groups.
  • Relevant Features (H1\mathcal{H}_1):
    1. KK groups are randomly selected as "signal groups".
    2. H1|\mathcal{H}_1| (number of relevant features) elements are randomly selected from within these KK signal groups.
    3. On average, a relevant group contains H1/K|\mathcal{H}_1|/K relevant features.
  • Signal Strength (δ\delta): For jH1j \in \mathcal{H}_1, βj\beta_j is sampled from N(0,δlogN/n)N(0, \delta\sqrt{\log N / n}). δ\delta controls how strong the signal of relevant features is.

Simulation Parameters:

  • Low-dimensional settings: n=1600n = 1600, N=800N = 800, G=80G = 80 (so G=10G'=10 features per group). H1=60|\mathcal{H}_1| = 60, K=20K = 20 (on average, 3 relevant features per signal group).
    • Correlation ρ\rho: varied in {0,0.2,0.4,0.6,0.8}\{0, 0.2, 0.4, 0.6, 0.8\} (fixed δ=3\delta=3).
    • Signal strength δ\delta: varied in {3,4,5,6,7}\{3, 4, 5, 6, 7\} (fixed ρ=0.6\rho=0.6).
  • High-dimensional settings: n=600n = 600, N=800N = 800, G=80G = 80 (so G=10G'=10 features per group). H1=60|\mathcal{H}_1| = 60, K=20K = 20.
    • Correlation ρ\rho: varied in {0,0.2,0.4,0.6,0.8}\{0, 0.2, 0.4, 0.6, 0.8\} (fixed δ=3\delta=3).
    • Signal strength δ\delta: varied in {3,4,5,6,7}\{3, 4, 5, 6, 7\} (fixed ρ=0.6\rho=0.6).
  • All simulation results are averaged over 50 independent trials.

5.1.2. HIV Mutation Data

This dataset was previously analyzed in various studies [2, 15, 27, 35, 40] and contains information about HIV-1 mutations associated with drug resistance.

  • Domain: HIV-1 drug resistance.

  • Response Variable (YY): Log-fold increase of lab-tested drug resistance.

  • Features (XjX_j): Binary variables indicating the presence or absence of mutation jj. Different mutations at the same location are treated as distinct features.

  • Multi-resolution Structure:

    1. Individual Mutations: The primary features.
    2. Mutation Positions: Mutations are naturally grouped based on their known genomic locations. These positions form the group layer.
  • Goal: Identify mutations and their clusters (positions) that influence drug resistance, controlling individual-FDR and group-FDR simultaneously.

  • Drug Classes:

    • Protease Inhibitors (PIs): APV, ATV, IDV, LPV, NFV, RTV, SQV.
    • Nucleoside Reverse Transcriptase Inhibitors (NRTIs): ABC, AZT, D4T, DDI.
  • Preprocessing: For each drug, rows lacking drug resistance info are removed, and mutations appearing fewer than 3 times are excluded.

  • Linear Model Assumption: Consistent with prior work, a linear model between response and features (no interaction terms) is assumed.

  • Ground Truth: The treatment-selected mutation (TSM) panels [34] are used as a reference standard for evaluating performance (approximation of ground truth).

    The following are the results from [Table 1] of the original paper: TABLE 1 Sample information for the seven P I -type drugs and the three NRTI-type drugs.

Drug typeDrugSample size# mutations# positions genotyped
PIAPV76720165
ATV32814760
IDV82520666
LPV51518465
NFV84220766
RTV79320565
SQV82420665
NRTIABC623283105
AZT626283105
D4T625281104
DDI628283105
  • Sample size: Number of HIV-1 samples available for each drug.
  • # mutations: Total number of unique mutations considered as features after preprocessing.
  • # positions genotyped: Number of distinct genomic locations (groups) where mutations were observed.

5.2. Evaluation Metrics

The primary evaluation metrics used are False Discovery Proportion (FDP) and Power.

5.2.1. False Discovery Proportion (FDP)

  • Conceptual Definition: FDP is the proportion of false discoveries among all discoveries made in a given experiment. It directly measures the empirical performance of an FDR control procedure. If a method claims to control FDR at a level α\alpha, the observed FDP should ideally be close to or below α\alpha.
  • Mathematical Formula: $ \mathrm{FDP} = \frac{\mathrm{V}}{\mathrm{R} \vee 1} $
  • Symbol Explanation:
    • V\mathrm{V}: The number of false positives (false discoveries), which are true null hypotheses that were incorrectly rejected.

    • R\mathrm{R}: The total number of rejections (discoveries) made by the method.

    • 1\vee 1: Denotes the maximum of R\mathrm{R} and 1. This ensures that if no discoveries are made (R=0R=0), the denominator is 1, and the FDP is correctly calculated as 0, preventing division by zero.

      In the context of multi-resolution testing, FDP is calculated separately for individual features (FDP (ind)) and for groups (FDP (grp)) at each layer mm, as FDP(m)\mathrm{FDP}^{(m)}.

5.2.2. Power

  • Conceptual Definition: Power (or True Positive Rate) is the probability that a statistical test correctly rejects a false null hypothesis. In the context of feature selection, it refers to the proportion of truly relevant features (or groups) that are successfully detected by the method. Higher power indicates a more sensitive and effective method.
  • Mathematical Formula: $ \mathrm{Power} = \frac{\mathrm{TP}}{\mathrm{P}} $
  • Symbol Explanation:
    • TP\mathrm{TP}: The number of true positives, which are truly relevant features (or groups) that were correctly detected.

    • P\mathrm{P}: The total number of truly relevant features (or groups) available in the dataset (i.e., the size of H1\mathcal{H}_1 or H1(m)\mathcal{H}_1^{(m)}).

      In the simulation studies, "Power" is often shown as the raw count of true positives, or implicitly as a measure of how many truly relevant items are discovered. For the HIV data, "True (ind)" and "True (grp)" are the number of true positives identified for individual mutations and mutation groups, respectively.

5.3. Baselines

The proposed SFEFP methods (e.g., eDS-filter, eDS+gKF-filter) are compared against several established FDR control procedures, particularly those designed for multi-resolution settings or known to work well in specific contexts.

  • MKF+ [23] (Multilayer Knockoff Filter):

    • Description: An extension of the knockoff filter to control FDR simultaneously at multiple resolutions (layers). It uses a default constant c=1c=1 (referred to as MKF(1)+MKF(1)+ or MKF+MKF+) to balance conservatism and power.
    • Why representative: It's a direct competitor for multilayer FDR control.
  • e-MKF [18]:

    • Description: An e-filter based method that leverages one-bit knockoff e-values for multilayer FDR control. It is an extension of MKF using the e-value framework.
    • Why representative: It is a direct e-value based counterpart to MKF+MKF+ and a predecessor to SFEFP in using e-values for multilayer control, highlighting the one-bit issue addressed by SFEFP.
  • KF+ (Knockoff Filter+) [2]:

    • Description: The original knockoff filter for single-resolution FDR control of individual features. The + usually indicates a practical variant that might relax the FDR slightly for higher power.
    • Why representative: Included to illustrate the necessity of multilayer FDR control. It is expected to control individual FDR but not necessarily group FDR, or FDR across both simultaneously.
  • SFEFP variants (proposed methods):

    • eDS-filter: An instantiation of SFEFP where the base detection procedure for all layers is the DS (Data Splitting) method, extended for group detection.
    • eDS+gKF-filter: An instantiation of SFEFP where DS is used for the individual feature layer, and group knockoff is used for the group layers.
    • KF+gDS: An instantiation of SFEFP where Knockoff Filter is used for the individual feature layer, and DS is used for the group layers. This is not explicitly detailed but implied as a combination symmetric to eDS+gKFeDS+gKF.
  • FEFP (single replication) variants (denoted with * prefix):

    • *e-MKF, *eDS-filter, *eDS+gKF, *KF+gDS: These denote the FEFP versions (i.e., SFEFP with R=1R=1 replication). These are crucial for demonstrating the power enhancement brought by stabilization (derandomization) in SFEFP compared to FEFP (which suffers from the one-bit dilemma). For example, *e-MKF is exactly the e-MKF method proposed by Gablenz et al. [18].

Specifics for Base Procedures:

  • Fixed design (group) knockoffs: Used for low-dimensional simulations, typically with a signed-max function as the test statistic.
  • Model-X (group) knockoffs: Used for high-dimensional simulations, implemented using the knockoffs R package [12].
  • DS procedure (individual selection): Implemented using the Lasso+OLSLasso+OLS procedure, as described in [15].
  • DS procedure (group selection): Test statistic Tg(m)T_g^{(m)} (Equation 10) is computed by averaging WjW_j statistics obtained from the Lasso+OLSLasso+OLS procedure.

FDR Levels:

  • Target FDR levels: Set to α(1)=α(2)=0.2\alpha^{(1)} = \alpha^{(2)} = 0.2 for all methods across all simulations.
  • Original FDR levels (for SFEFP methods): Set to α0(m)=α(m)/2=0.1\alpha_0^{(m)} = \alpha^{(m)}/2 = 0.1 for m[M]m \in [M].
  • Number of replications (for SFEFP): R=50R=50 for simulations, R=100R=100 for HIV data.

6. Results & Analysis

The experimental results are presented through simulation studies and a real-world HIV mutation data analysis. The key aspects evaluated are FDR control (measured by FDP) and Power.

6.1. Core Results Analysis

6.1.1. Simulation Results (Low-dimensional settings)

Settings: n=1600n = 1600, N=800N = 800, G=80G = 80 (10 features per group). H1=60|\mathcal{H}_1| = 60, K=20K = 20. Target FDR α(m)=0.2\alpha^{(m)} = 0.2. Original FDR α0(m)=0.1\alpha_0^{(m)} = 0.1. Averaged over 50 trials.

The following figure (Figure 1 from the original paper) shows the simulation results for methods under different correlations, while fixing the signal strength δ\delta as 3:

FIG 1. Simulation results for methods \(M K F +\) . \(e\) MKF, eDS-filter, \(e D S { + } g K F ,\) \(K F { + } g D S ,\) \(^ { \\ast } e\) MKF, \\*eDS-filter, \(^ { * } e D S + g K F ,\) and \(^ { * } K F { + } g D S\) under different correlations, while fixing the signal strength δ as 3.
该图像是一个图表,展示了不同检测方法在个体和组别特征的功效(Power)和假发现率(FDR)下的模拟结果。图表分为四个部分:上左为个体特征的功效,右侧为个体特征的FDR;下左为组别特征的功效,右侧为组别特征的FDR。不同方法的表现随相关性(Correlation)变化,明显显示出eDS-filter等方法在控制FDR的同时维持或提高了功效。

  • Analysis of Figure 1 (Varying Correlation ρ\rho, fixed δ=3\delta=3):

    • FDR Control (Individual and Group): All methods generally control FDR (or FDP) below the target level of 0.2, except for KF+gDSKF+gDS which shows slightly inflated FDP at higher correlations for individual features.

    • Power under High Correlations:

      • eDS-filter and eDS+gKFeDS+gKF (both leveraging the DS method) consistently demonstrate higher power (more true discoveries) at both individual and group layers, especially as the correlation ρ\rho increases. This highlights the effectiveness of DS in handling highly correlated features.
      • MKF+MKF+, e-MKF, and KF+gDSKF+gDS (which rely on knockoffs) show a noticeable drop in power as rho increases, confirming the known limitation of knockoff procedures in highly correlated settings.
    • Impact of Stabilization (SFEFP vs. FEFP, i.e., solid vs. dashed lines): SFEFP methods (solid lines) consistently show higher power than their FEFP counterparts (dashed lines) across almost all correlation levels. This clearly demonstrates that the stabilization step (averaging e-values over R=50R=50 replications) is effective in enhancing detection power and overcoming the "one-bit" limitation of FEFP.

      The following figure (Figure 2 from the original paper) shows the simulation results for nine methods under different signal strength, while fixing the correlation ρ\rho as 0.6:

      FIG 2. Simulation results for nine methods under different signal strength, while fixing the correlation as 0.6. 该图像是图表,展示了九种方法在不同信号强度下的模拟结果,左上角显示了个体检测的功效和假发现率,右上角显示个体的假发现率,左下角则展示了组检测的功效,右下角展示了组的假发现率。效果对比表明,eDS-filter在多个分辨率下有效控制假发现率。

  • Analysis of Figure 2 (Varying Signal Strength δ\delta, fixed ρ=0.6\rho=0.6):

    • FDR Control: All methods generally maintain FDR control well below 0.2 at both individual and group levels across varying signal strengths.
    • Power Enhancement with δ\delta: As expected, power for all methods increases with signal strength δ\delta.
    • Superiority of eDS-filter and eDS+gKF: eDS-filter and eDS+gKFeDS+gKF continue to show superior power compared to MKF+MKF+, e-MKF, and KF+gDSKF+gDS. This confirms that methods incorporating DS are more effective even when signals are strong, especially in correlated environments.
    • Stabilization Benefit: The solid lines (SFEFP) again outperform their dashed counterparts (FEFP with R=1R=1), indicating that stabilization consistently improves power regardless of signal strength.

6.1.2. Simulation Results (High-dimensional settings)

Settings: n=600n = 600, N=800N = 800, G=80G = 80. Target FDR α(m)=0.2\alpha^{(m)} = 0.2. Original FDR α0(m)=0.1\alpha_0^{(m)} = 0.1. Averaged over 50 trials.

The following figure (Figure 3 from the original paper) shows the simulation results for the high-dimensional setting under different correlations with δ=3\delta=3:

FIG 3. Simulation results for the high-dimensional setting under different correlations with \(\\delta = 3\) .
该图像是图表,展示了在不同相关性下的高维设置中,个体与组的功效(Power)和假发现率(FDR)。上半部分显示了个体的功效与FDR,而下半部分则展示了组的功效与FDR,比较了多种方法的表现。

  • Analysis of Figure 3 (Varying Correlation ρ\rho, fixed δ=3\delta=3 in high-dimensions):

    • The patterns observed in low-dimensional settings (Figure 1) are largely replicated here.

    • eDS-filter and eDS+gKFeDS+gKF maintain their power advantage in high-dimensional settings, particularly at higher correlations.

    • The benefit of stabilization (solid vs. dashed lines) is also consistent, showing power enhancement.

      The following figure (Figure 4 from the original paper) shows the simulation results for the high-dimensional setting under different signal strength with ρ=0.6\rho=0.6:

      FIG 4. Simulation results for the high-dimensional setting under different signal strength with \(\\rho = 0 . 6\) . 该图像是图表,展示了在不同信号强度下的模拟结果。上方左侧显示了各个方法在个体检测中的功效(Power),上方右侧为个体假发现率(FDR),下方左侧为组检测的功效,下方右侧为组的假发现率,所有数据均呈现出不同信号强度对各方法性能的影响。

  • Analysis of Figure 4 (Varying Signal Strength δ\delta, fixed ρ=0.6\rho=0.6 in high-dimensions):

    • Similar to low-dimensional results (Figure 2), eDS-filter and eDS+gKFeDS+gKF show superior power across all signal strengths.

    • Stabilization continues to provide a clear power boost.

      Overall Simulation Conclusion: The simulation studies strongly validate the two main advantages of SFEFP:

  1. Flexibility for Enhanced Power: By allowing different base detection procedures (DS vs. knockoff) at different resolutions, SFEFP methods (like eDS-filter and eDS+gKFeDS+gKF) can leverage the strengths of these procedures, leading to significantly higher power, especially in settings with high feature correlation.
  2. Stabilization for Enhanced Power: The stabilization step (averaging e-values over multiple replications) consistently improves detection power across various settings (correlation, signal strength, dimensionality) by addressing the "one-bit" dilemma and providing more robust e-values.

6.1.3. HIV Mutation Data Analysis

Goal: Identify important individual mutations and their genomic positions (clusters) associated with drug resistance, controlling FDR at both individual and group levels simultaneously. Methods compared: KF+KF+, MKF+MKF+, e-MKF, and eDS-filter. Reference Standard: Treatment-selected mutation (TSM) panels [34] are used as ground truth. Target FDR levels: α(1)=α(2)=0.3\alpha^{(1)} = \alpha^{(2)} = 0.3 (slightly relaxed for real data analysis, following practical recommendations from [23]). Replications for eDS-filter: R=100R=100. Original FDR levels for eDS-filter: α0(m)=α(m)/2=0.15\alpha_0^{(m)} = \alpha^{(m)}/2 = 0.15 for m=1,2m=1,2.

The following are the results from [Table 2] of the original paper: TABLE 2 Results r PI dTre" reents the ber tre posities the ber taisr p identied in the TM panel for he PI class f treatments. "False" presents the number false positives. The FDP is calculated as the rato of the number false positives to the total number o positives. The targe FDR leel e α( = α( = 0.. Fo -il e R= 50 α α0(m)=α(m)/2\alpha _ { 0 } ^ { ( m ) } = \alpha ^ { ( m ) } / 2 for m=1,2m = 1 , 2 The best-performing method is highlighted in bold.

DrugMethodTrue (ind)False (ind)FDP (ind)True (grp)False (grp)FDP (grp)
APVKF+2790.2501870.280
MKF+000000
e-MKF000000
eDS-filter2740.1291820.100
ATVKF+1960.2401910.050
MKF+000000
e-MKF000000
eDS-filter1810.0531800
IDVKF+34330.49324150.385
MKF+2630.1031700
e-MKF2640.1331800
eDS-filter2730.1001800
LPVKF+2780.2292030.130
MKF+1930.1361310.071
e-MKF1930.1361310.071
eDS-filter2330.1151500
NFVKF+33220.4002480.250
MKF+000000
e-MKF000000
eDS-filter3280.2002020.091
RTVKF+1950.2081220.143
MKF+000000
e-MKF000000
eDS-filter2570.2191720.105
SQVKF+2260.2141620.111
MKF+000000
e-MKF000000
eDS-filter2240.1541500

The following are the results from [Table 3] of the original paper: TABLE 3 D α(1)=α(2)=0.3.\alpha ^ { ( 1 ) } = \alpha ^ { ( 2 ) } = 0 . 3 . For eDS-filter, we set R=100R = 1 0 0 The best-performing method is highlighted in bold.

DrugMethodTrue (ind)False (ind)FDP (ind)True (grp)False (grp)FDP (grp)
ABCKF+1400.1761420.125
MKF+000000
e-MKF000000
eDS-filter1300.1331220.143
AZTKF+1700.3201650.238
MKF+11001000
e-MKF11001000
eDS-filter1510.0631400
D4TKF+1010910.100
MKF+000000
e-MKF000000
eDS-filter1820.1001610.118
DDIKF+000000
MKF+000000
e-MKF000000
eDS-filter1840.1821700.150
  • Analysis of Tables 2 and 3 (HIV Mutation Data):
    • KF+ Performance: The KF+KF+ method (single-resolution knockoff filter) often fails to control FDP simultaneously at both individual and group resolutions. For instance, for IDV and NFV (PI drugs), its FDP (ind) and FDP (grp) are significantly above 0.3. For AZT (NRTI drug), its FDP (ind) is 0.320 and FDP (grp) is 0.238, both above the target 0.2 (the table states target FDR is 0.3 which is possibly a typo given other context, but even at 0.3 it fails for individual for AZT). This highlights the necessity of multi-resolution FDR control.
    • MKF+ and e-MKF Performance: These methods frequently make zero discoveries (True (ind)=0, True (grp)=0) for several drugs (APV, ATV, NFV, RTV, SQV for PI; ABC, D4T, DDI for NRTI). This confirms the problem of conservatism or the zero-power dilemma that SFEFP aims to solve. When they do make discoveries (e.g., IDV, LPV), their power is often lower than eDS-filter.
    • eDS-filter (Proposed Method) Performance:
      • Consistent FDR Control: The eDS-filter consistently controls FDP at multiple resolutions (individual and group) simultaneously, with most FDP values at or below the target 0.3.

      • Superior Power: The eDS-filter generally achieves significantly higher power (more true positives) compared to MKF+MKF+ and e-MKF, especially for drugs where MKF+MKF+ and e-MKF found nothing. For example, for APV, eDS-filter finds 27 individual mutations and 18 groups, while MKF+MKF+ and e-MKF find 0. For D4T, eDS-filter finds 18 individual mutations and 16 groups, while MKF+MKF+ and e-MKF find 0.

      • Comparable or Higher Power than KF+ with better FDP: Compared to KF+KF+, eDS-filter achieves similar or higher power (e.g., for RTV), but with much better FDP control at both resolutions. For instance, for IDV, KF+KF+ has FDP (ind) of 0.493 and FDP (grp) of 0.385, while eDS-filter has FDP (ind) of 0.100 and FDP (grp) of 0.

      • Comparison with DeepPINK: The paper also mentions (in text) that eDS-filter achieved higher power than DeepPINK [27] (a method for individual FDR control) for all drugs except ATV, and lower FDP for several drugs.

        Overall Conclusion from Real Data: The analysis of HIV mutation data strongly supports the advantages of eDS-filter as an instantiation of SFEFP. It demonstrates superior power and robust FDR control at multiple resolutions compared to existing multilayer knockoff and e-filter methods, highlighting the practical benefits of the proposed framework. The flexibility of SFEFP to incorporate powerful base procedures like DS for specific data characteristics (e.g., correlated mutations) is crucial for achieving these results.

6.2. Ablation Studies / Parameter Analysis

While the paper doesn't present explicit "ablation studies" in the traditional sense (removing components of SFEFP one by one), the comparison between SFEFP (solid lines) and FEFP (dashed lines, equivalent to SFEFP with R=1R=1 replication) in the simulation results serves as a crucial analysis of the effect of the stabilization step.

  • Impact of Stabilization (R > 1 vs. R = 1):

    • Observation: Across all simulation figures (Figures 1, 2, 3, 4), the SFEFP methods (e.g., eDS-filter, eDS+gKFeDS+gKF, e-MKF, KF+gDSKF+gDS with R=50R=50) consistently show higher power than their FEFP counterparts (prefixed with *, implying R=1R=1).
    • Interpretation: This directly demonstrates the effectiveness of the stabilization treatment. The averaging of generalized e-values over multiple runs (derandomization) transforms the "one-bit" e-values into continuous, more informative scores. This effectively addresses the zero-power dilemma and enhances the ranking information, leading to a more powerful detection of true signals without compromising FDR control. This finding is particularly significant because derandomization in single-resolution settings is sometimes associated with a power loss, whereas in multi-resolution settings, SFEFP shows a power gain.
  • Impact of Base Detection Procedure (e.g., DS vs. Knockoffs):

    • Observation: Comparing eDS-filter (DS-based) with e-MKF (Knockoff-based) or KF+gDSKF+gDS (Knockoff on layer 1) clearly shows that DS-based approaches yield significantly higher power when features are highly correlated (Figures 1, 3).
    • Interpretation: This validates the flexibility principle of SFEFP. By allowing practitioners to choose the most suitable base detection procedure for each layer (e.g., DS for highly correlated features, knockoffs for other settings), SFEFP can achieve superior performance tailored to the data's characteristics. This is a form of "component analysis" showing the benefit of SFEFP's modularity.
  • Impact of Original FDR Level (α0(m)\alpha_0^{(m)}):

    • The paper discusses that the choice of α0(m)\alpha_0^{(m)} affects the number and magnitude of non-zero generalized e-values. For multi-resolution settings, α0(m)=α(m)\alpha_0^{(m)} = \alpha^{(m)} might not be optimal for FEFP (due to the one-bit dilemma). While specific ablation studies on α0(m)\alpha_0^{(m)} are not presented, the discussion implies that α0(m)=α(m)/2\alpha_0^{(m)} = \alpha^{(m)}/2 (used in simulations) is a reasonable practical choice to balance these factors for SFEFP.

6.3. Data Presentation (Tables)

The following are the results from [Table 1] of the original paper: TABLE 1 Sample information for the seven P I -type drugs and the three NRTI-type drugs.

Drug typeDrugSample size# mutations# positions genotyped
PIAPV76720165
ATV32814760
IDV82520666
LPV51518465
NFV84220766
RTV79320565
SQV82420665
NRTIABC623283105
AZT626283105
D4T625281104
DDI628283105

The following are the results from [Table 2] of the original paper: TABLE 2 Results r PI dTre" reents the ber tre posities the ber taisr p identied in the TM panel for he PI class f treatments. "False" presents the number false positives. The FDP is calculated as the rato of the number false positives to the total number o positives. The targe FDR leel e α( = α( = 0.. Fo -il e R= 50 α α0(m)=α(m)/2\alpha _ { 0 } ^ { ( m ) } = \alpha ^ { ( m ) } / 2 for m=1,2m = 1 , 2 The best-performing method is highlighted in bold.

DrugMethodTrue (ind)False (ind)FDP (ind)True (grp)False (grp)FDP (grp)
APVKF+2790.2501870.280
MKF+000000
e-MKF000000
eDS-filter2740.1291820.100
ATVKF+1960.2401910.050
MKF+000000
e-MKF000000
eDS-filter1810.0531800
IDVKF+34330.49324150.385
MKF+2630.1031700
e-MKF2640.1331800
eDS-filter2730.1001800
LPVKF+2780.2292030.130
MKF+1930.1361310.071
e-MKF1930.1361310.071
eDS-filter2330.1151500
NFVKF+33220.4002480.250
MKF+000000
e-MKF000000
eDS-filter3280.2002020.091
RTVKF+1950.2081220.143
MKF+000000
e-MKF000000
eDS-filter2570.2191720.105
SQVKF+2260.2141620.111
MKF+000000
e-MKF000000
eDS-filter2240.1541500

The following are the results from [Table 3] of the original paper: TABLE 3 D α(1)=α(2)=0.3.\alpha ^ { ( 1 ) } = \alpha ^ { ( 2 ) } = 0 . 3 . For eDS-filter, we set R=100R = 1 0 0 The best-performing method is highlighted in bold.

DrugMethodTrue (ind)False (ind)FDP (ind)True (grp)False (grp)FDP (grp)
ABCKF+1400.1761420.125
MKF+000000
e-MKF000000
eDS-filter1300.1331220.143
AZTKF+1700.3201650.238
MKF+11001000
e-MKF11001000
eDS-filter1510.0631400
D4TKF+1010910.100
MKF+000000
e-MKF000000
eDS-filter1820.1001610.118
DDIKF+000000
MKF+000000
e-MKF000000
eDS-filter1840.1821700.150

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper introduces SFEFP (Stabilized Flexible E-Filter Procedure), a novel and robust framework for simultaneously detecting significant features and feature groups while rigorously controlling the False Discovery Rate (FDR) at multiple resolutions. SFEFP addresses critical limitations of existing methods, such as the conservatism of multilayer knockoff filter (MKF) and the zero-power dilemma of e-filter procedures that use one-bit e-values.

The core innovation lies in:

  1. Generalized E-values: A unified construction that allows SFEFP to incorporate a wide variety of base detection procedures (e.g., knockoffs, data splitting (DS), Gaussian Mirror (GM), Symmetry-based Adaptive Selection (SAS)) at different resolutions, enabling practitioners to select the most effective method for each specific context.

  2. Generalized E-filter: A principled procedure that leverages these generalized e-values to make coherent selections across multiple layers while guaranteeing FDR control.

  3. Stabilization Treatment: This crucial step involves averaging generalized e-values obtained from multiple replications. It transforms binary ("one-bit") e-values into continuous scores, thereby providing richer ranking information. This stabilization effectively circumvents the zero-power dilemma and consistently enhances detection power and stability.

    Theoretical results underpin SFEFP with guarantees for multilayer FDR control and stability (convergence of the selection set with increasing replications). Practical instantiations, such as the eDS-filter and eDS+gKF-filter, demonstrate the framework's versatility. Simulation studies show that eDS-filter effectively controls FDR while maintaining or surpassing the power of MKF and e-MKF, especially in settings with high feature correlation. This superiority is further confirmed through the analysis of HIV mutation data.

In essence, SFEFP provides a powerful, flexible, and stable meta-methodology that can adapt to diverse data structures and leverages the strengths of various state-of-the-art FDR control techniques, thereby improving discovery potential in complex multi-resolution analyses.

7.2. Limitations & Future Work

The authors acknowledge several areas for further investigation and improvement:

  1. Impact of Original FDR Level (α0(m)\alpha_0^{(m)}) on Power: While the paper discusses the practical choice of α0(m)\alpha_0^{(m)}, a deeper theoretical understanding of its optimal setting and impact on the power of FEFP and SFEFP is needed.
  2. Sharper FDR Bounds: Simulation results often show that SFEFP achieves empirically lower FDR than the preset nominal level. This suggests that the current theoretical FDR bounds might be conservative. Exploring milder conditions to derive sharper FDR bounds could potentially lead to even more powerful variants of SFEFP.
  3. Adaptive Weights for Replications: The current SFEFP primarily uses uniform weights (ωr(m)=1/R\omega_r^{(m)} = 1/R) for averaging generalized e-values across replications. Developing data-driven or adaptive weighting schemes could potentially improve the overall reliability and performance of the results.
  4. Integration of Enhanced E-values: Techniques from other works focused on enhancing e-values (e.g., [9, 24]) could be incorporated into SFEFP to further boost its power.

7.3. Personal Insights & Critique

Inspirations

  • The Power of Flexibility: The most significant inspiration from this paper is its emphasis on flexibility. Instead of developing yet another specific FDR control method, it provides a unifying framework (SFEFP) that allows researchers to plug in the best available method for each specific data layer or characteristic. This modularity is extremely valuable, as no single FDR control method is universally optimal. This approach fosters innovation by allowing the framework to benefit from future advancements in base detection procedures.
  • Addressing the "One-bit" Dilemma: The clear identification and effective solution to the "one-bit" problem in e-filter procedures are insightful. The stabilization treatment, by generating continuous e-values from multiple runs, is a clever way to introduce nuance and resolve conflicts across layers, transforming a potential "zero-power disaster" into a consistent power gain. This highlights the importance of carefully considering the implications of binary decision rules in complex inference tasks.
  • Bridging Theory and Practice: The paper successfully bridges theoretical FDR control with practical considerations. By integrating powerful methods like DS (known for handling high correlations) into SFEFP, it demonstrates how theoretical guarantees can be combined with domain-specific effectiveness. The discussion on relaxing target FDR levels in real-world scenarios also shows a pragmatic understanding of applied statistics.

Potential Issues, Unverified Assumptions, or Areas for Improvement

  • Computational Cost of Stabilization: While SFEFP offers significant advantages, running base detection procedures RR times can be computationally intensive, especially for complex base methods or large datasets. The paper could elaborate more on the practical computational overhead for different choices of RR and how to optimize it (e.g., parallelization, early stopping for replications).

  • Choice of α0(m)\alpha_0^{(m)}: The paper admits that the choice of original FDR level α0(m)\alpha_0^{(m)} is important and its optimal setting requires further theoretical investigation. While a default of α0(m)=α(m)/2\alpha_0^{(m)} = \alpha^{(m)}/2 is suggested, this parameter might be critical for maximizing power in specific scenarios. A more data-driven or theoretically justified method for choosing α0(m)\alpha_0^{(m)} would be highly beneficial.

  • Interpretation of Averaged E-values: While averaging e-values (Equation 7) intuitively makes sense for stabilization, a deeper theoretical exploration of the properties of these averaged generalized e-values (e.g., how "tight" they remain, their precise distributional characteristics) could provide more insights into why SFEFP gains power rather than losing it, as sometimes seen in single-resolution derandomization.

  • Scalability to Many Layers: The framework is designed for MM layers. While M=2M=2 or 3 is common, the complexity of iteratively updating thresholds (Algorithm 2 and 3) might increase with a very large MM. Exploring the convergence properties and computational efficiency for a large number of layers could be valuable.

  • Generalizability of Assumptions for Base Procedures: The paper demonstrates that several methods (DS, GM, SAS) satisfy Definition 1 and their e-values are relaxed or asymptotic relaxed e-values. Ensuring that other novel FDR procedures (especially those with complex dependence structures or non-standard nulls) also fit these definitions might require careful verification by practitioners.

    Overall, SFEFP represents a significant step forward in multi-resolution FDR control, offering a highly adaptable and powerful framework. The future work suggested by the authors, particularly regarding optimal parameter choices and deeper theoretical understanding of e-value aggregation, will further strengthen its utility and impact.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.