Specificity, length and luck drive gene rankings in association studies
TL;DR Summary
This study analyzes 209 quantitative trait association studies, revealing systematic differences in gene prioritization between GWAS and rare variant burden tests. It proposes criteria based on trait importance and specificity, highlighting their distinct impacts on trait biology
Abstract
Standard genome-wide association studies (GWAS) and rare variant burden tests are essential tools for identifying trait-relevant genes. By analyzing association studies of 209 quantitative traits in the UK Biobank, we show that they systematically prioritize different genes. We propose prioritization criteria based on trait importance and trait specificity and find that GWAS prioritize genes near trait-specific variants, while burden tests prioritize trait-specific genes, revealing differences in trait biology and implications for interpretation and usage.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
Specificity, length and luck drive gene rankings in association studies
1.2. Authors
Jeffrey P. Spence, Hakhamanesh Mostafavi, Mineto Ota, Nikhil Milind, Tamara Gjorgjieva, Courtney J. Smith, Yuval B. Simons, Guy Sella & Jonathan K. Pritchard. The corresponding authors are Jeffrey P. Spence, Hakhamanesh Mostafavi, Mineto Ota, and Jonathan K. Pritchard.
1.3. Journal/Conference
Nature. Nature is one of the world's most prestigious and highly-cited multidisciplinary scientific journals, known for publishing groundbreaking research across all fields of science and technology. Its reputation ensures rigorous peer review and high impact in the scientific community.
1.4. Publication Year
2025 (Published online: 05 November 2025).
1.5. Abstract
Standard genome-wide association studies (GWAS) and rare variant burden tests are fundamental tools for identifying genes relevant to specific traits. By analyzing association studies of 209 quantitative traits in the UK Biobank, the authors demonstrate that these two methods systematically prioritize different genes. To address this, they propose prioritization criteria based on trait importance (how much a gene quantitatively affects a trait) and trait specificity (the importance of a gene for the studied trait relative to its importance across all traits). Their findings indicate that GWAS prioritize genes near trait-specific variants, while burden tests prioritize trait-specific genes. This distinction reveals differences in the underlying trait biology and carries significant implications for the interpretation and practical application of association studies.
1.6. Original Source Link
https://doi.org/10.1038/s41586-025-09703-7 The paper is officially published online in Nature.
2. Executive Summary
2.1. Background & Motivation
The central goal of human genetics is to identify genes that influence traits and disease risk and to understand the extent of their effects. This knowledge is crucial for deciphering biological processes underlying trait variation, identifying critical genes and pathways, and discovering potential therapeutic targets.
The core problem the paper addresses is the observed discrepancy in gene prioritization between two essential tools in human genetics: Genome-Wide Association Studies (GWAS) and rare variant burden tests. While conceptually similar, previous anecdotal evidence and a systematic analysis by Weiner et al. (2023) suggested that these methods often identify distinct sets of genes, even with some overlap. This raises critical questions:
-
How do these methods prioritize genes?
-
What underlying biological principles drive these differences?
-
Which method is more relevant for understanding trait biology or for downstream applications like drug discovery?
Existing challenges in interpreting these studies include:
-
GWASdo not directly pinpoint causal genes, as most associated variants arenon-coding. -
A large fraction of the
genomecontributes toheritability, andtrait-associated variantsoften cannot be mapped to genes with clear phenotypic relevance. -
Rare protein-coding variants, crucial for direct gene study, are often excluded or underpowered in standardGWASbut are the focus ofburden tests.The paper's innovative idea is to propose two distinct criteria for ideal gene prioritization—
trait importanceandtrait specificity—and then to use population genetics models and empirical data from theUK Biobankto understand howGWASandLoF burden testsalign with these criteria, and what non-biological factors might also influence their rankings.
2.2. Main Contributions / Findings
The paper makes several primary contributions and key findings:
- Systematic Quantification of Differences: The study systematically confirms that
GWASandLoF burden testsprioritize different genes for the same traits, even after conservatively accounting for power differences and issues in linking variants to genes. - Proposed Prioritization Criteria: It introduces two conceptually distinct criteria for ideal gene prioritization:
Trait Importance: How much a gene quantitatively affects a trait.Trait Specificity: The importance of a gene for the trait under study relative to its importance across all traits.
- Mechanisms of Prioritization:
- Burden Tests Prioritize Trait-Specific Genes:
LoF burden teststend to prioritize genes by theirtrait specificity (\Psi_G)rather than theirtrait importance. This is because the strength of selection againstLoss-of-Function (LoF)variants, which determines their aggregate frequency, is proportional to the total effect across all fitness-relevant traits. - GWAS Prioritize Trait-Specific Variants:
GWASprioritizetrait-specific variants (\Psi_V). Variants can achieve specificity in two ways: by acting through atrait-specific geneor by havingcontext-specific effectson apleiotropic gene(e.g., regulating expression only in trait-relevant cell types).
- Burden Tests Prioritize Trait-Specific Genes:
- Role of Non-Coding Variants: The difference between
LoF burden testsandGWASis largely driven byGWASincludingnon-coding variants, which can havecontext-specific effectsand thus prioritizepleiotropic genesin a trait-specific manner, a capabilityburden testsgenerally lack. - Impact of Trait-Irrelevant Factors:
- Gene Length (for Burden Tests):
LoF burden testssystematically prioritize longer genes, as more potentialLoFpositions lead to greater power, irrespective of the gene's truetrait importance. - Genetic Drift (for GWAS):
Random genetic driftcausesminor allele frequencies (MAFs)to vary widely around their expected values. This stochasticity significantly influencesGWASrankings, leading toGWAS hitsappearing morepleiotropicthan they truly are, as higher frequency variants (due to drift) increase power for multiple traits.
- Gene Length (for Burden Tests):
- Method for Estimating Trait Importance: The paper suggests that
non-standard GWAS approachesthat aggregate signals across different types of variants (e.g., usingAMM) can better estimatetrait importancethan standardP-valuerankings, overcoming theflattening effectwhere highly important genes are harder to detect due to stronger purifying selection. - Implications: The findings underscore that
LoF burden testsandGWASreveal distinct but complementary aspects oftrait biology. Understanding these differences is crucial for accurate interpretation, target discovery (e.g.,trait-specific genesfor drug targets to minimize side effects), and improving future association studies.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
- Genome-Wide Association Studies (GWAS): A research approach that involves scanning markers across the complete sets of
DNA(orgenomes) of many people, looking forgenetic variationsassociated with a particular disease or trait.- How it works: Researchers collect
DNAfrom individuals (e.g., thousands or hundreds of thousands). They then analyzesingle nucleotide polymorphisms (SNPs), which are commongenetic variationswhere a single nucleotide in thegenomediffers between members of a species. For eachSNP, they compare theallele frequencies(the proportion of a specific variant of a gene) between groups (e.g., people with a disease vs. healthy controls) or correlateallele dosagewith quantitative traits. - Output:
GWAStypically generateP-valuesfor millions ofSNPs, indicating the statistical significance of their association with the trait.Effect sizes(e.g.,beta coefficients) describe the magnitude and direction of the association.Genome-wide significant hitsareSNPswith very smallP-values(typically ), suggesting a strong association.
- How it works: Researchers collect
- Rare Variant Burden Tests: A statistical method used to identify genes associated with complex traits or diseases by aggregating the effects of multiple
rare variantswithin a specific gene.- How it works: Instead of testing individual
SNPslikeGWAS(which is underpowered forrare variants),burden testsgrouprare variants(typically those withminor allele frequency (MAF)less than 1%) within a gene. These variants are oftenLoss-of-Function (LoF)ordamaging missense variants. The aggregated presence of theserare variantsin an individual creates aburden genotype. Thisburdenis then tested for association with the phenotype. Byburdening(or summing) rare effects, the method boosts statistical power. - Output: Gene-level
P-valuesandeffect sizesfor the aggregatedrare variantswithin each gene.
- How it works: Instead of testing individual
- UK Biobank: A large-scale biomedical database and research resource containing in-depth genetic and health information from half a million UK participants. It is a critical resource for
GWASandrare variant burden testanalyses due to its large sample size and extensive phenotypic data. - Quantitative Traits: Traits that show continuous variation (e.g., height, blood pressure, body mass index) rather than discrete categories. Their variation is typically influenced by multiple genes and environmental factors.
- Single Nucleotide Polymorphism (SNP): A variation in a single nucleotide that occurs at a specific position in the
genome, where the nucleotide (A, C, G, or T) at that position can differ between individuals.SNPsare the most common type ofgenetic variationamong people. - Loss-of-Function (LoF) Variants:
Genetic variants(mutations) that are predicted to cause a complete or partial loss of function of the gene product (e.g., protein). These can includenonsense mutations(introducing a premature stop codon),frameshift mutations(altering the reading frame), orsplice site mutations(affectingRNA splicing). - P-value: In
hypothesis testing, theP-valueis the probability of observing a test statistic (or something more extreme) if the null hypothesis were true. A smallP-value(typically less than0.05or forGWAS) suggests that the observed data are unlikely under the null hypothesis, leading to its rejection and supporting the alternative hypothesis (e.g., an association exists). - Heritability: In genetics,
heritabilityrefers to the proportion of phenotypic variation in a population that is attributable togenetic variationamong individuals. It estimates how much of the differences between people for a trait are due to genes, as opposed to environmental factors. - Genetic Drift: The change in the frequency of an existing gene
allelein a population due to random sampling of organisms. It's a random process, not driven by selection, and can causeallelesto become more or less common over generations, especially in small populations. - Pleiotropy: The phenomenon where a single gene affects two or more seemingly unrelated phenotypic traits. A
pleiotropic genemight have broad effects across multiple biological systems. - Linkage Disequilibrium (LD): The non-random association of
allelesat differentloci(genomic positions).Allelesare inLDwhen the frequency of association of theirgenotypesis higher or lower than what would be expected if thelociwere independent and associated randomly.LDblocks are regions whereSNPsare highly correlated. - Trait Importance (proposed by paper): The quantitative effect a gene (or variant) has on the trait under study. Formally, for a variant, it's its squared effect on the trait of interest (); for a gene, it's the squared
LoF burden effect size(). - Trait Specificity (proposed by paper): The importance of a gene (or variant) for the trait under study relative to its importance across all fitness-relevant traits. Formally, for a variant, it's ; for a gene, it's , where trait 1 is the trait under study.
- Minor Allele Frequency (MAF): The frequency at which the less common
alleleoccurs in a given population. Forrare variants,MAFis typically very low (e.g., <1%). - Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq): A molecular biology technique used to assess
chromatin accessibilityacross thegenome. Accessiblechromatinregions are often regulatory elements (e.g.,enhancers,promoters) and indicate active gene regulation.ATAC peaksdenote regions of openchromatin. - S-LDSC (Stratified Linkage Disequilibrium Score Regression): A method used to partition the
heritabilityof complex traits across differentgenomic annotations(e.g.,gene bodies,enhancers,tissue-specific regulatory regions). It quantifies how much a givenannotationcontributes toheritabilitybeyond what is expected by chance. The output represents the change in the proportion ofheritabilityexplained by a single variant when that variant is in a givenannotation. - MAGMA (Multi-marker Analysis of GenoMic Annotation): A gene-set analysis tool that uses
GWAS summary statisticsto calculate gene-levelP-valuesand then tests for enrichment of these gene-levelP-valuesin predefined gene sets. It aggregatesSNP-level P-valueswithin genes to obtain a gene-level score. - PoPS (Polygenic Priority Score): A method that leverages
polygenic enrichmentsofgene features(e.g.,expression patterns,protein-protein interactions) to predictgene-level scores(like those fromMAGMA) and prioritize genes underlying complex traits and diseases. - AMM (Allele-level Mixed Model): A statistical method designed to partition
gene-mediated disease heritabilityfromGWAS datawithout requiringeQTLs(expression quantitative trait loci). It estimates the totalheritabilitycontributed by variants acting via a given set of genes.
3.2. Previous Works
The paper builds upon and references several key prior studies:
- Weiner et al. (2023) - "Polygenic architecture of rare coding variation across 394,783 exomes.": This study systematically analyzed
rare coding variantsand found thatburden heritabilityis explained by fewer genes compared toSNP heritability, andburden teststend to prioritize genes more closely related totrait biology. This observation of distinct gene sets identified byGWASandburden testsforms a primary motivation for the current paper's investigation into why these differences occur. The current paper directly aims to explain the "why" behind theWeiner et al.findings. - Simons et al. (2018) - "A population genetic interpretation of GWAS findings for human quantitative traits.": This work, and subsequent extensions (ref 3, 33, 82), developed population genetics models of complex traits, often assuming
stabilizing selection. The current paper explicitly utilizes these models (Supplementary Appendix B) to derive theoretical predictions about how natural selection influences the power ofGWASandburden teststo prioritize variants and genes based on theireffect sizesandfrequencies. Specifically, the concept offlattening(where selection makes it harder to detect very large effect variants) is central toSimons et al.'swork and is directly incorporated here to explain the decoupling ofz-scoresfromtrait importanceinburden tests. - Backman et al. (ref 4): This refers to the source of the
LoF burden test summary statisticsused in the current study, highlighting the reliance on existing large-scale datasets. - Finucane et al. (2015) - "Partitioning heritability by functional annotation using genome-wide association summary statistics." (ref 9): This paper introduced
S-LDSC, a foundational method for partitioningheritabilityacrossgenomic annotations. The current paper usesS-LDSCextensively to quantify the contribution oftrait-specific variants(coding and non-coding) toheritabilityinGWAS, thus leveraging a widely accepted methodology for interpretingGWASsignals. - Morgenthaler & Thilly (2007) - "A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST)." (ref 17): This is one of the earliest conceptual papers outlining the idea of
burden testsfor rare variants, demonstrating the historical foundation of the methods being analyzed.
3.3. Technological Evolution
The field of human genetics has evolved significantly:
-
Early Linkage Studies: Focused on large families to map disease-causing genes, primarily for
Mendelian diseases(single gene disorders). -
Candidate Gene Studies: Hypothesized specific genes and tested their association with traits, often with limited success for complex traits.
-
Rise of GWAS (mid-2000s): Revolutionized complex trait genetics by enabling unbiased,
genome-widescans forcommon variants. This was driven by advancements inSNP genotyping arraysand large cohorts.GWASrevealed thepolygenicnature of most complex traits (many genes, each with small effects) and highlighted the importance ofnon-coding regulatory regions. -
Post-GWAS Interpretation (late 2000s-present): The challenge shifted from finding
GWAS hitsto interpreting them. Methods likeLDSCandS-LDSCemerged to linkGWAS signalsto functionalannotationsand tissues.Fine-mappingtechniques aimed to pinpoint causal variants withinLD regions. -
Whole-Exome/Whole-Genome Sequencing (early 2010s-present): The advent of affordable
sequencingtechnologies enabled the study ofrare variants, which were missed byGWAS. This led to the development and widespread application ofrare variant burden tests, initially for severeMendelian disordersand now increasingly for complex traits. Largebiobankslike theUK Biobankprovide the necessary sample sizes for these studies. -
Integration and Causal Inference (present): Current research, including this paper, focuses on integrating information from
GWASandburden tests, understanding their complementary nature, and moving towards causal inference and functional interpretation of genetic signals. Methods likeAMM,MAGMA, andPoPSrepresent efforts to extract gene-level insights fromSNP-level GWAS data.This paper fits into the current era of integrating different genetic association approaches and providing a theoretical framework to understand their strengths and weaknesses in prioritizing genes for complex traits.
3.4. Differentiation Analysis
Compared to previous studies that anecdotally or systematically observed differences between GWAS and burden tests, this paper provides a novel theoretical and empirical framework to explain why these differences exist and how they relate to distinct biological properties of genes and variants.
-
Novel Prioritization Criteria: The introduction of
trait importanceandtrait specificityas explicit, formal criteria for ideal gene prioritization is a core innovation. Previous work might have implicitly considered these, but this paper defines them rigorously and uses them as a lens to analyze existing methods. -
Population Genetics Framework: The paper rigorously applies
population genetics models(building onSimons et al.) to derive theoretical predictions for how natural selection shapes the observedP-valuesandz-scoresin bothGWASandburden tests. This moves beyond purely statistical comparisons to a deeper biological explanation. -
Explanation for Prioritization Mechanisms:
- It explicitly differentiates that
burden testsprioritizetrait-specific genes(driven by the selection strength being inversely related to LoF frequency, which itself sums across all trait effects) whileGWASprioritizetrait-specific variants(which can includecontext-specific effectsonpleiotropic genes). This distinction, especially the role ofnon-coding variantsin allowingGWASto capturepleiotropic genesin a trait-specific manner, is a key insight. - The paper uncovers and quantifies the impact of
trait-irrelevant factorslikegene length(forburden tests) andrandom genetic drift(forGWAS), which were previously less systematically understood as drivers of observed rankings.
- It explicitly differentiates that
-
Proposing Solutions for Trait Importance: While previous work noted the difficulty in identifying
trait-important genesfromP-valuerankings, this paper proposes and empirically tests methods (e.g., aggregating signals withAMM) that can better estimatetrait importanceby overcoming theflattening effect.In essence, while others observed what was different, this paper provides a robust theoretical and empirical explanation for why these differences arise and how to potentially leverage or mitigate them.
4. Methodology
4.1. Principles
The core idea of the methodology is to understand the drivers behind gene prioritization in two major types of genetic association studies: Genome-Wide Association Studies (GWAS) and Loss-of-Function (LoF) burden tests. The theoretical basis hinges on integrating population genetics models of complex traits with statistical genetics, particularly how natural selection influences the allele frequencies and effect sizes of variants, and consequently, the power of association tests.
The paper posits two ideal criteria for prioritizing genes: trait importance (the magnitude of a gene's effect on the trait) and trait specificity (how unique that effect is to the trait under study compared to other traits). The methodology then involves:
-
Theoretical Derivation: Using
population genetics modelsto predict how thestrength of association (z-score)inGWASandburden testsis expected to relate totrait importanceandtrait specificity. This involves modeling the interplay betweenmutation rates, , andallele frequencies. -
Empirical Validation: Analyzing real
GWASandLoF burden test summary statisticsfrom theUK Biobankfor hundreds of quantitative traits to test these theoretical predictions. This includes comparing rankings, examiningheritability enrichmentintissue-specific annotations, and investigating the influence ofgene lengthandminor allele frequency (MAF). -
Simulation Studies: Using simulated data to further explore the effects of
genetic driftonGWASvariant rankings and apparentpleiotropy. -
Proposing Improved Estimation: Investigating whether alternative approaches, such as aggregating signals across variants (e.g., using
AMM), can better estimatetrait importancecompared toP-valuebased rankings.The intuition is that if a gene or variant has a large effect on a trait (high
trait importance), it might also affect many other traits (highpleiotropy). Natural selection tends to removevariantswith large, negative effects across many traits, leading to lowerfrequencies. This interplay, coupled with technical aspects of each assay, dictates which genes rise to the top of association study rankings.
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. Data Acquisition and Preprocessing
- GWAS Summary Statistics:
- Downloaded from the Neale Lab (http://www.nealelab.is/uk-biobank/; v3) for 305 continuous traits.
- These
regressionswere performed oninverse rank normal-transformed phenotypesin approximately 360,000UK Biobankindividuals. Covariatesincluded age, age, inferred sex, age inferred sex, age inferred sex, andprincipal components 1-20.Genome-wide significance thresholdwas set at .
- LoF Burden Test Summary Statistics:
- Downloaded from
Backman et al. (ref 4)for 292LoF burden tests. - 209 traits overlapped with the
GWAS data(Supplementary Table 1). Burden genotypeswere calculated by categorizing individuals:homozygous non-LoF(at all sites),homozygous LoF(at any site), orheterozygotes.Burden testswere run usingREGENIE (ref 59)oninverse rank normal-transformed phenotypes.- Mask M1: Used for primary analyses, includes only
LoF variantswith stringent filtering criteria andallele frequencyupper bound of 1%. - Mask M3: Used for analyses including
missense variants, also includeslikely damaging missense variantswith anallele frequencyupper bound of 1%. Per-trait genome-wide significance thresholdwas , derived fromBonferroni correctionfor testing approximately 18,000 genes.
- Downloaded from
- Subset of Genetically Uncorrelated Traits:
- For specific analyses (e.g., Figs. 3b-d, 4b,c, Extended Data Figs. 1a-c, 3a-c), a subset of 27
genetically uncorrelated traitswas selected. - This subset was formed by intersecting the 209 overlapping traits with those analyzed by
Mostafavi et al. (ref 45), ensuring pairwisegenetic correlations(from Neale Lab) were below 0.5 and prioritizing traits with higherheritability. Biomarkers were excluded. This minimizes results being driven by highly correlated phenotypes.
- For specific analyses (e.g., Figs. 3b-d, 4b,c, Extended Data Figs. 1a-c, 3a-c), a subset of 27
4.2.2. Defining GWAS Loci and Ranking
To systematically compare GWAS and burden test discoveries and minimize technical artifacts from unknown causal genes or variant-to-gene mapping errors, a conservative approach for defining GWAS loci was used:
- Locus Definition: For traits with at least one
burden test hitand oneGWAS hit(151 traits),LD-clumped hits(, clumping ) were used. - Starting with the most significant
GWAS hit, a 1-Mb window was taken around it. - All
independent hitswith largerP-values(lower significance) within this 1-Mb window were included. - The
locus sizewas then expanded to ensure no otherhitwas within 1 Mb of any variant already in thelocus. - Overlapping
lociwere merged. - This process was repeated for the next most significant
hitnot yet assigned. - Gene Assignment: Overlapping
protein-coding genes(18,524 genes inLoF burden tests) were assigned to eachlocus. - Ranking:
GWAS lociwere ranked by the minimumGWAS P-valuewithin eachlocus.Burden test geneswere ranked by theirburden P-value. - Overlap Quantification (Fig. 1c): For
genome-wide significant burden test hits, their rank was compared to the rank of theGWAS locuscontaining them.Top GWAS lociwere defined by selecting a number ofGWAS lociequal to the number of significantburden hitsfor a given trait (e.g., 82 for height).
4.2.3. Prioritization Criteria: Trait Importance and Trait Specificity
The paper formally defines these concepts (illustrated in Figure 2):
The following figure (Figure 2 from the original paper) illustrates how genes should ideally be prioritized:
该图像是图表,展示了负载测试中基因优先级与特定性之间的关系。图表中显示了选择强度 和 LoF 频率之间的负相关,使用 LOESS 拟合趋势线,强调了特定基因与其表型的关联性。图 e 为量化-量化图,展示了在不同表型组织对的 P 值分布。
- Trait Importance:
- For a variant: its
squared effecton the trait of interest. If is the effect size of a variant on trait ,trait importancefor trait 1 is . - For a gene: the
trait importanceofLoF variantsin that gene. If is theLoF burden effect sizeof a gene on trait ,trait importancefor trait 1 is . - The paper considers high-impact variants important regardless of their direction of effect.
- For a variant: its
- Trait Specificity:
- Defined as the
importancefor the trait of interest relative to theimportanceacross all fitness-relevant traits (measured in appropriate units). - For a variant: .
- For a gene: .
- Here, trait 1 is always the trait under study.
- Defined as the
4.2.4. Theoretical Model for LoF Burden Test Prioritization
The paper analyzes population genetics models developed by Simons et al. (ref 3) to understand how LoF burden tests prioritize genes.
- Expected Strength of Association (): For a gene, the
strength of associationinburden testsis proportional to itstrait importance (\gamma_1^2)and theaggregate frequency of LoFs (p_{\mathrm{LoF}}(1-p_{\mathrm{LoF}})). $ E[z^2] \propto \gamma_1^2 p_{\mathrm{LoF}}(1-p_{\mathrm{LoF}}) $ Where:- is the expected squared
z-score(strength of association). - is the
trait importanceof the gene for trait 1. - is the
aggregate frequencyofLoFswithin the gene.
- is the expected squared
- Relationship between and Selection (): Under
stabilizing selection(where intermediate trait values are favored), theaggregate frequency of LoFsis inversely related to thestrength of selection against heterozygous LoF carriers (s_{\mathrm{het}})and positively related tomutation rate (\mu)andgene length (L)(number of sites where anLoFcan occur). $ p_{\mathrm{LoF}}(1-p_{\mathrm{LoF}}) \propto \frac{\mu L}{s_{\mathrm{het}}} $ Where:- is the
per-base mutation rate. - is the
number of potential LoF positionswithin the gene (proxy forgene length). - is the
strength of selection against heterozygous LoF carriers.
- is the
- Relationship between and Total Trait Effects: For genes affecting complex traits under
stabilizing selection, thestrength of selection (s_{\mathrm{het}})is approximately proportional to the sum oftrait importancesacross all fitness-relevant traits: $ s_{\mathrm{het}} \approx \sum_t \gamma_t^2 $ Where:- is the sum of
trait importancesacross all fitness-relevant traits.
- is the sum of
- Combining these, the expected strength of association for LoF burden tests becomes:
$
E[z^2] \propto \gamma_1^2 \frac{\mu L}{\sum_t \gamma_t^2} = (\mu L) \frac{\gamma_1^2}{\sum_t \gamma_t^2} = (\mu L) \Psi_G
$
This shows that
LoF burden testsprioritize genes by theirtrait specificity (\Psi_G)andgene length (\mu L). It does not directly prioritize bytrait importance (\gamma_1^2). - "Flattening" Effect: For genes with sufficiently large effects (high
trait importance),selectioncauses theirLoF frequenciesto be very low, leading to larger standard errors in effect size estimates. Thisflatteningeffect decouples thestrength of association (z^2)from the truetrait importance, making rankings by significance independent oftrait importancefor the most important genes. - Empirical Testing for Burden Tests:
- Correlation between estimated
s_het(fromZeng et al. (ref 35)) andaggregate LoF frequencies() (Fig. 3b). - Correlation between estimated
s_hetand unbiased estimates ofaverage trait importance() across 27genetically uncorrelated traits(Fig. 3c). - Plotting mean
squared z-scores (z^2)againstmean importanceto show decoupling (Fig. 3d). - Using
gene expression specificityas a proxy fortrait specificity (\Psi_G). Genes were binned by expression specificity in ninetrait-tissue pairs, andquantile-quantile plotsofLoF burden test P-valueswere generated to see if more specific genes had stronger signals (Fig. 3e).
- Correlation between estimated
4.2.5. Theoretical Model for GWAS Prioritization
A similar argument applies to GWAS at the variant level:
-
The expected
strength of associationfor avariantis proportional to itstrait importance (\alpha_1^2)relative to itstotal trait importanceacross all fitness-relevant traits (). $ E[z^2] \propto \frac{\alpha_1^2}{\sum_t \alpha_t^2} = \Psi_V $ This meansGWASprioritizetrait-specific variants (\Psi_V). -
Variant Specificity Types (Fig. 4a):
-
Trait-specific gene: A
variantaffects a gene that primarily impacts the studied trait. -
Context-specific effects: A
variant(oftennon-coding) has effects only in specific cellular contexts or developmental stages relevant to the trait, even if the underlying gene ispleiotropic.The following figure (Figure 4 from the original paper) illustrates how GWAS prioritizes trait-specific variants:
该图像是图表,展示了不同变异类型对遗传力贡献的估计。图中(a)显示了预期的遗传力贡献与变异特异性之间的关系,公式为 ;(b)和(d)展示了基因组中基因的遗传力富集与特异性之间的相关性,(c) 描绘了变异对遗传力贡献的结构示意。
-
-
Empirical Testing for GWAS:
- S-LDSC (Stratified Linkage Disequilibrium Score Regression): Used to quantify
heritability enrichment(a proxy for how highly variants are prioritized on average) along axes oftrait specificity.- Gene Trait Specificity (for coding variants): Restricted analysis to
coding variantsand usedexpression specificityof the gene they act on as a proxy for .S-LDSCwas run for ninetrait-tissue pairs(Fig. 4b). - Context Specificity (for non-coding variants): Used
non-coding variantsandtissue specificity of ATAC-seq peaksas a proxy forcontext specificity.S-LDSCwas run while controlling forATAC peak strength(Fig. 4c).
- Gene Trait Specificity (for coding variants): Restricted analysis to
- was reported, representing the change in the proportion of
heritabilityexplained by a singlevariantdue toannotation.
- S-LDSC (Stratified Linkage Disequilibrium Score Regression): Used to quantify
4.2.6. Impact of Trait-Irrelevant Factors
Gene Length on LoF Burden Tests
- Theoretical Prediction: The expected
strength of associationinLoF burden testsis proportional to . Longer genes (larger ) should have higher power, all else being equal, because they have more potentialLoFpositions and thus a higher aggregateLoF frequency. - Empirical Testing:
-
Correlated
gene length(proxied byexpected number of segregating LoFsfromgnomAD (v2)) withunbiased estimates of squared trait importance (\gamma^2),squared standard errors, andz-scores (z^2)across 27genetically uncorrelated traits(Extended Data Fig. 1).The following figure (Extended Data Fig. 1 from the original paper) illustrates how coding sequence length drives prioritization in
LoF burden tests:
该图像是一个展示相同突变轨迹的频率变化图。图中横轴表示自突变出现以来的世代数,纵轴表示频率。可以看到,随着世代的增加,频率呈现出不同的增长趋势,突出显示了基因突变在不同世代中的分布情况。
-
Random Genetic Drift on GWAS
-
Theoretical Prediction: While the expected
strength of associationis proportional totrait specificity (\Psi_V),random genetic driftcausesvariant allele frequenciesto deviate widely from their expected values (Extended Data Fig. 2a).GWASconsiders variants individually, so this stochasticity inMAFcan disproportionately affect rankings. -
Realized Heritability: The
realized heritabilityof avariantis , where is thevariant allele frequency.Genetic driftmakes highly variable. -
Simulations: Simulated
GWASto show that forsufficiently trait-important variants, the ranking byrealized heritabilityis largely random with respect totrait importance, driven byMAFdifferences (Extended Data Fig. 2b). -
Counterintuitive Pleiotropy: This
MAFrandomness leads to a counterintuitive result:variantsthat are the strongestGWAS hitsfor one trait are more likely to behitsfor other traits, even if they are, on average, moretrait specific. This is becausehigh-frequency variants(due to drift) have increased power across all traits (Extended Data Fig. 3).The following figure (Extended Data Fig. 2 from the original paper) illustrates how GWAS variant rankings are driven largely by
genetic drift:
该图像是一个散点图,显示了模拟SNPs的标准化平方效应与相对实现的遗传力之间的关系。颜色条表示最小等位基因频率(MAF),其范围从0.1到0.4,反映不同SNP的分布特征。
The following figure (Extended Data Fig. 3 from the original paper) illustrates how genetic drift makes GWAS hits appear more pleiotropic:
该图像是表格,展示了关于性别和种族等社会相关分组的报告信息,包括群体特征、招募及伦理监督等内容。表格中提供了N/A的标注,表示这些信息在此研究中不适用或未报告。
4.2.7. Estimating Trait Importance
The paper explores how to overcome the flattening effect and estimate trait importance more effectively.
- Simplified Model: A
varianthas aneffect (\beta)on a gene, which in turn has aneffect (\gamma)on the trait, such that theoverall variant effect (\alpha)is . - Flattening and Plateaus (Fig. 5a): The expected contribution to
heritabilityfirst increases with thetotal effect (\alpha^2 = (\beta\gamma)^2), but thenplateausordecouplesfor very large effects due to strongselection. - Aggregation Strategy (Fig. 5c): While individual
variantsexperienceflattening,geneswith highertrait importance (large\gamma)will have morevariants(even those with small ) that cross theheritability contribution threshold (\tau)and contribute toheritability. Thus, the totalheritabilitycontributed byvariantsacting on a givengeneshould correlate with itstrait importance. - Empirical Testing with AMM:
-
Used
AMM (Allele-level Mixed Model, ref 47)to estimate thetotal heritabilityofvariantsacting via a given set ofgenesusingGWAS data. -
Geneswere binned bys_het(a proxy fortrait importance). -
Compared how
AMM-estimated total heritability(Fig. 5d) trackss_hetversusLoF burden heritability(Fig. 5b).The following figure (Figure 5 from the original paper) illustrates how
trait importanceis estimated by combining differentvariant types:
该图像是图表,展示了长基因与性状的效应关系。图A显示长基因对性状没有更大影响,图B则表明长基因的标准误差较小,图C显示LoF负担测试优先考虑长基因,均与平均预期的LoF数量相关。
-
4.2.8. Unbiased Estimates of Trait Importance
To obtain reliable estimates of trait importance ( for variants, for genes), the paper used an unbiased estimator to correct for the inherent bias in simply squaring the observed effect size estimates ().
- Assuming
effect size estimates() are approximatelynormally distributedabout their true values () with noise dependent on theirstandard errors (s): . - An
unbiased estimatorfor is: $ \hat{\gamma}^2 - s^2 $ Where:- is the
squared estimated effect size. - is the
squared standard errorof the estimate.
- is the
4.2.9. Specific Methodological Details
- LoF Burden Summary Statistics Binned by :
s_hetvalues were fromZeng et al. (ref 35). Genes were binned into 100 bins bys_het, andsummary statistics(e.g., unbiased , ) were averaged within bins across 27uncorrelated traits.Heritability enrichmentwas computed as the average in a bin relative to the overall average, then inverse-variance weighted across traits. - ATAC Peak Specificity:
ATAC-seq filesfromChIP-Atlas (ref 71)were grouped into 19tissue/cell-type categories. Apeakwas 'present' if >5% of samples in a tissue contained it.Peak specificitywas measured by thenumber of shared tissuesa peak was present in (for peaks relevant to a trait-tissue pair) andpeak intensity(fraction of samples within the focal tissue containing the peak). - Gene Expression Specificity:
Average gene expression(TPM) fromHuman Protein Atlas (ref 73)andGene Expression Omnibus (ref 74)for 17tissues/cell types. A gene was 'expressed' if >10 TPM.Expression specificity score= expression intrait-relevant tissue/ sum of expression across all 17 tissues. Genes were binned into quintiles based on this score. - Linking Traits to Tissues:
S-LDSCwas used to partitionheritabilityfor traits with using the 19ATAC-seq annotations. Traits were assigned to a tissue if it had an with az-score> 4.5 and >40% of heritability explained byATAC-seq peaksin that tissue. Genetically uncorrelated traits () were kept, resulting in ninetrait-tissue pairs. - Regression of Burden on Expression Specificity: Linear regression of
burdenz^2
on `expression specificity quintiles` for genes expressed in the `top tissue`, controlling for `unbiased estimates of trait importance`.
* **S-LDSC Analysis using ATAC-seq peaks:** `ATAC-seq peaks` were binned by `number of shared tissues` (5 bins) and `peak intensity` (5 bins). These annotations, along with `LDSC baseline v1.1 covariates`, were used in `S-LDSC v.1.0.1` on `HapMap3 SNPs`.
* **S-LDSC Analysis using Coding Variants:** `Coding variants` were defined by `Ensembl Variant Effect Predictor (v85)` consequences. `S-LDSC` was run with `expression specificity bins` (5 bins), `gene expression level bins` (5 bins), and `baseline v1.1 covariates` on `HapMap3 SNPs`.
* **LoF Burden Summary Statistics Binned by :** Used `expected number of segregating LoFs` from `gnomAD (v2)` as a proxy for . Binned genes into 100 bins, averaged `summary statistics` (e.g., unbiased , , ) across 27 `uncorrelated traits`.
* **Computing Frequency Spectra given :** Simulated `allele frequency distributions` under a `stabilizing selection model` (heterozygote fitness ) using `fastDTWF (ref 81)` for a population of 20,000 diploids.
* **Simulating Realized Heritability:** Simulated 50,000 unlinked `variants`. For each of 1,000 `s_het` values, 50 `variants` were simulated by drawing `allele frequencies` from computed distributions. `GWAS sample allele counts` were drawn from a `Binomial` distribution (). `Realized heritability` was set to , where is the `GWAS sample allele frequency`.
* **Computing Pleiotropy of GWAS Hits:** Considered 18 `uncorrelated traits` with at least 100 `GWAS hits`. Hits were grouped into `P-value quartiles`. For each hit, the number of traits for which it was a `hit` was counted and averaged within quartiles.
* **Simulating Pleiotropy of GWAS Hits:** Simulated `GWAS summary statistics` for 18 traits and 10 million positions. `Squared effect sizes (`\vec{\alpha^2_j}`)` for variant were drawn as:
Where:
* The `exponentiation` is element-wise.
* and are parameters related to the overall effect magnitude and trait specificity distribution.
* The `strength of selection` was assumed to be .
* `MAF` was drawn from the `frequency distribution` with the closest `s_het`.
* `Observed association statistic` for trait and variant was simulated as:
Where:
* is a scaling factor for `environmental noise` and `sample size`.
* These were converted to `P-values` ( as a `chi-squared` distributed `z-score squared`).
* A `variant` was a `hit` if its `P-value` was less than threshold .
* Default parameters: .
* **AMM Analysis:** was run to estimate `heritability enrichments` for `gene sets`. `Genes` were binned into 100 `s_het` bins. `AMM` estimates the probability that a `SNP` acts via the closest gene, etc., using probabilities from `ref 47`. `LDSC baseline covariates v2.3` and `HapMap3 variants` were used.
* **Correlation of GWAS hit probability and :** Logistic regression was performed to differentiate `GWAS hits` from randomly sampled `SNPs`, using `s_het` of the nearest gene as a predictor, along with covariates (MAF, LD score, gene density, distance to `TSS`).
* **Correlation of and number of GWAS hits:** `LD-clumped GWAS hits` were assigned to the closest gene. The number of `GWAS hits` per gene was correlated with the `unbiased estimate of trait importance (`\hat{\gamma}^2`)` from `LoF burden tests`.
# 5. Experimental Setup
## 5.1. Datasets
The study extensively uses data from the `UK Biobank` and public `genomic annotation` resources.
* **UK Biobank GWAS Summary Statistics:**
* **Source:** Neale Lab (http://www.nealelab.is/uk-biobank/; v3).
* **Scale & Characteristics:** Summary statistics for 305 continuous traits. The underlying `GWAS` were performed on approximately 360,000 individuals from the `UK Biobank`. Phenotypes were `inverse rank normal-transformed`.
* **Domain:** A wide range of quantitative traits, including anthropometric (e.g., height), blood biomarkers, and others.
* **UK Biobank LoF Burden Test Summary Statistics:**
* **Source:** `Backman et al. (ref 4)`.
* **Scale & Characteristics:** Summary statistics for 292 `LoF burden tests`. 209 of these traits overlapped with the `GWAS data`. `Burden genotypes` were constructed by aggregating `rare Loss-of-Function (LoF) variants` within genes.
* **Domain:** Covers similar quantitative traits as the `GWAS` data.
* **Subset of Genetically Uncorrelated Traits:**
* For analyses requiring independence (e.g., in Figures 3b-d, 4b,c, and Extended Data Figures 1a-c, 3a-c), a subset of 27 `genetically uncorrelated traits` was used.
* **Source:** Derived from the overlapping 209 traits, ensuring pairwise `genetic correlations` were below 0.5 (from Neale Lab) and prioritizing higher `heritability` traits.
* **Domain:** Diverse quantitative traits, excluding biomarkers. Examples include mean corpuscular volume, reticulocyte percentage, eosinophil percentage, lymphocyte count, standing height, heel bone mineral density, glucose, creatinine, and alanine aminotransferase.
* **ATAC-seq Data:**
* **Source:** `ChIP-Atlas (ref 71)`.
* **Scale & Characteristics:** All `ATAC-seq files` with >5,000,000 mapped reads and >5,000 identified `peaks`. Overlapping `peaks` were merged, yielding 2,131,526 unique `peaks`. Samples were grouped into 19 `tissue/cell-type categories` (e.g., adipocyte, bone, breast, T cell, erythroid). A `peak` was considered present in a `tissue` if >5% of samples showed it.
* **Domain:** `Chromatin accessibility` data for various human tissues and cell types, used to infer `regulatory regions` and their tissue specificity.
* **Gene Expression Data:**
* **Source:** `Human Protein Atlas (ref 73)` (rna_tissue_hpa.tsv.zip, rna_single_cell_type.tsv.zip) and `Gene Expression Omnibus (GEO)` accession GSE106292 (`refs 75,76`) for human bone samples.
* **Scale & Characteristics:** Estimates of `gene expression` (transcripts per million, `TPM`) in 17 `tissue/cell types`. Genes with >10 `TPM` were considered expressed.
* **Domain:** `Gene expression levels`, used to infer `gene tissue specificity`.
* **Gene Constraint Estimates ():**
* **Source:** `Zeng et al. (ref 35)`, downloaded from `Zenodo (ref 70)`.
* **Characteristics:** Bayesian estimates of `gene constraint` (`s_het`), reflecting the `strength of purifying selection` against `LoF variants` in a gene.
* **Domain:** Measures of evolutionary constraint for human genes.
* **Expected Number of Segregating LoFs:**
* **Source:** Calculated in `gnomAD (v2, ref 79)`, downloaded from `Zenodo (ref 70)`.
* **Characteristics:** Represents a proxy for `gene length (L)` and `mutation rate (`\mu`)` for `LoF variants`.
* **Domain:** `LoF variant` counts and genomic characteristics.
These datasets are effective for validating the method's performance because they provide:
1. **Large Sample Sizes:** The `UK Biobank` data allows for well-powered `GWAS` and `burden tests`, enabling robust statistical inferences.
2. **Diverse Phenotypes:** The wide range of quantitative traits allows for generalizable conclusions about gene prioritization across different biological systems.
3. **Multimodal Genetic Information:** Combining `common variant GWAS` with `rare variant burden tests` provides a comprehensive view of genetic architecture.
4. **Rich Functional Annotations:** `ATAC-seq` and `gene expression data` allow for empirical testing of `trait specificity` at both the `variant` and `gene levels` in a tissue-specific manner.
5. **Population Genetics Parameters:** `s_het` and `LoF counts` provide critical inputs for testing the theoretical population genetics models.
## 5.2. Evaluation Metrics
The paper employs a range of statistical and biological metrics to evaluate its hypotheses and findings.
* **P-value:**
* **Conceptual Definition:** The `P-value` is a statistical measure used in `hypothesis testing` to quantify the evidence against a null hypothesis. It represents the probability of observing test results at least as extreme as the results actually observed, assuming that the `null hypothesis` is true. A small `P-value` suggests that the observed data are inconsistent with the `null hypothesis`, providing evidence for the alternative hypothesis.
* **Mathematical Formula:** While there isn't a single universal formula for the `P-value` as it depends on the specific statistical test and its underlying distribution, for `GWAS` and `burden tests` which typically produce `z-scores` or `chi-squared statistics`, the `P-value` is derived from the tail probability of these distributions. For a two-sided test using a `z-score` ():
Where:
* is the `P-value`.
* is a random variable following the standard `normal distribution`.
* is the observed `z-score` from the association test.
* **Symbol Explanation:**
* : The probability of observing a `z-score` absolute value greater than or equal to the observed absolute `z-score` under the `null hypothesis`.
* **z-score squared ():**
* **Conceptual Definition:** The `z-score` measures how many `standard deviations` an element is from the mean. In `association studies`, the `z-score` for an effect size estimate () is often . The `squared z-score (`z^2`)` is a common measure of the `strength of association` and is approximately `chi-squared distributed` with 1 `degree of freedom` under the null hypothesis. It is directly related to statistical power.
* **Mathematical Formula:**
Where:
* is the `squared z-score` (strength of association).
* is the `estimated effect size` (e.g., `LoF burden effect size`, `variant effect size`).
* is the `standard error` of the `estimated effect size`.
* **Spearman's Rank Correlation Coefficient ():**
* **Conceptual Definition:** A `non-parametric measure` of the strength and direction of association between two ranked variables. It assesses how well the relationship between two variables can be described using a monotonic function. It is particularly useful for comparing rankings, as done for `GWAS` and `burden test P-values`.
* **Mathematical Formula:**
Where:
* is `Spearman's rank correlation coefficient`.
* is the difference between the ranks of the -th observations for two variables and .
* is the number of observations.
* **Symbol Explanation:**
* : The rank of the -th value of variable .
* : The rank of the -th value of variable .
* **Pearson's Correlation Coefficient ():**
* **Conceptual Definition:** A measure of the `linear correlation` between two sets of data. It is the ratio between the `covariance` of the two variables and the product of their `standard deviations`. It indicates the strength and direction of a linear relationship.
* **Mathematical Formula:**
Where:
* is `Pearson's correlation coefficient`.
* are the individual data points for variables and .
* is the number of observations.
* **Symbol Explanation:**
* : Sum of the products of each pair of values.
* : Sum of all values.
* : Sum of all values.
* : Sum of the squared values.
* : Sum of the squared values.
* **Heritability Enrichment ( from S-LDSC):**
* **Conceptual Definition:** In `S-LDSC`, `heritability enrichment` for an `annotation` (e.g., `tissue-specific ATAC peaks`, `coding variants`) quantifies how much more `heritability` is explained by `SNPs` within that `annotation` compared to `SNPs` outside it, relative to the proportion of `SNPs` in the `annotation`. The parameter in `S-LDSC` represents the change in `heritability` per `SNP` associated with toggling an `annotation` from 0 to 1. When normalized by `total heritability (`h^2`)`, can be interpreted as the increase in the *proportion* of `heritability` explained by a single `variant` when it falls within that `annotation`, conditional on other `annotations`. It's a key metric for understanding the functional architecture of `heritability`.
* **Mathematical Formula:** The `S-LDSC` model is complex, but the parameter for `annotation` is estimated from the relationship between `LD score` and `chi-squared statistics`. The enrichment is then derived as:
Where is the per-SNP `heritability contribution` of `annotation` , and is the number of `SNPs` in `annotation` . The paper specifically reports which is directly estimated by `S-LDSC`.
* **Symbol Explanation:**
* : The `S-LDSC` parameter representing the per-`SNP` contribution to `heritability` from a given `annotation`.
* : The `total SNP heritability` of the trait.
* : The change in the proportion of `heritability` explained by a single `variant` due to being in the `annotation`.
## 5.3. Baselines
The paper primarily compares the performance and prioritization mechanisms of `GWAS` and `LoF burden tests` against **each other** and against **theoretical predictions from population genetics models**, rather than against a specific set of alternative `gene prioritization models`.
* **Standard GWAS:** The `P-value` ranking from conventional `GWAS` is treated as one of the primary methods under investigation.
* **Standard LoF Burden Tests:** Similarly, the `P-value` ranking from conventional `LoF burden tests` is the other primary method being analyzed.
* **Theoretical Predictions:** The paper's own `population genetics models` serve as a theoretical baseline against which the empirical observations from `GWAS` and `burden tests` are compared (e.g., predictions about how `z-scores` should relate to `trait importance` and `specificity`, and the `flattening effect`).
* **Alternative Prioritization Approaches:** When the paper discusses `estimating trait importance`, methods like `AMM (Allele-level Mixed Model)` are introduced as an alternative/improved approach compared to `P-value` based rankings, effectively serving as a benchmark for how `trait importance` *could* be estimated.
The focus is less on outperforming existing `gene prioritization algorithms` and more on understanding the fundamental properties and biases of the two most common `genetic association study` designs.
# 6. Results & Analysis
## 6.1. Core Results Analysis
### 6.1.1. Burden Test and GWAS Gene Ranks Differ
The study begins by systematically quantifying the discrepancy in gene prioritization between `GWAS` and `LoF burden tests`.
The following figure (Figure 1 from the original paper) illustrates that `GWAS` and `LoF burden tests` prioritize different `loci`:

*该图像是图表,展示GWAS和LoF负担测试的不同优先级。a, b部分为示意图展示遗传变异对表型的影响;c, 每个单元格表示根据LoF负担测试的重要性排名的基因。d部分则比较了GWAS和LoF测试的P值,e和f显示了不同基因组区域的GWAS结果。*
* **Overlap, but Discordant Ranking:** Across 151 traits with at least one `burden hit` and one `GWAS hit`, 74.6% (1,382 out of 1,852) of `genome-wide significant burden test hits` fall within a `GWAS locus` (Fig. 1c). This indicates a substantial overlap in terms of physical location. However, the *ranking* of these genes/loci is very different. Only 26% (480 out of 1,852) of genes with `burden support` fall in the `top GWAS loci` (Supplementary Fig. 1), where `top GWAS loci` are defined as a number of `GWAS loci` matching the number of significant `burden hits`.
* **Example: Height Trait (Fig. 1d):** For height, with 382 `genome-wide significant GWAS loci`, the rankings show some concordance (`Spearman's`\rho = 0.46
), but there's little overlap in the top hits. Many significant GWAS loci do not contain a single significant burden gene.
- Illustrative Loci (Fig. 1e,f):
- NPR2 locus (Fig. 1e):
NPR2is the second most significant gene inLoF burden testsfor height but is in the 243rd most significantGWAS locus. Mutations inNPR2are known to cause short stature, making it a biologically plausible hit for both. - HHIP locus (Fig. 1f):
HHIPis in the third most significantGWAS locusfor height, withP-valuesas small as .HHIPis biologically relevant to height through its role inosteogenesisand interaction withHedgehog proteins. However, there is essentially no burden signal forHHIPor other genes in thislocus.
- NPR2 locus (Fig. 1e):
- Interpretation: These examples vividly demonstrate that while both methods identify biologically relevant genes, their prioritization criteria lead to fundamentally different top-ranked discoveries. The results are robust to various analytical choices (Supplementary Appendix A, Figs. 4-31).
6.1.2. Burden Tests Favour Trait-Specific Genes
The theoretical model predicts that LoF burden tests prioritize genes by their trait specificity (\Psi_G) and gene length (\mu L), not primarily by trait importance.
The following figure (Figure 3 from the original paper) illustrates that burden tests prioritize trait-specific genes, not large-effect genes:
该图像是图4,展示了GWAS如何优先考虑特定的变异。图中包括一幅示意图,说明变异的特异性是由基因的特异性和变异对基因的相对特异性两个组成部分决定的;同时显示了编码变异和非编码变异在不同细胞上下文中的作用。图b和图c分别展现了编码变异和非编码变异在特定组织中的遗传力富集结果。
- Inverse Relationship between and LoF Frequency (Fig. 3b): Genes with higher
estimateds_{\mathrm{het}}
(stronger purifying selection, implying larger overall fitness effects) have lower `aggregate LoF frequencies`. This negative relationship is strong and significant (`Spearman's`\rho = -0.547, P < 10^{-15}
). This confirms that highly constrained genes have fewer LoF variants in the population.
- Proportional to Total Trait Importance (Fig. 3c): The average
trait importanceacross traits () shows a significant positive relationship withs_het(Pearson'sr = 0.078, P < 10^{-15}
). This supports the model's assumption that `s_het` captures the total effect of a gene across all fitness-relevant traits.
* **Decoupling of from Trait Importance (Fig. 3d):** For genes with sufficiently large effects (high `trait importance`), the `strength of association (`z^2`)` in `LoF burden tests` is largely decoupled from their `trait importance`. The `Pearson's r` between mean importance and mean for the 25 highest `s_het` bins is low and not significant (
r = 0.188, P = 0.368
). This is due to the `flattening effect`: highly constrained genes have very `rare LoFs`, leading to larger standard errors and thus weaker statistical signals despite their true importance.
* **Prioritization by Expression Specificity (Fig. 3e):** Using `gene expression specificity` as a proxy for `trait specificity`, `LoF burden tests` show significantly stronger signals (lower `P-values`) in genes with higher expression specificity to the `trait-relevant tissue`. This holds true regardless of `s_het` (Supplementary Fig. 34) and using different `burden masks` (Supplementary Figs. 33, 35).
* **Interpretation:** These results strongly confirm that `LoF burden tests` prioritize genes based on their `trait specificity (`\Psi_G`)` and `gene length`, effectively selecting genes whose `LoFs` have relatively specific effects on the studied trait, rather than genes with the largest overall impact (`trait importance`).
### 6.1.3. GWAS Prioritize Trait-Specific Variants
The theoretical model predicts that `GWAS` prioritize `trait-specific variants (`\Psi_V`)`. This specificity can arise from `variants` affecting `trait-specific genes` or having `context-specific effects` on `pleiotropic genes`.
* **GWAS Prioritization of Coding Variants by Gene Specificity (Fig. 4b):** Analyzing `coding variants` (which act through specific genes), `heritability enrichment (`\tau/h^2`)` increases significantly in genes with higher `expression specificity` to the `trait-relevant tissue`. This indicates that `GWAS` prioritize `variants` acting on `trait-specific genes`.
* **GWAS Prioritization of Non-coding Variants by Context Specificity (Fig. 4c):** For `non-coding variants` within `ATAC peaks`, `heritability enrichment` shows a significant trend of increasing contribution in more `tissue-specific ATAC peaks`. This holds even when conditioning on `s_het` (Supplementary Figs. 41, 42).
* **Interpretation:** `GWAS` can prioritize `variants` that are `trait-specific` through two mechanisms: either they are in `trait-specific genes` (captured by `coding variants` and `gene expression specificity`) or they have `context-specific regulatory effects` in specific tissues (captured by `non-coding variants` in `tissue-specific ATAC peaks`). This means `GWAS` can highlight `pleiotropic genes` if their `non-coding regulatory variants` exhibit `context-specific effects`. This contrasts with `LoF burden tests` which primarily prioritize `trait-specific genes`.
### 6.1.4. LoF Burden Tests Prioritize Long Genes
The theoretical model indicated that `LoF burden tests` prioritize genes with more potential `LoF` positions (`gene length`, represented by ), as this increases the `aggregate frequency of LoFs` and thus power.
The following figure (Extended Data Fig. 1 from the original paper) shows how `coding sequence length` drives prioritization in `LoF burden tests`:

*该图像是一个展示相同突变轨迹的频率变化图。图中横轴表示自突变出现以来的世代数,纵轴表示频率。可以看到,随着世代的增加,频率呈现出不同的增长趋势,突出显示了基因突变在不同世代中的分布情况。*
* **No Correlation with Trait Importance (Extended Data Fig. 1a):** There is no substantial positive correlation between `gene length` (proxied by `expected number of unique LoFs`) and `unbiased estimates of squared trait importance (`\gamma^2`)` (). This means longer genes do not inherently have larger trait effects.
* **Smaller Standard Errors for Longer Genes (Extended Data Fig. 1b):** Longer genes have considerably smaller `LoF burden test standard errors` (`Spearman's`\rho = -0.255, P < 10^{-15}
). This is expected because more LoF sites lead to more observed LoF variants, reducing the statistical uncertainty.
- Significant Effect on Burden Signal (Extended Data Fig. 1c): Consequently,
gene lengthhas a significant positive effect on theLoF burden testz^2). - Interpretation:
LoF burden testssystematically favor longer genes, even if those genes are not necessarily moretrait importantortrait specificin a biological sense. This is a technical artifact of the aggregation strategy, meaninggene lengthis atrait-irrelevant factordriving rankings. This also makes longer genes appear morepleiotropicinburden testssimply because they are more often detected across traits due to higher power.
6.1.5. Random Genetic Drift Affects GWAS
The paper demonstrates that random genetic drift significantly influences GWAS rankings, introducing a layer of "luck" beyond biological trait importance or specificity.
The following figure (Extended Data Fig. 2 from the original paper) shows how GWAS variant rankings are driven largely by genetic drift:
该图像是一个散点图,显示了模拟SNPs的标准化平方效应与相对实现的遗传力之间的关系。颜色条表示最小等位基因频率(MAF),其范围从0.1到0.4,反映不同SNP的分布特征。
- MAF Stochasticity (Extended Data Fig. 2a):
Genetic driftcausesvariant frequencies(even for identicalmutationsunder the sameselection pressure) to spread widely around their expected values over time. - GWAS Ranking by Frequency, not Importance (Extended Data Fig. 2b): In simulated
GWAS, forsufficiently trait-important variants, the ranking byrealized heritability (2 \alpha_1^2 p(1-p))is largely random with respect to their truetrait importance. This randomness is driven by differences inminor allele frequency (MAF)due togenetic drift.LoF burden testslargely ameliorate this by aggregatingvariants, which averages outMAFstochasticity. - Apparent Pleiotropy (Extended Data Fig. 3):
- Real Data (Extended Data Fig. 3b): Stronger
GWAS hits(lowerP-value rank) tend to have highermean MAF. - Simulations (Extended Data Fig. 3d):
Variantsthat are strongerGWAS hits(lowerP-value rank) also tend to behitsfor a greater number of traits. This is because avariantthat, by chance, drifts to a higherMAFwill have increased power to be detected across all traits it affects.
- Real Data (Extended Data Fig. 3b): Stronger
- Interpretation:
Genetic driftintroduces substantial noise intoGWASrankings. StrongGWAS hitsare not necessarily the mosttrait importantbut oftenvariantsthat have, by chance, drifted to a higherMAF. This statistical artifact also explains whyGWAS hitsoften appear surprisinglypleiotropic:high-frequency variantsare more easily detected for any trait they influence, making them seem to affect more traits than their underlying biology might suggest.
6.1.6. Estimating Trait Importance
Given that neither GWAS nor LoF burden tests directly rank genes by trait importance based on P-values, the paper investigated if aggregating signals could provide better estimates.
The following figure (Figure 5 from the original paper) illustrates how trait importance is estimated by combining different variant types:
该图像是图表,展示了长基因与性状的效应关系。图A显示长基因对性状没有更大影响,图B则表明长基因的标准误差较小,图C显示LoF负担测试优先考虑长基因,均与平均预期的LoF数量相关。
- Flattening in LoF Burden Test Heritability (Fig. 5b):
LoF burden test heritability enrichment(based on ) does not correlate well withs_het(a proxy fortrait importance), especially for highly constrained genes ( across the 25 highests_hetbins). This again shows theflattening effectwhere highly important genes are hard to detect byburden tests. - AMM Better Tracks Trait Importance (Fig. 5d):
AMM (Allele-level Mixed Model)heritability enrichment(which aggregatesGWAS signalsacrossvariantsfor a gene) shows a strong positive correlation withs_het( across the 25 highests_hetbins). - Interpretation: While individual
variantsorLoF aggregatesexperienceflatteningfor highly important genes, aggregatingGWAS signalsacross multiple variants (some with smaller individual effects but collectively contributing) for a given gene can overcome this. This implies that methods likeAMMare more effective at prioritizingtrait-important genesby leveraging the collective signal, even if individualvariantsare subject toflattening(Fig. 5c). This approach is robust across different aggregation methods (Supplementary Figs. 42, 49, 50).
6.2. Data Presentation (Tables)
The paper does not contain any tables within its main article body that are presented in a format (like markdown or HTML tables) suitable for direct transcription. All quantitative results are presented within the main text or integrated into figures. For instance, statistical values like P-values, Pearson's r, and are reported directly in the text or figure captions alongside their corresponding figures (e.g., Fig. 1c states "74.6% (1,382 out of 1,852) of genome-wide significant burden test hits fall within a GWAS locus").
6.3. Ablation Studies / Parameter Analysis
The paper primarily validates its theoretical models and empirical findings through robust sensitivity analyses and simulations rather than traditional ablation studies on a single proposed model.
- Robustness of GWAS and Burden Test Comparisons:
- Definition of GWAS Loci: The study tests different approaches for defining
GWAS loci(e.g., usingLD-clumpingvs.COJO (ref 61)forconditionally independent SNPs). The main results regardingGWASandburden testdiscrepancies remain robust (Supplementary Figs. 8-10). - MAF Thresholds for GWAS: Comparisons are made by restricting
GWAStoSNPsbelow variousMAFthresholds (0.01, 0.1, 0.5). The discrepancy persists even whenGWASis restricted to lower frequency variants (Supplementary Figs. 26-28), suggesting the difference is not merely due to considering differentallele frequencyspectrums. - Ranking by Effect Size vs. P-value: The analysis confirms that ranking
lociby largesteffect sizeinstead ofP-valuedoes not fundamentally change the qualitative differences in prioritization (Supplementary Figs. 29-31). - Burden Test Masks: The findings for
burden tests(e.g., relationship betweenspecificityand power) are consistent when includinglikely damaging missense variants(mask M3) in addition toLoF variants(mask M1) (Supplementary Figs. 33, 35).
- Definition of GWAS Loci: The study tests different approaches for defining
- Simulations of Genetic Drift Effects:
- The paper conducts extensive
simulationsto illustrate howgenetic driftaffectsGWAS(Extended Data Fig. 2, 3). - Parameter Sensitivity: The sensitivity of these
simulated pleiotropyresults to various simulation parameters ( (effective population size), (P-value threshold), (trait specificity distribution), and (overall effect magnitude)) is explored in Supplementary Figs. 45-48. These analyses demonstrate that the qualitative conclusions aboutgenetic driftmakingGWAS hitsappear morepleiotropicare not sensitive to the specific choice of these parameters.
- The paper conducts extensive
- Controlling for Covariates:
- When regressing on
expression specificity,unbiased estimates of trait importancewere included as a covariate to ensure that the observed effect of specificity was not driven by inherent differences in gene importance across specificity bins (Supplementary Fig. 36). - In
S-LDSCanalyses forATAC peaks, thestrength of ATAC peakswas controlled for to isolate the effect oftissue specificity. Similarly, forcoding variants,gene expression levelbins were included as covariates. S-LDSCanalyses forATAC peakswere also conditioned ons_hetto show that the effect oftissue specificityis independent of overallgene constraint(Supplementary Figs. 41, 42).
- When regressing on
- Comparison of Trait Importance Estimation Methods:
-
The paper compares
LoF burden heritability enrichment(Fig. 5b) withAMM-estimated heritability enrichment(Fig. 5d) againsts_het(as a proxy fortrait importance). This comparison acts as a form ofablationorcomparison studyto show thatAMM's aggregation strategyis more effective at trackingtrait importancethan standardP-valuebasedburden testsignals. Additional analyses usingGWAS hit probabilityand correlation of withnumber of GWAS hitsfurther support these findings (Supplementary Figs. 49, 50).These extensive analyses demonstrate the robustness of the paper's core findings and provide strong evidence for the proposed mechanisms driving gene prioritization in
GWASandLoF burden tests.
-
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper rigorously demonstrates that standard Genome-Wide Association Studies (GWAS) and rare variant Loss-of-Function (LoF) burden tests, while both crucial for identifying trait-relevant genes, systematically prioritize different sets of genes due to fundamental differences in their underlying mechanisms and what aspects of trait biology they are sensitive to.
The core findings are:
-
Divergent Prioritization:
LoF burden testsprimarily prioritize long, trait-specific genes (genes whoseLoFshave relatively specific effects on the studied trait), whileGWASprioritize genes near trait-specific variants. -
Role of Non-coding Variants: A key distinction is that
GWAScan capturetrait-relevant, pleiotropic genesifnon-coding variantsacting on these genes havecontext-specific effects(e.g., tissue-specific regulation).LoF burden tests, which focus oncoding variants, generally cannot achieve this. -
Impact of Trait-Irrelevant Factors: Both methods are influenced by factors unrelated to a gene's true
trait importanceorspecificity:LoF burden testsare biased towards longer genes due to increased statistical power from more potentialLoFsites.GWASrankings are significantly affected by random genetic drift, which causesvariantsto drift to unexpectedly highminor allele frequencies (MAFs). These high-MAF variantsthen appear as strongGWAS hitsand seem morepleiotropicthan they truly are.
-
Estimating Trait Importance: Standard
P-valuerankings in neither method effectively capturetrait importancedue to theflattening effect(where strongpurifying selectionon highly important genes makes theirvariantsrare and hard to detect). However,non-standard GWAS approachesthat aggregate signals across multiplevariants(likeAMM) can more accurately estimatetrait importance.In essence,
LoF burden testsandGWASare complementary tools, each revealing distinct facets oftrait biology. Understanding their specific biases and strengths is critical for accurate interpretation and application inhuman genetics.
7.2. Limitations & Future Work
The authors themselves highlight several limitations and suggest future research directions:
- Improving Burden Tests: While larger sample sizes will reduce noise, the authors anticipate that
Bayesian frameworksincorporatingpriorsbased ongene features(e.g.,ref 3, 5) could be particularly effective at improving the accuracy and interpretation ofburden tests, potentially mitigating thegene lengthbias. - Enhancing GWAS for Trait Importance: The paper suggests that
non-standard GWAS approachesthat aggregate signals acrossvariants(ref 47, 48, 56, 57) are promising for prioritizing genes bytrait importance, and furtherdevelopment and refinementof such methods are needed. - Context-Specific Targeting of Pleiotropic Genes: The paper notes that while
trait-specific genesmight be idealdrug targetsdue to reduced side effects, highlypleiotropic genescould still be impactful if they can be targeted in acontext-specific way. This points to the ongoing challenge and research area of understandingcontext-specific gene functionanddrugability. - Differences in Experimental Systems: The paper acknowledges that the effects of
pleiotropic genesobserved inknockout experimental systemsmight differ fundamentally from the phenotypic consequences ofregulatory variantsidentified inGWAS, suggesting a need for integrating insights across different study designs.
7.3. Personal Insights & Critique
This paper offers profoundly valuable insights for anyone interpreting genetic association studies.
- Complementary Tools, Not Competing: The most striking takeaway is the clear articulation that
GWASandburden testsare not competing but rather complementary. They are designed to detect different biological signals and are affected by distinct non-biological factors. This fundamentally shifts the perspective from asking "which method is better?" to "what specific biological question can each method best answer?". - Beyond P-values: The rigorous demonstration of
P-valuedecoupling from truetrait importancedue toselectionandgenetic driftis a crucial message. It underscores that blindly ranking byP-valuecan be misleading for identifying trulyimportant genes. This highlights the need for more sophisticated methods (likeAMM) that aggregate information and account forevolutionary pressures. - Implications for Drug Discovery: The distinction between
trait importanceandtrait specificityhas direct practical implications fordrug target discovery.Trait-specific genes(prioritized byburden tests) might indeed make betterdrug targetsdue to fewer off-target effects, aligning with observations thatLoF burden evidenceis more predictive ofdrug trial success. However, ifcontext-specific targetingis possible, moretrait-important(butpleiotropic) genes (potentially identified byGWASthroughcontext-specific variants) could yield greater clinical impact. This provides a clear framework for evaluating targets. - Reframing Pleiotropy: The explanation that
GWAS hitsappearpleiotropicpartly as a statistical artifact ofgenetic driftpushingvariantsto higherMAFsis a fascinating and important point. This challenges the naive interpretation ofGWAS pleiotropyand suggests that some observed shared genetic influences across traits might be due to statistical power rather than deep biological interconnectedness at thevariant level. - Value of Non-coding Genome: The paper reinforces the critical role of the
non-coding genomeincomplex traits.Context-specific non-coding variantsallowGWASto pinpointtrait-specific effectsof broadlypleiotropic genes, a capabilityburden testslack. This emphasizes the continued need for sophisticated functionalgenomic annotationto interpretGWAS signals. - Areas for Improvement/Further Research:
-
Unified Framework for Prioritization: While the paper provides criteria, a practical, unified framework or score that intelligently combines
trait importanceandtrait specificity(and perhaps accounts forgene lengthanddriftbiases) across bothGWASandburden testswould be a valuable next step. -
Quantifying "Context Specificity": The proxies used for
context specificity(ATAC-seq peak tissue specificity,gene expression specificity) are good, but a more direct and granular measure ofvariant-level context specificity(e.g., across cell types, developmental stages, environmental conditions) could refine these analyses further. -
Beyond Quantitative Traits: The study focuses on
quantitative traits. Extending this framework tobinary disease traitscould reveal additional nuances, especially concerning the effects ofselectionanddriftincase-control studies. -
Dynamic Nature of Selection: The models assume
stabilizing selection. While a common assumption, investigating the implications of other forms ofselectionortime-varying selectionon gene prioritization could add complexity and realism.Overall, this paper is a landmark study that significantly advances our theoretical and practical understanding of how
genetic association studiesrevealtrait biology. It provides a robust, evidence-based roadmap for interpreting current findings and designing future research.
-
Similar papers
Recommended via semantic vector search.