Convergent genome evolution shaped the emergence of terrestrial animals
TL;DR Summary
This study analyzes 154 genomes from 21 animal phyla to uncover the convergence and contingency in terrestrialization events, revealing unique gene patterns yet recurrent adaptive functions, crucial for life on land, while establishing a timeline for these transitions.
Abstract
The challenges associated with the transition of life from water to land are profound; yet they have been met in many distinct animal lineages. This constitutes a series of independent evolutionary experiments from which we can decipher the role of contingency versus convergence in the adaptation of animal genomes. Here we compare 154 genomes from 21 animal phyla and their outgroups to reconstruct the protein-coding content of the ancestral genomes linked to 11 animal terrestrialization events, and to produce a timescale of terrestrialization. We uncover distinct patterns of gene gain and loss underlying each transition to land, but similar biological functions emerged recurrently, pointing to specific adaptations as key to life on land.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
Convergent genome evolution shaped the emergence of terrestrial animals
1.2. Authors
Jialin Wei, Davide Pisani, Philip C. J. Donoghue, Marta Álvarez-Presas, Jordi Paps
1.3. Journal/Conference
Nature. The journal Nature is one of the most prestigious and highly influential scientific journals globally, publishing original research across all fields of science and technology. Its reputation ensures rigorous peer review and widespread recognition within the scientific community.
1.4. Publication Year
2025
1.5. Abstract
The transition of life from water to land presents significant evolutionary challenges, which have been overcome independently by multiple animal lineages. These events serve as natural experiments to understand the roles of contingency versus convergence in genomic adaptation. This study compares 154 genomes from 21 animal phyla and their outgroups to reconstruct the protein-coding content of ancestral genomes associated with 11 independent animal terrestrialization events, and to establish a timeline for these transitions. The research uncovers distinct patterns of gene gain and loss for each transition, yet recurrent emergence of similar biological functions suggests specific adaptations are crucial for life on land. Semi-terrestrial species are found to have evolved convergent functional patterns, in contrast to fully terrestrial lineages which followed more divergent paths. The timeline proposed identifies three major temporal windows for animal land colonization over the last 487 million years, each linked to specific ecological contexts. The study concludes that while each lineage exhibits unique adaptations, strong evidence of convergent genome evolution across the animal kingdom implies a largely predictable adaptive response to terrestrial life, connecting genes to ecosystems.
1.6. Original Source Link
/files/papers/6919aa68110b75dcc59ae248/paper.pdf Publication Status: Officially published (Published online: 12 November 2025).
2. Executive Summary
2.1. Background & Motivation
The transition of life from water to land is one of the most profound evolutionary events, fundamentally shaping Earth's ecosystems and biodiversity. This terrestrialization has occurred multiple times independently across diverse animal lineages (e.g., arthropods, vertebrates, molluscs, nematodes). Each such event represents a unique "evolutionary experiment" in overcoming universal challenges like desiccation, temperature fluctuations, new modes of locomotion, respiration, and reproduction outside of water.
The core problem the paper aims to solve is to decipher the genomic underpinnings of these independent transitions. Prior research has noted widespread phenotypic convergence (e.g., water-retentive skin, adapted immune systems, changes in skeletal design) across terrestrial lineages, suggesting predictable responses to similar environmental pressures. At the genomic level, studies have linked gene innovation, duplication, and loss to major evolutionary transitions and identified specific genes (e.g., aquaporin-coding genes) or genomic changes associated with terrestrialization in individual lineages. However, compared to land plants, the comprehensive genomic basis of terrestrialization across multiple animal lineages remains largely uncharacterized.
The specific challenge is to move beyond single-lineage studies and conduct a broad, comparative genomic analysis across diverse animal phyla to determine whether terrestrialization primarily leads to lineage-specific, contingent genomic adaptations (unique solutions due to chance or historical context) or convergent (predictable, parallel solutions due to similar environmental pressures) changes. The paper's entry point is to leverage the independent nature of these terrestrialization events as a natural laboratory to explore this fundamental question of evolutionary biology: the role of contingency versus convergence in shaping animal genomes during adaptation to land.
2.2. Main Contributions / Findings
The paper makes several primary contributions:
- Development of the
InterEvoFramework: The study introduces and applies anintersection framework for convergent evolution (InterEvo)(Extended Data Fig. 1), a comparative genomics pipeline used to analyze 154 genomes from 21 animal phyla and their outgroups. This framework systematically identifies the intersection of biological functions between different sets of genes that were independently gained or reduced across 11 distinct animal terrestrialization events. - Identification of Convergent Functional Adaptations: Despite distinct patterns of gene gain and loss in each lineage, similar biological functions emerged recurrently across independent transitions. These convergent functions, driven by gene gains (novel, novel core, and expanded HGs), primarily involve
osmoregulation(water transport),metabolism(especially fatty acids, linked to diet),reproduction,detoxification,sensory reception, andreaction to stimuli. Gene reductions also show convergent patterns, notably the loss ofDbl-homology domainandpleckstrin-homology domaingene families (related toRho GTPasesand regeneration) and contraction ofchloride channel proteingenes (osmoregulation). - Differentiation Between Semi- and Fully Terrestrial Lineages: The study reveals that semi-terrestrial species (e.g., rotifers, nematodes) evolved convergent functional patterns, characterized by an "expansive and versatile toolkit" for environmental flexibility (e.g., cuticle remodelling, visual development, stress response). In contrast, fully terrestrial lineages (e.g., land gastropods, arachnids, hexapods, tetrapods) followed more diverse genomic paths, displaying a "small and streamlined set" centered on neuronal development and ion membrane homeostasis, with limited functional convergence among themselves outside of arthropods.
- Establishment of a Temporal Framework for Terrestrialization: The paper reconstructs a molecular evolutionary timescale (Fig. 1) that supports three major temporal windows of animal land colonization during the last 487 million years:
- First Window (Middle Cambrian - Middle Ordovician): Associated with early land plants, including nematodes, myriapods, hexapods, and arachnids.
- Second Window (Late Devonian - Early Carboniferous): Linked to episodic flooding and deepening soils, involving clitellate annelids and tetrapods.
- Third Window (Cretaceous): Characterized by greenhouse landscapes, leading to the terrestrialization of bdelloid rotifers and land gastropods.
- Interplay of Contingency and Predictability: The study concludes that adaptation to life on land involves both predictable (convergent) molecular responses to common challenges and lineage-specific (contingent) adaptations shaped by unique evolutionary histories, genomic backgrounds, and ecological contexts. This highlights the repeatability and uniqueness of evolutionary innovation.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
- Comparative Genomics: A field of biological research in which the genomic features of different organisms are compared. It reveals similarities and differences in DNA, RNA, and protein sequences, as well as gene order, regulation, and other genomic structural features. This comparison helps understand evolutionary relationships, identify functionally important genes, and uncover adaptations.
- Homology Groups (HGs): Groups of genes (or proteins) that share a common evolutionary ancestor. These can include
orthologs(genes in different species that evolved from a common ancestral gene by speciation) andparalogs(genes within the same species that arose from a common ancestral gene by gene duplication). The paper usesOrthoFinderto cluster protein sequences intoHGs. - Gene Ontology (GO): A collaborative bioinformatics initiative that provides a controlled vocabulary (ontology) of terms for describing gene products in any organism. It covers three domains:
molecular function(what a gene product does at the molecular level),cellular component(where a gene product is active), andbiological process(the larger processes or pathways to which a gene product contributes).GO termsallow for standardized functional annotation and comparison across species. - Pfam Protein Domains: A large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
Pfam domainsare common, conserved parts of proteins that can function independently or in combination with other domains. IdentifyingPfam domainsin novel proteins can provide clues about their function. - Phylogenetic Tree (Phylogeny): A branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities—their
phylogeny—based upon similarities and differences in their physical or genetic characteristics. It helps to visualize common ancestry and divergence over evolutionary time. - Gene Gain and Loss: Fundamental evolutionary processes where new genes emerge (gene gain, often through gene duplication or
de novogene birth) or existing genes are eliminated from a genome (gene loss). These processes are crucial drivers of genomic evolution and adaptation. - Gene Expansion and Contraction: Refers to changes in the number of copies of genes within a
homology groupin a particular lineage.Gene expansionmeans an increase in copy number, often suggesting a gene family has become more important or adapted to new functions.Gene contractionmeans a decrease in copy number. - Principal Component Analysis (PCA): A statistical procedure that transforms a set of possibly correlated variables into a set of linearly uncorrelated variables called
principal components. It is used fordimensionality reductionand visualizing high-dimensional data, revealing underlying patterns and groupings. - Principal Coordinates Analysis (PCoA): Similar to PCA,
PCoAis a method used to visualize the similarity or dissimilarity of data points. Unlike PCA, which operates on a raw data matrix,PCoAoperates on adissimilarity matrix(e.g.,Jaccard distance), allowing it to handle non-Euclidean distances. It finds a low-dimensional ordination that best represents the distances between objects. - Permutational Multivariate Analysis of Variance (PERMANOVA): A non-parametric multivariate statistical test used to compare groups of objects based on a distance matrix. It tests whether there are significant differences between the centroids of two or more groups, and whether the groups differ in their dispersion, without assuming multivariate normality.
- Molecular Clock: A technique in molecular evolution that uses the rate of accumulation of molecular changes (e.g., mutations in DNA or protein sequences) to estimate the time when two species diverged from a common ancestor. It assumes a relatively constant rate of molecular evolution over time, allowing scientists to date evolutionary events.
3.2. Previous Works
The paper contextualizes its research by referencing several key prior studies:
- Phenotypic Adaptations:
- Studies noting widespread convergent
phenotypic adaptationsin terrestrial animals, such as water-retentive skin or cuticle, adapted immune systems, changes in skeletal design, elevated metabolic rates, developmental adaptations (e.g., encapsulated larvae), and vision adaptation in aerial environments. These observations form the basis for hypothesizing genomic convergence.
- Studies noting widespread convergent
- Genomic Changes in Metazoan Evolution:
- Research demonstrating that
genomic changes, includinggene innovation(e.g., Paps & Holland, 2018),duplication(e.g., Fernandez & Gabaldon, 2020), andloss(e.g., Guijarro-Clarke et al., 2020), were crucial to majormetazoan evolutionary transitions. This sets the stage for investigating these genomic dynamics during terrestrialization.
- Research demonstrating that
- Specific Genes and Lineage-Specific Terrestrialization:
- Work linking specific genes, like
aquaporin-coding genes, to terrestrialization in several clades (e.g., Martinez-Redondo et al., 2023). - Studies associating genomic changes with terrestrialization in individual lineages, such as molluscs (e.g., Aristide & Fernández, 2023), arthropods (e.g., Thomas et al., 2020), beetles (e.g., Balart-Garcia et al., 2023), and annelids (e.g., Vargas-Chavez et al., 2025). These studies often highlighted roles for genes related to metabolism, stress response, osmoregulation, and immunity.
- Work linking specific genes, like
- Land Plants Terrestrialization:
- Research on the genomic basis of
terrestrialization in land plants(e.g., Bowles et al., 2020), which identified two bursts of genomic novelty linked to this transition. This serves as a comparative benchmark, noting that the genomic basis in animals islargely uncharacterizedin comparison.
- Research on the genomic basis of
3.3. Technological Evolution
The field of evolutionary genomics, particularly concerning large-scale comparative analyses, has seen significant advancements driven by:
-
Next-Generation Sequencing (NGS): The exponential decrease in sequencing costs and increase in throughput has made it possible to sequence hundreds of genomes across diverse taxa, providing the raw data for comparative studies.
-
Bioinformatics Tools for Orthology Inference: Algorithms like
OrthoFinderhave been developed to accurately identifyhomology groupsandorthologsacross many species, overcoming challenges of gene duplication and loss. -
Ancestral State Reconstruction: Methods to infer the gene content or other characteristics of ancestral genomes have become more sophisticated, allowing researchers to track gene evolution over deep time.
-
Phylogenomic Methods: Advancements in constructing highly resolved
phylogenetic treesusing genomic-scale datasets (e.g.,IQ-TREEwith complex models) provide a robust evolutionary framework for interpreting genomic changes. -
Gene Family Evolution Models: Tools like
CAFE(Computational Analysis of Gene Family Evolution) allow for statistical inference of gene familyexpansionsandcontractionsalong aphylogeny, accounting for varying evolutionary rates. -
Functional Annotation Resources: Comprehensive databases like
Gene Ontology (GO),Pfam,eggNOG, andPANTHERprovide standardized classifications of gene functions, enabling large-scale functional enrichment analyses.This paper's work fits into the current state of technological evolution by integrating these advanced tools and large genomic datasets to perform a comprehensive,
genome-wide comparative analysisacross a broad range of animalphyla, a scale previously challenging to achieve for such a specific evolutionary transition.
3.4. Differentiation Analysis
Compared to previous works, the core differences and innovations of this paper's approach are:
-
Scope and Breadth of Comparison: While prior studies focused on
genomic changesin specific lineages undergoingterrestrialization, this paper performs agenome-wide comparative analysisacross11 independent animal terrestrialization eventsspanning21 animal phylausing154 genomes. This unprecedented breadth allows for a systematic and robust investigation into widespread patterns ofconvergenceversuscontingency. -
InterEvoFramework: The paper introduces a novelIntersection Framework for Convergent Evolution (InterEvo). This approach specifically looks for theintersection of biological functionsacross independently evolved gene sets (gained or reduced), rather than just identifying changes within individual lineages. This design directly addresses the question of functional convergence. -
Combined Analysis of Gene Turnover: The study comprehensively analyzes
gene gains(novel, novel core, expanded) andgene reductions(contracted, lost) simultaneously across allterrestrialization events. This integrated view provides a more complete picture ofgenome plasticitythan focusing solely on one type of genomic change. -
Distinction Between Semi- and Fully Terrestrial: A novel aspect is the explicit comparison of
genomic adaptationsbetweensemi-terrestrialandfully terrestriallineages, revealing distinct patterns of functional convergence and divergence between these two categories. -
Molecular Timescale Integration: The study integrates a
molecular clock analysisto establish atemporal frameworkforanimal terrestrialization, linkinggenomic changesto specificpaleoecological contexts(e.g., early land plants, seasonal wetlands, Cretaceous greenhouse).In essence, the paper moves beyond identifying "what changed" in individual lineages to systematically address "what converged functionally" across many independent events, and "when" these events occurred within Earth's history, offering a deeper understanding of the predictability of evolutionary adaptation.
4. Methodology
4.1. Principles
The core idea of the method used in this paper, termed the intersection framework for convergent evolution (InterEvo), is to systematically identify patterns of convergent evolution at the genomic level across multiple independent terrestrialization events in animals. The theoretical basis is that if similar environmental pressures drive adaptation, then independent lineages facing these pressures might evolve similar biological functions, even if the specific genes or genomic changes (gains, losses, expansions, contractions) are distinct. The intuition is that by comparing the protein-coding content of ancestral genomes before and after terrestrialization in many lineages, and then analyzing the functional annotations of the genes that changed, one can uncover recurrent adaptive strategies critical for life on land. The InterEvo workflow, depicted in Extended Data Fig. 1, integrates comparative genomics, homology inference, and functional annotation to detect these convergent evolution patterns.
The following figure (Extended Data Fig. 1 from the original paper) shows the InterEvo (Intersection Framework for Convergent Evolution) workflow:
该图像是示意图,展示了动物基因组的同源群体分类,包括新生同源群体、扩展同源群体和丢失/收缩同源群体。图中展示了基因本体术语与功能的关系,旨在重建过渡节点的祖先基因组。
4.2. Core Methodology In-depth (Layer by Layer)
The methodology involves several sequential steps, from genome acquisition to functional analysis and timescale reconstruction.
4.2.1. Taxon Sampling and Homology Groups (HGs) Inference
- Genome Acquisition: The researchers compiled
154 genome samplingsfrom public databases includingUniProt,NCBI, andEnsembl. This dataset comprises151 metazoan(animal) genomes and3 unicellular holozoangenomes, serving as outgroups. These genomes collectively contained3,934,362 predicted proteins. The sampling focused on species that flank11 key terrestrialization eventsidentified across21 animal phyla(Fig. 1, Extended Data Fig. 2). - Protein Processing:
- A
side scriptfromOrthoFinder(primary_transcript.py) andCd-hit v.4.8.1were used to extractcanonical proteins(the longest representative transcript for each gene) from the raw data. Cd-hituses a similarity threshold of1.00to cluster identical protein sequences, ensuring that only unique canonical proteins are used.
- A
- Genome Quality Assessment: The quality of the canonical proteins from all 154 genomes was assessed using
BUSCO v.5.4.7(Benchmarking Universal Single-Copy Orthologs). The preference was for genomes withcompleteness greater than 85%andfragmentation less than 15%. However, some genomes not perfectly meeting these criteria were included based on their habitat and phylogenetic importance. - Homology Group (HG) Inference:
HGswere inferred usingOrthoFinder v.2.5.5.OrthoFinderis a widely used bioinformatics tool that identifiesorthologsandparalogs(i.e.,homology groups) across multiple species. It relies onMAFFT v.7.505for multiple sequence alignment andDIAMOND v.2.1.8for sequence similarity searches. The outputHGsare groups of proteins that have distinctly diverged from other groups, comprising orthologues and/or paralogues.
4.2.2. Guide Tree Construction
-
Conserved Gene Identification: The researchers started by identifying
conserved single-copy geneswithin theMetazoa_odb10dataset fromBUSCO v.5.4.7. For this,Homo sapienswas used as a reference, identifying943such genes. -
Alignment and Trimming: The identified conserved protein sequences were aligned using
MAFFT v.7.505and thentrimmedusingtrimAl v.1.4.rev.15to remove poorly aligned regions that could introduce noise into phylogenetic inference. -
Supermatrix Concatenation: The trimmed alignments were concatenated into a single
supermatrixusingFASconCAT-G v.1.05.1. Asupermatrixcombines multiple gene alignments into a single, larger alignment, which is then used to infer a species tree. -
Phylogenetic Tree Building: The concatenated supermatrix was used to build the phylogeny (guide tree) with
IQ-TREE v.2.2.2.6. TheC60+G+l modelwas used for phylogenetic inference, and1,000 bootstrap replicateswere performed to assess tree robustness. The guide tree was used as a constraint, incorporating species positions inferred from previous literature to ensure a phylogenetically sound backbone. This resulting phylogeny, with branch lengths representing genetic changes, served as the input forCAFE5and themolecular clockanalysis.The following figure (Extended Data Fig. 2 from the original paper) shows the Species tree of the 154 sampled taxa:
该图像是一个系统发生树,展示了不同动物门及其亲缘关系。图中显示154个基因组的进化关系,以及它们在11个陆地化事件中对应的变化,通过这种方式可以观察到基因增减的特征和相关的生物适应性。
4.2.3. Gene Content Analysis
The HG content for key nodes in the tree (including the 11 terrestrialization events and their ancestors) was reconstructed using a previously described approach (Paps and Holland, 2018; Guijarro-Clarke et al., 2020; Bowles et al., 2020). HGs were classified based on their mode of evolution:
-
Novel HGs: These are
HGsthat are present in at least one species within theLCA(Last Common Ancestor) of a lineage (referred to as a "node"), while being completely absent in all species of theoutgroup(sister groups and other more distantly related aquatic relatives). -
Novel Core HGs: A more stringent category of
novel HGs. These areHGsthat are present in all species within a node (allowing for one absence if the node contains more than three species), while being absent in all species of the outgroup. For nodes with only two species,novel HGsare equivalent tonovel core HGs. -
Lost HGs: These are
HGsthat are absent in all species within a specific node, but were present in itssister groupsand other species in theoutgroup. -
Expanded HGs: These
HGsshow a statistically significant increase in the number ofgene copieswithin a lineage, often due togene duplication events. -
Contracted HGs: These
HGsshow a statistically significant reduction in the number ofgene copieswithin a lineage. -
Ancestral HGs: All
HGsinferred to be present in a given ancestral node.The inference of
novel,novel core, andlost HGswas performed using thePhylogenetically Aware Parsing Scriptpipeline developed by Paps and Holland.
For expanded and contracted HGs, the CAFE5 software (v.5.22) was used:
- An
ultrametric phylogenetic tree(a tree where all tips are equidistant from the root, implying a constant evolutionary rate along all paths from the root to the tips) was generated from theIQ-TREEphylogeny usingape,TreeTools, andphytoolspackages in R. CAFE5was run with aPoisson distributionand anerror model.- Due to the large dataset, the phylogeny was split into three smaller trees:
Lophotrochozoa,Ecdysozoa, andDeuterostomia. - For each smaller tree,
CAFE5was run withtwo-lambdaandthree-lambda models(models that allow for different rates of gene family evolution across different parts of the tree) ten times each to test for convergence ofModel Base Final Likelihood (-lnL). - The model with the highest
-lnLwas selected, and alikelihood ratio testwith achi-squared distributionwas used (via thelmtestpackage in R) to compare thetwo-lambdaandthree-lambdamodels. This indicated thatthree-lambdamodels were generally a better fit () forLophotrochozoaandEcdysozoa. However,simulation testswithinCAFE5revealed that thethree-lambdamodel forDeuterostomiafluctuated, making thetwo-lambdamodel more stable and thus chosen as better fit for this specific phylogeny.
4.2.4. Novel Core HG Validation
To ensure the robustness of the identified novel core HGs, a validation step was performed using BLASTp v.2.14.0+.
Novel core HGswere searched against theNCBI RefSeq database(downloaded 23 August 2023), which contains a broad range of high-quality molecular sequences.- The crucial step was to exclude protein sequences from the
in-groups(the terrestrial nodes themselves) from theRefSeqsearch using the"-negative_taxidlist"option. This ensures that any significant hits found are from species outside the target lineage, confirming the novelty of theHGsto the specific terrestrial node. - The results showed
very weak hitsfor the vast majority of sequences, with and , confirming the true novelty of theseHGsto the terrestrial lineages.
4.2.5. Permutation Test Analysis
Two permutation tests were conducted to statistically validate key findings:
-
Novel HGs Gain Rate:
- Objective: To determine if the rate of novel gene emergence per million years (
Myr) is significantly higher interrestrial nodescompared toaquatic nodes. - Method:
- The
rate of novel HGs(total novel HGs / total divergence time) was calculated for the11 terrestrial nodes. 11 aquatic nodes(e.g., Actinopterygii, Bivalvia) were randomly selected as a comparison group.- The
observed total evolutionary ratefor terrestrial nodes () was recorded. - A
permutation testwith10,000 bootstrap drawswas performed. In each permutation, 11 aquatic nodes were sampled (with replacement), their evolutionary rate () was recalculated, and the value recorded. This generated anull distributionof novel gene rates for aquatic nodes. - The
empirical one-tailed P-valuewas the proportion of bootstraps where .
- The
- Result: The observed novel gene rates found in terrestrial lineages were
significantly higherthan in aquatic nodes () (Extended Data Fig. 3a).
- Objective: To determine if the rate of novel gene emergence per million years (
-
Functional Repertoire:
-
Objective: To assess if the
GO term compositionofterrestrial lineages(derived from novel genes) significantly differs from that ofaquatic lineages. -
Method:
- Lineages with the biggest taxon sampling from
random aquatic lineages(e.g., Actinopterygii, Bivalvia) were included. - The
GO matrix(presence/absence profile of GO terms) derived fromnovel genesfor each lineage was converted into abinary presence/absence matrix. - The
dissimilaritybetween terrestrial and aquaticGO term profileswas quantified usingJaccard distance(proportion of non-shared terms). - A
permutation testwas run10,000 times. In each permutation, theaquatic/terrestriallabels were randomly reshuffled across lineages, two group profiles were rebuilt, and theJaccard distancebetween them was recalculated. This generated anull distribution. - The
empirical P-valuewas the proportion of permutations where the distance was the observed distance.
- Lineages with the biggest taxon sampling from
-
Result: The biological functions in terrestrial nodes were
significantly differentfrom those in other nodes (observed Jaccard distance = 0.583, ) (Extended Data Fig. 3b).Both analyses were conducted in R using the
vegan,car, andggplot2packages.
-
4.2.6. Functional Annotation and Enrichment Analysis
- Representative Species Selection: For each of the
11 terrestrial events, a single representative species was chosen for detailed functional annotation (e.g.,Homo sapiensfor Tetrapoda,Drosophila melanogasterfor Hexapoda). - Pfam and GO Annotation:
egg-NOG-mapper v.2was applied online with default parameters to annotatePfam domainsandGO termsfor theHGsof interest.UniProtwas used to confirm gene names andPANTHER 19.0to classify genes byprotein class. - GO Enrichment Analysis:
- Objective: To find
overrepresented GO termsinnovelandexpanded HGsof terrestrial events. - Background: The
GO termshitting allHGspresent in theLCA of Bilateriawere used as the background. This ensures a consistent and broad evolutionary context for comparison. - Method: A
Fisher's exact testwas performed to compare the number ofHGshitting eachGO termbetween the terrestrial events and the bilaterian background. - Correction:
P-valuesformultiple comparisonswere corrected using theBenjamini-Hochberg method. - Significance:
GO termswith adjusted were consideredsignificantly enriched.
- Objective: To find
- Differential Functional Term Presence (Semi vs. Fully Terrestrial):
- Objective: To identify
biological functionsthat significantly differentiatesemi-terrestrialandfully terrestrialgroups after thePCoAanalysis. - Method: Using binary presence/absence matrices of
GO termsorPfams, atwo-tailed Fisher's Exact Testwas conducted in R for every feature (term). Functional terms lacking variability (present in all or none) were discarded. - Background: The
marginal totalsacross the entire pool of species served as the background for the test. - Correction:
P-valueswere corrected using theBenjamini-Hochberg method. - Significance: Adjusted indicated
significant enrichment, with terms specifically reported. Terms present in of both groups were excluded for biological relevance.
- Objective: To identify
4.2.7. PCoA and PCA
-
Principal Component Analysis (PCA):
- Objective: To compare the distribution of
GO termslinked tonovelandancestral HGsamongsemi-terrestrialandfully terrestrial lineages. - Method:
PCAwas conducted using theprcompfunction in R. SpeciesGO termswere plotted using the first two principal components (PC1andPC2). - Statistical Analysis:
ANOVAandTukey's honest significant difference (HSD) testwere performed onprincipal components scoresto evaluate differences.MANOVAexamined the combined effect onPC1andPC2. - Visualization:
Ellipseswere generated (using normal distribution-based ellipse fitting) forsemi-terrestrialandfully terrestrialgroups to visualize clustering.
- Objective: To compare the distribution of
-
Principal Coordinates Analysis (PCoA):
-
Objective: To quantify
compositional differencesinGO termandPfam presence/absence profilesbetweensemi-terrestrialandfully terrestrial species, especially considering that shared absences might biasEuclidean-based PCA. -
Method:
Pairwise dissimilaritiesamong species were computed usingJaccard distance(which focuses on shared presences) in theveganR package.PCoAwas performed on theJaccard distance matrix. The axes explain percentages ofJaccard distance variation.
-
Statistical Analysis: A
PERMANOVA(adonis2function) was performed on theJaccard distanceswith10,000 permutationsto test for overall group differences.Homogeneity of multivariate dispersionwas tested using thebetadisperfunction to ensurePERMANOVAresults were not driven by unequal within-group spread. -
Visualization: Plots were generated using
ggplot2, with group ellipses representing95% concentration regions. -
Result:
PERMANOVAshowedsignificant differencesbetweensemi-andfully terrestrialgroups forGO terms() andPfam domains(), while group dispersions did not differ.The following figure (Fig. 5 from the original paper) shows the PCoA of GO terms and Pfam domains associated with novel genes in semi- and fully terrestrial species:
该图像是图5的统计图,展示了基于GO术语(左图)和Pfam域(右图)的PCoA分析结果。每个点代表61种采样的陆生物种,颜色表示不同的分类群。图中椭圆显示了半陆生(橙色)和完全陆生(绿色)物种的聚类模式,第一和第二主坐标分别解释了19.9%和15.6%的变异性。
-
4.2.8. Molecular Clock
Molecular clock analysis was performed using a two-step approach in MCMCTree (part of the PAML package).
-
Step 1: Branch Length Estimation (
CODEML)- The previously described
concatenated alignmentof943 conserved orthologous genes(generated fromBUSCOgenes usingMAFFT,trimAl, andFASconCAT-G) was used. CODEMLwas used to estimatebranch lengthsbymaximum likelihood. This calculates thegradientandHessianof thelikelihood functionat themaximum likelihood estimates.- The
Empirical+F model(model ) and anindependent rates clock model(clock ) were applied. TheEmpirical+F modelis a protein substitution model that uses empirical amino acid frequencies. Anindependent rates clock modelallows for variable evolutionary rates across different branches of the phylogeny.
- The previously described
-
Step 2: Divergence Time Estimation (
MCMCTree)-
MCMCTreewas executed to estimatedivergence times. -
The same
independent rates clock modelwas used. -
A
discrete gamma distributionwith4 categoriesand a was used to model rate variation among sites. -
The
prior for the substitution ratewas determined based on theapproximate root age(591.255 Ma), resulting in agamma distributionwith and . -
The
Markov chain Monte Carlo (MCMC)was run for approximately20 million generations, with the first100,000 generations discarded as burn-in. Samples were collected every1,000 generationsto obtain20,000 samples. -
Six independent MCMC runswere performed to ensureconvergenceandreliability. -
Tracer v.1.7.2was used to assessconvergence, witheffective sample sizes (ESS)exceeding200for all parameters across all runs. -
The fourth run was selected for final divergence time estimates based on consistency.
The results of the molecular clock analysis are presented in Figure 1, illustrating the temporal windows of terrestrialization.
-
5. Experimental Setup
5.1. Datasets
The study utilized a comprehensive dataset of 154 genome samplings (Supplementary Table 1 in the original paper).
- Source: These genomes were compiled from publicly available databases including
UniProt,NCBI,Ensembl, and other resources. - Scale and Characteristics: The dataset included
151 metazoangenomes and3 unicellular holozoangenomes (serving as outgroups). These represent21 animal phyla, covering a broad diversity of animals, with a specific focus on species that flank the11 key terrestrialization eventsidentified in the study. - Quality Control: The quality of the protein sequences derived from these genomes was assessed using
BUSCO v.5.4.7. Genomes withcompleteness greater than 85%andfragmentation less than 15%were preferred, though some genomes not perfectly meeting these criteria were included if deemed important for the phylogenetic and habitat context. - Why these datasets were chosen: This extensive and diverse sampling allowed the researchers to:
-
Reconstruct
ancestral genomeswith high confidence. -
Identify
homology groupsacross a wide range of evolutionary distances. -
Cover multiple independent
terrestrialization events, providing the necessary comparisons for discerningconvergentversuscontingentadaptations. -
Establish a robust
molecular timescalefor these events.The paper does not provide a concrete example of a data sample like a specific gene sequence or annotation entry, but the nature of the data is protein sequences from various animal genomes.
-
5.2. Evaluation Metrics
The paper employs a range of statistical and biological metrics to evaluate its findings:
-
P-value(P):- Conceptual Definition: The probability of observing a test statistic as extreme as, or more extreme than, the one observed, assuming the null hypothesis is true. A small P-value (typically < 0.05) indicates that the observed result is unlikely under the null hypothesis, leading to its rejection.
- Mathematical Formula: There is no single universal formula for P-value, as it depends on the statistical test being performed. For a test statistic and observed value : (for a one-tailed test, or similar for two-tailed)
- Symbol Explanation:
- : Probability.
- : The test statistic (e.g., F-statistic, t-statistic, chi-squared value, Jaccard distance).
- : The observed value of the test statistic from the data.
- : The null hypothesis.
- Usage in Paper:
- Permutation test for novel gene rates: (terrestrial rates significantly higher).
- Permutation test for functional repertoire: (terrestrial GO term composition significantly different).
- CAFE5 model selection: (three-lambda models better fit than two-lambda).
- GO enrichment analysis: Adjusted (significantly enriched GO terms after Benjamini-Hochberg correction).
- PERMANOVA: (significant differences between semi- and fully terrestrial groups).
- Differential functional term presence: Adjusted (significantly enriched GO/Pfam terms between semi- and fully terrestrial).
-
R-squared():- Conceptual Definition: In the context of
PERMANOVA, represents the proportion of the total variance in the distance matrix that is explained by the grouping variable (e.g., habitat type). A higher indicates that the grouping variable accounts for a larger proportion of the observed differences between samples. - Mathematical Formula: For
PERMANOVA, it is typically calculated as: - Symbol Explanation:
- : The proportion of variance explained.
- : Sum of squares between groups (variation explained by the grouping factor).
- : Total sum of squares (total variation in the distance matrix).
- Usage in Paper:
- PERMANOVA for GO terms: (9.95% of GO term profile variance explained by habitat).
- PERMANOVA for Pfam domains: (9.92% of Pfam profile variance explained by habitat).
- Conceptual Definition: In the context of
-
Log-likelihood (-lnL):- Conceptual Definition: A measure used in
maximum likelihood estimationto assess how well a statistical model fits the observed data. A higher log-likelihood (or a smaller negative log-likelihood,-lnL) indicates a better fit of the model to the data. It is used for model comparison, often throughlikelihood ratio tests. - Mathematical Formula: The likelihood function represents the probability of observing the data given the model parameters . The log-likelihood is .
-lnLis simply the negative of this value. - Symbol Explanation:
- : Likelihood function.
- : Model parameters.
- : Observed data.
- Usage in Paper: Used in
CAFE5to compare the fit oftwo-lambdaandthree-lambda models.
- Conceptual Definition: A measure used in
-
Jaccard Distance:- Conceptual Definition: A measure of dissimilarity between two sets. It is calculated as one minus the
Jaccard similarity coefficient, which is the size of the intersection divided by the size of the union of the sets. AJaccard distanceof 0 indicates identical sets, while 1 indicates completely disjoint sets. It is particularly suitable for binary presence/absence data. - Mathematical Formula: For two sets A and B, the Jaccard similarity coefficient
J(A,B)is: TheJaccard distanceis then: - Symbol Explanation:
A, B: The two sets being compared (e.g., GO term profiles of two lineages).- : Cardinality (number of elements) of a set.
- : Intersection of sets (elements common to both).
- : Union of sets (all unique elements in either set).
- Usage in Paper: Used to quantify
dissimilaritybetweenGO termandPfam presence/absence profilesof terrestrial and aquatic species, and betweensemi-terrestrialandfully terrestrialspecies forPCoA.
- Conceptual Definition: A measure of dissimilarity between two sets. It is calculated as one minus the
-
Effective Sample Size (ESS):- Conceptual Definition: A measure used to assess the
convergenceand quality of samples generated byMarkov Chain Monte Carlo (MCMC)algorithms. It estimates the number of independent samples equivalent to the correlated samples generated by the MCMC chain. AnESSvalue generally greater than200(as used in the paper) indicates good mixing and adequate sampling for reliable parameter estimation. - Mathematical Formula: While the exact formula is complex and involves autocorrelation functions, conceptually it is:
- Symbol Explanation:
- : Total number of samples in the MCMC chain.
- : Autocorrelation at lag .
- Usage in Paper: Used to assess the
convergenceofMCMCTreeruns formolecular clock analysis, ensuringESSvalues exceeded 200 for all parameters.
- Conceptual Definition: A measure used to assess the
5.3. Baselines
The paper's methodology involves comparisons against different "baselines" depending on the analysis context, rather than a single baseline model for the entire framework:
-
Ancestral States: For analyzing
gene gainsandlosses(novel, novel core, lost HGs), the implicitly used baseline is thegene content of the immediate ancestral nodesto theterrestrialization events. This allows the researchers to identify genes that emerged or disappeared during the transition. -
Aquatic Nodes: For the
permutation tests, randomly selectedaquatic nodes(e.g., Actinopterygii, Bivalvia, Cnidaria) served as thenull distribution baselineto assess whether the observednovel gene gain ratesandfunctional repertoireof terrestrial lineages were significantly different from what would be expected in non-terrestrial (aquatic) evolution. -
Bilaterian Ancestral Genes: For
GO enrichment analysis, theGO termshittingall HGs present in the LCA of Bilateriawere used as the background. This provides a broad, evolutionarily ancient baseline against which to identifyoverrepresented functionsin the terrestrial lineages. -
Two-lambda Model: In
CAFE5analysis, thetwo-lambda modelwas used as a baseline to compare against thethree-lambda modelusinglikelihood ratio tests, determining which model better fit the gene family evolution data.These various baselines are representative because they allow the researchers to isolate the genomic and functional changes specifically associated with
terrestrializationby comparing them against relevant ancestral or non-terrestrial states, or against alternative statistical models.
6. Results & Analysis
6.1. Core Results Analysis
The study's results comprehensively detail the genomic changes and functional adaptations associated with animal terrestrialization, highlighting patterns of convergence and contingency, and establishing a temporal framework.
-
High Gene Turnover Characterizes Terrestrialization:
- All
11 terrestrial lineagesshowed significantgene turnover, characterized by bothgene gains(novel genes and expansions) andgene reductions(losses and contractions) (Fig. 2). This indicatesgenome plasticityas animals adapted to new environmental challenges. Novelty(new genes) was particularly high inbdelloid rotifers,nematodes,tetrapods, andland gastropods(the latter primarily ingene expansions).Gene reductionwas pervasive, withNematoda,Tardigrada, andOnychophoraexhibiting the largestgene losses.- A
permutation testconfirmed that therate of novel gene emergenceper million years interrestrial nodeswassignificantly higherthan inaquatic nodes() (Extended Data Fig. 3a), validating the importance ofgene gainsin this transition. - Conversely,
arachnidsandhexapodsshowedlower levels of plasticity, suggestinggene co-option(repurposing existing genes) might have played a more dominant role in their adaptation.
- All
-
Convergent Functions via Gene Gains:
- Despite distinct patterns of
gene gainat theHGlevel,functional annotationrevealed strongconvergencein the types ofbiological functionsthat emerged. Novel HGsshared by at least 10 terrestrial nodes (118GO termsand 26novel core HGs) were involved inosmosis(water transport),metabolism(fatty acids, linked to diet),reproduction,detoxification,sensory reception, andreaction to stimuli(Fig. 3a, b, c).- The
55 most specific GO functionsincludedlocomotion,membrane ion transport,transporter activity,response to stimulus,neuronal functions, and various metabolic, reproductive, and developmental processes (Fig. 3b). Pfam domainsechoed these findings, recovering functions related toosmoregulation(e.g.,neurotransmitter-gated ion channel),stimulus/neuronal functions(transmembrane receptor), anddetoxification(cytochrome P450) (Fig. 3c).Expanded HGs(genes predating the transition but increasing in copy number) also showedconvergencein functions related todetoxification(e.g.,cytochrome P450,flavin-containing monooxygenases,glutathione S-transferase),oxidative stress,metabolism, andreception of stimuli(e.g.,G-protein-coupled receptor family) (Fig. 4a).
- Despite distinct patterns of
-
Gene Reduction Marks Land Adaptation:
Gene losswas numerically high in most terrestrial nodes (Fig. 2).- Notably, the
Dbl-homology domain gene familywas lost in 8 out of 11 terrestrial events, and thepleckstrin-homology domain gene familyin 7 out of 11. Both are components ofRho GTPasesinvolved inregenerationandwound healing, suggesting a convergent reduction in regenerative capacity as an adaptation to land. Chlorophyllase protein familyloss indicateddietary shifts, andShugoshin C-terminal domainloss pointed to changes inreproduction.Convergent contractionsinHGsincludedchloride channel protein members(critical forosmoregulation),carbohydrate sulfotransferases(extracellular communication), andmelatonin-related receptors(circadian rhythms) (Fig. 4b).
-
Semi Versus Fully Terrestrial Lineages:
PCoAbased onGO termsandPfam domainsshowed significant separation betweensemi-terrestrialandfully terrestrialgroups ( for both) (Fig. 5), indicating distinct functional adaptations.Semi-terrestrial species(e.g., bdelloid rotifers, nematodes, tardigrades) evolved an "expansive and versatile toolkit" emphasizingcuticle remodelling,visual development, andstress response. They showed broadfunctional convergencein areas likecirculatory system development,osmoregulation,nutrient processing,muscle function,energy metabolism,detoxification, andsensory response.Fully terrestrial species(e.g., land gastropods, arachnids, hexapods, tetrapods) displayed a "small and streamlined set" ofnovel gene functionscentered onneuronal developmentandion membrane homeostasis. They showedlimited convergenceamong themselves, with most shared adaptations within arthropods (Myriapoda, Armadillidium, Hexapoda, Arachnida) likely stemming from similar ancestral toolkits.
-
Unique Adaptations in Terrestrial Events:
- Beyond convergence, each lineage also exhibited
unique adaptations. Examples includestress-response genesin bdelloid rotifers,nervous systemandmuscle adaptationsin clitellates,shell formationandestivation genesin land snails, andcuticle-related genesin nematodes. - For
arthropods, examples includedexoskeleton wax layersynthesis genes andretinol-binding protein genesfor vision adaptation.Hexapodsshowed enrichment inmoultingandvisiongenes. Tetrapodsexhibited enrichednovelandexpanded genesrelated toimmunity functions(e.g.,T cell co-stimulation,innate immunity,neutrophil degranulation,mucins), supporting the evolution of specialized skin barriers against terrestrial pathogens.
- Beyond convergence, each lineage also exhibited
-
Temporal Windows of Terrestrialization:
- The
molecular clock timescale(Fig. 1) identified three major temporal windows foranimal land conquest:- Middle Cambrian - Middle Ordovician (~534-403 Ma): Coinciding with early land plants. Included
nematodes,myriapods,hexapods, andarachnids. Adaptations focused oncuticle formation,exoskeleton maintenance,lipid metabolism,drought,light, andoxidative stress tolerance. - Late Devonian - Early Carboniferous (~465-263 Ma for clitellates, ~351-338 Ma for tetrapods): A period of episodic flooding and seasonal wetlands.
Clitellate annelidsandtetrapodsadapted. Adaptations includedlimbs,lungs,skin barriers(tetrapods), andnervous/muscular systemenhancements (clitellates). - Cretaceous Period (~181-39 Ma): A
greenhouse landscape.Bdelloid rotifersandland gastropodsdiversified. Adaptations includedextreme stress tolerance(rotifers) andshell formation,mucus secretion,estivation(snails).Gene expansionsinammonium transporters,NADP-dependent oxidoreductases,G-protein-coupled receptorswere convergent in this window.
- Middle Cambrian - Middle Ordovician (~534-403 Ma): Coinciding with early land plants. Included
- The
6.2. Data Presentation (Tables)
The following are the results from Extended Data Table 1 of the original paper:
| Novel Genes associated with terrestrialisation-linked GOs in human | |||
|---|---|---|---|
| Gene Symbol | Protein Name | Protein Class | Biological Functions |
| APOA2 | Apolipoprotein A-lIl | transfer/carrier protein (PC00219) | lipid metabolism |
| IL27 | Interleukin-27 subunit alpha | immunity and response to stimuli | |
| OSM | Oncostatin-M | ||
| XCL1 XCL2 | Lymphotactin | intercellular signal molecule (PC00207) | |
| Cytokine SCM-1 beta | |||
| CXCL16 | C-X-C motif chemokine 16 Tumor necrosis factor ligand | ||
| TNFSF18 FLT3LG | superfamily member 18 Fms-related tyrosine kinase 3 ligand | ||
| CD1A, CD1B, | T-cell surface glycoprotein | ||
| CD1C, CD1E CD1D | Antigen-presenting glycoprotein | defense/immunity protein (PC00090) | |
| TMIGD2 | CD1d Transmembrane and immunoglobulin domain-containing | cell adhesion molecule (PC00069) | |
| protein 2 Urokinase plasminogen activator | |||
| PLAUR | surface receptor Ly6_PLAUR domain-containing | transmembrane signal receptor (PC00197) | |
| LYPD3 | protein 3 Megakaryocyte and platelet | blood cell function regulation | |
| MPIG6B | inhibitory receptor G6b | ||
| SPP1 | Osteopontin | intercellular signal molecule (PC00207) | bone regeneration |
| ENAM | Enamelin | structural protein (PC00211) | teeth development retinal cell -to-cell |
| GPR152 AKAP3, | Probable G-protein coupled receptor 152 | transmembrane signal receptor (PC00197) | communication |
| AKAP, KAP5 DKKL1 | A-kinase anchor protein | scaffold/adaptor protein (PC00226) | reproductive strategies |
| ZNF239 | Dickkopf-like protein 1 Zinc finger protein 239 | membrane traffic protein (PC00150) gene-specific transcriptional regulator | |
| TBC1D21 | (PC00264) | ||
| TBC1 domain family member 21 Protein phosphatase 1 regulatory | protein -binding activity modulator PC095) | neurodevelopment | |
| PPP1R3F HR | subunit 3F Lysine-specific demethylase | chromatin/chromatin-binding, or- | hair-cycle regulation (suggesting skin barrier) |
| Novel Genes associated with terrestrialisation-linked GOs in fruit fly | |||
| Pof | Protein painting of fourth | RNA metabolism protein (PC00031) gene -specific transcriptional regulator | reproductive strategies |
| MESR4 | Misexpression suppressor of ras 4, isoform A | (PC00264) | |
| Ir64a, Ir75d, Ir31a, Ir84a Gr39b | Ionotropic receptor Putative gustatory receptor 39b | transmembrane signal receptor (PC00197) | sensory activity (response to stimuli) |
6.3. Ablation Studies / Parameter Analysis
While the paper does not present explicit ablation studies in the traditional sense (removing a component of a model to assess its impact), it does include several analyses that function as parameter choices or validation of methodologies:
-
CAFE5Model Comparison: The choice betweentwo-lambdaandthree-lambda modelsinCAFE5for inferringgene expansionsandcontractionscan be considered a parameter analysis. The researchers ran both models ten times, compared their fit usinglikelihood ratio tests, and performedsimulation teststo check model stability (Supplementary Table 20). This rigorous process ensured that the most appropriate model (e.g.,three-lambdaforLophotrochozoaandEcdysozoa,two-lambdaforDeuterostomiadue to stability) was chosen for each phylogenetic group, thereby validating the inference ofexpandedandcontracted HGs. -
Permutation Tests: The
permutation tests(Extended Data Fig. 3) fornovel HG gain ratesandfunctional repertoire differencesserve to statistically validate the significance of the observed patterns against random chance. By showing that terrestrial lineages have significantly higher novel gene rates and distinct functional profiles compared to aquatic ones, these tests confirm that the observed genomic changes are not merely stochastic but linked to theterrestrializationprocess. -
BLASTpValidation of Novel Core HGs: The validation ofnovel core HGsusingBLASTpagainstRefSeq(excluding ingroup taxa) acts as a crucial check on the methodology for identifying trulynovel genes. The results (very weak hits with high e-values and low identity) confirm that the identifiednovel core HGsare indeed specific to the terrestrial lineages, reinforcing the reliability of these findings. -
PCoA and PERMANOVA for Habitat Types: The
PCoAand subsequentPERMANOVAanalyses to differentiatesemi-terrestrialandfully terrestrialgroups (Fig. 5) rigorously test the hypothesis that habitat dependency correlates with distinct genomic adaptation patterns. The significantP-valuesandR-squaredvalues support the distinction in functional profiles, validating this categorization.These analyses, though not explicitly termed "ablation studies," systematically evaluate the robustness of key methodological choices and the statistical significance of the core findings.
7. Conclusion & Reflections
7.1. Conclusion Summary
This study provides a comprehensive comparative genomic analysis of 11 independent animal terrestrialization events, leveraging an InterEvo framework to explore convergent and contingent evolutionary paths. The key findings demonstrate that terrestrialization is universally characterized by extensive gene turnover, involving both gene gains (novel genes, expansions) and gene reductions (contractions, losses). Crucially, despite lineage-specific gene changes, convergent functional adaptations emerged recurrently, particularly in processes related to osmoregulation, stress response, immunity, sensory reception, metabolism, and reproduction.
The paper highlights a distinction between semi-terrestrial and fully terrestrial lineages: semi-terrestrial animals exhibit broader functional convergence with a versatile toolkit for environmental flexibility, whereas fully terrestrial animals show more contingent adaptations with a streamlined set of functions, indicating diverse genomic solutions for permanent land colonization.
Furthermore, the molecular timescale analysis identifies three major temporal windows for animal terrestrialization (Middle Cambrian-Ordovician, Late Devonian-Early Carboniferous, and Cretaceous), each coinciding with distinct paleoecological contexts. The study concludes that genomic adaptations to terrestrial life are a complex interplay of predictable (convergent) molecular responses to common environmental challenges and unique (contingent) solutions shaped by individual evolutionary histories.
7.2. Limitations & Future Work
The authors acknowledge several limitations:
-
Habitat Classification Ambiguity: The classification of
terrestriality(e.g.,semi-terrestrialvs.fully terrestrial) lacks a universal consensus, and various definitions exist (e.g., cryptic forms, poikilohydric, homoiohydric). This could influence interpretations, and more comparisons using alternative classifications are needed. -
Challenges in Annotating Lost and Contracted Genes: Many
lostandcontracted HGsare difficult to functionally annotate because they are absent or poorly characterized in common model organisms. Functional analysis often relies on distanthomologsfrom humans or fruit flies, which may not accurately reflect functions due tosequence divergence, or become virtually impossible ifHGsare lost in traditional models. -
Inference of Gene Duplication Events:
CAFE5infersgene expansionsbased oncopy number changes, not explicitgene trees. This means it does not pinpoint whether duplications occurred precisely at theterrestrial nodesor independently within descendant lineages. While observed expansions remain robust, more precise methods are needed. -
Phylogenetic Position Incongruence: Certain phylogenetic relationships, such as the position of
Chelicerata(specificallyXiphosurarelative toArachnida), remain debated. Such incongruences can complicate interpretations ofterrestrial transitionsandancestral states. -
Limited Taxon Sampling: For some lineages (e.g.,
tardigrades,onychophorans,woodlice), only one or two genomes were available. This limited sampling may lead toHG numbersthat are not fully representative of the clade's entiregene content.Based on these limitations, the authors suggest future research directions:
-
Increased Taxon Sampling: Inclusion of more genomes for underrepresented lineages to improve representativeness.
-
Advanced Annotation Tools: Developing sophisticated annotation tools for
lostandcontracted genes, potentially usingmachine learning approaches(e.g.,language models), to overcome challenges posed bysequence divergenceand limitedhomologs. -
Gene Tree-Based Expansion Inference: Improving
gene family expansion inferenceby integratinggene tree-based approachesto precisely pinpointduplication events.
7.3. Personal Insights & Critique
This paper offers a compelling and comprehensive analysis that significantly advances our understanding of animal terrestrialization. The InterEvo framework is a powerful conceptual and methodological innovation for studying convergent evolution on a genomic scale, moving beyond simple gene counts to shared biological functions. The sheer breadth of 154 genomes across 21 phyla makes the findings highly robust and generalizable across the animal kingdom.
The distinction between semi-terrestrial and fully terrestrial adaptations, while acknowledged by the authors as having classification ambiguities, provides a valuable layer of nuance to the convergence-contingency debate. It suggests that the "predictability" of evolutionary adaptation might vary depending on the degree of environmental independence achieved. Semi-terrestrial forms, constantly buffering between aquatic and terrestrial challenges, may be under stronger, more uniform selective pressures leading to broader functional convergence. Fully terrestrial forms, having overcome initial hurdles, might then explore more diverse and contingent genomic pathways as they specialize within drier niches.
The integration of molecular clock dates with paleoecological contexts is particularly insightful, transforming the genomic data into a narrative of co-evolution with Earth's changing environments and the rise of land plants. This highlights the profound interconnectedness of biological and geological history.
One area for potential critique, as the authors touched upon, lies in the functional annotation of novel and lost genes. The reliance on existing databases, often biased towards well-studied model organisms, means that truly novel or highly divergent functions might still be mis- or under-annotated. AI-driven methods, as suggested, could be transformative here. Additionally, while the paper meticulously analyzes gene turnover, a deeper dive into the regulatory changes (e.g., cis-regulatory elements, microRNAs) associated with these transitions could offer another layer of insight into how gene co-option and expression changes contribute to terrestrial adaptation, especially in lineages showing lower gene plasticity.
The paper's conclusions, particularly the statement that adaptation to life on land is predictable in large part, have significant implications. They suggest that fundamental biophysical and physiological constraints of terrestrial environments impose strong selective pressures that funnel evolution towards similar molecular solutions across vastly different lineages. This principle could potentially be applied to understanding other major evolutionary transitions (e.g., flight, deep-sea adaptation, endothermy) to discern generalizable molecular principles.
Similar papers
Recommended via semantic vector search.