Paper status: completed

Analysis of effects to scientific impact indicators based on the coevolution of coauthorship and citation networks

Published:04/19/2024
Original LinkPDF
Price: 0.100000
Price: 0.100000
Price: 0.100000
5 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This study establishes a model for coauthorship and citation networks to explore their effects on scientific impact indicators. It finds that increasing references or reducing paper lifespan boosts journal impact factor and h-index, highlighting the dynamic nature of these indica

Abstract

While computer modeling and simulation are crucial for understanding scientometrics, their practical use in literature remains somewhat limited. In this study, we establish a joint coauthorship and citation network using preferential attachment. As papers get published, we update the coauthorship network based on each paper's author list, representing the collaborative team behind it. This team is formed considering the number of collaborations each author has, and we introduce new authors at a fixed probability, expanding the coauthorship network. Simultaneously, as each paper cites a specific number of references, we add an equivalent number of citations to the citation network upon publication. The likelihood of a paper being cited depends on its existing citations, fitness value, and age. Then we calculate the journal impact factor and h-index, using them as examples of scientific impact indicators. After thorough validation, we conduct case studies to analyze the impact of different parameters on the journal impact factor and h-index. The findings reveal that increasing the reference number N or reducing the paper's lifetime θ significantly boosts the journal impact factor and average h-index. On the other hand, enlarging the team size m without introducing new authors or decreasing the probability of newcomers p notably increases the average h-index. In conclusion, it is evident that various parameters influence scientific impact indicators, and their interpretation can be manipulated by authors. Thus, exploring the impact of these parameters and continually refining scientific impact indicators are essential. The modeling and simulation method serves as a powerful tool in this ongoing process, and the model can be easily extended to include other scientific impact indicators and scenarios.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

The title of the paper is "Analysis of effects to scientific impact indicators based on the coevolution of coauthorship and citation networks". It centralizes on investigating how scientific impact indicators are affected by the simultaneous development and interaction of collaborative authorship (coauthorship) and citation patterns among researchers and papers.

1.2. Authors

The sole author of this paper is Haobai Xue, affiliated with the Shenzhen Science & Technology Library/University Town Library of Shenzhen, China. This indicates a background potentially focused on library science, information science, or scientometrics, which aligns with the paper's subject matter.

1.3. Journal/Conference

The paper was published on arXiv, a preprint server, under the identifier 2404.12765. It was published on 2024-04-19T10:10:15.000Z. As a preprint, it is publicly available but has not necessarily undergone formal peer review by a journal or conference at the time of this analysis. arXiv is a reputable platform for disseminating research rapidly, especially in fields like physics, mathematics, computer science, and quantitative finance.

1.4. Publication Year

The paper was published in 2024.

1.5. Abstract

The paper addresses the limited practical use of computer modeling and simulation in scientometrics despite their crucial role. It proposes a joint model for coauthorship and citation networks using preferential attachment. As papers are published, the coauthorship network is updated based on author lists, forming collaborative teams. Team formation considers existing collaborations, and new authors are introduced with a fixed probability. Simultaneously, the citation network is updated as papers cite references, with citation likelihood depending on existing citations, a fitness value, and age. The study then calculates journal impact factor (JIF) and h-index as examples of scientific impact indicators. After validation against empirical data, case studies analyze how various parameters affect these indicators. Key findings show that increasing the number of references (NN) or reducing a paper's lifetime (θ\theta) significantly boosts JIF and average h-index. Conversely, enlarging team size (mm) without introducing new authors or decreasing the probability of newcomers (pp) notably increases the average h-index. The paper concludes that various parameters influence scientific impact indicators, suggesting potential for manipulation and the need for continuous refinement. It highlights modeling and simulation as powerful tools for this ongoing process, with the model being easily extensible.

2. Executive Summary

2.1. Background & Motivation

The core problem the paper aims to solve is the limited practical application of mathematical modeling and computer simulations in bibliometrics (also known as scientometrics), despite their significant potential. Bibliometrics is the quantitative analysis of scientific literature to understand research trends, impact, and collaboration patterns. While extensive empirical research has uncovered numerous "laws" governing scientific publication and citation behaviors, the underlying mechanisms are often not fully elucidated, and the field still grapples with challenges like biases in real-world data and the difficulty of exploring extreme scenarios.

This problem is important because understanding the dynamics of scientific production and its impact is crucial for informed decision-making in science policy, funding allocation, and evaluating research output. Traditional empirical studies, while valuable, can be constrained by data limitations and the inability to isolate specific factors. Modeling and simulation offer a controlled environment to overcome these issues, allowing researchers to:

  1. Elucidate underlying mechanisms: By replicating microscopic behaviors of researchers and literature, these methods can reveal the causes of macroscopic phenomena.

  2. Circumvent biases and errors: Simulated environments can be free from the inherent biases and errors present in real-world databases.

  3. Conduct "thought experiments": They enable the exploration of extreme or hypothetical scenarios that are difficult or impossible to study with real data, providing insights for policy design.

  4. Generate simulated data: This allows for direct comparison with empirical results, fostering a deeper understanding.

    The paper's entry point is to establish a comprehensive coevolution model that simultaneously simulates coauthorship and citation networks. This joint modeling approach is innovative because prior research often treated these networks separately, or existing coevolution models had limitations (e.g., fixed productivity per author, or too many complex new concepts). By focusing on the interplay between author collaboration and paper citation, the paper seeks to provide a more holistic understanding of how scientific impact indicators are shaped.

2.2. Main Contributions / Findings

The paper makes several primary contributions and reaches key conclusions:

  • Establishment of a Joint Coevolution Model: The paper successfully develops a mathematical model that simulates the coevolution of coauthorship and citation networks. This model incorporates key mechanisms such as preferential attachment, paper fitness, paper aging, and realistic team assembly processes, validated against the APS dataset. This provides a robust framework for studying scientometric phenomena.
  • Validation of Model Reliability: The model's ability to replicate empirical characteristics, including the growth of papers and authors, paper team size distribution, researcher productivity (Lotka's law), collaborator number distribution, citation number distribution, and the temporal dynamics of Journal Impact Factor (JIF) and h-index, demonstrates that modeling and simulation are reliable tools for this field.
  • Parametric Analysis of Scientific Impact Indicators: The study conducts detailed case studies to analyze the impact of various parameters on JIF and h-index. These parameters include:
    • Paper lifetime (θ\theta)
    • Reference number (NN)
    • Team size (mm)
    • Probability of newcomers (pp)
  • Key Findings on Parameter Influence:
    • Boosting Impact: Increasing the reference number (NN) or decreasing the paper's lifetime (θ\theta) significantly boosts both the journal impact factor and the average h-index. A shorter paper lifetime means citations are concentrated in recent papers, which are the basis for JIF. More references lead to more citations overall.
    • Inflating h-index: Enlarging team size (mm) while keeping the number of new authors per paper (kk) constant (implying a decrease in probability of newcomers pp) notably increases the average h-index. This is because incumbents collaborate more frequently, inflating their productivity and h-index.
    • Decreasing h-index: Enlarging team size (mm) while keeping the probability of newcomers (pp) constant, or increasing pp while keeping mm constant, tends to decrease the average h-index. This is due to more researchers being generated, diluting the impact per author or favoring newer, lower h-index researchers.
  • Implication for Indicator Interpretation: The findings highlight that scientific impact indicators (like JIF and h-index) can be significantly influenced by various underlying parameters. This implies inherent weaknesses or potential for manipulation by authors and journals, suggesting that these indicators may not always reliably assess the "true quality" of a paper or author.
  • Extensibility of the Model: The paper concludes by emphasizing the model's versatility, stating it can be easily extended to include other scientific impact indicators and scenarios, making it a powerful tool for developing improved indicators and predicting future scientific landscapes.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

  • Bibliometrics / Scientometrics: These terms are often used interchangeably. Bibliometrics refers to the statistical analysis of written publications, such as books or articles. Scientometrics is a broader term that encompasses the quantitative study of science, technology, and innovation, including bibliometrics, but also extends to patents, funding, and other aspects. Both fields use quantitative methods to analyze scientific literature and identify trends, patterns, and "laws" governing research impact.

  • Coauthorship Network: In a coauthorship network, individual researchers or authors are represented as nodes (or vertices), and a link (or edge) exists between two authors if they have coauthored at least one paper together. These links are typically undirected, meaning if A coauthors with B, B also coauthors with A. The strength of a link can sometimes represent the number of coauthored papers. This network illustrates collaborative relationships within the scientific community.

  • Citation Network: In a citation network, scientific papers or articles are represented as nodes. A directed link (or edge) goes from paper A to paper B if paper A cites paper B. This network shows the flow of intellectual influence and knowledge transfer, indicating how ideas build upon previous work.

  • Coevolution: This term, borrowed from biology, refers to the reciprocal evolutionary changes in two or more interacting species or, in this context, two or more interacting systems or networks. In this paper, coevolution refers to the simultaneous and interdependent development of the coauthorship network (how authors collaborate) and the citation network (how papers are cited and influence each other). Changes in one network can affect the other, and vice versa.

  • Preferential Attachment / Matthew Effect / Cumulative Advantage: This is a fundamental mechanism in network growth. It posits that "the rich get richer" or "success breeds more success." In the context of networks:

    • Preferential Attachment: When a new node (e.g., a new paper or author) joins a network, it is more likely to connect to existing nodes that already have a high number of connections (i.e., high degree).
    • Matthew Effect: A sociological term, often used in scientometrics, referring to the phenomenon where prominent or highly cited researchers/papers tend to receive even more recognition and citations than lesser-known ones.
    • Cumulative Advantage: This describes the process where initial success or advantage accumulates over time, leading to disproportionate growth. These three terms describe essentially the same phenomenon: existing popularity or connectivity attracts further connections.
  • Fitness Models (in Citation Networks): While preferential attachment explains much of citation dynamics, it often assumes uniform quality. Fitness models introduce a hidden intrinsic quality or "fitness" parameter for each paper (or author). This parameter quantifies a paper's inherent ability to attract citations, independent of its age or existing citations. A high-fitness paper, even if new, can rapidly accumulate citations and surpass older, more established papers. This adds realism by accounting for varying quality.

  • Paper Obsolescence / Aging: This refers to the phenomenon where the relevance or novelty of a scientific paper diminishes over time. Older papers, unless they are foundational works, tend to be cited less frequently than more recent ones. This is often modeled as a decay term, where the probability of a paper being cited decreases as its age increases.

  • Journal Impact Factor (JIF): A metric used to evaluate the relative importance of a scientific journal. It is calculated by dividing the number of citations received by articles published in that journal during a specified period (typically the previous two years) by the total number of "citable items" published in the journal during the same two-year period. A higher JIF is generally perceived to indicate a more influential journal.

  • h-index: A metric used to quantify the research output and impact of an individual scholar, department, or journal. An author has an h-index of hh if hh of their papers have at least hh citations each, and the remaining papers have fewer than hh citations. For example, an h-index of 10 means the author has 10 papers with at least 10 citations each. It balances productivity (number of papers) with impact (number of citations per paper).

3.2. Previous Works

The paper provides a comprehensive literature review that touches upon several foundational and related works:

  • Early Citation Models (de Solla Price, 1976): Price's model was one of the first to formally express preferential attachment in citation networks, positing that the probability of citing a paper is proportional to its existing citations. It successfully replicated the fat-tailed distribution of citations (meaning a few papers receive many citations, while most receive few).
  • Network Growth and Preferential Attachment (Barabási & Albert, 1999; Barabási et al., 2002): Barabási's work formalized the mechanisms of growth (new nodes are added) and preferential attachment in general complex networks. He later applied these concepts to coauthorship networks, proposing models that captured their time evolution and empirical measurements, showing how new authors and internal links are incorporated.
  • Fitness Models (Bianconi & Barabási, 2001): To address the limitations of Price's model (uniform quality, inability for new papers to surpass old ones), fitness models were introduced. These models assign an intrinsic "fitness" to each paper, allowing new, high-quality papers to gain citations rapidly, aligning more with real-world observations.
  • Paper Aging/Obsolescence: Research on aging scientific literature dates back to 1943 (Gosnell). Later models (Medo et al., Eom & Fortunato, Wang et al.) incorporated a negative exponential or log-normal decay term to represent the diminishing novelty and citation probability of older papers.
  • Minimal Citation Model (Wang et al., 2013): This model combines growth, preferential attachment, fitness, and aging to capture the time evolution of citations for a paper, forming the basis for the citation dynamics used in the current study.
  • Coauthorship Network Structure (Newman, 2001, 2004): Newman conducted extensive empirical studies on the structures and statistical properties of coauthorship networks using real bibliographic databases. His work primarily focused on static network properties. Tomassini (2007) explored the formation and temporal evolution of these networks.
  • Team Assembly Mechanisms (Guimera et al., 2005): Guimera et al. proposed a model for the self-assembly of creative teams based on parameters like team size (mm), fraction of newcomers (pp), and tendency of incumbents to repeat collaborations (qq). This work is crucial as the current paper replaces Barabási's abstract parameters with more explicit ones derived from team assembly.
  • Coevolution Models (Börner, 2004; Xie et al., 2017):
    • TARL Model (Börner): The TARL (topics, aging, and recursive linking) model was an early attempt at coevolution. It modeled authors citing randomly selected papers, incorporating Matthew effect for citations. However, it assumed fixed paper production per author, failing to reproduce fat-tailed coauthor distributions.
    • Xie et al. Model: This model introduced concepts like concentric circles, leaders, and influential zones to model coevolution, successfully reproducing fat-tailed distributions for citations and coauthors. However, it was noted for introducing a large number of new concepts and parameters, making it less common.
  • Impact Factor Models (Garfield, 1972; Zhou et al., 2019, 2020):
    • Garfield: First proposed the Journal Impact Factor (JIF) as average citations per published item.
    • Zhou et al.: Developed citation models to investigate the impact of factors like review cycles, reference numbers, and reference age distribution on JIF. Later expanded to include submission models to simulate JIF dynamics across multiple journals.
  • h-index Models (Hirsch, 2005; Guns & Rousseau, 2009; Ionescu & Chopard, 2013; Medo & Cimini, 2016):
    • Hirsch: Introduced the h-index as a composite metric of productivity and impact.
    • Guns & Rousseau: Simulated h-index growth, finding it mostly linear over time.
    • Ionescu & Chopard: Developed agent-based models for individual and group h-index, incorporating Lotka's law for productivity.
    • Medo & Cimini: Compared scientific impact indicators using a citation model, confirming h-index captures combined ability and productivity.

3.3. Technological Evolution

The field has evolved from studying static networks to dynamic, growing networks, then to incorporating intrinsic properties (fitness, aging) and finally to attempting coevolutionary models.

  • Static Network Analysis (e.g., Newman's early work): Focused on structural properties of existing networks.
  • Dynamic Network Growth Models (e.g., Barabási & Albert): Introduced growth and preferential attachment to explain how networks evolve over time.
  • Enriched Node Properties (e.g., Bianconi & Barabási's fitness, aging models): Added more realistic attributes to nodes (papers, authors) to better explain empirical distributions.
  • Coevolutionary Models (e.g., Börner, Xie et al.): Attempted to capture the interdependencies between different types of networks (e.g., coauthorship and citation).

3.4. Differentiation Analysis

Compared to previous coevolution models, this paper's core innovations and differences are:

  • More Transparent Coauthorship Mechanism: Instead of abstract parameters (like aa and bb in Barabási's coauthorship model), this study adopts parameters from team assembly mechanisms (inspired by Guimera et al.)—specifically, team size (mm) and probability of newcomers (pp). This makes the connections between paper publication, author collaboration, and network evolution more explicit and interpretable.
  • Integrated Q-factor and Paper Quality: The model directly links individual author abilities (the Q-factor from Sinatra et al.) to paper quality (ηiη_i), which then feeds into the citation model as paper fitness. This provides a clear, mechanistic link between author talent and paper impact.
  • Validation and Parametric Studies: While previous models aimed to reproduce empirical distributions, this paper rigorously validates its integrated model against the APS dataset and then uses it specifically to conduct systematic parametric studies on Journal Impact Factor and h-index. This focus on understanding how specific parameters influence these scientific impact indicators is a direct practical application of the simulation.
  • Simplicity and Extensibility: Unlike Xie et al.'s model which introduced many new concepts, this model combines established mechanisms (preferential attachment, fitness, aging) with a more explicit team assembly process. This makes it relatively simpler and, as the authors claim, easily extensible to other indicators and scenarios.
  • Joint Consideration of Manipulation: The paper explicitly frames its findings around the idea that scientific impact indicators can be influenced or "manipulated" by certain choices (e.g., reference numbers, team sizes), offering a critical perspective on their interpretation.

4. Methodology

The paper proposes a coevolutionary model for coauthorship and citation networks, built upon established mechanisms like preferential attachment, fitness, and aging. The model is formulated to simulate the publication process, author collaboration, and paper citation dynamics over time, using the American Physical Society (APS) dataset for validation.

4.1. Principles

The core idea is to simulate the microscopic behaviors of researchers and papers to understand the macroscopic properties of scientometric indicators. The model operates on a principle of preferential attachment, where success (more citations, more collaborations) leads to more success. It also incorporates heterogeneity among papers (via fitness derived from author abilities) and temporal dynamics (via paper aging). The simulation proceeds step-by-step, mimicking the publication of papers, the formation of author teams, and the subsequent citation of these papers.

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. APS Database

The model relies on and is validated against the American Physical Society (APS) dataset. This dataset includes:

  • Citing article pairs: Information on which paper cites which, used for constructing citation networks.

  • Article metadata: Fundamental details such as doi (Digital Object Identifier), authors, and publication dates for all APS papers, used for constructing coauthorship networks.

    For consistency, the study only considers citation pairs where both citing and cited papers are within the article metadata subset. The dataset covers 129 years (1893 to 2021). For the simulation, a time length of T=13T = 13 years is chosen, with each simulated year corresponding to approximately 10 years of empirical data. The entire APS dataset is treated as a single "unified virtual journal" with 12 issues per year, simplifying the simulation to one journal.

4.2.2. Growth of Papers and Authors

The simulation models the growth of papers and authors based on empirical trends from the APS dataset.

  • Paper Growth: The empirical cumulative paper number exhibits exponential growth. An exponential growth model Pt=αexp(βt)P_t = \alpha \exp(\beta t) is fitted to the data, yielding an estimated annual growth rate β=6.36%\beta = 6.36\%. In the simulation, the first year starts with N1=10N_1 = 10 papers per issue. Each subsequent year, NtN_t increases by 1 paper per issue.

    • This means Year 1: 10 papers/issue * 12 issues = 120 papers.
    • Year 2: 11 papers/issue * 12 issues = 132 papers.
    • Year 13: 22 papers/issue * 12 issues = 264 papers.
    • This setup results in an annual paper growth rate of 6.68%6.68\%, which closely matches the empirical β\beta.
    • The total number of papers modeled in the simulation is P=2496P = 2496.
  • Author Growth: The cumulative author number also shows exponential growth and is linearly correlated with the cumulative paper number.

    • Image 8 (from the original paper) shows this linear relationship, with a fitting equation y = 0.679x. This implies that, on average, for each new paper published, approximately k=0.679k = 0.679 new authors are added to the existing author pool.
    • The paper then links this empirical observation to the paper team assembly mechanism. Each paper involves mm authors. For each author slot in a team, there's a probability pp that the author is a newcomer (not previously in the system) and a probability 1-p that they are an incumbent (already in the system).
    • The expected number of newcomers for each paper is given by: $ k = m p $ Where:
      • kk: The average number of new authors added per paper (empirically found to be 0.679).
      • mm: The average team size for papers (empirically found to be 3.54 for APS datasets).
      • pp: The probability of selecting a newcomer.
    • Using the empirical values, pp can be calculated as p=k/m=0.679/3.540.192p = k / m = 0.679 / 3.54 \approx 0.192. This means roughly 19.2% of author slots are filled by new authors.

4.2.3. Paper Team Assembly

A paper team is the group of researchers who coauthor a paper. The model simulates the formation of these teams.

  • Team Size Distribution: Empirical data shows that average paper team size increases over time and its distribution is fat-tailed. The simulation generates team size distributions for each of the 13 simulated years by using the empirical distributions from corresponding 10-year intervals of the APS dataset.

    • Image 9 (from the original paper) compares the simulated and empirical annual average team size increase and the overall team size distribution, showing a close alignment. The slight discrepancy in the overall distribution (more small teams in simulation) is attributed to the different paper growth rates between simulation and empirical data (empirical data is more influenced by later intervals with larger team sizes).
  • Author Selection Mechanism: Once a team size (mm) is determined for a new paper, the mm authors are selected. For each of the mm author slots:

    • With probability pp (the probability of selecting newcomers, calculated as 0.192 in Section 2.2), a new author is introduced into the system. This new author is added to the list of incumbents (existing authors) for future papers.
    • With probability 1-p, an incumbent author (an author already in the system) is selected. The selection of an incumbent follows a preferential attachment rule: authors with more previous collaborations are more likely to be selected.
    • The probability π(k)\pi(k) to select an incumbent with connectivity kk is given by: $ \pi ( k ) = \left( 1 - p \right) \frac { k } { \sum _ { i \in A _ { t } } k _ { i } } $ Where:
      • π(k)\pi(k): The probability of selecting an incumbent with connectivity kk.
      • pp: The probability of selecting newcomers.
      • kk: The connectivity of a specific incumbent author, representing their accumulated number of collaborations.
      • AtA_t: The list of all incumbent authors (authors already in the system) at time tt.
      • iAtki\sum _ { i \in A _ { t } } k _ { i }: The sum of connectivity (accumulated collaborations) of all incumbent authors at time tt.
    • Initial Connectivity: For authors with no prior collaborations, an initial connectivity k0=1k_0 = 1 is assigned. This ensures that even new incumbents have a finite, non-zero probability of being selected for their first collaboration, allowing them to enter the preferential attachment process.
    • Repeated Collaborations: The connectivity kk here refers to the accumulated number of collaborations, not just the number of distinct collaborators. This implicitly accounts for the tendency of authors to repeat previous collaborations, a factor (qq) mentioned in related work.

4.2.4. Author Ability and Paper Quality

The model incorporates author ability and paper quality to add heterogeneity to the system, influencing how papers attract citations.

  • Author Q-factor: Each new author entering the system is assigned a Q-factor. This Q-factor is a hidden intrinsic parameter that represents an author's ability to produce impactful work, independent of their productivity.

    • The Q-factor is assumed to follow a log-normal distribution with parameters μ=0.93\mu = 0.93 and σ=0.46\sigma = 0.46, consistent with previous research based on APS datasets.
    • Image 10 (from the original paper) shows the distribution of author ability (Q-factor).
  • Paper Quality ηη: The quality (or fitness) of a paper ii, denoted as ηi\eta_i, is determined by the Q-factors of its paper team members.

    • The paper's quality is primarily driven by the most talented author in the team, with some randomness.
    • The equation for paper quality ηi\eta_i is: $ \eta _ { i } = \delta \left( \operatorname* { m a x } _ { j \in a _ { i } } Q _ { j } \right) $ Where:
      • ηi\eta_i: The quality or fitness value of paper ii.
      • maxjaiQj\operatorname*{max}_{j \in a_i} Q_j: The maximum Q-factor among all authors jj in the paper team aia_i. This implies that the paper's potential impact is often capped by its most skilled contributor.
      • δ\delta: A multiplicative noise term uniformly distributed in [1δ,1+δ][1 - \delta^*, 1 + \delta^*]. This introduces additional randomness to the paper creation process, acknowledging that even with highly talented authors, not every paper will be equally successful. The value of δ\delta^* is not explicitly given in the text but is implied to be a constant.
    • Image 10 (from the original paper) also shows the distribution of paper quality, which is fitted with a log-normal distribution.

4.2.5. Coauthorship Network Construction

  • An adjacency matrix is used to record collaborations. For every pair of authors in a paper team, the corresponding element in the adjacency matrix is incremented by one. Thus, Ai,jA_{i,j} represents the number of collaborations between author ii and author jj.
  • The coauthorship network (or collaborators' network) is then derived from this adjacency matrix by replacing non-zero elements with 1 (indicating a collaboration exists) and zero elements with 0.
  • The incumbents' list (AtA_t) tracks not only author IDs but also their productivity (total number of authored papers).
  • As papers are continuously added, both the incumbents' list and the coauthorship network evolve.
  • Validation: After all P=2496P = 2496 papers are incorporated, the model's productivity distribution (Lotka's law) and collaborator number distribution are validated against APS empirical data.
    • Image 11 (from the original paper) shows these distributions, demonstrating a strong match and fat-tailed patterns, validating the coauthorship network model.

4.2.6. Reference Model

The model also simulates how papers select references.

  • Reference Number Growth: Similar to citation numbers, the average reference number per paper empirically increases over time. The model replicates this by dividing empirical reference number data into 13 intervals based on publication date. The distribution for each interval is used to generate the reference numbers for the corresponding simulation year.
    • Image 12 (from the original paper) compares the simulated and empirical annual average reference numbers and the overall reference number distribution, showing good alignment, with a similar discrepancy in overall distribution as noted for team size (due to empirical data being more influenced by later intervals with higher reference numbers).
  • Total References = Total Citations: A key assumption in this model, consistent with the APS dataset used, is that the total number of references always precisely matches the total number of citations at any given time. This ensures a closed system where internal citations and references balance out.

4.2.7. Citation Network

Once the reference number for a new paper is determined, the citation network is established by deciding which existing papers it will cite. The model uses a minimal citation model (from Wang et al.) where the probability of paper ii being cited at time tt depends on three independent factors: preferential attachment, fitness, and aging.

  • The probability Πi(t)\Pi_i(t) that paper ii is cited at time tt after publication is expressed as: $ \Pi _ { i } ( t ) = \eta _ { i } c _ { i } ^ { t } P _ { i } ( t ) $ Where:
    • Πi(t)\Pi_i(t): The probability of paper ii being cited at time tt.
    • ηi\eta_i: The paper's fitness term, which is analogous to the paper's quality discussed in Section 2.4. It quantifies the inherent attractiveness of the work, reflecting the community's response.
    • citc_i^t: The preferential attachment term. This term indicates that a paper's probability of being cited is proportional to its previously received citations.
      • It is not simply the number of citations, but an adjusted value. An initial attractiveness c0=1c_0 = 1 is assigned to a new paper with zero citations. This ensures every new paper has a finite, non-zero initial chance of being cited for the first time, preventing a "cold start" problem.
    • Pi(t)P_i(t): The aging term, representing the long-term decay in a paper's citation likelihood as its novelty diminishes. It is modeled as a negative exponential decay: $ P _ { i } ( t ) = \exp \left[ - \frac { ( t - \tau _ { i } ) } { \theta } \right] $ Where:
      • tt: The current time (in months).
      • τi\tau_i: The publication date of paper ii (in months).
      • (tτi)(t - \tau_i): The age of paper ii (in months).
      • θ\theta: A parameter characterizing the lifetime of a paper, typically measured in months. A larger θ\theta means the paper remains relevant and citable for a longer period. The paper sets θ=48\theta = 48 months, consistent with previous studies using APS datasets.
  • Validation: The final citation number distribution generated by the model is validated against APS empirical data.
    • Image 13 (from the original paper) shows that the simulated distribution exhibits a fat-tailed pattern and aligns well with empirical data, validating the citation network model.

4.2.8. Journal Impact Factor (JIF)

The Journal Impact Factor (JIF) is calculated yearly based on the generated citation network.

  • The JIF for year kk, denoted IF(k), is computed as: $ I F ( k ) = \frac { n _ { \mathrm { c i t e s } } ( k , k - 1 ) + n _ { \mathrm { c i t e s } } ( k , k - 2 ) } { n _ { \mathrm { p a p e r s } } ( k - 1 ) + n _ { \mathrm { p a p e r s } } ( k - 2 ) } $ Where:
    • IF(k): The Journal Impact Factor for the kk-th year.
    • ncites(k,k1)n_{\mathrm{cites}}(k, k-1): The number of citations received during year kk by papers published in the (k-1)-th year.
    • ncites(k,k2)n_{\mathrm{cites}}(k, k-2): The number of citations received during year kk by papers published in the (k-2)-th year.
    • npapers(k1)n_{\mathrm{papers}}(k-1): The total number of papers published in the (k-1)-th year.
    • npapers(k2)n_{\mathrm{papers}}(k-2): The total number of papers published in the (k-2)-th year.
  • This formula calculates the average number of citations in year kk to papers published in the two preceding years (k-1 and k-2).
  • Validation: The simulated JIF fluctuations are compared against APS empirical data.
    • Image 13 (from the original paper) shows that the simulated JIF variations align closely with empirical results, further validating the citation network model.

4.2.9. h-index

The h-index is calculated for each author in the system.

  • To determine an author's h-index:
    1. All of the author's publications are sorted in descending order based on their number of citations. Let this sorted list be Π={α1,,αi,,αn}\Pi = \{\alpha_1, \ldots, \alpha_i, \ldots, \alpha_n\}, where cαicαi+1c_{\alpha_i} \geq c_{\alpha_{i+1}} (citations of paper ii are greater than or equal to citations of paper i+1i+1).
    2. The h-index is then identified as the largest number hh such that the hh-th paper in the sorted list has at least hh citations.
  • Mathematically, it can be defined as: $ h = \operatorname* { m a x } _ { i } \left{ \operatorname* { m i n } _ { \alpha _ { i } \in \Pi } \left[ c _ { \alpha _ { i } } , i \right] \right} $ Where:
    • hh: The h-index value.
    • ii: The position of the paper in the sorted list (starting from 1).
    • cαic_{\alpha_i}: The number of citations received by the paper at position ii in the sorted list.
    • min[cαi,i]\operatorname*{min}[c_{\alpha_i}, i]: Takes the minimum value between the citations of the ii-th paper and its rank ii.
    • maxi{}\operatorname*{max}_{i}\{\dots\}: The h-index is the maximum value of this minimum, effectively finding the point where the paper's rank ii is less than or equal to its citations cαic_{\alpha_i}.
  • Validation: The h-index distributions and temporal variations are compared between simulated and empirical results.
    • Image 14 (from the original paper) shows that simulated h-index distributions exhibit fat-tailed characteristics and align well with empirical data and findings from previous research. The temporal growth of the h-index for top researchers is predominantly linear, consistent with prior predictions, thereby validating the h-index outcomes.

5. Experimental Setup

5.1. Datasets

The entire study, including model formulation and validation, is based on the American Physical Society (APS) dataset.

  • Source: The APS Data Sets for Research (Ref. [30]).
  • Scale: The dataset comprises approximately 0.7 million papers and 0.5 million authors up to the end of 2021.
  • Characteristics: It includes citing article pairs (for citation networks) and article metadata (doi, authors, publication dates for coauthorship networks).
  • Domain: Physics and related disciplines, as published across APS journals.
  • Choice Justification: The APS dataset is widely used in scientometrics research (as cited in many references, e.g., Medo & Cimini [2, 9], Sinatra et al. [29]). Its continuous span of 129 years (1893-2021) provides a rich historical record suitable for studying long-term trends and dynamic network evolution. The choice to treat the 19 APS journals as a "unified virtual journal" simplifies the model, focusing on the overall dynamics within a large scientific corpus rather than inter-journal comparisons.

5.2. Evaluation Metrics

The paper focuses on two widely recognized scientific impact indicators: the Journal Impact Factor (JIF) and the h-index. Both are explained in detail below.

5.2.1. Journal Impact Factor (JIF)

  • Conceptual Definition: The Journal Impact Factor (JIF) is a measure of the average number of citations received per paper published in a particular journal during a specific two-year period. It is intended to reflect the relative importance or influence of a journal within its field. A higher JIF typically indicates that articles in that journal are cited more frequently shortly after publication.
  • Mathematical Formula: $ I F ( k ) = \frac { n _ { \mathrm { c i t e s } } ( k , k - 1 ) + n _ { \mathrm { c i t e s } } ( k , k - 2 ) } { n _ { \mathrm { p a p e r s } } ( k - 1 ) + n _ { \mathrm { p a p e r s } } ( k - 2 ) } $
  • Symbol Explanation:
    • IF(k): The Journal Impact Factor for the kk-th year.
    • ncites(k,k1)n_{\mathrm{cites}}(k, k-1): The total number of citations received during year kk by all papers published in the (k-1)-th year.
    • ncites(k,k2)n_{\mathrm{cites}}(k, k-2): The total number of citations received during year kk by all papers published in the (k-2)-th year.
    • npapers(k1)n_{\mathrm{papers}}(k-1): The total number of papers published in the (k-1)-th year.
    • npapers(k2)n_{\mathrm{papers}}(k-2): The total number of papers published in the (k-2)-th year.
    • In essence, the numerator sums all citations in year kk to papers published in years k-1 and k-2. The denominator sums all "citable items" (papers) published in years k-1 and k-2.

5.2.2. h-index

  • Conceptual Definition: The h-index is a metric that attempts to measure both the productivity and citation impact of a researcher (or a journal or group). An author has an h-index of hh if hh of their published papers have each been cited at least hh times, and the other papers have not. It aims to provide a single number that reflects the overall quality and quantity of a scholar's output, preventing researchers with many uncited papers (high productivity, low impact) or very few highly cited papers (low productivity, high impact) from having an artificially inflated score.
  • Mathematical Formula: $ h = \operatorname* { m a x } _ { i } \left{ \operatorname* { m i n } _ { \alpha _ { i } \in \Pi } \left[ c _ { \alpha _ { i } } , i \right] \right} $
  • Symbol Explanation:
    • hh: The h-index value.
    • Π={α1,,αi,,αn}\Pi = \{\alpha_1, \ldots, \alpha_i, \ldots, \alpha_n\}: The set of an author's papers, sorted in descending order of their citation counts. α1\alpha_1 is the most cited paper, αn\alpha_n is the least cited.
    • cαic_{\alpha_i}: The number of citations received by the paper αi\alpha_i at position ii in the sorted list.
    • ii: The rank or position of the paper in the sorted list (e.g., i=1i=1 for the most cited paper, i=2i=2 for the second most cited, etc.).
    • min[cαi,i]\operatorname*{min}[c_{\alpha_i}, i]: This expression takes the smaller value between the number of citations for the ii-th paper and its rank ii.
    • maxi{}\operatorname*{max}_i\{\dots\}: The h-index is the maximum value obtained from min[cαi,i]\operatorname*{min}[c_{\alpha_i}, i] across all papers ii. This effectively finds the highest rank hh where the paper at that rank still has at least hh citations.

5.3. Baselines

The paper primarily validates its model against empirical data from the APS dataset rather than comparing it directly to other simulation models in the results section. The "baselines" are implicitly the observed real-world patterns and distributions of coauthorship, citations, JIF, and h-index from the APS dataset. The goal is to demonstrate that the proposed simulation model can accurately reproduce these real-world phenomena. This approach validates the model's ability to represent the underlying dynamics of scientific systems.

6. Results & Analysis

The results section first validates the model's ability to reproduce empirical data characteristics for network evolution and impact indicators, then conducts case studies by varying key parameters.

6.1. Core Results Analysis

6.1.1. Growth of Papers and Authors (Validation)

The model successfully replicates the growth of papers and authors observed in the APS dataset.

  • Image 1 (from the original paper) shows the annual growth of accumulated papers and authors. The simulated annual paper growth rate of 6.68%6.68\% closely aligns with the empirical rate of 6.36%6.36\%.

  • Image 8 (from the original paper) demonstrates a strong linear relationship between cumulative author number and cumulative paper number in empirical data, with a slope k=0.679k=0.679. The model uses this to determine the probability of newcomers (pp), effectively integrating author growth with paper production.

    该图像是一个示意图,展示了从1880年到2040年间累积的论文数量和作者数量的变化趋势。图中蓝色圆点表示论文数量,红色方框表示作者数量。可以看到,二者均呈现出显著的增长趋势。 该图像是一个示意图,展示了从1880年到2040年间累积的论文数量和作者数量的变化趋势。图中蓝色圆点表示论文数量,红色方框表示作者数量。可以看到,二者均呈现出显著的增长趋势。

6.1.2. Paper Team Assembly (Validation)

The model accurately captures the dynamics of paper team size.

  • Image 9 (from the original paper) compares the model simulations with APS empirical data.
    • Figure 9(a) shows that the annual average team size increase in the simulation closely matches the empirical data.

    • Figure 9(b) displays the distribution of paper team sizes. While generally aligned, the simulation shows a slightly higher occurrence of papers with smaller team sizes compared to empirical data. This minor discrepancy is attributed to the paper growth rate difference between simulation intervals and empirical intervals, where later empirical intervals (with larger average team sizes) have a stronger influence on the overall distribution.

      Figure 1. Evolution of cumulative papers and authors: (a) yearly progression; (b) author accumulation in relation to paper accumulation. 该图像是图表,展示了模拟数据与实际数据的比较。左侧图表示平均论文团队大小随着年份的变化,右侧图显示了不同论文团队大小对应的论文比例分布。数据展示了模拟数据与实际数据之间的相似性。

6.1.3. Author Ability and Paper Quality (Model Setup)

  • Image 10 (from the original paper) illustrates the distributions of author ability (Q-factor) and paper quality.
    • Figure 10(a) shows the log-normal distribution assumed for the Q-factor (average 2.81), consistent with previous research.

    • Figure 10(b) shows the resulting paper quality distribution (average 3.62), also exhibiting a log-normal shape. These distributions are foundational for the fitness term in the citation model.

      该图像是一个展示作者能力与论文质量的概率分布图,左侧显示作者能力的分布,平均能力为2.81,右侧展示论文质量的分布,平均质量为3.62。两者均包含理论结果和模拟数据的对比。 该图像是一个展示作者能力与论文质量的概率分布图,左侧显示作者能力的分布,平均能力为2.81,右侧展示论文质量的分布,平均质量为3.62。两者均包含理论结果和模拟数据的对比。

6.1.4. Coauthorship Network (Validation)

The model's coauthorship network characteristics are well-validated against APS empirical data.

  • Image 11 (from the original paper) presents the final distributions of researcher productivity (number of authored papers) and collaborator number.
    • Figure 11(a) shows the productivity distribution, which closely mirrors Lotka's law (a fat-tailed distribution where few authors are highly productive and many are less so), aligning with empirical data.

    • Figure 11(b) depicts the collaborator number distribution, also showing a strong match with empirical data and exhibiting fat tails. These validations confirm the model's ability to realistically generate author collaboration patterns.

      该图像是一个示意图,展示了研究者发表论文数量与合作者数量的分布。左侧图为作者数量与研究者比例的关系,右侧图为合作者数量与研究者比例的关系,均展示了模拟数据(红色方块)与实证数据(蓝色圆圈)的对比。 该图像是一个示意图,展示了研究者发表论文数量与合作者数量的分布。左侧图为作者数量与研究者比例的关系,右侧图为合作者数量与研究者比例的关系,均展示了模拟数据(红色方块)与实证数据(蓝色圆圈)的对比。

6.1.5. Reference Model (Validation)

The reference model also shows good agreement with empirical data.

  • Image 12 (from the original paper) compares simulated and APS empirical data for reference numbers.
    • Figure 12(a) illustrates that the yearly average reference numbers in the simulation closely follow the empirical trend.

    • Figure 12(b) shows the overall reference number distribution. Similar to team size, the simulation has more papers with lower reference numbers compared to the empirical data, again explained by the influence of later, higher-reference empirical intervals.

      Figure 4. Model simulations vs. APS empirical data: (a) researcher productivity distribution; (b) collaborator number distribution. 该图像是图表,展示了模拟数据与实证数据的比较。左侧图表显示了平均参考文献数量随年份的增长趋势,右侧图表显示了参考文献数量与论文比例的关系,分别用红色和蓝色标识。整体趋势揭示了模型与实际数据之间的相似性。

6.1.6. Citation Network (Validation)

The citation network generated by the model accurately reflects real-world patterns.

  • Image 13 (from the original paper) compares the citation number distribution and Journal Impact Factor (JIF) dynamics.
    • Figure 13(a) demonstrates that the simulated citation number distribution exhibits a fat-tailed pattern and aligns remarkably well with APS empirical data, validating the core citation model.

    • Figure 13(b) shows the temporal variation of the Journal Impact Factor. The simulated JIF fluctuations closely align with the empirical results of the APS dataset, further validating the citation network model.

      Figure 6. Model simulations vs. APS empirical data: (a) citation number distribution; (b) temporal variation of the journal impact factor of the APS dataset 该图像是图表,展示了模拟数据与APS实证数据的比较。左侧显示了论文引用数的分布,右侧展现了APS数据集的期刊影响因子的时间变化。模拟数据以红色方形表示,实证数据以蓝色圆圈表示。

6.1.7. h-index (Validation)

The h-index results from the simulation are also well-validated.

  • Image 14 (from the original paper) shows the h-index distribution and temporal variations for top researchers.
    • Figure 14(a) indicates that both simulated and empirical h-index distributions are fat-tailed and closely align, consistent with findings in other literature.

    • Figure 14(b) presents the temporal dynamic growth of the h-index for the top 3 researchers. Both simulated and empirical results predominantly show linear growth patterns, which is consistent with predictions from previous studies, lending credibility to the simulation.

      Figure 7. Model simulation versus APS empirical data: (a) \(h\) -index distribution in the final year; (b) temporal variation of the \(h\) -index for the top 3 researchers. 该图像是图表,展示了模拟数据与实证数据的比较。左侧为 hh-index 分布图,显示了研究者的比例与 hh-index 的关系;右侧为 top 3 研究者的 hh-index 随时间变化的曲线,将模拟值与实证值进行了对比。

6.2. Ablation Studies / Parameter Analysis

The paper conducts several case studies to analyze the impact of different parameters on the Journal Impact Factor (JIF) and h-index.

6.2.1. Paper Lifetime (θ\theta)

Paper lifetime (θ\theta) (from Equation (3)) dictates how long a paper remains likely to be cited. A larger θ\theta means older papers contribute more citations.

  • Impact on JIF:
    • Image 15 (from the original paper) illustrates the impact of θ\theta on the JIF.
    • Figure 15(a) shows the temporal variation of JIF at different θ\theta values.
    • Figure 15(b) plots JIF as a function of θ\theta. It is evident that as θ\theta increases, the journal impact factor decreases monotonically.
    • Reasoning: The JIF calculation (Equation (4)) only considers citations received by papers published in the previous two years. If θ\theta is larger, citations are more broadly distributed across older papers (published more than 2 years ago). Since the total number of citations remains constant, a larger proportion going to older papers means fewer citations are available for the recent 2-year window, thus decreasing the JIF.
  • Impact on h-index:
    • Image 2 (from the original paper) depicts the impact of θ\theta on h-index distributions.

    • Figure 2(a) shows the h-index distribution at different θ\theta values. A smaller θ\theta leads to a higher proportion of researchers with low or moderate h-index and a smaller proportion with a large h-index. This is because a small θ\theta concentrates citations on recently published papers, often by newcomers with lower h-index. Conversely, a large θ\theta directs more citations to older papers, typically by established incumbents, strengthening the Matthew effect and resulting in more researchers with a large h-index.

    • Figure 2(b) shows the average h-index as a function of θ\theta. The average h-index tends to be higher for smaller θ\theta. This is because when θ\theta is small, citations are concentrated on recent papers which are more evenly distributed among newer authors (who have lower hh-indices but whose papers are getting cited more relative to older papers). This causes the average hh-index of the system (which includes a large fraction of low hh-index authors) to increase. If θ\theta is large, citations are spread over all papers, allowing established authors to accumulate more citations, leading to a stronger 'rich get richer' effect and potentially a lower average if the majority are low hh-index authors. The paper states: "Researchers with lower or moderate hh-index exhibit larger fractions, leading to a higher weighted average of distributions for smaller θ\theta". This implies that the overall distribution becomes "heavier" on the lower end, but the average increases due to many more authors reaching a low-to-moderate hh-index threshold.

      Figure 8. impact of paper life time \(\\theta\) on journal impact factor: (a) temporal variation of journal impact factor at different \(\\theta ; ( \\mathbf { b } )\) the journal impact factor as functions… 该图像是图表,展示了论文生命周期 heta 对期刊影响因子的影响。左侧图显示了不同 heta 下的影响因子随时间的变化,右侧图则展示了影响因子与论文生命周期 heta 的关系。两个图均表明,论文的生命周期对期刊影响因子有显著影响。

Figure 15. impact of paper life time θ\theta on journal impact factor: (a) temporal variation of journal impact factor at different θ\theta; (b) the journal impact factor as functions of θ\theta at different year.

该图像是一个线性拟合图,展示了累计论文数量与累计作者数量之间的关系。图中蓝色圆点表示实证数据,红色线条为线性拟合结果,拟合方程为 `y = 0.679x`。 该图像是一个线性拟合图,展示了累计论文数量与累计作者数量之间的关系。图中蓝色圆点表示实证数据,红色线条为线性拟合结果,拟合方程为 y = 0.679x

Figure 9. impact of the paper life time θ\theta on the hh index: (a) distribution of hh index at different θ\theta; (b) average hh -index as functions of θ\theta at different year.

6.2.2. Reference Number (NN)

The reference number (NN) is the average number of references a paper cites, which directly corresponds to the average number of citations.

  • Impact on JIF:
    • Image 3 (from the original paper) shows the impact of NN on the JIF.
    • Figure 3(a) illustrates the temporal variation of JIF at different NN values.
    • Figure 3(b) plots JIF as a function of NN. It is clear that a higher NN leads to a higher journal impact factor.
    • Reasoning: Since increasing NN means more citations are generated overall, this directly translates to higher citc_i^t values (preferential attachment term) in the citation probability (Equation (3)). More citations circulating in the system, including those to papers within the 2-year JIF window, naturally increase the JIF.
  • Impact on h-index:
    • Image 4 (from the original paper) illustrates the impact of NN on the h-index.

    • Figure 4(a) shows the h-index distribution at different NN values. As NN increases, authors tend to have higher h-index values.

    • Figure 4(b) plots the average h-index of all authors as a function of NN. The average h-index monotonically increases with the reference number N.

    • Reasoning: While NN does not directly influence an author's productivity (number of papers published), it increases the number of citations each paper receives. Since the h-index is a function of both productivity and citations, more citations per paper allow authors to reach higher h-index thresholds.

      Figure 10. impact of reference number \(N\) on journal impact factor: (a) temporal variation of journal impact factor at different \(N\) ; (b) the journal impact factor as functions of \(N\) at different y… 该图像是图表,展示了参考文献数量 NN 对期刊影响因子的影响。左侧图表显示了在不同 NN 下期刊影响因随时间的变化(x轴为年份,y轴为期刊影响因子),右侧图表则呈现了期刊影响因子与平均参考文献数量 NN 的关系(x轴为平均参考文献数量,y轴为期刊影响因子)。可以看到,随着参考文献数量的增加,期刊影响因子有显著提升。

Figure 10. impact of reference number NN on journal impact factor: (a) temporal variation of journal impact factor at different NN; (b) the journal impact factor as functions of NN at different year.

Figure 11. impact of the reference number \(N\) on the \(h\) index: (a) distribution of \(h\) . index at different \(N\) ; (b) average \(h\) -index as functions of \(N\) at different year. 该图像是图表,展示了参考数量 NNhh 指数的影响。左侧图示显示了不同 NN 值下研究者与 hh 指数的关系,右侧图显示了在不同年份(11、12、13年)中,平均 hh 指数与参考数量 NN 的关系。

Figure 11. impact of the reference number NN on the hh index: (a) distribution of hh . index at different NN; (b) average hh -index as functions of NN at different year.

6.2.3. Team Size (mm) at Fixed Probability of Newcomers (pp)

This case explores how changing the average team size (mm) affects the h-index while keeping the probability of newcomers (pp) constant. The JIF is minimally affected because paper quality (ηi\eta_i) is only slightly influenced by mm.

  • Impact on h-index:
    • Image 5 (from the original paper) shows the impact of average team size mm on the h-index.

    • Figure 5(a) illustrates the h-index distribution at different mm values. With larger mm, more researchers are generated per paper. The distributions for small team sizes tend to be higher in the low to medium h-index region. This indicates that with more researchers, the total number of citations available per average researcher effectively decreases, leading to a higher fraction of researchers with lower hh-indices. However, the top researcher might still achieve a higher h-index because with more participants, there's a higher chance of someone reaching extreme values.

    • Figure 5(b) plots the average h-index as a function of mm. The average h-index decreases with increasing team size.

    • Reasoning: When pp is fixed, increasing mm (team size) leads to k=mpk = mp also increasing, meaning more new authors are added to the system per paper. While each researcher might be selected more frequently as a coauthor, the overall pool of researchers grows faster. This dilutes the total citations among more authors, reducing the average citations per author and consequently lowering the average h-index.

      Figure 12. impact of the average team size \(m\) on the \(h\) index: (a) distribution of \(h\) . index at different \(m ; ( \\mathbf { b } )\) average \(h\) index as functions of \(m\) at different year. 该图像是图表,展示了平均团队规模mmhh指数的影响。左侧图表描绘了不同团队规模下研究者的hh指数分布,标记不同的mm值(如1.1, 1.5, 2.6等)。右侧图表则表示在不同年份(11年、12年、13年)的平均hh指数与mm的关系,显示随着团队规模的变化,平均hh指数的趋势。

Figure 12. impact of the average team size mm on the hh index: (a) distribution of hh . index at different mm; (b) average hh index as functions of mm at different year.

6.2.4. Probability of Newcomers (pp)

This case examines the impact of the probability of newcomers (pp) on the h-index, while keeping average team size (mm) constant. Variations in pp do not affect paper quality (ηi\eta_i) (since QQ distributions are the same for newcomers and incumbents) and thus do not affect JIF.

  • Impact on h-index:
    • Image 6 (from the original paper) illustrates the impact of pp on the h-index.

    • Figure 6(a) shows the h-index distribution at different pp values. As pp increases, the distributions become increasingly dominated by fresh researchers with low h-index. The distributions for small pp tend to be higher than those for large pp at lower h-index values.

    • Figure 6(b) plots the average h-index as a function of pp. The average h-index decreases with increasing p.

    • Reasoning: When pp increases, more new authors are generated with each paper, and the probability of selecting incumbents decreases. This influx of newcomers (who typically have a low or zero h-index) shifts the overall h-index distribution towards lower values, consequently decreasing the average h-index across the entire author pool.

      Figure 13. impact of the probability of newcomers \(p\) on the \(h\) index: (a) \(h\) -index at different \(p\) () average \(h\) -index as functions of \(p\) at different year. 该图像是图表,展示了不同新作者引入概率 pphh 指数的影响。左侧图表描绘了在不同 pp 值下,研究者数量与 hh 指数的关系;右侧图表则显示了 pp 对平均 hh 指数的影响,分别在第 11、12 和 13 年的数据中进行比较。整体趋势表明,随着 pp 的增大,平均 hh 指数逐渐下降。

Figure 13. impact of the probability of newcomers pp on the hh index: (a) hh -index at different pp () average hh -index as functions of pp at different year.

6.2.5. Team Size (mm) at Fixed Number of New Authors per Paper (kk)

This is a crucial case study that simulates a scenario where authors intentionally enlarge their team size without increasing the overall influx of new authors into the system. This implies that as mm increases, the probability of newcomers (pp) must decrease to keep k=mpk = mp constant (Equation (1)).

  • Impact on h-index:
    • Image 7 (from the original paper) illustrates the impact of team size mm (at fixed kk) on the h-index.

    • Figure 7(a) shows the h-index distribution at different mm values. The numbers of authors with medium to high h-index increase significantly with increasing team size mm.

    • Figure 7(b) plots the average h-index as a function of mm. The average h-index increases significantly with increasing team size mm.

    • Reasoning: If kk (new authors per paper) is kept constant, but mm (team size) increases, it means that incumbent authors are collaborating more frequently with each other (as pp decreases). More frequent collaborations among incumbents inflate their productivity (number of papers) and, consequently, their h-index. This specific scenario highlights how strategic collaborative behavior can directly manipulate an author's h-index.

      Figure 14. impact of the team size \(m\) on the \(h\) index: (a) distribution of \(h\) -index at different \(m\) ; (b) average \(h\) -index as functions of \(m\) at different year. 该图像是图表,展示了团队规模 mmhh 指数的影响。在左侧,展示了不同 mm 值下 hh 指数的分布,以及研究者的比例;右侧则展示了在不同年份(11、12 和 13年)下,平均 hh 指数与团队规模 mm 的函数关系。

Figure 14. impact of the team size mm on the hh index: (a) distribution of hh -index at different mm; (b) average hh -index as functions of mm at different year.

7. Conclusion & Reflections

7.1. Conclusion Summary

This research successfully established a sophisticated mathematical model simulating the coevolution of coauthorship and citation networks. Through thorough validation against the APS dataset, the model proved capable of accurately replicating complex empirical phenomena, including fat-tailed distributions for productivity, collaborators, and citations, as well as the temporal dynamics of Journal Impact Factor (JIF) and h-index. The study's core contribution lies in its parametric analysis, revealing how key parameters influence these scientific impact indicators. Specifically, increasing the reference number (NN) or shortening the paper's lifetime (θ\theta) significantly boosts both JIF and the average h-index. More critically, enlarging team size (mm) without a proportional increase in new authors (i.e., decreasing pp while keeping kk constant) notably inflates the average h-index. These findings underscore that scientific impact indicators are sensitive to underlying publication and collaboration dynamics, suggesting they can be influenced or even manipulated by authors and publication strategies. The paper concludes by advocating for the continuous refinement of these indicators and positions modeling and simulation as invaluable tools for this ongoing process, citing the model's extensibility to other indicators and scenarios.

7.2. Limitations & Future Work

The paper explicitly states several strengths and potential extensions, which implicitly address limitations:

  • Model Simplification: The study treats the APS dataset as a "unified virtual journal," simplifying complexities of inter-journal citations or disciplinary differences.

  • Specific Parameters Studied: The parametric studies focus on paper lifetime (θ\theta), reference number (NN), team size (mm), and probability of newcomers (pp). While insightful, other parameters not explored might also significantly influence indicators.

  • Manipulation Interpretation: While the paper concludes that indicators "can be manipulated by authors," it doesn't delve into the ethical or practical implications of such manipulation beyond stating its possibility.

    As for future work, the paper suggests:

  • Extension to Other Scientific Impact Indicators: The mathematical models "can be easily extended to include other scientific impact indicators." This implies incorporating metrics beyond JIF and h-index, such as g-index, i10-index, Altmetrics, etc.

  • Exploration of Other Scenarios: The model's versatility allows for simulating "other scenarios," which could include different collaboration strategies, citation behaviors (e.g., self-citation, predatory citation), journal policies, or the impact of external factors.

  • Tool for Validation and Prediction: The methods "can serve as robust tools for validating underlying mechanisms or predicting different scenarios based on joint coauthorship and citation networks." This points to its utility in theory testing and forecasting.

7.3. Personal Insights & Critique

This paper provides a robust and insightful demonstration of how network models can illuminate the complex interplay between collaboration and citation dynamics in science. The validation against the APS dataset is a strong point, lending credibility to the model's realism.

Inspirations:

  • Mechanistic Understanding: The model offers a clear, mechanistic perspective on scientometric laws. Instead of just observing correlations, it simulates the underlying processes (e.g., how author Q-factor translates to paper fitness, or how team assembly influences h-index). This provides a deeper, causal understanding.
  • Policy Implications: The parametric studies are highly relevant for science policy. For instance, understanding that reducing paper lifetime (i.e., faster obsolescence) or increasing reference counts inflates JIF could inform discussions about how these factors might be strategically (or unethically) used by journals or authors to boost metrics. The finding that increasing team size without increasing new authors inflates h-index is particularly salient for evaluating individual researcher performance, suggesting that highly collaborative fields might inherently produce higher h-indices for established researchers, irrespective of groundbreaking individual contributions.
  • Transferability: The coevolutionary framework is highly transferable. The core mechanisms (preferential attachment, fitness, aging, team assembly) could be adapted to model other complex systems where growth, collaboration, and impact are intertwined, such as patent networks, software development communities, or even artistic collaborations and their reception.

Potential Issues/Critique:

  • Interpretation of "Manipulation": While the paper notes indicators can be "manipulated," the term carries a negative connotation. It might be more nuanced to describe these as "strategic behaviors" or "inherent sensitivities" of the metrics to certain practices. For example, extensive referencing might be a legitimate practice in some fields rather than a manipulative tactic. The model doesn't differentiate between "good" and "bad" manipulation.

  • Simplified Q-factor and Paper Quality: The paper quality ηi\eta_i is determined solely by the maximum Q-factor of a team member, with a noise term. This might oversimplify team dynamics, where synergistic effects, diverse expertise, or even the "average" Q-factor might play a larger role than just the single "star" author. More complex functions for aggregating Q-factors could be explored.

  • Static Q-factor: The Q-factor is assigned once when an author publishes their first paper and remains constant. In reality, author abilities might evolve, improve, or decline over time. Incorporating a dynamic Q-factor could add another layer of realism.

  • Homogeneous "Virtual Journal": Treating all APS journals as one virtual entity simplifies the model but loses the ability to investigate inter-journal dynamics, citation flows between journals, or disciplinary differences in citation behavior or team formation, which are significant in real scientometrics.

  • Lack of Economic/Social Factors: The model does not explicitly account for external factors like funding availability, institutional prestige, disciplinary migration, or broader societal impact, which also heavily influence collaboration and citation patterns.

  • Parameter Sensitivity in θ\theta and NN vs. mm and pp: The paper finds that θ\theta and NN "significantly boosted" both JIF and average h-index, while mm (at fixed kk) "notably increases" average h-index. The term "significantly" vs. "notably" implies a difference in magnitude, but a clearer quantification of these impacts (e.g., percentage changes, effect sizes) would strengthen the claims.

    Overall, this paper makes a valuable contribution by providing a robust, extensible simulation framework to explore the intricate relationships governing scientific impact indicators, offering crucial insights for both research evaluation and science policy.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.