MGCFNN: A NEURAL MULTIGRID SOLVER WITH NOVEL FOURIER NEURAL NETWORK FOR HIGH WAVENUMBER HELMHOLTZ EQUATIONS

Chen-song Zhang

Paper status: completed

MGCFNN: A NEURAL MULTIGRID SOLVER WITH NOVEL FOURIER NEURAL NETWORK FOR HIGH WAVENUMBER HELMHOLTZ EQUATIONS

High Wavenumber Helmholtz Equation Solving (1)Multigrid Neural Network (1)Fourier Neural Network (1)Neural Solver (1)Supervised Learning Testing and Scalability (1)

Original Link

Price: 0.10

1 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This paper introduces MGCFNN, an advanced multigrid neural solver that utilizes a novel Fourier Neural Network (FNN) for high wavenumber Helmholtz equations, demonstrating superior performance and optimal convergence up to wavenumbers of 2000.

Abstract

Solving high wavenumber Helmholtz equations is notoriously challenging. Traditional solvers have yet to yield satisfactory results, and most neural network methods struggle to accurately solve cases with extremely high wavenumbers within heterogeneous media. This paper presents an advanced multigrid-hierarchical AI solver, tailored specifically for high wavenumber Helmholtz equations. We adapt the MGCNN architecture to align with the problem setting and incorporate a novel Fourier neural network (FNN) to match the characteristics of Helmholtz equations. FNN enables faster propagation of source influence during the solve phase, making it particularly suitable for handling large size, high wavenumber problems. We conduct supervised learning tests to demonstrate the superior learning capabilities of our solvers and perform scalability tests to highlight significant speedup over the most recent specialized AI solver and AI-enhanced traditional solver for high wavenumber Helmholtz equations, along with an ablation study to underscore the effectiveness of the multigrid hierarchy and the benefits of introducing FNN. Notably, our solvers exhibit optimal convergence of $O(k)$ up to $k \approx 2000$ .

Mind Map

In-depth Reading

English Analysis~36 min read · 50,767 chars

1. Bibliographic Information

1.1. Title

The central topic of the paper is the development of a novel neural multigrid solver, named MGCFNN, which incorporates a Fourier Neural Network (FNN) specifically designed for solving high wavenumber Helmholtz equations.

1.2. Authors

The authors are:

Yan Xie
Minrui Lv
Chen-song Zhang

Their affiliations are:
State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China

1.3. Journal/Conference

The paper is an arXiv preprint, indicating it has not yet been formally published in a journal or conference proceedings at the time of this analysis. ArXiv is a widely recognized open-access preprint server for research papers in physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. While it doesn't undergo formal peer review for publication, it's a crucial platform for disseminating early research findings and has significant influence in scientific communities.

1.4. Publication Year

The publication year is implied to be 2024, given the arXiv preprint arXiv:2404.02493 reference for Cui et al. (2024), which is a "latest study" referenced in the paper itself, and the paper's own arXiv preprint nature.

1.5. Abstract

Solving high wavenumber Helmholtz equations is a significant challenge due to the limitations of traditional solvers and the inability of most neural network methods to accurately handle extremely high wavenumbers in heterogeneous media. This paper introduces MGCFNN, an advanced multigrid-hierarchical AI solver specifically designed for these complex problems. The MGCFNN architecture adapts the existing MGCNN framework and integrates a novel Fourier Neural Network (FNN). The FNN is mathematically analogous to a convolutional neural network (CNN) but is tailored to the characteristics of Helmholtz equations, enabling faster propagation of source influence. This makes it particularly effective for large-scale, high wavenumber problems.

The authors demonstrate the superior learning capabilities of MGCFNN through supervised learning tests against numerous neural operator learning methods. Scalability tests, conducted using an unsupervised strategy, highlight significant speedups compared to recent specialized AI solvers and AI-enhanced traditional solvers for high wavenumber Helmholtz equations. An ablation study further confirms the effectiveness of the multigrid hierarchy and the benefits of FNN. Notably, MGCFNN achieves optimal convergence of $\mathcal{O}(k)$ (where $k$ is the wavenumber) for wavenumbers up to approximately $k \approx 2000$ .

1.6. Original Source Link

The original source link is /files/papers/69202a729c937f638e825c84/paper.pdf. This appears to be a local or internal file path. Based on the abstract and references, it is an arXiv preprint.

2. Executive Summary

2.1. Background & Motivation

The core problem the paper aims to solve is the efficient and accurate solution of high wavenumber Helmholtz equations, especially in heterogeneous media.

This problem is critically important in various scientific and engineering disciplines such as acoustics, electromagnetics, and seismology, where understanding wave propagation is fundamental. However, solving these equations presents significant challenges:

Computational Cost: High wavenumbers necessitate a very large number of grid points to capture the wave shape accurately, leading to large-scale computational problems.
Ill-Conditioning: The discretized Helmholtz operator results in a complex-valued, highly ill-conditioned, and indefinite system, making it difficult for traditional numerical solvers to converge.
Slow Decay: Wave propagation in heterogeneous media exhibits complex patterns with slow decay rates, requiring many iterations for solvers to reach satisfactory accuracy.
Limitations of Existing Solvers:
- Traditional Solvers: Methods like Born series, factorization, shifted Laplacian, domain decomposition, and multigrid have not yet yielded consistently satisfactory results for high wavenumber cases, often struggling with convergence or requiring extensive problem-specific tuning.
- Neural Network Methods: While Physics-Informed Neural Networks (PINNs) have been applied to Helmholtz equations, they typically do not address high wavenumber scenarios. Neural Operator Learning methods (e.g., FNO, DeepONet) also haven't focused on extremely high wavenumbers or the specific challenges of heterogeneous media in this context. Recent iterative AI solvers exist but are often limited in the maximum wavenumber they can handle or rely heavily on existing traditional solvers for data generation and assistance.
  
  The paper's entry point or innovative idea is to combine the strengths of a multigrid hierarchical approach (known for handling multi-scale problems) with a novel neural network architecture called Fourier Neural Network (FNN), specifically designed to leverage the frequency domain characteristics of wave propagation. This aims to create a solver that is both efficient and robust for extremely high wavenumbers in complex media, matching the application scope of traditional solvers but with significantly improved speed.

2.2. Main Contributions / Findings

The primary contributions of this paper are:

Novel Fourier Neural Network (FNN): The authors significantly modify the spectral convolution operation from the Fourier Neural Operator (FNO) to create FNN. This FNN handles full-mode information, making it suitable for high wavenumber problems and scalable for larger problem sizes. It is mathematically equivalent to a CNN with an extended kernel size, facilitating faster wave propagation.
MGCFNN Solver Architecture: They adapt the existing MGCNN architecture to specifically address high wavenumber Helmholtz equations and integrate the FNN into its solve phase, particularly at coarse levels. This integration forms MGCFNN, which combines the complementary features of the FNN (for efficient wave propagation) and the multigrid hierarchy (for global influence and multi-scale resolution), leading to improved convergence and shorter solve times. Unlike other specialized iterative AI solvers, MGCFNN does not require specialized domain knowledge, simplifying training and application to a broader range of wave equations.
Superior Learning Capabilities and Scalability:
- Supervised Learning: The MGCFNN solver demonstrates superior learnability compared to other operator learning networks in supervised settings, achieving lower errors with shorter training times and fewer parameters.
- Scalability & Speedup: Using an unsupervised training strategy, MGCFNN exhibits significant speedup (at least $4.8 \times$ ) over the most recent specialized AI solvers and AI-enhanced traditional solvers for high wavenumber Helmholtz equations, as well as sparse direct solvers. It successfully scales to large problems (e.g., $4095 \times 4095$ grids) and high angular frequencies ( $\omega = 640\pi \approx k \approx 2000$ ).
Optimal Convergence Property: The MGCFNN solver demonstrates an optimal convergence property of $\mathcal{O}(k)$ (where $k$ is the wavenumber) for wavenumbers up to approximately $k \approx 2000$ .
Ablation Study: The paper includes an ablation study that clearly underscores the effectiveness of the multigrid hierarchy and the specific benefits derived from introducing the FNN component, especially its hybrid use within the multigrid framework.

These findings collectively present MGCFNN as a state-of-the-art AI solver for high wavenumber Helmholtz equations, significantly reducing solve time and extending the achievable wavenumber scope compared to previous methods.

3.1. Foundational Concepts

To understand this paper, a foundational grasp of several key concepts is essential, especially for readers new to the field.

Partial Differential Equations (PDEs):
- Conceptual Definition: A Partial Differential Equation (PDE) is a mathematical equation that involves an unknown function of multiple independent variables and its partial derivatives with respect to those variables. PDEs are fundamental in physics, engineering, and other sciences for describing phenomena such as heat conduction, wave propagation, fluid flow, and electromagnetism.
- Relevance: The Helmholtz equation, which is the focus of this paper, is a specific type of PDE.
Helmholtz Equation:
- Conceptual Definition: The Helmholtz equation is an elliptic partial differential equation that arises in the study of wave phenomena. It is derived from the wave equation by applying the technique of separation of variables, assuming time-harmonic behavior. It describes the spatial part of waves (e.g., sound waves, electromagnetic waves) propagating in a medium at a specific frequency.
- Relevance: This entire paper is dedicated to efficiently solving the Helmholtz equation, particularly for high wavenumbers and in heterogeneous media.
Wavenumber ( $k$ ):
- Conceptual Definition: In wave physics, the wavenumber ( $k$ ) is a measure of spatial frequency, representing the number of waves per unit distance or, more commonly, $2\pi$ times the number of waves per unit distance. It is inversely proportional to the wavelength ( $\lambda$ ), i.e., $k = 2\pi/\lambda$ . A higher wavenumber means shorter wavelengths and more rapid oscillations in space.
- Relevance: "High wavenumber" is the core challenge. As $k$ increases, the wave oscillates more rapidly, requiring finer spatial discretization and posing numerical stability challenges. The paper aims to solve problems up to $k \approx 2000$ .
Heterogeneous Media:
- Conceptual Definition: In physics, heterogeneous media refer to substances or environments where the physical properties (esuch as wave speed, density, refractive index) vary from point to point in space. This is in contrast to homogeneous media, where properties are uniform.
- Relevance: Wave propagation in heterogeneous media is significantly more complex than in homogeneous media because the wave path is continuously altered by local property changes (e.g., reflection, refraction, scattering), making accurate simulation very difficult.
Finite Difference Discretization:
- Conceptual Definition: Finite difference methods (FDM) are numerical techniques for approximating solutions to differential equations by approximating derivatives with finite difference quotients. This involves replacing continuous functions with discrete values on a grid and differential operators with algebraic expressions involving these discrete values.
- Relevance: The Helmholtz equation is discretized using a second-order finite difference scheme on a uniform grid, transforming the continuous PDE into a large system of linear algebraic equations ( $A_h u_h = f_h$ ).
Multigrid Methods:
- Conceptual Definition: Multigrid methods are a class of algorithms for solving systems of algebraic equations that arise from the discretization of differential equations. They are particularly efficient for problems where errors exhibit different behaviors at different scales (frequencies). The core idea is to use a hierarchy of grids (coarse to fine). Fine grids efficiently reduce high-frequency errors, while coarse grids efficiently reduce low-frequency errors (which appear as high-frequency errors on coarser grids when restricted). This process is typically done through cycles of smoothing on a fine grid, restricting the residual to a coarser grid, solving a coarse-grid problem, and then prolonging the coarse-grid correction back to the fine grid.
- Relevance: The proposed MGCFNN is a multigrid-hierarchical solver, leveraging this principle to achieve efficient convergence.
Neural Networks (NNs) & Deep Learning:
- Conceptual Definition: Neural networks are computational models inspired by the structure and function of biological neural networks. They consist of interconnected nodes (neurons) organized in layers. Deep learning refers to neural networks with many layers (deep architectures) that can learn complex patterns from large amounts of data.
- Relevance: The paper uses various neural network architectures (CNNs, ResNets, FNNs) as components of its solver.
Convolutional Neural Networks (CNNs):
- Conceptual Definition: Convolutional Neural Networks (CNNs) are a class of deep neural networks specifically designed for processing structured grid-like data, such as images. Their key feature is the convolutional layer, which applies a sliding filter (kernel) across the input data to detect local patterns. This makes them highly effective for capturing spatial hierarchies and relationships.
- Relevance: CNNs are a core component of the MGCNN architecture from which MGCFNN is derived. The FNN is also described as mathematically akin to a CNN with an extended kernel size.
Fourier Transform (FFT, IFFT):
- Conceptual Definition: The Fourier Transform is a mathematical operation that decomposes a function (e.g., a signal or an image) into its constituent frequencies. It transforms a function from its original domain (e.g., time or space) to a representation in the frequency domain. Fast Fourier Transform (FFT) is an efficient algorithm to compute the discrete Fourier transform. Inverse Fast Fourier Transform (IFFT) converts data back from the frequency domain to the original domain.
- Relevance: The FNN fundamentally operates in the frequency domain, leveraging the FFT and IFFT for efficient convolution and wave propagation, especially for high-frequency components. The convolution theorem states that convolution in the spatial domain is equivalent to pointwise multiplication in the frequency domain, which is a key mathematical basis for FNN.
Neural Operators:
- Conceptual Definition: Neural operators are a class of neural networks designed to learn mappings between infinite-dimensional function spaces (operators), rather than finite-dimensional vector spaces. This means they can learn to map entire input functions (e.g., initial conditions, PDE coefficients) to entire output functions (e.g., solutions to PDEs), allowing them to generalize across different discretizations and input function types. Examples include Fourier Neural Operator (FNO), DeepONet, Convolutional Neural Operator (CNO), and Multigrid Neural Operator (MgNO).
- Relevance: The paper compares MGCFNN against several neural operator learning methods, highlighting its superior learnability. FNN itself is a modification of FNO.
Physics-Informed Neural Networks (PINNs):
- Conceptual Definition: PINNs are neural networks that incorporate the governing physical laws (expressed as PDEs) directly into their loss function during training. This means they not only learn from data but are also constrained to satisfy the physics of the problem, leading to solutions that are both data-consistent and physically consistent.
- Relevance: The paper mentions PINNs as a related research area but clarifies that its focus is on inference (acting as a solver) rather than online training for solutions, and PINNs typically don't address high wavenumber Helmholtz problems.
Iterative Solvers & Preconditioners:
- Conceptual Definition: Iterative solvers are numerical methods that produce a sequence of approximate solutions that converge to the true solution of a system of linear equations. Each approximation is generated by applying a fixed set of operations to the previous approximation and the right-hand-side vector. A preconditioner is an operator that transforms the original linear system into an equivalent system that is easier for an iterative solver to solve, typically by improving the condition number of the system matrix.
- Relevance: MGCFNN functions as an iterative solver, and can also act as a preconditioner within other iterative frameworks like GMRES.
Generalized Minimal Residual (GMRES) Method:
- Conceptual Definition: GMRES is a popular Krylov subspace method for solving non-symmetric systems of linear equations. It finds the approximate solution in the Krylov subspace that minimizes the residual norm. It's often used with preconditioning to speed up convergence. A restart technique is often employed to limit memory usage by restarting the GMRES process after a certain number of iterations.
- Relevance: The paper uses GMRES to further accelerate the convergence of its MGCFNN solver, with MGCFNN acting as a preconditioner.

3.2. Previous Works

The paper positions its work within the context of both traditional numerical solvers and recent advancements in AI/data-driven methods for PDEs.

Traditional Solvers for Helmholtz Equations:
- Overview: The paper notes that numerous efforts have been made to develop efficient traditional solvers, including the Born series method (Osnabrugge et al., 2016), factorization method (Osei-Kuffuor & Saad, 2010), shifted Laplacian method (Gander et al., 2015; Sheikh et al., 2013; Calandra et al., 2013), domain decomposition method (Chen & Xiang, 2013; Leng & Ju, 2022; 2019), and multigrid method (Brandt & Livshits, 1997).
- Limitations: Despite these efforts, these methods have "not yet achieved satisfactory results" (Ernst & Gander, 2011) for high wavenumber Helmholtz equations, often struggling with the highly ill-conditioned nature and complex wave patterns. They typically require $O(k)$ iterations, where $k$ is the wavenumber.
Physics-Informed Neural Networks (PINNs):
- Overview: PINNs (Raissi et al., 2019) use neural networks to approximate PDE solutions by embedding the physics directly into the loss function. Recent works (Song et al., 2022; Escapil-Inchauspé & Ruz, 2023) have applied PINNs to Helmholtz equations.
- Limitations: The paper highlights that these PINN applications do not address high wavenumber cases, and their focus is on inference rather than online training to gain the solution, which differs from the MGCFNN's role as a solver.
Neural Operator Learning Methods:
- Overview: This area focuses on learning the operator (mapping) between infinite-dimensional parameter and solution spaces of PDEs. Key methods include Convolutional Neural Operators (CNO) (Raissi et al., 2019), Deep Operator Network (DeepONet) (Lu et al., 2021), Fourier Neural Operator (FNO) (Li et al., 2021), U-NO (Rahman et al., 2023), and MgNO (He et al., 2024).
- Limitations: While these methods function more like solvers during inference, MgNO (which has been tested on Helmholtz equations) and others generally "did not consider high wavenumber cases." The paper shows that for high wavenumber and heterogeneity, a single application of these networks is insufficient.
Iterative AI Solvers for High Wavenumber Helmholtz:
- Overview: Acknowledging the difficulty of directly solving high wavenumber Helmholtz equations with a single network application, recent studies have developed iterative AI solvers to improve accuracy. These include works by Azulay & Treister (2022), Cui et al. (2022), Stachenfeld et al. (2021), and Stanziola et al. (2021).
- Limitations: The wavenumbers tested in these studies "remain limited" compared to the paper's target of $k \approx 2000$ .
- Latest Specialized AI Solvers:
  - Encoder-solver (Lerer et al., 2024): A pure AI solver that showed superior performance over traditional shifted Laplacian solvers. However, it "relies on an existing traditional solver for data generation and solve phase assistance."
  - Wave-ADR-NS (Cui et al., 2024): An AI-enhanced traditional multigrid solver framework, also demonstrating superior performance. Its limitation is that it "requires extensive problem insights to construct a solver with good convergence, limiting its applicability to other wave equations."
MGCNN (Xie et al., 2023):
- Overview: MGCNN laid out principles for building an efficient AI solver for discretized linear PDEs, guaranteeing quick convergence and adaptability.
- Limitations: Despite its advantages, MGCNN "has not been applied to solve the Helmholtz equation, as its architecture is not directly suitable for this specific problem." This is the starting point for the MGCFNN modifications.

`Born Series Method` for Helmholtz Equation (Appendix B)

Since the paper explicitly mentions the connection between FNN and the Born series method, it's crucial to explain its core idea.

The Born series method is an iterative approach to solve the Helmholtz equation, particularly useful in heterogeneous media where the wave speed varies. It's based on treating the heterogeneity as a perturbation from a homogeneous background medium.

Consider the Helmholtz equation: $ \nabla^2 u(x) + k(x)^2 u(x) = -f(x), \quad x \in \mathbb{R}^d $ where u(x) is the wavefield, k(x) is the spatially varying wavenumber, and f(x) is the source term.

This equation can be rewritten by introducing a background wavenumber $k_0$ and a damping term $i\epsilon$ (where $\epsilon > 0$ for well-posedness, often related to absorbing boundary conditions): $ \nabla^2 u(x) + (k_0^2 - i\epsilon) u(x) = -f(x) - (k(x)^2 - k_0^2 + i\epsilon) u(x) $ Let $L_0 = \nabla^2 + (k_0^2 - i\epsilon)$ be the damped homogeneous Helmholtz operator, and $V(x) = -(k(x)^2 - k_0^2 + i\epsilon)$ be the scattering potential (representing the heterogeneity). Then the equation becomes: $ L_0 u(x) = f(x) + V(x) u(x) $ If $g_0$ is the Green's function (the impulse response) of the damped homogeneous Helmholtz operator $L_0$ , then $u(x) = G_0 (f(x) + V(x) u(x))$ , where $G_0$ is the integral operator associated with $g_0$ . In operator form, this is: $ u = G_0 f + G_0 V u $ Here, $G_0$ represents the Green's function operator, and $V$ can be seen as a diagonal matrix representing the scattering potential.

The Born series method then solves this equation iteratively. If we assume $u = u^{(0)} + u^{(1)} + u^{(2)} + \dots$ , where $u^{(0)} = G_0 f$ is the solution in the homogeneous background, then higher-order terms account for scattering: $ u = G_0 f + G_0 V (G_0 f) + G_0 V (G_0 V (G_0 f)) + \dots $ This can be written as a series expansion: $ u = \sum_{n=0}^{\infty} (G_0 V)^n G_0 f = (I + G_0 V + (G_0 V)^2 + \dots) G_0 f $ Or, in an iterative form (which the paper states is similar to its FNN solve phase): $ u^{(p+1)} = G_0 V u^{(p)} + G_0 f $ Here, $u^{(p)}$ is the approximate solution at iteration $p$ . $G_0 V u^{(p)}$ represents the wave scattered by the heterogeneity, and $G_0 f$ is the incident wave. The method essentially propagates waves through the medium iteratively, accounting for scattering at each step. This iterative process is a "pseudo-propagation" of wave source influence.

3.3. Technological Evolution

The evolution of solving Helmholtz equations has moved from:

Purely Analytical Solutions: For very simple, homogeneous geometries (limited applicability).
Traditional Numerical Methods:
- Early finite difference and finite element methods struggled with large-scale problems and high wavenumbers due to computational cost and ill-conditioning.
- Development of specialized methods like shifted Laplacian preconditioning, domain decomposition, and multigrid to improve robustness and efficiency for higher wavenumbers, but still facing limitations as wavenumber increases.
- Parallel computing techniques applied to these methods to handle larger problem sizes.
Data-Driven and AI-Enhanced Methods (Recent Decade):
- PINNs emerged, integrating physics into neural networks, primarily for forward and inverse problems but not typically high wavenumber.
- Neural Operators (FNO, DeepONet) aimed to learn the mapping between function spaces, showing promise for generalizing across different problem instances but initially not focusing on extreme wavenumbers.
- Hybrid AI-traditional methods (e.g., AI-enhanced multigrid, AI preconditioners) attempted to combine the strengths of both, often requiring significant domain expertise.
- Pure iterative AI solvers for Helmholtz equation, but often with limitations on $k$ or reliance on traditional solvers for data.
  
  This paper's work fits into the latest stage of AI-enhanced methods, specifically pushing the boundaries of neural operator-inspired multigrid solvers to tackle the extremely high wavenumber regime that previously challenged even advanced traditional methods and earlier AI approaches. It builds upon MGCNN by introducing FNN to specifically address the wave propagation characteristics unique to the Helmholtz equation, marking a significant step towards fully AI-native, highly scalable PDE solvers.

3.4. Differentiation Analysis

Compared to the main methods in related work, MGCFNN offers several core differentiations and innovations:

Target Wavenumber: Unlike most neural network methods that "do not address high wavenumber cases" or iterative AI solvers whose "wavenumbers tested in these studies remain limited," MGCFNN is specifically designed for and demonstrates optimal convergence up to $k ≈ 2000$ , a scope comparable to advanced parallel traditional solvers.
Novel FNN for Helmholtz Characteristics: Instead of directly applying existing neural operators like FNO (which "encounters limitations when solving high wavenumber Helmholtz equations"), MGCFNN introduces a significantly modified Fourier Neural Network (FNN). This FNN is tailored to Helmholtz equation characteristics by:
- Handling full-mode information (crucial for high frequencies).
- Enabling scalability to larger problem sizes through a specific weight interpolation and padding strategy in the frequency domain.
- Being mathematically akin to a CNN with an extended kernel, which efficiently propagates source influence, a critical aspect of wave equations.
Integrated Multigrid-Hierarchical AI Solver (MGCFNN): MGCFNN adapts the successful MGCNN framework, which provides a principled way to build efficient AI solvers for PDEs. The innovation lies in replacing CNN components at certain coarse levels with the specialized FNN. This hybrid approach combines the global influence and multi-scale efficiency of multigrid with the FNN's ability to efficiently handle high-frequency wave propagation, especially where CNNs struggle on coarse grids that cannot fully resolve high-frequency waves.
Reduced Domain Knowledge Requirement: In contrast to methods like Wave-ADR-NS (Cui et al., 2024), which "requires extensive problem insights to construct a solver," MGCFNN does "not necessitate specialized domain knowledge." This simplifies training and broadens its applicability to other wave equations.
Unsupervised Training Capability: While traditional solvers require explicit algorithm design and many AI solvers rely on pre-computed solutions (e.g., from traditional solvers) for supervised training, MGCFNN is designed for unsupervised learning, eliminating the need for expensive labeled data generation.
Superior Performance Metrics: MGCFNN demonstrates:
- Lower Error and Better Generalization: In supervised tests, it outperforms other operator learning networks (like FNO2D, MgNO, U-NO, etc.) in terms of test error, indicating better generalization and robustness, especially for heterogeneous media.
- Significant Speedup: It achieves substantial speedups (e.g., $4.8 \times$ to $116.2 \times$ ) compared to the latest specialized AI solvers (encoder-solver, Wave-ADR-NS) and sparse direct solvers (CHOLMOD) for high wavenumber problems.
- Optimal Convergence: Exhibits an optimal $\mathcal{O}(k)$ convergence rate, which is a benchmark for high-performance solvers.
  
  In essence, MGCFNN is not just another neural PDE solver; it's a carefully engineered, specialized architecture that addresses the specific numerical and physical challenges of high wavenumber Helmholtz equations by strategically combining multi-scale AI (multigrid) with frequency-domain-aware neural networks (FNN), pushing the boundaries of what AI solvers can achieve in this domain.

4. Methodology

The core methodology of MGCFNN is to combine a multigrid-hierarchical AI solver framework with a novel Fourier Neural Network (FNN) component. This aims to efficiently tackle the challenges of high wavenumber Helmholtz equations by leveraging the strengths of multigrid for multi-scale problem solving and FNN for handling wave propagation characteristics in the frequency domain.

4.1. Principles

The core idea behind MGCFNN is to adapt the well-established MGCNN architecture, which uses a multigrid hierarchy with CNNs as smoothers, for the unique demands of high wavenumber Helmholtz equations. The theoretical basis and intuition are:

Multigrid for Efficiency: Linear PDEs often exhibit errors at different scales (frequencies). Multigrid methods efficiently resolve these errors by iterating on a hierarchy of grids. Fine grids are good at reducing high-frequency errors, while coarse grids effectively handle low-frequency errors (which appear as high-frequency errors on coarser grids). This prevents slow convergence often seen in single-grid iterative methods.
Fourier Transform for Wave Propagation: Wave phenomena are inherently well-described in the frequency domain. The Fourier Transform decomposes a wave into its constituent frequencies, and the convolution theorem states that convolution (which describes local interactions like those in PDEs) becomes simple pointwise multiplication in the frequency domain. This can accelerate the "propagation of source influence."
FNN for High Wavenumber Challenges: For high wavenumber Helmholtz equations, CNNs (which have local receptive fields) struggle to capture long-range interactions efficiently, especially on coarse grids where wavelengths might be undersampled. The FNN is designed to operate in the frequency domain, effectively having a global receptive field (akin to an extended kernel CNN), making it better suited to capture the global nature of wave propagation and high-frequency modes.
Hybrid Approach (MGCFNN): By strategically integrating FNN into the coarse levels of the multigrid hierarchy (where CNNs are less effective due to undersampling of high-frequency waves), MGCFNN combines the best of both worlds: the multi-scale efficiency of multigrid for overall convergence and the spectral efficiency of FNN for critical wave interactions.

4.2. Core Methodology In-depth (Layer by Layer)

The MGCFNN architecture is composed of two main phases: a setup phase network and a solve phase network. Both phases are built upon a multigrid hierarchy, and the FNN is specifically incorporated into the solve phase at coarse levels.

4.2.1. Formulation and Discretization

The paper begins by defining the target problem: the Helmholtz equation.

The Helmholtz equation is expressed as: $- \Delta u ( x ) - \left( \frac { \omega } { c ( x ) } \right) ^ { 2 } ( 1 - \gamma i ) u ( x ) = f ( x ) , x \in [ 0 , 1 ] ^ { 2 }$ where:

u(x) represents the unknown wave field at spatial position $x$ .
$\Delta$ is the Laplace operator (second spatial derivative, $\Delta = \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2}$ in 2D).
c(x) is the spatially varying wave speed.
$\omega$ is the angular frequency (a constant).
f(x) is the source term (right-hand side).
$i = \sqrt{-1}$ is the imaginary unit.
$\gamma$ is the damping factor, indicating the medium's wave absorption property. The paper primarily focuses on the hardest case where $\gamma = 0$ , but also includes comparisons with $\gamma = 0.01$ .
The term $\kappa(x) = \frac{1}{c(x)}$ is defined as the slowness field.
The term $k = \frac{\omega}{c} = \omega\kappa$ is defined as the wavenumber.

For boundary conditions, the paper employs absorbing boundary conditions (Engquist & Majda, 1979; Erlangga et al., 2006) to minimize wave reflection at the domain boundaries, simulating radiation in an infinite domain. This is achieved by setting $\gamma$ to increase quadratically from 0 to 1 at a certain distance from the boundary.

The continuous Helmholtz equation (1) is then discretized using a second-order finite difference scheme on a uniform grid with spacing $h$ . This transforms the PDE into a linear system: $A _ { h } u _ { h } = f _ { h }$ where:

$A_h$ is the discretized Helmholtz operator, represented as a matrix.
$u_h$ is the discretized unknown wave field (a vector of values at grid points).
$f_h$ is the discretized source term (a vector of values at grid points).

The discretized Helmholtz operator $A_h$ in stencil form is given as: $A _ { h } = \frac { 1 } { h ^ { 2 } } \left[ \begin{array} { c c c } { { 0 } } & { { - 1 } } & { { 0 } } \\ { { - 1 } } & { { 4 - \omega ^ { 2 } \kappa ^ { 2 } ( x ) h ^ { 2 } } } & { { - 1 } } \\ { { 0 } } & { { - 1 } } & { { 0 } } \end{array} \right]$ This stencil applies to a 2D grid, where the center element corresponds to the current grid point, and the -1 elements correspond to its immediate neighbors (north, south, east, west). The term $4 - \omega^2 \kappa^2(x) h^2$ incorporates the diagonal part of the Laplacian and the wave term.

To accurately capture the wave shape, the grid spacing $h$ must be sufficiently small, typically requiring at least 10 grid points per wavelength. This implies the condition $\omega \kappa h \leq \frac{2\pi}{10}$ . Consequently, for increased wavenumbers $k$ , the grid size ( $1/h$ ) must also increase. The term $\frac{1}{h^2}$ is a scaling factor, and $\omega \kappa h$ along with $\gamma$ are grouped as the problem coefficient coef. This coef is scale-invariant, serving as a normalized input to the network for different scales.

4.2.2. Fourier Transform

The paper highlights the importance of the Fourier Transform in the context of wave propagation problems. The Fourier Transform is a mathematical operation that decomposes a function into its harmonic wave components (different frequencies). This makes it inherently suitable for analyzing and processing wave-like phenomena.

A key property utilized is the convolution theorem (Smith, 1997), which states that the convolution of two functions in the spatial domain corresponds to the pointwise multiplication of their Fourier Transforms in the frequency domain: $\mathcal { F } ( f \ast g ) = \mathcal { F } ( f ) \mathcal { F } ( g )$ where:

$\mathcal{F}$ denotes the Fourier Transform.
$f$ and $g$ are two functions (e.g., a signal and a kernel).
$\ast$ denotes the convolution operation.

This theorem implies that complex convolution operations (which describe local interactions across a spatial kernel) can be performed rapidly in the frequency domain through simple element-wise multiplication. This property is crucial for FNN to achieve "rapid propagation of wave source influence during the solve phase" with an effectively extended kernel size. The paper uses FFT for the forward transform and IFFT for the inverse transform.

4.2.3. Network Architecture

The overall MGCFNN architecture consists of a setup phase network and a solve phase network. Both utilize a multigrid hierarchy. A novel FNN kernel is specifically designed for handling high-frequency wave propagation in the coarser levels of the solve phase.

4.2.3.1. Fourier Neural Network (FNN)

The Fourier Neural Operator (FNO) (Li et al., 2021) showed impressive capabilities for PDEs, but the authors found it limited for high wavenumber Helmholtz equations. They introduce FNN with significant modifications to FNO's spectral convolution operation:

Separation of Coefficient and Source Processing: The FNN is designed to linearly operate on the source terms, with the problem coefficients processed separately by the setup phase. This ensures the network maintains a linear relationship with the source term, which is important for iterative solvers.
Smooth Weights in Frequency Domain & Interpolation (Case 1): The weights of the FNN in the frequency domain are assumed to form a smooth function (as shown in Figure 1a). To capture high-frequency modes with a limited parameter count, a smaller weight tensor is learned and then used to interpolate weights across the entire frequency domain. This is referred to as "case 1" in Figure 2.
- Figure 1a (Implicit in the text description): Shows the smooth distribution of weights in the frequency domain, allowing for interpolation.
- Figure 2 (Partial description, likely part (a)): Illustrates "Case 1: Interpolated weights are of the same size as the target T channels, and box height is proportional to the number of grid points." This suggests that for a given problem size, the FNN can interpolate its learned weights to match the required frequency resolution.
Scalability for Larger Problems (Case 2): To apply FNN to problems of larger size and higher wavenumber, a more elaborate strategy is used. The FNN first performs an IFFT on its learned (initial) weights to obtain an equivalent convolutional kernel in the spatial domain. This spatial kernel is then padded to the size of the larger target problem domain. Finally, an FFT is applied to this padded spatial kernel to obtain the weights in the frequency domain suitable for the larger problem. This is referred to as "case 2" in Figure 2.
- Figure 2 (Partial description, likely part (b)): Illustrates "Case 2" which implies a process for scaling the FNN weights to larger problem sizes by transforming to space domain, padding, and transforming back.
Complex Tensor Handling: The input tensor to FNN is interpreted as a concatenation of its real and imaginary parts. This effectively halves the number of channels prior to the FFT operation. Conversely, the output tensor channels are doubled from a complex tensor after IFFT to produce real and imaginary parts. This design choice reduces the computational cost of the Fourier Transform while maintaining the network's predictive performance.

Remark 3.1 (Wavenumber Invariance): For higher wavenumber problems, the authors initially considered leveraging the invariance of the function in the frequency domain (see Appendix A) and re-applying the interpolation technique. However, direct interpolation performed poorly for substantially larger wavenumbers. The mathematical equivalence (likely between convolution and multiplication in frequency domain, formula 4) prompted the more "elegant and efficient" approach of case 2 (IFFT-pad-FFT).

Remark 3.2 (Connection to Born Series): The FNN is mathematically connected to the inverse of the damped Helmholtz operator (details in Appendix A). Furthermore, its role in the solve phase is mathematically related to the traditional Born series method (Osnabrugge et al., 2016), which iteratively solves the equation via a pseudo-propagation process (details in Appendix B).

FNN Pseudocode: The FNN operates in two phases: setup phase (Algorithm 2) to initialize the weight tensor in the frequency domain, and solve phase (Algorithm 3) to perform spectral convolution.

Algorithm 2 FNN setup phase The purpose of this algorithm is to prepare the FNN weights (setup_weight) in the frequency domain based on the input problem size. This is a one-time cost for problems of the same size.

1: InPUT: size is the size of the input tensor of the FNN. 2: OuTPuT: setup_weight is the tensor of the FNN weights in frequency domain to do linear mapping. 3: procedure SETUPFNNWEIGHT(size) 4: init_weight <- loaded or initialized * init_weight: This represents the initial, smaller set of learned FNN weights. These can be pre-loaded from a trained model or randomly initialized for training. 5: interp_size <- loaded or settled * interp_size: This is the reference size at which init_weight was originally defined or optimized for interpolation. 6: if size = interp_size then case 1 * If the current problem size matches the interp_size for which init_weight was designed, Case 1 (interpolation) is applied. 7: setup_weight <- interpolate(init_weight) * interpolate: This function expands the init_weight (a smaller tensor) to the size of the current problem using an interpolation technique in the frequency domain. This is for scenarios where the learned weights are smooth and can be resized. 8: else case 2 * If the current problem size is different from interp_size (typically larger, for scalability), Case 2 (IFFT-pad-FFT) is applied. 9: $space_weight <- F^-1 (init_weight)$ * $F^-1$ : This is the Inverse Fast Fourier Transform (IFFT). It transforms the init_weight (from frequency domain) into an equivalent convolutional kernel (space_weight) in the spatial domain. 10: space_weight <- pad space_weight to size * pad: The space_weight (spatial kernel) is expanded by padding it with zeros to match the larger size of the target problem. This effectively extends the kernel's spatial influence. 11: setup_weight <- F(space_weight) * $F$ : This is the Fast Fourier Transform (FFT). It transforms the padded space_weight back into the frequency domain, creating the setup_weight tensor suitable for the larger problem. 12: end if 13: return setup_weight 14: end procedure

Algorithm 3 FNN spectral convolution This algorithm performs the actual spectral convolution using the pre-setup FNN weights.

1: InPUT: input is the input tensor of the FNN, setup_weight is the tensor of the FNN weights in frequency domain. 2: OuTPUT: output is the output tensor of the FNN. 3: procedure FnnConv(input, setup_weight) 4: $C <- number of channels of the input tensor$ 5: input_cmplx <- input[: C/2] + input[C/2:] * i // real to complex value * input: The input tensor to the FNN, which is presented as a concatenation of its real and imaginary parts. * input_cmplx: This step converts the input from a real-valued tensor (where $C$ channels are split into $C/2$ real and $C/2$ imaginary parts) into a complex-valued tensor with $C/2$ complex channels. 6: pad input_cmplx * pad: The complex input tensor might be padded (e.g., with zeros) to a suitable size for efficient FFT computation, often to the next power of 2. 7: input_freq <- F(input_cmplx) * $F$ : This performs the FFT on the complex input tensor, transforming it into the frequency domain. 8: output_freq <- linear_map(input_freq, setup_weight) * linear_map: This is the core spectral convolution operation. In the frequency domain, convolution becomes element-wise multiplication. This function applies a linear mapping (element-wise multiplication) using setup_weight to the input_freq to produce output_freq. * The linear_map is further defined by the Einstein summation convention: $OUT _ { b , o } [ x , y ] = IN _ { b , i } [ x , y ] W _ { i , o } [ x , y ]$ where: * $b$ denotes the batch index. * $o$ is the output channel index. * $i$ is the input channel index. * x, y are the spatial indices in the frequency domain. * $IN_{b,i}[x,y]$ is the value of the input tensor at batch $b$ , input channel $i$ , and frequency coordinate (x,y). * $W_{i,o}[x,y]$ is the value of the setup_weight tensor at input channel $i$ , output channel $o$ , and frequency coordinate (x,y). * $OUT_{b,o}[x,y]$ is the resulting output tensor. * This operation effectively performs a channel-wise linear transformation (matrix multiplication between input and weight channels) at each frequency coordinate (x,y). The setup_weight tensor thus has a size of $C^2 n^2$ for an $n \times n$ grid with $C$ channels, meaning distinct weights for each channel pair at each frequency. 9: $output_cmplx <- F^-1 (output_freq)$ * $F^-1$ : This performs the IFFT on the output_freq, transforming it back into the spatial domain as a complex-valued tensor. 10: unpad output_cmplx * unpad: Any padding applied earlier is removed to restore the original spatial dimensions. 11: output <- concat(output_cmplx.real, output_cmplx.imag) // complex to real value * concat: The real and imaginary parts of the complex output tensor are concatenated to form the final real-valued output tensor. 12: return output 13: end procedure

Effective Kernel Size: Despite FNN utilizing all Fourier transform modes (which mathematically implies an infinite effective kernel size in the spatial domain), its "effective influence length is limited," as illustrated in Figure 1b. This means that after IFFT, the spatial kernel often decays rapidly, making a fixed-length padding sufficient. However, for true global influence and multi-scale problem handling, the multigrid hierarchy is still crucial, especially when coarse grids can no longer adequately resolve high-frequency waves (as shown in Subsection 4.4).

4.2.3.2. Setup Phase

The setup phase of MGCFNN is responsible for processing the PDE coefficients, specifically the wave speed c(x) (or slowness $\kappa(x)$ ) and the damping factor $\gamma(x)$ . This process generates "necessary information for each level in solve phase." It is a streamlined version of the MGCNN setup phase (Xie et al., 2023).

Algorithm 1 Setup Phase Network This algorithm takes the problem coef (coefficient tensor) and the desired level (number of multigrid levels) as input and outputs a list of setup tensors, setup_outs, one for each level.

1: InPUT: coef is problem coefficient tensor, level is number of levels for multigrid hierarchy. 2: OuTpuT: setup_outs is a list of setup tensors for each level. 3: procedure SETuP(coef, level) 4: $setup_out_1 <- ReChannelCNN(coef)$ * ReChannelCNN: This is likely an initial CNN layer that processes the raw coef tensor (e.g., slowness and damping factor) and adjusts its channel dimensions, preparing it for the subsequent nonlinear processing. The _1 indicates the first (finest) level. 5: for l in 1, 2, ..., level do * This loop iterates through each level of the multigrid hierarchy, from the finest ( $l=1$ ) to the coarsest ( $l=level$ ). 6: $setup_out_l <- NonLinearResCNN(setup_out_l)$ // Nonlinear processing * NonLinearResCNN: At each level $l$ , this applies a NonLinear Residual CNN block (e.g., a ResNet-like structure using CNNs) to the current $setup_out_l$ tensor. This performs nonlinear feature extraction and transformation based on the coefficients at that specific grid resolution. 7: $if l < level then$ * This condition ensures that downsampling occurs for all levels except the coarsest one. 8: $setup_out_{l+1} <- RestrictCNN(setup_out_l)$ // Downsampling * RestrictCNN: This block performs downsampling (e.g., pooling, strided convolution, or a specialized restriction operator within a CNN context) on $setup_out_l$ to generate the coefficient information for the next coarser level $l+1$ . 9: end if 10: end for 11: $return setup_outs <- [setup_out_1, setup_out_2, ..., setup_out_level]$ 12: end procedure

This setup phase is executed only once if the PDE coefficients remain fixed, even when the solve phase (which processes the source term) is applied multiple times in an iterative solving framework.

4.2.3.3. Solve Phase

The solve phase of MGCFNN follows the multigrid cycle structure, similar to MGCNN. The key difference is the replacement of CNN components with the custom-designed FNN at certain coarse levels. The solve phase network processes the source term (or right-hand-side term), which is typically the residual in an iterative solver, to propose a solution or correction.

MGCFNN Architecture (Figure 3): The diagram visually represents the multigrid cycle (V-cycle or W-cycle implicitly).

Levels: The network operates on multiple levels (labeled Level 1, Level 2, Level 3, Level 4). Level 1 is the finest grid, and Level 4 is the coarsest.
FNN Kernels: The figure explicitly shows "Its coarsest two levels use FNN kernels." This means that ResNet blocks at Level 3 and Level 4 will incorporate FNN instead of CNN.
Down Cycle:
- Input rhs (right-hand side/residual) enters Level 1.
- At each level $l$ , a ResNet block (either ResCNN or ResFNN) is applied, utilizing the $setup_out_l$ (coefficient information from the setup phase) to process the current status tensor ( $x_l$ ). This is the "smoothing" step.
- After processing, a Restriction operator transfers the status tensor to the next coarser level ( $x_{l+1} = \text{Restrict}(x_l)$ ).
Coarsest Level: At the coarsest level (Level 4 in the diagram), the ResFNN block is applied multiple times, potentially performing a "coarse-grid solve" or heavy smoothing.
Up Cycle:
- From the coarsest level, Prolongation operators transfer the status tensor to the next finer level ( $x_l = x_l + \text{Prolong}(x_{l+1})$ ).
- At each level $l$ (during the up cycle), the prolonged correction from the coarser level is added to the current status tensor.
- Then, another ResNet block (either ResCNN or ResFNN) is applied to further refine the solution.
Output: The final output from Level 1 is the proposed solution or correction.

Steps within each level during both down and up cycles:

Status Tensor Processing: The current status tensor ( $x_l$ $x_{l}$ , which is the solution/correction estimate at level $l$ $l$ ) is processed by a ResNet block. This ResNet (either ResCNN or ResFNN) uses the coefficient information ( $setup_out_l$ $se t u p_{o} u t_{l}$ ) from the setup phase. $x _ { l } = R e s N e t ( s e t u p . o u t _ { l } , x _ { l } ) = N e t ( s e t u p . o u t _ { l } \cdot x _ { l } ) + x _ { l }$ Here:
- $x_l$ : The current status tensor (e.g., solution estimate or residual correction) at level $l$ .
- ResNet: The residual network (either ResCNN using CNN layers or ResFNN using FNN layers) which acts as a "smoother" or "solver" at this level.
- $setup.out_l$ : The pre-processed coefficient information for level $l$ , obtained from the setup phase.
- The Net function represents the main neural network computation within the ResNet block, which likely performs transformations conditioned on $setup.out_l$ . The $+ x_l$ term indicates a residual connection, common in ResNets.
Restriction (Down Cycle): During the down cycle (moving from finer to coarser grids), after processing by ResNet, a restriction operator is applied to transfer the status tensor to the next coarser level. This operator effectively averages or samples the fine-grid data to represent it on the coarse grid.
Prolongation (Up Cycle): During the up cycle (moving from coarser to finer grids), before processing, the status tensor from the coarser level is prolonged (interpolated or upsampled) and added as a correction to the current level's status tensor.

Architecture Naming:

MGCFNN: Refers to networks that use FNN in some coarse levels of the solve phase (as depicted in Figure 3, the coarsest two levels).
MGFNN: Refers to networks that use FNN at all levels (from finest to coarsest) of the solve phase.
MGCNN: Refers to networks that use CNN at all levels of the solve phase.

Multigrid Challenges for High Wavenumber:

Grid Size: For high wavenumbers, the grid must be sufficiently large to capture the fine-scale oscillations. This means that simply reducing grid size for coarser levels can easily lead to undersampling the waves.
Dissimilar Properties: The coarse grid problem, despite having fewer grid points, still faces the same high wavenumber. This means the ratio of wavelength to grid spacing changes dramatically, making the coarse-grid problem fundamentally different from the fine-grid problem in terms of numerical properties. This can degrade the efficiency of standard multigrid components.
- Traditional methods often use a fixed, small number of multigrid levels for higher wavenumbers (Calandra et al., 2013).

Architecture Techniques to address these challenges:

Fixed Number of Levels: The paper uses a fixed number of multigrid levels, determined by the number of points per wavelength. This prevents overly coarse grids from completely failing to resolve waves.
Increased Complexity at Coarser Levels: The network's computational complexity is increased when moving to coarser levels. Specifically, the number of sweeps (layers) for all multigrid-hierarchical solvers is doubled on coarser levels. This allows more computation to be spent on the critical coarse-grid problems.
FNN for Global Influence: The FNN is developed specifically to "transmit wave influence more globally," which is essential when CNNs with their local receptive fields might struggle, especially on coarse grids.

Remark 3.3 (Optimal FNN Use): The optimal use of FNN is problem-dependent. MGCFNN replaces CNN with FNN at "levels where the grid size is nearly insufficient to resolve high-frequency waves but still too large for CNN to handle." Using FNN at all levels (MGFNN) is unnecessary (and potentially inefficient) because CNNs are effective for fine grids where waves are well-resolved. Conversely, FNN is not needed for low-frequency waves, which CNNs and coarse grids can handle. This intelligent hybrid placement of FNN (MGCFNN) is key to its efficiency.

4.2.4. FNN Weights in Frequency and Space Domains (Appendix A)

The paper provides insights into the nature of FNN weights.

Figure 5: The weights of FNN in frequency domain. This figure shows that the learned FNN weights, when visualized in the frequency domain, are not perfectly smooth but show concentrated energy in certain regions, similar to how a wave with a specific frequency would have centralized energy in the frequency domain. There are some "serrations" which the authors attribute to simple interpolation.
Figure 6: The equivalent kernels of the weights of FNN in space domain. By taking the IFFT of the FNN weights from the frequency domain, the equivalent spatial kernels are obtained. These kernels exhibit a larger spatial extent ("larger kernel size than common CNNs") and behave like damped wave propagation, which aligns with the design assumptions for handling wave phenomena.

To further explain this, the paper analyzes the inverse of a damped Helmholtz operator. The one-dimensional stencil of the damped Helmholtz operator is: $[ - 1 , 2 - k _ { h \gamma } ^ { 2 } , - 1 ]$ where $k_{h\gamma}^2 = \omega^2 \kappa^2 h^2 (1 - \gamma i)$ . For a mode $m$ on a grid of size $N$ , the inverse of the damped Helmholtz operator in the frequency domain is: $\frac { 1 } { 2 ( 1 - c o s ( \frac { 2 m } { N } \pi ) ) - k _ { h \gamma } ^ { 2 } } e ^ { \frac { - 2 m n \pi } { N } i }$ where $n$ is the spatial index.
Figure 7a: The inverse of the damped Helmholtz operator in frequency domain. This shows the spectral response of the inverse operator.
Figure 7b: Its corresponding weights in space domain. This shows the spatial kernel of the inverse operator, which resembles the damped wave propagation observed in the learned FNN kernels. The "similar patterns" observed between the learned FNN weights and the analytical inverse of the damped Helmholtz operator suggest that FNN effectively learns this inverse.

4.2.5. Connection with Born Series Method (Appendix B)

As mentioned in Remark 3.2, FNN's role in the solve phase is related to the Born series method. The Born series method iteratively solves the Helmholtz equation in a heterogeneous medium by viewing the heterogeneity as a perturbation. Starting from the Helmholtz equation: $\nabla ^ { 2 } u ( x ) + k ( x ) ^ { 2 } u ( x ) = - f ( x ) , \quad x \in \mathbb { R } ^ { d }$ It can be rewritten with a damped background wavenumber $k_0$ and damping factor $\epsilon$ : $\nabla ^ { 2 } u ( x ) + ( k _ { 0 } ^ { 2 } - i \epsilon ) u ( x ) = - f ( x ) - ( k ( x ) ^ { 2 } - k _ { 0 } ^ { 2 } + i \epsilon ) u ( x )$ Let $g_0$ be the Green's function of the damped Helmholtz operator, and $G = \mathcal{F}^{-1} g_0 \mathcal{F}$ be the corresponding integral operator. Let $V$ be the diagonal matrix of the scattering potential $k(x)^2 - k_0^2 + i\epsilon$ . Then the equation becomes: $u = G V u + G f$ The Born series method solves this iteratively: $\boldsymbol { u } ^ { k + 1 } = G V \boldsymbol { u } ^ { k } + G f$ Comparing this to the FNN solve phase equation: $x _ { l } = F N N ( s e t u p \_ o u t _ { l } \cdot x _ { l } ) + x _ { l }$ The paper states that FNN acts like a damped Helmholtz operator, setup_out contains information from k(x), and $x_l$ is the current state (initially $f$ ). Thus, the FNN in the solve phase can be interpreted as learning a Born series method for solving the Helmholtz equation, iteratively refining the solution by applying a learnable propagation operator.

4.2.6. Iterative Framework And Mixed Precision Strategy (Appendix C)

The MGCFNN is designed to be used within an iterative framework to achieve desired accuracy.

Stationary Iterative Method: For a linear system $A \, sol = rhs$ , a stationary iterative method updates the solution sol iteratively. $sol ^ { k + 1 } = sol ^ { k } + B r ^ { k } = sol ^ { k } + B ( r h s - A sol ^ { k } )$ where:
- $sol^k$ : The solution at the $k$ -th iteration.
- $B$ : The solver, which in this case is the MGCFNN (or FNN/MGCNN) network, providing an approximate error correction based on the residual.
- $r^k = rhs - A \, sol^k$ : The residual vector, representing how far the current solution is from satisfying the equation. In the experiments, the neural network acts as $B$ .
Krylov Subspace Method (GMRES): Krylov subspace methods (Golub & Van Loan, 2013), such as Generalized Minimal Residual (GMRES), can accelerate the iterative process. Here, the solver $B$ (the neural network) acts as a preconditioner.
- A restart technique is used for GMRES (set to 10, or 5 if faster convergence) to manage memory by limiting the size of the Krylov subspace.
Mixed Precision Strategy: To achieve high accuracy (e.g., $rtol=1E-7$ ), a mixed precision strategy is employed:
- The computationally intensive part (the MGCFNN network or the GMRES solve process before restart) operates in float32 precision for speed.
- However, to prevent error accumulation, the residual ( $r^k$ ) is updated in float64 precision. This ensures that the iterative process can accurately converge to high tolerances.

4.2.7. A Computational Inefficiency Issue (Appendix E)

A recognized drawback of the FNN is an "unexpected inefficiency" in computing the linear_map operation (Equation 16) in the frequency domain. The theoretical computational complexity of this operation is $\mathcal{O}(C^2 n^2)$ for an $n \times n$ grid with $C$ channels. This should ideally be as fast as a $1 \times 1$ kernel convolution. However, empirical timing tests reveal that this operation "takes even longer than the FFT2d in PyTorch." This issue is less pronounced in FNO-like methods because they typically consider "significantly fewer modes." This inefficiency restricts FNN applications, although it's primarily used in coarse levels of the multigrid hierarchy, where grid sizes are smaller. Future work is suggested to address this.

5. Experimental Setup

5.1. Datasets

The experiments primarily use synthetic data generated based on established practices in the field, with some real-world-inspired variations.

Source Term (rhs): The rhs (right-hand-side) term, representing the source, is generated as a white noise tensor. This choice is made to ensure that all frequency ranges are covered, providing a comprehensive test for the solver's ability to handle various wave patterns.
Slowness Field ( $\kappa$ ) Generation:
- The spatially varying slowness field $\kappa(x)$ (which is $1/c(x)$ , where c(x) is wave speed) is generated using images from the CIFAR10 (Krizhevsky et al., 2009) dataset.
- CIFAR10 images are interpolated to the target problem size (e.g., $256 \times 256$ , $511 \times 511$ , etc.).
- The pixel values are then linearly transformed to fit a specified range of slowness values, $[\kappa_{\min}, \kappa_{\max}]$ . Default values are $\kappa_{\min}=0.25$ and $\kappa_{\max}=1.0$ .
- This method creates heterogeneous media whose complexity is derived from real-world images, making the problems more realistic than simple analytical patterns.
- Example of a slowness field from CIFAR-10 data:
  
  $该图像是一个示意图，展示了在不同的慢度场下，频率为 $80 ext{π}$ 的单源波的结果。左上角为慢度场图，右上角为真实结果图，左下角为另一慢度场下的结果，右下角为频率结果，以及收敛历史图。整体展示了解决高波数亥姆霍兹方程的方法效果。$ Figure 4a (Implicit in the text description): A slowness field generated from a CIFAR-10 data. (Top-left image in the figure)
Other Datasets for Generalization:
- Ultrasonic CT Scenario: Results from an ultrasonic CT scenario are also presented in Subsection 4.4, described as "more representative of real-world applications." The source is the AI4S Cup competition.
- STL-10 Dataset: For generalization tests (Appendix G), the model trained on CIFAR-10 is applied to the STL-10 (Coates et al., 2011) dataset. STL-10 images have a higher resolution ( $96 \times 96$ ) compared to CIFAR-10.
  - Example of a slowness field generated from STL-10 data:
    
    Figure 9a (Implicit in the text description): A slowness field generated from a STL-10 data. (Top-left image in the figure)
- Marmousi Data: The benchmark Marmousi (Brougois et al., 1990) data is also used for generalization tests (Appendix G). This is a well-known synthetic seismic model, which is typically high-resolution ( $362 \times 1101$ ) and represents complex geological structures.
  - Example of a slowness field from Marmousi data:
    
    Figure 9b (Implicit in the text description): A slowness field from Marmousi. (Top-left image in the figure)

The authors note that $coef_i$ (coefficient tensor) should ideally be generated to fit specific application scenarios, but the chosen datasets provide a good basis for testing.

5.2. Evaluation Metrics

The paper uses a variety of metrics to evaluate different aspects of the solver's performance.

Loss Functions for Training:
- Supervised Learning Loss ( $L_{err}$ ): Used when true solutions are available for comparison (e.g., during model comparison in Subsection 4.2). $L _ { e r r } = \frac { 1 } { N } \sum _ { i = 1 } ^ { N } | | u _ { i } - s o l _ { i } | | _ { 2 } ^ { 2 }$ Where:
  - $N$ : The total number of data samples in the batch.
  - $u_i$ : The true (ground truth) solution for the $i$ -th problem instance.
  - $sol_i$ : The solution predicted by the neural network for the $i$ -th problem instance.
  - $||\cdot||_2^2$ : The squared $L_2$ norm, which measures the squared Euclidean distance between the true and predicted solutions.
  - $L_{err}$ : The mean squared error between predicted and true solutions.
- Unsupervised Learning Loss ( $L_{res}$ ): Used when true solutions are not available, or the goal is to satisfy the PDE (e.g., for scalability tests). $L _ { r e s } = \frac { 1 } { N } \sum _ { i = 1 } ^ { N } | | r h s _ { i } - A _ { i } s o l _ { i } | | _ { 2 } ^ { 2 }$ Where:
  - $N$ : The total number of data samples in the batch.
  - $rhs_i$ : The input source term for the $i$ -th problem instance.
  - $A_i$ : The discretized linear operator (Helmholtz operator) for the $i$ -th problem instance, determined by the coefficient tensor $coef_i$ (slowness field $\kappa$ and damping factor $\gamma$ ).
  - $sol_i$ : The solution predicted by the neural network for the $i$ -th problem instance.
  - $||\cdot||_2^2$ : The squared $L_2$ norm.
  - $L_{res}$ : The mean squared residual, measuring how well the predicted solution satisfies the discretized Helmholtz equation. A smaller residual indicates a more accurate solution.
Performance Metrics for Solvers:
- iters: The number of iterations required for the solver to converge to a specified tolerance. Optimal solvers aim for a low and stable number of iterations.
- time(s): The total time in seconds taken by the solver to reach convergence. This is a crucial metric for evaluating real-world efficiency.
- rtol (Relative Tolerance): A convergence criterion for iterative solvers, typically indicating that the relative residual norm ( $||rhs - A \cdot sol||_2 / ||rhs||_2$ ) has fallen below a certain threshold. The paper uses $rtol=1E-7$ as the target tolerance.
- speedup: A ratio comparing the time(s) of MGCFNN to that of a baseline solver, calculated as Time_Baseline / Time_MGCFNN. A value greater than 1 indicates MGCFNN is faster.
- Parameters(MB): The size of the neural network model in megabytes. A smaller model is generally more memory-efficient and faster to train/deploy.
- Train Error: The $L_2$ error on the training dataset.
- Test Error: The $L_2$ error on an unseen test dataset, indicating generalization capability.
- Epoch: The training epoch at which the best test error was achieved.

5.3. Baselines

The paper compares MGCFNN against a range of contemporary methods, including established neural operator learning methods, specialized AI solvers for Helmholtz equations, and traditional sparse direct solvers.

For Supervised Learning (Learnability Comparison, Subsection 4.2):
- MgNO (He et al., 2024): Multigrid Neural Operator.
- FNO2d (Li et al., 2021): Fourier Neural Operator in 2D.
- U-NO (Rahman et al., 2023): U-shaped Neural Operator.
- Dil-ResNet (Stachenfeld et al., 2021): Dilated Residual Network.
- LSM (Wu et al., 2023): Latent Spectral Models.
- MWT2d (Gupta et al., 2021): Multiwavelet-based Operator Learning.
- U-Net (Ronneberger et al., 2015): A popular convolutional network architecture, often used for image segmentation but adaptable for other tasks. The authors utilize official implementations and adjust parameters for fair comparison under identical settings.
For Scalability and Iterative Solving (Unsupervised Strategy, Subsection 4.3):
- Encoder-solver (Lerer et al., 2024): A pure AI solver specialized for Helmholtz, showing superior performance over traditional shifted Laplacian solvers.
- Wave-ADR-NS (Cui et al., 2024): An AI-enhanced traditional multigrid solver framework for Helmholtz, also demonstrating superior performance.
- CHOLMOD (Davis & Hager, 2009; Chen et al., 2008; Davis et al., 2004; Amestoy et al., 1996; 2004): A sparse LU (lower-upper) decomposition solver from the SuiteSparse software package. This is a traditional direct solver, often considered a benchmark for accuracy and robust for 2D problems, though less scalable to 3D than iterative methods. CHOLMOD is run on CPU with single precision.

5.4. Default Settings

Unless explicitly stated otherwise, the experiments adhere to these default settings:

Slowness Bounds: $\kappa_{\min} = 0.25$ and $\kappa_{\max} = 1.0$ .
Test Results: Represent the median of 50 unseen data samples.
Training: Unsupervised learning is the recommended strategy, except for specific supervised learning comparisons.
Hardware: Training and inference are performed on an RTX 4090 GPU using PyTorch with CUDA 12.4.
Precision: Both training and inference use float32 precision.
Solver Precision: The iterative solver employs a mixed precision strategy (as detailed in Appendix C) to achieve $rtol=1E-7$ tolerance for the norm of the relative residual.
Solver Terminology: "Standalone solver" refers to the network operating within a stationary iterative framework. GMRES is used to speed up convergence.
Scalability Test Training: Models are trained only on a $511 \times 511$ grid and then tested on grid sizes up to $4095 \times 4095$ .
Model Size for Scalability: Smaller models are used in scalability tests to fit within GPU memory constraints, while larger models are used for smaller problems to fully utilize GPU resources.
Hyperparameters: Detailed settings are provided in Appendix H and I.

5.5. Model Architecture Settings (Appendix H)

The number of points per wavelength is a critical factor for determining the model architecture, especially the number of multigrid levels.

Points per wavelength:
- Problems smaller than $511 \times 511$ : around 6 points per wavelength.
- Larger problems (scalability tests): approximately 12 points per wavelength.
Number of Multigrid Levels: Determined such that the coarsest level has around 3 points per wavelength. Additional levels are then added for optimization.
FNN Placement: For MGCFNN, the level with approximately 3 points per wavelength is the starting level for FNN use in the solve phase. The authors find that using FNN in the two coarsest levels is beneficial.
Channels: Larger number of channels are used for smaller problems to fully utilize GPU resources.
Sweeps: The number of sweeps (layers) for all multigrid-hierarchical solvers is doubled on coarser levels to increase computational complexity where needed.

The following are the detailed model architecture settings for the solve phase:

Table 7: Solve phase settings for around 6 points in a wavelength. The following are the results from Table 7 of the original paper:

Model	Settings
MGCFNN, MGFNN	32 channels, 1, 2, 4 sweeps for level 1, 2, 3
FNN	32 channels, 8 sweeps for one level

Table 8: Solve phase settings for around 12 points in a wavelength. The following are the results from Table 8 of the original paper:

Model	Settings
MGCFNN MGCNN	12 channels, 1, 2, 4, 8 sweeps for level 1, 2, 3, 4 12 channels, 1, 2, 4, 8, 16 sweeps for level 1, 2, 3, 4, 5

Kernel Sizes:
- CNNs within ResNets (both solve and setup phases): kernel size of 5.
- Restriction and Prolongation CNNs: kernel size of 3.
- All skip add operations: kernel size of 1.
Setup Phase Sweeps: Two sweeps for each level.
Setup Phase Channels: Consistent with the solve phase to ensure network compatibility.

5.6. Training Settings (Appendix I)

Table 9: Training settings. The following are the results from Table 9 of the original paper:

Parameter	Value	Description
epochs	120
num_data	10000	number of data used in training
lr	0.001	initial learning rate
optimizer	Adam	scheduler: step_size=6, gamma=0.8
batch_size	10

6. Results & Analysis

The paper presents a comprehensive evaluation of MGCFNN's performance, focusing on its learning capabilities, scalability, and the effectiveness of its architectural components.

6.1. Core Results Analysis

6.1.1. Comparison of Different Solvers (Supervised Learning)

This section demonstrates the learnability of MGCFNN and MGFNN against other prominent operator learning methods in a supervised setting. The comparison is made after a single inference pass, as most baselines are not designed for iterative use.

Settings: Grid size $256 \times 256$ , angular frequency $\omega = 80\pi$ (setting a high wavenumber $k$ ). The minimum slowness $\kappa_{\min}$ is varied (0.75, 0.50, 0.25) to test performance in increasingly heterogeneous (and thus harder) media. Best Test Error refers to the point (epoch) where the model achieved its lowest test error.

Table 1: Supervised learning on grid $256 \times 256$ with $\omega = 80\pi$ and $\kappa_{\min} = 0.75$ . We record information Where the Best Test Error reaches. The following are the results from Table 1 of the original paper:


	κmin = 0.75 Where Best Test Error					Parameters(MB)
	Model Train Error	Test Error	Train Error	Epoch			Train Time(s/epoch)
MGFNN	0.035	0.061	0.035	120	93.2	8.9
MGCFNN	0.046	0.070	0.046	120	67.5	5.3
MGNO	0.063	0.079	0.063	120	110.3	4.6
FNO2D	0.085	0.561	0.496	4	103.6	46.1
MWT2D	0.119	0.527	0.475	4	147.0	26.0
U-NO	0.408	0.880	0.870	4	101.6	86.7
U-NET	0.534	0.803	0.758	31	89.9	31.0
DIL-RESNET	0.605	0.606	0.605	116	140.0	0.6
LSM	0.722	0.783	0.739	66	230.7	4.9

Table 2: Supervised learning on grid $256 \times 256$ with $\omega = 80\pi$ and $\kappa_{\min} = 0.50, 0.25$ . We record information Where the Best Test Error reaches. The following are the results from Table 2 of the original paper:

	Where Best Test Error				Where Best Test Error
	κmin = 0.50				κmin = 0.25
	Model Train Error	Test Error	Train Error	Epoch	Train Error	Test Error	Train Error	Epoch
MGFNN	0.075	0.187	0.122	25	0.126	0.431	0.253	18
MGCFNN	0.101	0.206	0.104	88	0.190	0.458	0.396	7
MGNO	0.209	0.333	0.221	81	0.361	0.633	0.614	14
FNO2D	0.122	0.749	0.718	3	0.154	0.851	0.818	3
MWT2D	0.169	0.728	0.669	4	0.186	0.845	0.821	3
U-NO	0.389	0.871	0.864	3	0.362	0.935	0.930	4
U-NET	0.634	0.774	0.716	57	0.575	0.806	0.778	26
DIL-RESNET	0.627	0.629	0.627	117	0.699	0.702	0.699	119
LSM	0.852	0.860	0.852	117	0.867	0.878	0.874	38

Analysis:

Superior Performance of MGCFNN/MGFNN: Across all levels of heterogeneity ( $\kappa_{\min}$ from 0.75 to 0.25), MGCFNN and MGFNN consistently achieve the lowest test errors, indicating superior learning capabilities and generalization. For example, with $\kappa_{\min}=0.75$ , MGFNN achieves a test error of 0.061, significantly lower than the next best (MGNO at 0.079) and drastically better than FNO2D (0.561).
Robustness to Heterogeneity: As the medium becomes more heterogeneous (decreasing $\kappa_{\min}$ ), the test errors generally increase for all models, but MGCFNN and MGFNN maintain their relative advantage. This demonstrates their robustness in challenging environments.
Overfitting in Other Methods: Methods like FNO2D, MWT2D, and U-NO show low training errors but very high test errors, and often achieve their best test error very early in training (e.g., Epoch 3 or 4). This indicates severe overfitting and poor generalization for high wavenumber Helmholtz problems.
Efficiency: MGCFNN generally has fewer parameters and shorter training times per epoch compared to many baselines, while still achieving better performance. MGCFNN is particularly efficient compared to MGFNN (5.3 MB vs 8.9 MB, 67.5 s/epoch vs 93.2 s/epoch for $\kappa_{\min}=0.75$ ) due to its selective use of FNN.
Implication for Iterative Solving: The results, especially for higher heterogeneity ( $\kappa_{\min}=0.50, 0.25$ ), suggest that a single pass of a neural network (even the best ones) may not be sufficient to achieve desired accuracy for high wavenumber and heterogeneous Helmholtz equations. This motivates the paper's focus on iterative solving.

6.1.2. Scalability Comparison (Unsupervised Learning)

This section focuses on the MGCFNN's ability to iteratively achieve high accuracy and scale to larger problems with higher wavenumbers, using an unsupervised training strategy. It compares MGCFNN against recent specialized AI solvers and a traditional sparse direct solver.

Settings: $rtol=1E-7$ as convergence criterion. Results are presented for various grid sizes (from $511 \times 511$ up to $4095 \times 4095$ ) and angular frequencies ( $\omega = 80\pi$ up to $\omega = 640\pi \approx k \approx 2000$ ). The damping factor $\gamma$ varies depending on the comparison (Lerer et al. used $\gamma=0.01$ , Cui et al. used $\gamma=0.0$ ).

Table 3: Scalability comparison with other solvers. The following are the results from Table 3 of the original paper:

		time (s) & iters			time (s) & iters			time(s)
rtol=1E-7		γ = 0.01 (RTX3090) Lerer et al. (2024)			γ = 0.0 (A100 80G) Cui et al. (2024)			γ = 0.0 (CPU)
		MGCFNN	ENCODER-SOLVER	speedup	MGCFNN	WAVE-ADR-NS	speedup	CHOLMOD	speedup
80π	511 × 511	0.12(9)	0.65(43)	5.5	0.16(14)	15.07(28)	94.8	0.49	3.1
160π	1023 × 1023	0.19(11)	1.29(68)	6.8	0.30(22)	34.98(54)	116.2	8.88	29.5
320π	2047× 2047	0.58(14)	3.40(85)	5.8	1.15(40)	91.63(122)	79.6	38.91	33.8
640π	4095 × 4095	2.77(18)	13.34(117)	4.8	8.55(83)	286.14(247)	33.5	183.61	21.5

Analysis:

Dominant Performance: MGCFNN consistently outperforms all other solvers (encoder-solver, Wave-ADR-NS, and CHOLMOD) across all tested grid sizes and wavenumbers.
Significant Speedup:
- Against Encoder-solver (Lerer et al., 2024, $\gamma=0.01$ ): MGCFNN achieves speedups ranging from $4.8 \times$ to $6.8 \times$ . For the largest problem ( $4095 \times 4095$ , $640\pi$ ), MGCFNN takes 2.77s compared to 13.34s for Encoder-solver. It also requires significantly fewer iterations (18 vs 117).
- Against Wave-ADR-NS (Cui et al., 2024, $\gamma=0.0$ ): The speedups are even more dramatic, ranging from $33.5 \times$ to $116.2 \times$ . For the largest problem, MGCFNN takes 8.55s compared to 286.14s for Wave-ADR-NS. Iterations are also substantially lower (83 vs 247).
- Against CHOLMOD (traditional direct solver on CPU, $\gamma=0.0$ ): MGCFNN shows speedups from $3.1 \times$ to $33.8 \times$ . While CHOLMOD is quite fast for smaller problems, its solve time scales much worse for larger problems, making MGCFNN highly superior.
Optimal Convergence for High Wavenumber: The number of iterations for MGCFNN grows relatively slowly with increasing wavenumber/grid size, indicating an efficient and scalable iterative process. This confirms its optimal convergence property.
Hardware Efficiency: It's important to note that the encoder-solver results are on RTX3090, Wave-ADR-NS on A100 80G, and CHOLMOD on CPU. Despite potential hardware differences (especially A100 being more powerful than RTX4090 used for MGCFNN), MGCFNN still demonstrates superior performance, underlining its algorithmic efficiency.

6.1.3. Ablation Study

The ablation study investigates the individual contributions of the FNN and the multigrid hierarchy to the overall performance of MGCFNN.

Fixed Grid Size Analysis:
- Settings: Two datasets: $256 \times 256$ grid with $\omega = 80\pi$ , and $480 \times 480$ grid with $\omega = 150\pi$ (from ultrasonic CT competition). $rtol=1E-7$ .

Table 4: Model architecture ablation study. The following are the results from Table 4 of the original paper:

	standalone			GMRES			standalone			GMRES
rtol=1E-7	grid size 256 × 256, ω = 80π						grid size 480 × 480, ω = 150π
	MGCFNN	MGFNN	FNN	MGCFNN	MGFNN	FNN	MGCFNN	MGFNN	FNN	MGCFNN	MGFNN	FNN
iters	22	19	55	15	13	24	12	9	54	11	10	22
time(s)	0.136	0.135	0.270	0.111	0.116	0.151	0.093	0.101	0.724	0.105	0.124	0.365

Analysis:

Multigrid Hierarchy Effectiveness: Comparing FNN (single-level) with MGFNN (multigrid with FNN at all levels) and MGCFNN (multigrid with FNN at coarse levels):
- For $256 \times 256$ : FNN requires 55 iterations, while MGFNN needs only 19, and MGCFNN needs 22. This clearly shows that the multigrid hierarchy significantly accelerates convergence, reducing iterations by more than half.
- Similar trends are observed for $480 \times 480$ : FNN (54 iters) vs MGFNN (9 iters) vs MGCFNN (12 iters).
Hybrid (MGCFNN) vs Full FNN (MGFNN):
- MGFNN often achieves slightly fewer iterations than MGCFNN (e.g., 19 vs 22 for $256 \times 256$ , 9 vs 12 for $480 \times 480$ ). This suggests that using FNN at all levels can slightly improve the "algorithmic" convergence in terms of iteration count.
- However, MGCFNN achieves comparable or even slightly faster solve times (e.g., 0.136s vs 0.135s for $256 \times 256$ standalone; 0.093s vs 0.101s for $480 \times 480$ standalone). This indicates that the computational overhead of using FNN at finer levels (as in MGFNN) negates the benefit of fewer iterations, especially considering the computational inefficiency issue of FNN's linear_map (Appendix E).
- MGCFNN also uses less memory than MGFNN (5.3 MB vs 8.9 MB in Table 1), making it more suitable for large-scale problems.
GMRES Preconditioning: Using GMRES as a wrapper significantly reduces the iteration count for all models (FNN, MGFNN, MGCFNN), demonstrating its effectiveness as a preconditioner. MGCFNN with GMRES often achieves the fastest solve times.

The following figure (Figure 4b from the original paper) shows a single source wave with $\omega = 150\pi$ , in a slowness field generated from a CIFAR-10 data. This visualizes the output of the solver and the convergence process.

Figure 4b: The first column shows the slowness field ( $\kappa(x)$ ) and the position of the source (cross symbol). The second column shows the real part of the computed wavefield solution. The third column shows the real part of the wavefield in the frequency domain. The fourth column displays the iterative solving history, including the residual norm over iterations and the cumulative time taken.

The following figure (Figure 4a from the original paper) shows a single source wave with $\omega = 80\pi$ , in a slowness field generated from a CIFAR-10 data.

该图像是（b）部分的示意图，展示了在一维慢度场中一个单源波的传播特征。左侧为慢度分布，右侧展示了结果的实部和频率实部，以及收敛历史，反映了在迭代过程中收敛的情况。 Figure 4a: Similar to Figure 4b, this shows the slowness field, the real part of the wavefield solution, the real part in the frequency domain, and the iterative solving history for a different wavenumber and slowness field.

Scalability Comparison between MGCFNN and MGCNN:
- Settings: Various grid sizes and angular frequencies, $rtol=1E-7$ .

Table 5: Scalability comparison between MGCFNN and MGCNN. The following are the results from Table 5 of the original paper:

		MGCFNN		MGCNN		MGCFNN		MGCNN
rtol=1E-7		standalone				GMRES
		iters	time(s)	iters	time(s)	iters	time(s)	iters	time(s)
80π	size 511 × 511	14	0.17	35	0.27	12	0.17	21	0.20
160π	1023× 1023	22	0.31	61	0.55	20	0.31	36	0.37
320π	2047 × 2047	41	1.18	115	2.33	35	1.13	71	1.65
640π	4095 × 4095	83	8.18	231	18.66	72	8.48	146	14.56

Analysis:

MGCFNN's Superiority over MGCNN: MGCFNN consistently requires significantly fewer iterations (e.g., 83 vs 231 for standalone $4095 \times 4095$ ) and achieves faster solve times (e.g., 8.18s vs 18.66s for standalone $4095 \times 4095$ ) compared to MGCNN. This confirms that the specialized FNN component is crucial for handling high wavenumber Helmholtz equations. MGCFNN effectively learns the inverse operator.
Optimal $\mathcal{O}(k)$ Convergence: Both MGCFNN and MGCNN exhibit an optimal $\mathcal{O}(k)$ convergence property, meaning the number of iterations scales linearly with the wavenumber $k$ . This is a desirable characteristic for efficient solvers.
GMRES Benefits: Similar to Table 4, GMRES significantly reduces iteration counts and solve times for both MGCFNN and MGCNN, highlighting its role as an effective preconditioner. For instance, MGCFNN with GMRES for the largest problem takes 72 iterations and 8.48s, slightly more iterations but comparable time to standalone MGCFNN. However, MGCNN sees a more pronounced benefit from GMRES to reduce its high iteration count (231 to 146).
Despite Inefficiency: Even with the "unexpected computation inefficiency of linear map in frequency domain" (Appendix E), MGCFNN still outperforms MGCNN in solve time, underscoring the algorithmic advantage of FNN for Helmholtz equations.

Remark 4.1 (Comparison to Traditional Solvers): The solve time for the largest problem ( $4095 \times 4095$ , $640\pi$ ) with MGCFNN is approximately 8.2 seconds. This represents a "significant improvement" over recent parallel traditional solvers like source transfer (Leng & Ju, 2019) and trace transfer (Leng & Ju, 2022) methods, which typically require "over fifty seconds" for similar problems.

6.1.4. Scalability Test Details (Appendix F)

This section provides additional details on the scalability tests, particularly concerning the variability in convergence behavior. The following figure (Figure 8 from the original paper) shows the scalability test over 50 unseen data.

该图像是一个结果展示图，左侧显示了‘Slowness’的热图，右侧是与高波数Helmholtz方程求解相关的不同结果，包括真实值和频域结果。最右侧是收敛历史图，显示了迭代过程中收敛的趋势和时间信息。 Figure 8: This figure shows bar charts for iters (iterations) and time(s) (solve time) for 50 unseen data samples for both MGCFNN (upper row) and MGCNN (lower row) across different grid sizes and angular frequencies. The bottom row indicates how the convergence criteria and mixed precision strategy affect iters and time(s) over these data samples. It clearly shows the variability in iterations and time needed for convergence across different data for both models.

Analysis:

The figure illustrates that the iterations required for convergence "vary across different data." This variability is a "characteristic typical of the Helmholtz equation."
This is attributed to the fact that certain speed (or slowness) distributions can exacerbate the ill-conditioned nature of the Helmholtz equation, leading to phenomena like trapping (Graham et al., 2019; Ralston, 1971), which makes convergence harder for some problem instances.

6.1.5. Generalization to Other Distributions (Appendix G)

The paper evaluates the generalization capability of MGCFNN by applying a model trained on the CIFAR-10 dataset to other datasets with different characteristics.

Settings: Model trained on CIFAR-10. Tested on STL-10 (higher resolution $96 \times 96$ ) and Marmousi (benchmark seismic data, $362 \times 1101$ ). $rtol=1E-7$ .

Table 6: Generalization results of number of iterations on STL-10 dataset and Marmousi. The model is trained on CIFAR-10 dataset. The following are the results from Table 6 of the original paper:

rtol=1E-7, iters		standalone			GMRES
	size	CIFAR-10	STL-10	Marmousi	CIFAR-10	STL-10	Marmousi
80π	511 × 511	14	39	26	12	32	21
160π	1023 × 1023	22	41	35	20	33	29
320π	2047 × 2047	41	59	52	35	51	42
640π	4095 × 4095	83	109	92	72	94	79

Analysis:

Good Generalization Performance: MGCFNN demonstrates "good generalization performance" on unseen data distributions (STL-10 and Marmousi), which are notably different in resolution and complexity from the training CIFAR-10 data.
Increased Iterations: While still effective, the number of iterations required to converge on STL-10 and Marmousi datasets is generally higher than on CIFAR-10. For example, at $640\pi$ and $4095 \times 4095$ , MGCFNN needs 83 iterations for CIFAR-10, but 109 for STL-10 and 92 for Marmousi (standalone).
Reason for Increase: This increase in iterations is expected because "a more complex medium results in more optic rays," which in turn influences the convergence of iterative solvers (Galkowski et al., 2024). The Marmousi model, in particular, represents very complex geological structures.
GMRES Consistency: GMRES continues to reduce iteration counts across these varied datasets, maintaining its preconditioning benefits.

The following figure (Figure 9a from the original paper) shows a single source wave with $\omega = 80\pi$ , in a slowness field generated from a STL-10 data.

Figure 9a: This figure visualizes the slowness field from an STL-10 image, the resulting real part of the wavefield, its frequency domain representation, and the convergence history for a single source problem.

The following figure (Figure 9b from the original paper) shows a single source wave with $\omega = 160\pi$ , in a slowness field from Marmousi.

该图像是（b）部分的示意图，展示了在一维慢度场中一个单源波的传播特征。左侧为慢度分布，右侧展示了结果的实部和频率实部，以及收敛历史，反映了在迭代过程中收敛的情况。 Figure 9b: This figure visualizes the slowness field from the Marmousi model, the resulting real part of the wavefield, its frequency domain representation, and the convergence history for a single source problem.

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper successfully introduces MGCFNN, a novel neural multigrid solver specifically designed to overcome the notorious challenges of solving high wavenumber Helmholtz equations. By adapting the MGCNN architecture and integrating a novel Fourier Neural Network (FNN), the authors have created a powerful solver that leverages both the multi-scale efficiency of multigrid methods and the spectral properties of wave propagation.

Key contributions and findings include:

The FNN effectively manages full-mode information and scales to larger sizes and higher wavenumbers, behaving like a CNN with an extended kernel to accelerate source influence propagation.
MGCFNN demonstrates superior learnability in supervised settings, outperforming other operator learning methods with lower errors, shorter training times, and fewer parameters.
In iterative, unsupervised solving scenarios, MGCFNN achieves significant speedups (up to $116.2 \times$ ) over state-of-the-art specialized AI solvers and AI-enhanced traditional solvers, as well as sparse direct solvers, for high wavenumber problems.
The ablation studies confirm the critical roles of both the multigrid hierarchy and the FNN component, highlighting the benefits of their complementary features in the hybrid MGCFNN model.
MGCFNN and the modified MGCNN exhibit an optimal convergence property of $\mathcal{O}(k)$ for wavenumbers up to approximately 2000. MGCFNN excels in both iteration count and solve time.
The solver exhibits good generalization capabilities to different data distributions (STL-10, Marmousi) despite being trained on CIFAR-10.

Overall, MGCFNN establishes a new state-of-the-art for AI solvers in this domain, matching the high wavenumber scope of advanced traditional solvers while drastically reducing solve times.

7.2. Limitations & Future Work

The authors acknowledge several limitations and propose areas for future exploration:

Simplistic FNN Interpolation: The current interpolation operation within FNN (Case 1 in Algorithm 2) is described as "simplistic." Improving this could further enhance FNN's efficiency and robustness.
Unexplored FNN Properties: The full properties and best practices for FNN in solving Helmholtz equations are "not yet fully explored." Further research is needed to optimize its design and application.
Boundary Conditions: The paper primarily focuses on absorbing boundary conditions. It has "not addressed various boundary conditions," such as reflection boundary conditions. These might conflict with the inherent periodic boundary effects of the Fourier transform or the padding technique used in FNN, requiring specific adaptations.
Computational Inefficiency of Linear Map: The linear_map operation within FNN (Equation 16) suffers from an "unexpected inefficiency" (Appendix E), taking longer than FFT2d. Developing a more efficient implementation of this operation is a crucial area for future work to further improve FNN's speed.

7.3. Personal Insights & Critique

This paper presents a highly impressive advancement in AI-driven PDE solvers, particularly for the notoriously difficult Helmholtz equation.

Innovation of Hybrid Approach: The core innovation lies in the intelligent fusion of multigrid principles with a specialized Fourier Neural Network. Multigrid is a cornerstone of numerical PDEs, and adapting it with data-driven components in a principled way (MGCNN -> MGCFNN) is a strong research direction. The FNN itself is a clever modification of FNO, specifically designed to handle the characteristics of wave propagation (full modes, extended spatial influence) in a computationally efficient manner (despite the current identified inefficiency). This hybrid approach capitalizes on the strengths of both worlds: the global communication of multigrid and the frequency-domain efficiency of FNN.
Scalability and High Wavenumber Achievement: The ability to achieve optimal $\mathcal{O}(k)$ convergence up to $k \approx 2000$ and demonstrate significant speedups against leading AI and traditional solvers on large grids is a remarkable feat. This pushes the boundaries of what AI solvers can realistically accomplish in scientific computing, making them competitive with, or superior to, highly optimized traditional methods.
Unsupervised Training: The emphasis on unsupervised training is a critical advantage. Generating ground truth solutions for complex, high wavenumber Helmholtz equations is computationally expensive, often requiring powerful traditional solvers. An unsupervised approach bypasses this bottleneck, making the method more practical and self-contained.
Problem-Specific Design: The paper excels in demonstrating how general neural network architectures need to be specifically tailored to the underlying physics and numerical challenges of a problem. The modifications from FNO to FNN and the strategic placement of FNN within the multigrid hierarchy are excellent examples of this.

Potential Issues & Areas for Improvement:

Efficiency of linear_map: The acknowledged computational inefficiency of the linear_map operation in FNN is a significant concern. While the current implementation still provides speedups due to algorithmic superiority, optimizing this core component could unlock even greater performance. This suggests that the current speedup is somewhat "held back" by an implementation detail, rather than a fundamental algorithmic limit.
Generalization to 3D: The current work focuses on 2D problems. Helmholtz equations in 3D are vastly more complex computationally. While the principles might extend, the practical challenges of memory, computational cost, and potential for FNN's linear_map inefficiency to become critical in 3D need thorough investigation.
Boundary Conditions: The current focus on absorbing boundary conditions is practical for many wave propagation problems. However, extending MGCFNN to handle reflective or periodic boundary conditions without compromising the benefits of FNN (which implicitly assumes periodicity via FFT) will be a non-trivial challenge.
Interpretability and Design Principles of FNN: While the paper provides a connection to the Born series, a deeper theoretical understanding of why certain FNN architectures and interpolation strategies work best (or fail) would be valuable for future adaptations and optimizations. The "serration" in frequency domain weights, for instance, hints at areas for further refinement.

Overall, this paper provides a robust and highly performant solution to a long-standing challenge in numerical PDEs. Its methodology, especially the FNN and its integration, offers a promising direction for developing specialized neural network components tailored to the spectral characteristics of various PDEs within a multi-scale solution framework.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.

MGCFNN: A NEURAL MULTIGRID SOLVER WITH NOVEL FOURIER NEURAL NETWORK FOR HIGH WAVENUMBER HELMHOLTZ EQUATIONS

TL;DR Summary

Abstract

Mind Map

In-depth Reading

English Analysis~36 min read · 50,767 chars

1. Bibliographic Information

1.1. Title

1.2. Authors

1.3. Journal/Conference

1.4. Publication Year

1.5. Abstract

1.6. Original Source Link

2. Executive Summary

2.1. Background & Motivation

2.2. Main Contributions / Findings

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

3.2. Previous Works

Born Series Method for Helmholtz Equation (Appendix B)

3.3. Technological Evolution

3.4. Differentiation Analysis

4. Methodology

4.1. Principles

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Formulation and Discretization

4.2.2. Fourier Transform

4.2.3. Network Architecture

4.2.3.1. Fourier Neural Network (FNN)

4.2.3.2. Setup Phase

4.2.3.3. Solve Phase

4.2.4. FNN Weights in Frequency and Space Domains (Appendix A)

4.2.5. Connection with Born Series Method (Appendix B)

4.2.6. Iterative Framework And Mixed Precision Strategy (Appendix C)

4.2.7. A Computational Inefficiency Issue (Appendix E)

5. Experimental Setup

5.1. Datasets

5.2. Evaluation Metrics

5.3. Baselines

5.4. Default Settings

5.5. Model Architecture Settings (Appendix H)

5.6. Training Settings (Appendix I)

6. Results & Analysis

6.1. Core Results Analysis

6.1.1. Comparison of Different Solvers (Supervised Learning)

6.1.2. Scalability Comparison (Unsupervised Learning)

6.1.3. Ablation Study

6.1.4. Scalability Test Details (Appendix F)

6.1.5. Generalization to Other Distributions (Appendix G)

7. Conclusion & Reflections

7.1. Conclusion Summary

7.2. Limitations & Future Work

7.3. Personal Insights & Critique

Similar papers

`Born Series Method` for Helmholtz Equation (Appendix B)