MscaleFNO: Multi-scale Fourier Neural Operator Learning for Oscillatory Function Spaces
TL;DR Summary
This paper introduces MscaleFNO, a multi-scale Fourier neural operator that reduces spectral bias in learning mappings between highly oscillatory functions. It shows significant performance improvements in high-frequency wave scattering problems by employing parallel scaled FNOs.
Abstract
In this paper, a multi-scale Fourier neural operator (MscaleFNO) is proposed to reduce the spectral bias of the FNO in learning the mapping between highly oscillatory functions, with application to the nonlinear mapping between the coefficient of the Helmholtz equation and its solution. The MscaleFNO consists of a series of parallel normal FNOs with scaled input of the function and the spatial variable, and their outputs are shown to be able to capture various high-frequency components of the mapping's image. Numerical methods demonstrate the substantial improvement of the MscaleFNO for the problem of wave scattering in the high-frequency regime over the normal FNO with a similar number of network parameters.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
MscaleFNO: Multi-scale Fourier Neural Operator Learning for Oscillatory Function Spaces
1.2. Authors
-
Zhilin You
-
Zhenli Xu
-
Wei Cai 2
School of Mathematical Sciences, MOE-LSC and CMA-Shanghai, Shanghai Jiao Tong University, Shanghai, China Department of Mathematics, Southern Methodist University, Dallas, TX, USA
1.3. Journal/Conference
This paper is published on arXiv, a preprint server. As such, it has not yet undergone formal peer review for publication in a journal or conference. arXiv is a widely recognized platform for disseminating early research in physics, mathematics, computer science, and related fields.
1.4. Publication Year
December 31, 2024 (as indicated in the paper's footer, though the arXiv publication date is 2024-12-28T15:40:45.000Z).
1.5. Abstract
This paper introduces the multi-scale Fourier neural operator (MscaleFNO), an approach designed to mitigate the spectral bias inherent in standard Fourier Neural Operators (FNOs). This bias often hinders FNOs from effectively learning mappings between highly oscillatory functions. The MscaleFNO achieves this by employing a series of parallel, standard FNOs, each receiving scaled inputs of both the function and its spatial variable. The outputs from these parallel FNOs are shown to effectively capture various high-frequency components of the mapping's target image. The paper demonstrates the utility of MscaleFNO through its application to the nonlinear mapping between the coefficient of the Helmholtz equation and its solution, particularly in the high-frequency wave scattering regime. Numerical experiments reveal a substantial improvement in performance compared to a standard FNO with a similar number of network parameters.
1.6. Original Source Link
-
Original Source Link:
https://arxiv.org/abs/2412.20183v1 -
PDF Link:
https://arxiv.org/pdf/2412.20183v1.pdfThe paper is currently available as a preprint on arXiv.
2. Executive Summary
2.1. Background & Motivation
The core problem addressed by this paper is the spectral bias of Deep Neural Networks (DNNs), specifically as it applies to operator learning. DNNs have a well-documented tendency to prioritize learning low-frequency components of functions during training, often struggling to accurately represent or capture high-frequency content. This becomes a significant challenge in operator learning tasks where the input and output functions are highly oscillatory, meaning they exhibit rapid changes and intricate patterns.
This problem is particularly critical in various scientific and engineering domains, such as wave scattering phenomena, where the relationship between physical properties (e.g., material coefficients) and system responses (e.g., wave fields) involves complex, high-frequency oscillations. Traditional numerical methods for solving partial differential equations (PDEs) like the Helmholtz equation are computationally intensive, requiring repeated simulations for each new parameter configuration. Operator learning offers a promising alternative by learning a universal mapping, but the spectral bias of standard models like the Fourier Neural Operator (FNO) limits their effectiveness in these high-frequency regimes.
The paper's entry point is to adapt the successful multi-scale approach, previously applied to DNNs for function approximation (MscaleDNN), to the more complex domain of operator learning with FNOs. The innovative idea is to address the spectral bias in operator learning by simultaneously scaling both the spatial coordinates and the input function itself across multiple parallel FNO branches.
2.2. Main Contributions / Findings
The primary contributions and key findings of this paper are:
-
Proposal of MscaleFNO Architecture: The paper introduces the
Multi-scale Fourier Neural Operator (MscaleFNO). This novel architecture extends themulti-scale DNN (MscaleDNN)concept tooperator learning, specifically tailored forFNOs. It comprises multiple parallelFNOsub-networks, each processing inputs that are scaled differently in both spatial coordinates and function values. -
Mitigation of Spectral Bias in Operator Learning: The
MscaleFNOis explicitly designed to reduce thespectral biasofFNOs, enabling them to effectively learn mappings betweenhighly oscillatory functions. This is achieved by allowing different sub-networks to specialize in capturing frequency components at their respective scales. -
Enhanced High-Frequency Representation: The
MscaleFNOdemonstrates a superior capability in capturing high-frequency components of the mapping's image. Numerical results, particularlyDiscrete Fourier Transform (DFT)analyses, confirm thatMscaleFNOaccurately reconstructs the full spectrum of oscillatory solutions, unlike standardFNOswhich tend to smooth out high-frequency details. -
Application to Helmholtz Equation: The proposed
MscaleFNOis successfully applied to the challenging problem of mapping thecoefficientof theHelmholtz equationto itssolution. This is a crucial application inwave scattering, especially inhigh-frequency regimes. -
Substantial Performance Improvement: Numerical experiments show that
MscaleFNOsignificantly outperforms standardFNOsin accuracy for highly oscillatory problems, often by an order of magnitude or more, even when both models have a similar number of parameters. This improvement is consistent across various test cases, including single-frequency, multi-frequency, and varying domain lengths, and demonstrates robust generalization to unseen data distributions. -
Hierarchical Frequency Decomposition: The analysis of individual sub-network contributions within
MscaleFNOreveals a systematichierarchical frequency decomposition. Sub-networks with smaller scaling factors capturelow-frequency patterns, while those with larger scaling factors effectively extracthigh-frequency details, collectively achieving a comprehensivespectral representation.In essence, the paper provides a practical and effective solution to a critical limitation of
neural operatorswhen dealing withoscillatory phenomena, thereby broadening their applicability in scientific computing.
3. Prerequisite Knowledge & Related Work
This section provides the necessary background for understanding the MscaleFNO paper.
3.1. Foundational Concepts
3.1.1. Operator Learning
Operator learning is a subfield of machine learning that focuses on learning mappings between infinite-dimensional function spaces. Unlike traditional machine learning which learns mappings between finite-dimensional vectors (e.g., classifying an image, predicting a stock price), operator learning aims to learn a functional relationship that maps an entire input function to an entire output function. For example, in physics, this could involve learning the mapping from a material's conductivity field (an input function) to the resulting temperature distribution (an output function) without needing to solve the underlying Partial Differential Equation (PDE) every time. This is particularly powerful because it allows for generalization to new functions, not just new discrete data points.
3.1.2. Neural Networks (DNNs)
Deep Neural Networks (DNNs) are computational models inspired by the structure and function of the human brain. They consist of multiple layers of interconnected nodes (neurons), organized into an input layer, several hidden layers, and an output layer. Each connection between neurons has an associated weight, and each neuron has a bias. During training, the network learns to adjust these weights and biases to transform input data into desired output data. DNNs are highly versatile and have achieved state-of-the-art performance in various tasks like image recognition, natural language processing, and pattern recognition. The Fourier Neural Operator and MscaleFNO are specialized types of DNNs.
3.1.3. Spectral Bias
Spectral bias, also known as the frequency principle, refers to the empirical observation that DNNs tend to learn low-frequency components of a function faster and more efficiently than high-frequency components during training. In other words, a DNN will approximate the smooth, overall shape of a target function much more quickly than its rapid oscillations or fine details. This is a fundamental limitation when dealing with highly oscillatory functions, where high-frequency content carries crucial information. This bias is attributed to the internal mechanisms of DNNs, such as optimization algorithms (e.g., gradient descent) and architecture choices.
3.1.4. Fourier Transform and Discrete Fourier Transform (DFT) / Fast Fourier Transform (FFT)
The Fourier Transform is a mathematical operation that decomposes a function into its constituent frequencies. It transforms a function from its original domain (e.g., time or space) to the frequency domain, representing it as a sum of sine and cosine waves of different amplitudes and frequencies.
-
For a continuous function
f(x), itsFourier Transformis . -
The
Inverse Fourier Transformreconstructs the original function: .The
Discrete Fourier Transform (DFT)is the discrete version of theFourier Transform, applied to a sequence of sampled data points. TheFast Fourier Transform (FFT)is an efficient algorithm for computing theDFT. Inneural operatorslikeFNO,FFTis crucial for efficiently moving between the spatial and frequency domains, enabling computations directly in the frequency domain.
3.1.5. Helmholtz Equation
The Helmholtz equation is a linear elliptic partial differential equation that arises in the study of physical phenomena involving wave propagation in space. It is often derived from the wave equation by applying the Fourier Transform in time, reducing a time-dependent problem to a time-independent one.
In 1D, the Helmholtz equation is typically written as:
$
\Delta u + k^2 u = f
$
where is the Laplacian operator (e.g., in 1D), is the wave field, is the wave number (related to frequency), and is a source term. The paper uses for the wave number squared, which can be spatially varying:
$
\Delta u + a^2(\pmb{x}) u = f(\pmb{x})
$
This equation describes wave scattering (e.g., acoustic, electromagnetic) where represents material properties (like conductivity) that influence wave propagation. Solutions to the Helmholtz equation can be highly oscillatory, especially for large wave numbers (high-frequency regimes), making it a challenging problem for standard DNNs with spectral bias.
3.2. Previous Works
3.2.1. Fourier Neural Operator (FNO)
The Fourier Neural Operator (FNO) [8] is a prominent neural operator architecture. Unlike other neural operators that learn directly in the spatial domain or use kernel integration, FNO leverages the Fourier Transform to learn mappings in the frequency domain. This is motivated by the idea that many physical systems exhibit simpler dynamics in the spectral space.
The core idea of FNO is to replace the kernel integration in a general neural operator with a convolution, which can be efficiently computed using FFT.
Specifically, a general neural operator iterates using:
$
v_t(\pmb{x}) = \sigma \Big( W_t v_{t-1}(\pmb{x}) + \big( K(a ; \phi_t) v_{t-1} \big)(\pmb{x}) \Big)
$
where is an integral operator:
$
(\mathcal{K}(a ; \phi_t) v_{t-1})(\pmb{x}) = \int_D k(\pmb{x}, \pmb{y}, a(\pmb{x}), a(\pmb{y}) ; \phi_t) v_{t-1}(\pmb{y}) d\pmb{y}
$
The FNO simplifies this by imposing translation invariance on the kernel and removing its dependence on , making it a convolution:
$
(\mathcal{K}(a ; \phi_t) v_{t-1})(\pmb{x}) = \int_D k_{\phi_t}(\pmb{x} - \pmb{y}) v_{t-1}(\pmb{y}) d\pmb{y}
$
By the Convolution Theorem, this can be computed in the Fourier domain as:
$
(\mathcal{K}(a ; \phi_t) v_{t-1})(\pmb{x}) = \mathcal{F}^{-1} \big( R_t \cdot \mathcal{F}(v_{t-1}) \big)(\pmb{x})
$
where is the Fourier transform of the kernel. FNO then learns the Fourier modes directly and applies a truncation mechanism to focus on a limited number of low-frequency modes (), which is a source of its spectral bias.
3.2.2. DeepONet
DeepONet [11, 5] is another popular neural operator architecture. It is based on the universal approximation theorem for operators, which states that any continuous nonlinear operator can be approximated by a neural network. DeepONet consists of two sub-networks: a branch network that encodes the input function at various sensor locations, and a trunk network that encodes the coordinates of the output. The outputs of these two networks are then combined (typically by a dot product) to produce the approximation of the operator's output.
3.2.3. U-Net
U-Net [14] is a convolutional neural network (CNN) architecture originally developed for biomedical image segmentation. It has a U-shaped structure, consisting of a downsampling path (encoder) to capture context and an upsampling path (decoder) to enable precise localization. Skip connections between the encoder and decoder paths help retain fine-grained details lost during downsampling. While not strictly an operator learning model in the same sense as FNO or DeepONet, U-Net-like architectures are often used for learning mappings between functions (e.g., image-to-image translation, solving PDEs on grids) and can be considered a form of neural operator in practice.
3.2.4. Multi-scale Deep Neural Network (MscaleDNN)
The MscaleDNN [10, 17] is a method proposed to address the spectral bias of DNNs in function approximation. The core idea is to decompose a target function into multiple frequency bands and use separate DNNs to learn each band. Each sub-network operates on a scaled version of the input variable, effectively shifting the high-frequency components of the original function to lower frequencies that the DNN can learn more easily.
The process involves:
- Frequency Partitioning: Dividing the frequency domain of the target function into non-overlapping bands . This conceptually decomposes into a sum of functions , where each contains frequencies only within . $ f(\pmb{x}) = \sum_{i=1}^M f_i(\pmb{x}) $
- Radial Scaling: For each , its Fourier transform is scaled radially: . This means that if , the frequencies in are compressed, effectively transforming a high-frequency component into a lower-frequency one that is easier for a standard
DNNto learn. - Sub-network Learning: Each scaled function is approximated by a separate
DNN(or for the original spatial variable). - Weighted Summation: The final approximation of is a weighted sum of the outputs from these sub-networks.
$
f(\pmb{x}) \sim \sum_{i=1}^M \alpha_i^n f_{\theta_i}(\alpha_i \pmb{x})
$
The
MscaleDNNhas shown significant improvements in approximatingoscillatory functions.
3.2.5. Other Spectral Bias Mitigation Techniques
Other works have also sought to reduce spectral bias:
- Phase Shift DNN [4]: Introduces phase shifts to
DNNlayers to improve their ability to learnhigh-frequency oscillations. - Hierarchical Attention Neural Operator [9]: Uses an attention mechanism across different scales to address the
spectral biasinoperator learning. - Diffusion Models with Neural Operators [12]: Integrates
diffusion modelsto enhance the spectral representation capabilities ofneural operatorsfor complex phenomena liketurbulent flows.
3.3. Technological Evolution
The evolution of PDE solvers has progressed from traditional numerical methods (finite element method, finite difference method) that are computationally expensive for parameterized problems, to data-driven operator learning methods. Early neural operator models like DeepONet provided a general framework for learning function-to-function mappings. FNO then introduced the powerful concept of learning directly in the frequency domain, leveraging the efficiency of FFT. However, FNO inherited the spectral bias common to DNNs, limiting its performance on highly oscillatory problems. The MscaleFNO represents a step forward by addressing this spectral bias specifically within the FNO framework, allowing neural operators to tackle more challenging high-frequency wave phenomena which are ubiquitous in scientific computing. It builds upon the success of multi-scale approaches developed for function approximation (like MscaleDNN) and adapts them to the more complex domain of operator learning.
3.4. Differentiation Analysis
The MscaleFNO differentiates itself from previous work by:
-
Extending MscaleDNN to Operator Learning: While
MscaleDNNsuccessfully mitigatesspectral biasforfunction approximation(mapping a point to a value ),MscaleFNOextends this concept tooperator learning(mapping an input function to an output function ). This is a more complex task as it involves learning mappings between entire function spaces. -
Addressing Dual High-Frequency Variations:
MscaleFNOspecifically targetshigh-frequency variationsin two critical aspects:- Spatial coordinates (): Similar to
MscaleDNN, it scales the spatial input to capture fine spatial details. - Input function (): Crucially, it also scales the values of the input function , acknowledging that the operator's response might itself be highly oscillatory with respect to changes in the input function's amplitude, not just its spatial distribution. This is essential for problems like the
Helmholtz equationwhere thewave number(related to ) dictates the oscillation frequency of the solution.
- Spatial coordinates (): Similar to
-
Integration with FNO: By integrating the
multi-scaleapproach directly into theFNOarchitecture,MscaleFNOleverages thespectral domainprocessing capabilities ofFNOwhile simultaneously overcoming itsspectral bias. This allows for efficient learning ofFourier modesacross different frequency bands. -
Improved Performance in High-Frequency Regimes: Compared to a normal
FNO(which has inherentspectral bias),MscaleFNOsignificantly improves accuracy inhigh-frequency wave scattering problems, as demonstrated by theHelmholtz equationexamples. Otherneural operatorsorDNNsmight struggle with these regimes without explicitmulti-scaleorspectral biasmitigation strategies.In summary,
MscaleFNOfills a critical gap by providing amulti-scalesolution to thespectral biasproblem forFourier Neural Operators, making them more effective for complexhighly oscillatory function mappingsin scientific computing.
4. Methodology
The MscaleFNO combines the principles of Fourier Neural Operators (FNOs) with a multi-scale approach, inspired by MscaleDNN, to improve the learning of mappings involving highly oscillatory functions. This section details the architecture and underlying concepts.
4.1. Principles
The core principle of MscaleFNO is to overcome the spectral bias of standard FNOs by explicitly decomposing the learning problem into multiple frequency scales. Instead of a single FNO trying to learn all frequency components simultaneously (and preferentially learning low ones), MscaleFNO employs a parallel ensemble of FNOs. Each sub-FNO processes a scaled version of the input, allowing it to specialize in a particular frequency band. By scaling both the spatial variable and the input function , the MscaleFNO ensures that both the spatial oscillations and the amplitude-dependent oscillations of the operator's output are captured across different scales. The outputs of these specialized sub-FNOs are then combined via a weighted sum to produce the final, comprehensive solution. This parallel architecture ensures that high-frequency components, which are typically challenging for DNNs to learn, are effectively "downshifted" into a learnable frequency range for at least one of the FNO sub-networks.
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. General Operator Learning Framework
The paper considers learning a (nonlinear) mapping between infinite-dimensional function spaces. Let be a bounded and open domain. The input function is and the output function is , where and are spaces of vector functions. The goal is to learn an approximation such that .
Given observations , the objective is to solve an optimization problem:
where represents the finite-dimensional parameters of the approximation . The loss function is defined as the relativeL_2loss:
Here, denotes the norm for functions. For numerical computation, functions are discretized at points . The discrete loss function then becomes:
This loss function measures the relative difference between the predicted output and the true output, normalized by the magnitude of the true output.
The general neural operator framework (as presented in [2, 7]) is an iterative architecture:
Here:
- : A
linear lifting operator(typically afully connected neural network) that maps the input function to a higher-dimensional feature space . So, is the initial lifted representation. - : The feature representation at iteration .
- : A
linear local transformmapping . It represents local operations applied to the feature vector at each point . - : An
integral operatormapping . It captures global interactions and dependencies across the domain . It is defined as: where is akernel functionparameterized by . - : A
nonlinear activation function(e.g.,GELU,ReLU). - : A
projection operator(typically aneural network) that maps the final high-dimensional feature back to the desired output function .
4.2.2. Fourier Neural Operator (FNO)
The FNO enhances the general neural operator framework by specializing the integral operator . It enforces translation invariance on the kernel and removes its dependence on the input function . This simplifies the kernel to , making the integral operation a convolution:
According to the Convolution Theorem, convolutions in the spatial domain can be efficiently computed as element-wise products in the Fourier domain. Let denote the Fourier transform and its inverse. If is the Fourier transform of the kernel, then:
In practical FNO implementations, the spatial coordinates themselves are often included as an input feature to capture position-dependent characteristics. This is represented as an identity mapping: . Thus, the FNO learns a mapping , effectively .
The comprehensive FNO framework is then given by:
Here, is an enhanced linear lifting operator that takes both the spatial coordinate and the input function as input. and remain as defined before. This entire process can be compactly written as:
The architecture is illustrated in Figure 1.
The FNO architecture consists of an initial lifting layer, several Fourier layers (each containing a linear transformation and a Fourier integral operation), and a final projection layer.

该图像是一个示意图,展示了多尺度傅里叶神经算子(MscaleFNO)的架构。图中包含输入 a(x) 和 ,经过处理模块 ,然后通过多个傅里叶层处理,最终生成输出 u(x)。该框架能够捕捉高频成分,适用于波散射问题。
Figure 1: The FNO architecture
4.2.2.1. Truncation Mechanism of
For computational efficiency, FNO truncates the Fourier spectrum. When the Fast Fourier Transform (FFT) is applied to , it produces Fourier modes. Instead of retaining all modes, only the lowest-frequency modes are kept, where is typically much smaller than the total number of discretization points . The parameter tensor acts as a learned filter in the Fourier domain, preserving these modes and effectively nullifying higher frequencies. Mathematically, this operation is expressed as:
where represents the selected Fourier modes, and represents the channel dimension. This truncation is a primary source of the spectral bias in FNO, as it inherently limits the model's ability to represent high-frequency components.
4.2.3. FNO Network Parameters
The total number of parameters in a standard FNO is meticulously calculated to facilitate comparison.
- Lifting Layer (): Implemented as a linear transformation: where , , and are parameters. The number of parameters for is:
- Fourier Layer (one layer, repeated times):
- Local Linear Transform (): , where and . Number of parameters for :
- Fourier Integral Operation (): The parameter tensor . Since complex numbers have real and imaginary parts, the actual number of real-valued parameters is double. The paper text, however, specifies for , implying a standard count for complex numbers as single parameters or a specific implementation detail. We follow the paper's formula directly: Number of parameters for :
- Projection Layer (): Implemented as a two-layer
fully connected networkwithGELU activationand hidden channels: where , , , and . Number of parameters for : - MLP after Fourier Integral (optional): Some
FNOimplementations include anMLP(Multi-Layer Perceptron) after theFourier integral operation. This modifies the equation for to: TheMLPis a two-layerfully connected neural networkwithGELU activationand hidden dimension : where are weight matrices and are bias vectors. Number of parameters for theMLPlayer (): The total number of parameters for anFNOwith anMLPafterFourier integral operationsis: Here:- : channel dimension (number of internal feature channels).
- : number of retained
Fourier modes. - : number of
Fourier layers. - : spatial dimension of the domain (e.g., for 1D problems).
- : dimension of the input function .
- : dimension of the output function .
4.2.4. Multi-scale Fourier Neural Operator (MscaleFNO)
The MscaleFNO extends the multi-scale concept to operator learning using the FNO framework. It is specifically designed to address high-frequency variations in both the spatial coordinate and the input function .
The MscaleFNO architecture consists of parallel sub-networks, each of which is a complete FNO as described above. Each sub-network receives a scaled version of the input. The final output is a weighted sum of the outputs from these sub-networks.
The architecture is illustrated in Figure 2.

该图像是MscaleFNO架构示意图,展示了多个并行的Fourier神经算子如何处理输入函数及其空间变量。不同的输入经过Fourier层后输出,并通过加法整合形成最终结果 u(x)。
Figure 2: The MscaleFNO architecture
The mathematical expression for the MscaleFNO model is:
Here:
-
: The total number of parallel sub-networks.
-
: Represents an individual
FNOsub-network with parameters . -
: A
scaling factorapplied to both the spatial coordinate and the input function for the -th sub-network. These values aretrainable parameters. Larger values enable the sub-network to capturehigher-frequency components. -
: A
weightfor combining the output of the -th sub-network. These values are alsotrainable parameters. -
The linear transformations and , and the
Fourier layerswithin eachFNOsub-network, maintain their original definitions from thenormal FNOframework. -
A specific detail mentioned is the use of the
sine activation functionthroughout allFourier layers. This choice is often beneficial for learningoscillatory patterns.The total number of parameters for the
MscaleFNOis calculated as: where is the parameter count of a singleFNOsub-network (e.g., from Equation (24)), and2Naccounts for the trainable scaling factors and trainable combination weights . This equation assumes that each of the sub-networks has the same internal architecture (channel dimension andFourier modes).
4.2.5. Mapping between Conductivity and Solution in Helmholtz Equation
The paper applies MscaleFNO to the Helmholtz equation, which models wave scattering. The specific form considered is:
with a boundary condition:
Here:
-
: The
scattering field(the solution). -
: The square of the
wave number, representing theconductivityor material properties of the scatterer, which iscompact-supported(non-zero only in a finite region) inside the domain . -
: The
incident waveorforcing term. -
: The computational domain.
-
: The boundary condition.
For a
homogeneous Dirichlet boundary condition(), the solution can be expressed using aGreen's function: TheGreen's functionsatisfies: where is theDirac delta function. TheGreen's functioncan be decomposed into afree space Green's functionand a smooth function : Thefree space Green's functionfor different dimensions is: where is theHankel function of the second kind of order zero. The smooth function satisfies a homogeneousHelmholtz equationwith a non-homogeneousDirichlet boundary condition: The problem forMscaleFNOis to learn the mapping from the spatially varyingwave number perturbation(orconductivity) (or in later examples) to thescattering field.
5. Experimental Setup
The paper presents a series of numerical experiments to compare the performance of the proposed MscaleFNO with a normal FNO (standard FNO). The primary goal is to demonstrate MscaleFNO's superior ability to learn mappings involving highly oscillatory functions, particularly in high-frequency regimes.
5.1. Datasets
All experiments use 1-D functions, meaning . The datasets are synthetically generated based on specific functional forms or Helmholtz equation solutions.
5.1.1. Example 4.1: Single-frequency nonlinear mapping
-
Target Mapping: Learning the operator for , where . This function exhibits
high-frequency dependenceon the variablea(x). -
Input Function
a(x)Generation: Generated as a sum of sine functions: where are randomly sampled coefficients. This ensuresa(x)is normalized. -
Dataset Size: 2,000 samples in total, split into 1,000 for training, 500 for validation, and 500 for testing.
-
Grid Resolution: Exact solutions are computed on a 1001-point grid.
-
Example Data Profile:

该图像是图表,展示了输入函数a(x)的概要和其离散傅里叶变换(DFT)。左侧为函数a(x)的图像,右侧为其 DFT,显示了不同模式的幅度。该图清晰地对比了输入函数及其频域特性。Figure 3: The profile of the input function
a ( x )(left) and its DFT (right). The left panel shows the spatial variation ofa(x), while the right panel illustrates itsDiscrete Fourier Transform (DFT), indicating the frequency components present.
5.1.2. Example 4.2: Multiple-frequency nonlinear mapping
-
Target Mapping: Learning the operator for
u(x)as a mixture of sine and cosine functions with multiple frequencies, exhibitingmulti-frequency dependenceona(x): where and are fixed, randomly generated numbers. The parameter (number of frequency terms) is varied: . -
Input Function
a(x)Generation: Generated similarly to Example 4.1, but with sine and cosine terms: where . -
Dataset Size & Grid: Same as Example 4.1 (2,000 samples, 1001-point grid).
-
Example Data Profile:

该图像是图表,展示了输入函数a(x)的轮廓(左)和其离散傅里叶变换(DFT,右)。输入函数的值在 范围内波动,而 DFT 显示出不同模式的幅值能量分布。Figure 7: The profile of the input function
a ( x )(left) and the DFT ofa ( x )(right). This figure presents a representative input functiona(x)and its frequency components.
该图像是图表,展示了不同 值下代表性精确解u(x)的离散傅里叶变换(DFT)。图中分别显示了 ,20,40,80,100, 和200的模态与幅度的关系,反映出随着 的增加,高频成分的表现变化。Figure 8: DFT of representative exact solution
u ( x )for different . This figure shows how the spectral content of the solutionu(x)becomes increasingly complex and extends to higher frequencies as increases.
5.1.3. Example 4.3: Helmholtz equation ()
-
Target Mapping: Learning the operator from
variable wave number perturbationto thesolutionu(x)for the 1-DHelmholtz equation: with . Parameters are and . -
Forcing Term
f(x): -
Perturbation Generation: where .
-
Dataset Size & Grid: 1,000 samples (800 train, 100 val, 100 test). High-resolution numerical solutions are computed on an 8001-point grid (accuracy ) and then downsampled to a 1001-point grid.
-
Example Data Profile:

该图像是图表,展示了输入函数 的轮廓(左)以及 的离散傅里叶变换(DFT)(右)。左侧图显示了函数值在区间 内的变化,右侧图则展示了各频率模式的幅度。Figure 12: The profile of the input function (left) and the DFT of (right).

该图像是图表,包含了精确解u(x)的曲线(左)和其离散傅立叶变换(DFT,右)。左侧展示了解的值与变量 的关系,而右侧表示对应模态的幅度。图中揭示了高频成分的特征。Figure 13: The profile of the exact solution
u ( x )from in Fig. 12 (left) and the DFT ofu ( x )(right). These figures show the spatial and spectral characteristics of the input perturbation and the resulting solution for the Helmholtz equation.
5.1.4. Example 4.4: Helmholtz equation (varying )
-
Target Mapping: Same
Helmholtz equationas Example 4.3, but with varying domain lengths . This corresponds to increasingly challenginghigh-frequency scattering problems. -
Perturbation Generation: A simpler form is used:
-
Dataset Size & Grid: Dataset sizes (1,000 samples) and splits (800 train, 100 val, 100 test) are the same. High-resolution solutions are computed and then downsampled to maintain a consistent mesh size, resulting in varying numbers of grid points proportional to .
-
Example Data Profile:

该图像是图表,展示了不同域长度 (分别为 2, 4, 8, 和 10)下的亥姆霍兹方程的特征解的空间分布。每个子图表示在不同的 值下,函数在空间 上的取值变化。Figure 15: Characteristic solutions of the Helmholtz equation in spatial space for different domain lengths . This figure illustrates how the solutions become more complex and oscillatory as the domain length increases, indicating higher frequency content.
5.1.5. Example 4.5: Helmholtz equation (generalization test)
- Target Mapping: Same
Helmholtz equationas Example 4.4, specifically at . - Test Samples Generation (unseen distribution): To evaluate
generalization capability, test samples for are drawn from a distribution distinct from the training data: where and . Theforcing termis the same as in Equation (47). - Dataset Size & Grid: Training and validation data are from the distribution in Example 4.4. The test data is newly generated using the above formula.
5.2. Evaluation Metrics
The primary evaluation metric used in the paper is the relativeL_2loss (or relativeL_2error) on the test set. This metric quantifies the normalized difference between the predicted solution and the true solution.
5.2.1. Relative Loss
- Conceptual Definition: The
relativeL_2loss(or error) measures the root mean square error (RMSE) between the predicted function and the true function, normalized by the norm of the true function. It indicates how well the model's prediction matches the true output relative to the overall magnitude of the true output. A lower value indicates better accuracy. - Mathematical Formula: For discrete functions (as used in the numerical computations), the formula is:
- Symbol Explanation:
- : The predicted value of the output function by the model at the -th discrete point , given input .
- : The true value of the output function at the -th discrete point .
- : The total number of discrete points in the computational domain.
- : Sum of squared differences or values over all discrete points.
- : Square root, forming the norm (Euclidean norm for vectors).
5.3. Baselines
The paper compares the MscaleFNO primarily against a normal FNO (standard FNO) configuration. To ensure a fair comparison, the normal FNO is designed to have a similar number of network parameters as the MscaleFNO.
5.3.1. General Model Configurations
- Optimizer:
Adam optimizerwith a learning rate of 0.001. - Batch Size: 20 for all training processes.
5.3.2. Specific Model Architectures for 1-D Problems ()
- Normal FNO:
Channel dimension (d_v): 48Number of Fourier modes (k_{\mathrm{max}}): 500Number of Fourier layers (): 1 (for Examples 4.1 & 4.2), 4 (for Examples 4.3, 4.4 & 4.5).
- MscaleFNO:
Number of parallel sub-networks (): 8- Each sub-network:
Channel dimension (d_v): 16 (note: for MscaleFNO, compared to for Normal FNO, indicating a different total capacity structure, though parameter count is matched.)Number of Fourier modes (k_{\mathrm{max}}): 500Number of Fourier layers (): 1 (for Examples 4.1 & 4.2), 4 (for Examples 4.3, 4.4 & 4.5).
- Initial
scaling factors (c_i)andcombination weights (\gamma_i)are trainable. Specific initial values are mentioned for each example.
Parameter Count Comparison:
- For Examples 4.1 & 4.2:
MscaleFNO(1,035,544 parameters) vs.normal FNO(1,164,001 parameters).MscaleFNOhas slightly fewer parameters. - For Examples 4.3, 4.4 & 4.5:
MscaleFNO(4,127,128 parameters) vs.normal FNO(4,641,169 parameters).MscaleFNOagain has fewer parameters. The paper consistently ensures thatMscaleFNOuses a similar or even slightly smaller parameter count than thenormal FNOto attribute performance gains to the architectural innovation rather than merely increased model capacity.
6. Results & Analysis
This section details the experimental results, comparing the MscaleFNO against the normal FNO across various highly oscillatory function learning tasks and Helmholtz equation problems.
6.1. Core Results Analysis
6.1.1. Example 4.1: Learning a Single-Frequency Nonlinear Mapping
This example focuses on learning the mapping . The initial scaling factors for MscaleFNO were set as .
-
Error Curves:

该图像是图表,展示了不同模型在训练过程中的相对误差曲线。红色曲线表示MscaleFNO的表现,而蓝色曲线表示普通FNO。可以看到,随着训练轮次的增加,MscaleFNO的相对误差显著降低,表明其在高频问题上的优势。Figure 4: Error curves of different models during the training process. The
MscaleFNO(red curve) quickly converges to arelative testing errorof within 100 epochs. In stark contrast, thenormal FNO(blue curve) stagnates at an error level of , failing to effectively learn the mapping. This demonstrates a significant, order-of-magnitude improvement byMscaleFNO. -
Predicted Solutions (Visual Comparison):
![Figure 5: Predicted solution by the normal FNO (left) and MscaleFNO (right) with zoomed-in inset for \(x \\in \[ - 0 . 1 8 , - 0 . 1 2 \]\)](/files/papers/69230f9867343f5ebcdd6426/images/5.jpg)
该图像是图表,展示了正常 FNO(左)与 MscaleFNO(右)在预测解的对比。图中包含了x ext{ 的范围为 } [-0.18, -0.12]的放大插图,能够更清晰地看出两者在捕捉高频成分方面的差异。Figure 5: Predicted solution by the normal FNO (left) and MscaleFNO (right) with zoomed-in inset for . Visually, the
normal FNOproduces a smoothed-out approximation, completely failing to capture thehigh-frequency oscillationspresent in the true solution. TheMscaleFNO, however, accurately reproduces the fine wave patterns, matching the exact solution almost perfectly. -
DFT Analysis (Spectral Comparison):
![Figure 6: DFT of predicted solution by normal FNO (left) and MscaleFNO (right) with zoomed-in inset for modes \(\\in \[ 0 , 2 0 \]\)](/files/papers/69230f9867343f5ebcdd6426/images/6.jpg)
该图像是图表,展示了正常 FNO(左侧)与 MscaleFNO(右侧)预测解的 DFT。横坐标为模态,纵坐标为幅度,包含了0到20的放大插图。正常 FNO 的预测结果与精确解相比存在明显偏差,而 MscaleFNO 在高频模态下的表现更加优越。Figure 6: DFT of predicted solution by normal FNO (left) and MscaleFNO (right) with zoomed-in inset for modes . The
DFTanalysis quantitatively confirms the visual observations. The spectrum of theMscaleFNOclosely matches thehigh-frequency componentsof the true solution, preserving the energy at higher modes. Thenormal FNO'sDFTshows a significant decay in thehigh-frequency region, indicating its inability to learn these components. These results collectively highlightMscaleFNO's superior capability in capturinghigh-frequency components.
6.1.2. Example 4.2: Learning a Multiple-Frequency Nonlinear Mapping
This example explores the performance with increasingly complex multi-frequency solutions by varying from 10 to 200.
-
Error Curves under Varying :

该图像是图表,展示了不同模型在训练过程中相对误差的变化,横轴为训练轮次(epoch),纵轴为相对误差,分为六个子图,分别对应不同的 值(10, 20, 40, 80, 100, 200)。通过比较,MscaleFNO(橙色曲线)显示出相较于普通FNO(蓝色曲线)有更低的相对误差。Figure 9: Error curves of different models during the training process under different values of (Epoch = 900). For the
normal FNO(blue curves), as increases (representing higher-frequency scattering regimes), therelative testing errorsignificantly increases, reaching approximately 0.2 at . This confirms thespectral bias. In contrast,MscaleFNO(orange curves) consistently outperforms thenormal FNOand maintains a high accuracy, withrelative errorsaround even at . The initialscaling factorsforMscaleFNOwere adjusted for to include higher values , demonstrating adaptability to the problem's frequency content. -
Predicted Solutions for :
![Figure 10: \(M ~ = ~ 2 0 0\) : Predicted solution by normal FNO (left) and MscaleFNO (right) with zoomed-in inset for \(x \\in \[ - 0 . 1 8 , - 0 . 1 2 \]\)](/files/papers/69230f9867343f5ebcdd6426/images/10.jpg)
该图像是图表,展示了正常FNO(左)与MscaleFNO(右)在 条件下的预测解。图中包含了精确解和各自方法的结果,并在 区域进行了放大。Figure 10: : Predicted solution by normal FNO (left) and MscaleFNO (right) with zoomed-in inset for . Similar to the single-frequency case, the
normal FNOstruggles with thehighly oscillatory patternsof the solution, producing a blurred output.MscaleFNO, however, accurately captures the intricatefine detailsandoscillations. -
Spectral Contributions of MscaleFNO Subnetworks:

该图像是图表,展示了不同初始尺度下MscaleFNO子网络的谱贡献。图中包括六个子图,分别标记为(a)到(f),展示了在不同尺度下(如)的模式(modes)对应的幅度(Amplitude)。每个子图显示了模式数与对应幅度的关系,揭示了多尺度输入对高频成分的捕获能力。Figure 11: : Spectral contributions of MscaleFNO subnetworks corresponding to different initial scales. This figure provides a crucial insight into
MscaleFNO's mechanism. It shows theDFTof the outputs from individual sub-networks. Sub-networks with smallerscaling factors() effectively capturelow-frequency patterns, while those with larger values contribute significantly to reconstructing thehigh-frequency components. This demonstrates thehierarchical frequency decomposition, where each sub-network specializes in a different part of thefrequency spectrum, and their combined output reconstructs the full range of frequencies.
6.1.3. Example 4.3: Mapping for Helmholtz Equation ()
This example applies MscaleFNO to a real-world physics problem, the 1-D Helmholtz equation with .
-
Error Curves:

该图像是图表,展示了不同模型在训练过程中的相对误差曲线。蓝色线条表示正常FNO,棕色线条表示MscaleFNO,数值结果显示MscaleFNO在高频误差处理上有明显优于正常FNO的表现。Figure 14: Error curves of different models during the training process. Both models show rapid initial convergence. However, the
normal FNOconverges to arelative erroraround , with limited further improvement. TheMscaleFNO, with initial scales , continues to reduce errors throughout training, achieving arelative errorof . This represents anorder-of-magnitude improvementin accuracy for theHelmholtz equationproblem.
6.1.4. Example 4.4: Mapping for Helmholtz Equation (Varying )
This section tests MscaleFNO's robustness as the domain length increases, which corresponds to increasingly high-frequency scattering problems.
-
Error Curves under Varying :

该图像是图表,展示了不同 值下的相对误差曲线(Epoch=100)。曲线显示了正常 FNO 与 MscaleFNO 在不同训练轮次下的误差变化,让人对其在高频范围内的表现进行比较。Figure 16: Error curves under different values of (Epoch=100). As increases from 2 to 10, the
normal FNO(blue curves) exhibits progressively deteriorating performance, with therelative testing errorsoaring to approximately 0.7 for . This vividly illustrates thenormal FNO's struggle with higher frequency regimes. In contrast,MscaleFNO(orange curves) demonstratesrobust performanceacross all values, consistently outperforming thenormal FNOand maintaining arelative erroraccuracy better than . -
Predicted Solutions for :
![Figure 17: \(L = 1 0\) : Predicted solution by normal FNO (left) and MscaleFNO (right) with zoomed-in inset for \(x \\in \[ - 0 . 2 , 0 . 2 \]\)](/files/papers/69230f9867343f5ebcdd6426/images/17.jpg)
该图像是图表,展示了正常 FNO 和 MscaleFNO 的预测解。左侧为正常 FNO 的绘图,右侧为 MscaleFNO,二者均显示了精确解和对应方法的比较,包含一个放大插图,范围为 。Figure 17: : Predicted solution by normal FNO (left) and MscaleFNO (right) with zoomed-in inset for . For , the solution is
highly oscillatory. Thenormal FNOcaptures only the general wave patterns but misses thehigh-frequency details.MscaleFNO, however, accurately reproduces the full solution structure, including thefine oscillations. -
DFT Analysis for :
![Figure 18: \(L = 1 0\) : DFT of predicted solution by normal FNO (left) and MscaleFNO (right) with zoomed-in inset for modes \(\\in \[ 1 0 0 0 , 1 1 0 0 \]\)](/files/papers/69230f9867343f5ebcdd6426/images/18.jpg)
该图像是图表,展示了正常FNO和MscaleFNO在预测解的离散傅里叶变换(DFT)中的结果。左侧为正常FNO,右侧为MscaleFNO,图中标出了各模式的幅度,并显示在模式 的放大细节。Figure 18: : DFT of predicted solution by normal FNO (left) and MscaleFNO (right) with zoomed-in inset for modes . The
Fourier analysiscorroborates the spatial observations. Thenormal FNOpreserveslow-frequency componentsbut shows significant distortion and decay athigh frequencies.MscaleFNOaccurately reconstructs the entirespectrum, as evidenced by the excellent match with the exact solution'sDFT, especially in the zoomed-inhigh-frequency modes.
6.1.5. Example 4.5: Generalization Capability (Unseen Data Distribution)
This test evaluates how well MscaleFNO performs on test samples generated from a different function form (unseen distribution) than the training data, for .
-
Normal FNO on Unseen Data:
![Figure 19: \(L = 1 0\) : (a) Predicted solution of normal FNO against exact solution with zoomed-in inset for \(x \\in \[ - 0 . 2 , 0 . 2 \]\) and (b) The DFT of `u ( x )` with zoomed-in inset for modes \(\\in \[ 1 0 0 0 , 1 1 0 0 \]\)](/files/papers/69230f9867343f5ebcdd6426/images/19.jpg)
该图像是一个图表,展示了常规傅里叶神经算子的预测解与精确解的比较,左侧 (a) 显示的是u(x)的值,横轴为 ,纵轴为值,包含放大的插图;右侧 (b) 显示u(x)的离散傅里叶变换 (DFT),横轴为模态,纵轴为幅度,也带有放大的插图。图中包含精确解与常规 FNO 解的对比曲线。Figure 19: : (a) Predicted solution of normal FNO against exact solution with zoomed-in inset for and (b) The DFT of
u ( x )with zoomed-in inset for modes . Thenormal FNOfails dramatically on the unseen test functions. Its predictions show significant errors in both solution amplitude andoscillation patternsin the spatial domain (a), and it fails to reconstruct the correctspectral contentin theFourier domain(b), particularly athigh frequencies. -
MscaleFNO on Unseen Data:
![Figure 20: \(L = 1 0\) : (a) Predicted solution of MscaleFNO against exact solution with zoomed-in inset for \(x \\in \[ - 0 . 2 , 0 . 2 \]\) and (b) The DFT of `u ( x )` with zoomed-in inset for modes \(\\in \[ 1 0 0 0 , 1 1 0 0 \]\)](/files/papers/69230f9867343f5ebcdd6426/images/20.jpg)
该图像是图表,展示了MscaleFNO的预测解与精确解的对比。图(a)显示了u(x)的预测值,其中包含范围的缩放插图;图(b)展示了u(x)的离散傅里叶变换(DFT),并包含了模式范围[1000, 1100]的缩放插图。两部分均对比了精确解和MscaleFNO的结果。Figure 20: : (a) Predicted solution of MscaleFNO against exact solution with zoomed-in inset for and (b) The DFT of
u ( x )with zoomed-in inset for modes . In contrast,MscaleFNOdemonstratesrobust prediction capabilityeven with a change in thetest function form. It accurately captureshigh-frequency oscillationsacross the interval and precisely predicts bothlow and high-frequency modesin theFourier spectrum. This highlightsMscaleFNO's excellentgeneralization abilitybeyond the training data distribution.
6.2. Data Presentation (Tables)
The paper does not include explicit result tables, but rather presents all comparative data through error curves and visual comparisons of predicted solutions and their Discrete Fourier Transforms (DFTs). The analysis above is derived from interpreting these figures.
6.3. Ablation Studies / Parameter Analysis
The paper implicitly conducts a form of ablation study by comparing MscaleFNO to normal FNO with similar parameter counts, isolating the effect of the multi-scale architecture. The analysis of Figure 11 for MscaleFNO's subnetwork contributions serves as a parameter analysis, showing how different scaling factors (c_i) lead to specialization in different frequency ranges, justifying the multi-scale design. The adjustment of initial scaling factors for different values in Example 4.2 also shows some empirical parameter tuning for optimal performance.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper successfully proposes the Multi-scale Fourier Neural Operator (MscaleFNO), a novel architecture designed to mitigate the inherent spectral bias of Fourier Neural Operators (FNOs) when learning mappings between highly oscillatory functions. By employing a parallel ensemble of FNO sub-networks, each operating on scaled inputs of both the spatial variable and the input function, MscaleFNO effectively decomposes the complex learning problem into different frequency scales. Numerical results on various nonlinear oscillatory mappings and, more critically, on the Helmholtz equation in high-frequency regimes, consistently demonstrate that MscaleFNO significantly outperforms the normal FNO. It achieves higher accuracy, captures fine oscillatory details, accurately reconstructs the full Fourier spectrum of solutions, and exhibits robust generalization capabilities to unseen data distributions, all while maintaining a comparable number of model parameters. The MscaleFNO thus presents a substantial advancement for operator learning in scientific computing problems involving high-frequency phenomena.
7.2. Limitations & Future Work
The authors explicitly state that future work will include:
-
Applying MscaleFNO for higher dimensional Helmholtz equations: The current numerical demonstrations are primarily for 1-D problems. Extending
MscaleFNOto 2-D and 3-DHelmholtz equationswill be crucial to prove its broader applicability and scalability. This often involves increased computational complexity and data requirements. -
Solving inverse medium problems in high-frequency wave scattering:
Inverse problems, where the goal is to infer properties of a medium from observed wave measurements, are notoriously challenging, especially inhigh-frequency regimesdue to their ill-posed nature and sensitivity tohigh-frequency components. ApplyingMscaleFNOto such problems would be a significant step.The paper does not explicitly list limitations of the
MscaleFNOitself, but the nature of the future work suggests potential challenges with scaling to higher dimensions (e.g., computational cost, data size, complexity ofFourier transforms), and the difficulty ofinverse problems.
7.3. Personal Insights & Critique
-
Innovative Extension: The
MscaleFNOis an elegant and effective extension of theMscaleDNNconcept tooperator learning. The key insight of simultaneously scaling both the spatial coordinate and the input function for operator learning is powerful. This allows the model to addressspectral biasarising from both the spatial structure and the amplitude-dependent oscillations of the operator's response. This is a clear improvement over traditionalFNOfor the specific problem domain ofoscillatory functions. -
Computational Cost vs. Parameter Count: While the paper emphasizes that
MscaleFNOhas a similar (or even slightly smaller) number of parameters compared tonormal FNO, it's important to consider the computational cost. Running parallelFNOsub-networks implies a higher computational cost during both training and inference per epoch/sample, effectively times the forward/backward pass of a singleFNO(though potentially faster due to smaller individual or parallelization). The balance between increased training time/inference latency and accuracy gain is a practical consideration for real-world applications. -
Learned Scales and Weights: The decision to make
scaling factors (c_i)andcombination weights (\gamma_i)trainable is a strong point. It allows the model to adaptively find the optimal frequency decomposition and combination, rather than relying on heuristic choices. However, the sensitivity of the training process to the initialization of these scales, and whether they can sometimes converge to degenerate solutions (e.g., all scales becoming similar), could be an area for further investigation. Thesine activation functionchoice is also a sensible and likely contributing factor to learning oscillations. -
Generalization Performance: The results from Example 4.5, demonstrating
MscaleFNO's robustgeneralizationto an unseen distribution of test functions, are particularly impressive. This suggests that themulti-scalearchitecture helps the model learn more fundamental, scale-invariant patterns of the underlying physical system, rather than simply memorizing the training data. This is a crucial attribute forneural operatorsintended for scientific discovery. -
Potential Improvements/Future Directions not mentioned:
-
Adaptive Scaling: Instead of a fixed number of parallel FNOs with initially set scales, one could explore adaptive methods where the number of scales or the scale values are dynamically adjusted during training, perhaps guided by
spectral analysisof the residual errors. -
Efficiency for Higher Dimensions: For 2D/3D problems, the computational cost of
FFTand the memory footprint can become substantial. Exploring more efficient spectral representations or hierarchical grids within theMscaleFNOframework might be necessary. -
Theoretical Guarantees: While empirical results are strong, theoretical analysis of
MscaleFNO'sspectral representation capabilitiesand convergence properties would further solidify its foundation.Overall,
MscaleFNOrepresents a significant and well-motivated step towards makingneural operatorsmore robust and accurate foroscillatory phenomena, opening doors for broader application in scientific machine learning, especially in areas likecomputational electromagnetics,acoustics, andseismology.
-
Similar papers
Recommended via semantic vector search.