Physics-Informed Neural Operator for Learning Partial Differential Equations

ANIMA ANANDKUMAR

Paper status: completed

Physics-Informed Neural Operator for Learning Partial Differential Equations

Published:11/06/2021

Physics-Informed Neural Operators (1)Learning Partial Differential Equations (1)Solution Operators (1)Optimization Challenges (1)Reduced Data Requirements (1)

Original Link

Price: 0.10

4 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This paper introduces a Physics-Informed Neural Operator (PINO) that learns solution operators for parametric PDE families by integrating training data with physics constraints, effectively addressing optimization challenges and reducing data needs, outperforming previous ML meth

Abstract

This paper proposes physics-informed neural operators (PINO) that integrate training data and physics constraints to learn the solution operator of parametric PDE families. The method addresses optimization challenges in existing models like PINNs and reduces data requirements in approaches such as FNO. Experiments demonstrate that the resulting PINO model accurately approximates ground-truth solution operators for various PDE families, outperforming previous ML methods while effectively solving complex flows.

Mind Map

In-depth Reading

English Analysis~44 min read · 60,834 chars

1. Bibliographic Information

1.1. Title

The central topic of this paper is the Physics-Informed Neural Operator (PINO) framework, designed for learning the solution operator of parametric Partial Differential Equation (PDE) families.

1.2. Authors

The authors are:

Zongyi Li, Computing and mathematical science, California Institute of Technology, Pasadena, USA
Hongkai Zheng, Computing and mathematical science, California Institute of Technology, Pasadena, USA
Nikola Kovachki, Computing and mathematical science, California Institute of Technology, Pasadena, USA
David Jin, Computing and mathematical science, California Institute of Technology, Pasadena, USA
Haoxuan Chen, Computing and mathematical science, California Institute of Technology, Pasadena, USA
Burigede Liu, Computing and mathematical science, California Institute of Technology, Pasadena, USA
Kamyar Azizzadenesheli, Computing and mathematical science, California Institute of Technology, Pasadena, USA
Anima Anandkumar, Computing and mathematical science, California Institute of Technology, Pasadena, USA

Zongyi Li and Hongkai Zheng are noted to have contributed equally to this research. Their affiliations indicate a strong background in computational science, machine learning, and applied mathematics, primarily from Caltech.

1.3. Journal/Conference

The paper is published in "ACM/IMS J. Data Sci. 1, 3, Article 9 (May 2024)". This is the ACM/IMS Journal of Data Science, which is a peer-reviewed journal focusing on fundamental data science research and its applications. Its association with ACM (Association for Computing Machinery) and IMS (Institute of Mathematical Statistics) suggests a rigorous academic venue covering both theoretical and applied aspects of data science, machine learning, and statistical methods.

1.4. Publication Year

The paper was published on 2021-11-06T00:00:00.000Z (UTC), as indicated in the abstract metadata. However, the ACM Reference Format section lists the publication date as "May 2024". This discrepancy suggests that the initial abstract submission date was in late 2021, but the formal publication in the ACM/IMS J. Data Sci. occurred in May 2024. For the purpose of this analysis, we will consider the formal publication year as 2024 based on the ACM reference.

1.5. Abstract

This paper introduces physics-informed neural operators (PINO), a novel machine learning method designed to learn the solution operator of parametric Partial Differential Equation (PDE) families. PINO achieves this by integrating both traditional training data and physics constraints (i.e., the governing PDE equations). The key objective is to overcome the known optimization difficulties encountered in existing Physics-Informed Neural Networks (PINNs) and to reduce the substantial data requirements often associated with Fourier Neural Operators (FNOs). Through various experiments, the authors demonstrate that PINO can accurately approximate ground-truth solution operators for a range of popular PDE families. The model is shown to outperform previous machine learning approaches, exhibit robustness in zero-shot super-resolution (predicting beyond training data resolution), and effectively solve complex flow problems, including long temporal transient and Kolmogorov flows.

1.6. Original Source Link

The original source link for the paper is: /files/papers/691b06b7110b75dcc59ae4ae/paper.pdf. This appears to be a direct link to the PDF hosted by the publishing platform, indicating its status as an officially published paper.

2. Executive Summary

2.1. Background & Motivation

The core problem the paper addresses is the efficient and accurate solution of Partial Differential Equations (PDEs) using machine learning (ML) methods. PDEs are fundamental to modeling countless phenomena in science and engineering, from fluid dynamics and weather forecasting to material science. Traditionally, these are solved using numerical methods like Finite Element Methods (FEM) or Finite Difference Methods (FDM), which can be computationally intensive and slow, especially for complex, high-dimensional, or long-time dynamic systems.

Recent advancements in ML have shown promise in solving PDEs, broadly categorized into two approaches:

Approximating the solution function: Methods like Physics-Informed Neural Networks (PINNs) use neural networks to directly approximate the solution for a single instance of a PDE by minimizing a loss function derived from the PDE itself.
Learning the solution operator: Methods like Fourier Neural Operators (FNOs) train neural networks to learn the mapping (operator) from an input function (e.g., initial conditions, coefficients) to an output solution function for a family of PDEs.

However, both existing paradigms have significant shortcomings:

PINNs: While not requiring extensive labeled data, PINNs suffer from challenging optimization landscapes, making them prone to failure, particularly for multi-scale or dynamic systems. They also need re-optimization for every new PDE instance.
FNOs: FNOs are faster and can generalize across a family of PDEs, but they are purely data-driven. This means they require large datasets of input-solution pairs, which can be prohibitively expensive or even impossible to obtain, especially for high-fidelity or complex real-world scenarios. Their generalization to unseen conditions or higher resolutions can also be limited if only coarse-resolution data is available.

The paper's entry point is to bridge this gap, proposing a new learning paradigm that aims to overcome the optimization challenges of PINNs and relieve the data requirements of FNOs. This is done by integrating the strengths of both physics-informed learning and operator learning.

2.2. Main Contributions / Findings

The primary contributions and key findings of the PINO paper are:

Physics-Informed Neural Operator (PINO) Framework: The paper proposes PINO, a novel hybrid framework that combines the strengths of data-driven operator learning (like FNO) with physics-informed optimization (like PINN). This allows PINO to leverage available training data while enforcing physical consistency through PDE constraints.
Two-Phase Learning Scheme: PINO introduces a two-phase learning process:
- Operator Learning Phase: A neural operator is trained over multiple instances of a parametric PDE family using a hybrid loss function that incorporates both data loss (when data is available, even if low-resolution) and physics constraints (PDE loss, which can be applied at higher resolutions).
- Instance-Wise Fine-Tuning Phase: For specific new PDE instances, the pre-trained operator can be further optimized using only physics constraints and an optional anchor loss, leading to improved accuracy for that specific instance.
High-Fidelity Reconstruction and Zero-Shot Super-Resolution: PINO demonstrates the ability to approximate solution operators with high fidelity. A significant finding is its capacity for zero-shot super-resolution, meaning it can accurately predict solutions at resolutions significantly higher than the training data, a critical limitation for purely data-driven methods like FNO when provided with coarse data. This is achieved by imposing PDE constraints at higher resolutions during training.
Reduced Data Requirements: PINO shows that it can learn complex PDE solution operators with "few to no data" by heavily relying on the physics constraints, a major advantage over FNO. This makes it applicable to scenarios where generating extensive high-fidelity data is difficult or impossible.
Enhanced Performance on Challenging PDEs: Experiments show PINO accurately solves various PDE families, including Burgers', Darcy, and Navier-Stokes equations, outperforming previous ML methods (e.g., PINN, FNO, DeepONet variants) in terms of accuracy and generalization. It effectively handles complex scenarios like long temporal transient flows and chaotic Kolmogorov flows, where other baseline methods often fail or perform poorly.
Efficient Derivative Computation: The paper outlines efficient methods for computing derivatives of neural operators, crucial for the PDE loss. These include numerical differentiation, pointwise differentiation with autograd, and a novel function-wise differentiation method specifically leveraging the Fourier domain for FNO-based architectures.
Application to Inverse Problems: PINO is successfully applied to inverse problems (e.g., recovering a diffusion coefficient in Darcy flow from observed solutions), offering two approaches: learning a forward operator model and learning an inverse operator model. The PDE loss is crucial here for guaranteeing physically valid inverse solutions and achieving significant speed-ups (e.g., 3000x faster than MCMC).
Transferability: PINO demonstrates the ability to transfer learned dynamics across different Reynolds numbers in Navier-Stokes equations using instance-wise fine-tuning, suggesting broader applicability to varying parameters or conditions.

3.1. Foundational Concepts

To understand PINO, a basic grasp of Partial Differential Equations (PDEs), neural networks, and core machine learning paradigms is essential.

3.1.1. Partial Differential Equations (PDEs)

Conceptual Definition: PDEs are mathematical equations that involve an unknown function of multiple independent variables (e.g., space and time) and its partial derivatives with respect to those variables. They are fundamental tools for modeling diverse physical phenomena, such as heat conduction, fluid flow, wave propagation, and electromagnetism.
Importance: PDEs are at the heart of scientific and engineering simulations. Solving them allows scientists to predict system behavior, design new materials, optimize processes, and understand complex natural phenomena.
Examples: The paper specifically deals with:
- Burgers' Equation: A non-linear PDE that describes the propagation of shock waves, often used as a simplified model for fluid flow and turbulence.
- Darcy Flow: A linear elliptic PDE describing the flow of fluid through porous media, crucial in hydrogeology and petroleum engineering.
- Navier-Stokes Equation: A non-linear PDE that describes the motion of viscous fluid substances, forming the basis for fluid dynamics and turbulence modeling.

3.1.2. Solution Function vs. Solution Operator

When using ML to solve PDEs, there are two primary targets:

Solution Function: This approach aims to learn the specific solution u(x, t) for a single given instance of a PDE (e.g., with fixed initial conditions, boundary conditions, and parameters). A neural network typically acts as an ansatz (an educated guess or trial function) for u(x, t). Physics-Informed Neural Networks (PINNs) fall into this category.
Solution Operator: This approach aims to learn the mapping (the operator) from an entire input function space (e.g., all possible initial conditions, all possible diffusion coefficients) to an entire output solution function space. This operator, denoted $\mathcal{G}^\dagger$ , can then generate solutions for any new instance within a family of PDEs without retraining. Neural Operators (like FNO and PINO) fall into this category. The advantage is generalization over a family of problems, not just a single instance.

3.1.3. Neural Networks (NNs)

Conceptual Definition: Neural networks are computational models inspired by the structure and function of biological neural networks. They consist of interconnected nodes (neurons) organized in layers (input, hidden, output). Each connection has a weight, and each neuron has an activation function.
Functionality: NNs learn complex patterns and relationships from data by adjusting these weights and biases during a training process (e.g., backpropagation and gradient descent). They are universal approximators, meaning a sufficiently large network can approximate any continuous function.
Activation Function ( $\sigma$ ): A non-linear function applied to the output of each neuron. Common examples include ReLU, sigmoid, and GeLU (Gaussian Error Linear Unit), which the paper mentions using. Non-linearities are crucial for neural networks to learn complex, non-linear mappings.

3.1.4. Automatic Differentiation (Autograd)

Conceptual Definition: Automatic Differentiation (often shortened to autograd) is a technique for algorithmically computing the derivative of a function. Unlike symbolic differentiation (which computes an algebraic expression for the derivative) or numerical differentiation (which approximates derivatives using finite differences), autograd computes exact derivatives by systematically applying the chain rule to the elementary operations that compose the function.
Importance in ML: In machine learning, autograd is fundamental for training neural networks. It efficiently calculates the gradients of the loss function with respect to all network parameters, which are then used by optimization algorithms (like gradient descent) to update the parameters. PINNs heavily rely on autograd to compute the derivatives of the neural network's output with respect to the input variables (spatial coordinates, time) for enforcing PDE constraints.

3.1.5. Fourier Transform (FFT)

Conceptual Definition: The Fourier Transform is a mathematical operation that decomposes a function (e.g., a signal or an image) into its constituent frequencies. It transforms a function from its original domain (e.g., time or space) to the frequency domain. The Fast Fourier Transform (FFT) is an efficient algorithm for computing the discrete Fourier transform.
Importance: In numerical methods for PDEs, spectral methods leverage the Fourier transform to efficiently compute spatial derivatives and solve certain types of equations, especially for problems with periodic boundary conditions. In the context of neural operators, FNOs use the FFT to perform convolutions in the frequency domain, which are often computationally faster and more expressive than traditional convolutions in the spatial domain.

3.2. Previous Works

The paper primarily contrasts PINO with two main categories of prior work: Physics-Informed Neural Networks (PINNs) and Fourier Neural Operators (FNOs).

3.2.1. Physics-Informed Neural Networks (PINNs)

Concept: Introduced by Raissi et al. (2019) [14], PINNs use neural networks to approximate the solution function u(x, t) of a single PDE instance. They learn by minimizing a physics-informed loss function that includes two main components:
1. PDE residual loss: This term measures how well the neural network's output satisfies the governing PDE equation. It's computed by taking derivatives of the neural network's output (with respect to space and time) using autograd and plugging them into the PDE.
2. Boundary/Initial condition loss: This term ensures that the neural network's output satisfies the specified boundary and initial conditions of the PDE.
Illustrative Loss (Stationary Case): Given a stationary PDE $\mathcal{P}(u, a) = 0$ $P (u, a) = 0$ in domain $D$ $D$ with boundary condition $u=g$ $u = g$ on $\partial D$ $\partial D$ , and a neural network $u_\theta$ $u_{θ}$ approximating $u$ $u$ , the PINN loss for a single instance $a$ $a$ is: $\mathcal{L}_{\mathrm{pde}}(a, u_{\theta}) = \Big\| \mathcal{P}(a, u_{\theta}) \Big\|_{L^2(D)}^2 + \alpha \Big\| u_{\theta}|_{\partial D} - g \Big\|_{L^2(\partial D)}^2$ Here:
- $\mathcal{P}(a, u_{\theta})$ is the residual of the PDE when $u_\theta$ is substituted, and $\|\cdot\|_{L^2(D)}^2$ measures its squared $L_2$ norm over the domain $D$ . This term forces the neural network to satisfy the PDE.
- $u_{\theta}|_{\partial D}$ is the value of the neural network's output on the boundary $\partial D$ , and $g$ is the prescribed boundary condition. $\|\cdot\|_{L^2(\partial D)}^2$ measures the squared $L_2$ norm of their difference. This term forces the neural network to satisfy the boundary conditions.
- $\alpha$ is a hyperparameter balancing the importance of the boundary condition loss relative to the PDE residual loss.
Limitations (as highlighted by the paper):
- Challenging Optimization: PINNs are notorious for difficult optimization landscapes, especially for multi-scale or dynamic systems. They often struggle to converge or get stuck in local minima.
- Information Propagation: They have difficulty propagating information from initial/boundary conditions to unseen interior points or future times.
- Single Instance Learning: PINNs learn the solution for one specific PDE instance. To solve a new instance (e.g., with different initial conditions), the entire network needs to be re-optimized from scratch, making them inefficient for parametric PDE families.

3.2.2. Fourier Neural Operators (FNOs)

Concept: Introduced by Li et al. (2020, 2021) [2], FNOs are a class of neural operators designed to learn mappings between infinite-dimensional function spaces. Instead of learning a function of fixed dimensions (like traditional neural networks), FNOs learn operators that map an input function to an output function. Their key innovation is to leverage the Fast Fourier Transform (FFT) to perform convolutions efficiently in the frequency domain.
Architecture: The core of an FNO layer involves:
1. Transforming the input to the Fourier domain using FFT.
2. Applying a linear transformation (e.g., multiplication by learnable weights) to a subset of the Fourier modes (truncating high frequencies).
3. Transforming back to the spatial domain using inverse FFT.
4. Applying a pointwise non-linearity (e.g., GeLU). Multiple such layers are stacked.
Illustrative Loss: FNOs are typically trained in a supervised manner using data loss. Given a dataset of input-output function pairs $\{ (a_j, u_j) \}_{j=1}^N$ where $u_j = \mathcal{G}^\dagger(a_j)$ , the FNO minimizes the empirical average of the data loss: $\mathcal{J}_{\mathrm{data}}(\mathcal{G}_{\theta}) = \mathbb{E}_{a \sim \mu} [ \mathcal{L}_{\mathrm{data}}(a, \theta) ] \approx \frac{1}{N} \sum_{j=1}^N \int_D | u_j(x) - \mathcal{G}_{\theta}(a_j)(x) |^2 \mathrm{d}x$ Here $\mathcal{G}_{\theta}$ is the neural operator parameterized by $\theta$ , $\mathcal{L}_{\mathrm{data}}$ is a per-instance data loss (e.g., $L_2$ norm), and $\mu$ is the distribution of input functions.
Limitations (as highlighted by the paper):
- Data Intensive: FNOs are purely data-driven, requiring large amounts of high-fidelity data, which can be expensive to generate from numerical solvers or physical experiments.
- Limited Generalization without Data: They struggle to perfectly approximate the ground-truth operator when only coarse-resolution training data is available, and their generalization to unseen scenarios (e.g., different Reynolds numbers, geometries) beyond the training distribution is challenging.
- No Physics Information: As purely data-driven models, FNOs do not inherently incorporate the underlying physics equations during training, which can lead to physically inconsistent predictions or limit accuracy when data is scarce.

3.2.3. DeepONet

Another notable operator learning model, DeepONet [4], is also mentioned. DeepONet approximates operators by decomposing them into a branch network (for the input function) and a trunk network (for the query locations). While powerful, the paper notes it can be limited to a fixed grid in standard implementations and has a linear method of approximation [36]. PINO builds on the FNO architecture for its scalability.

3.3. Technological Evolution

The evolution of ML for PDEs has progressed through several stages:

Traditional Numerical Solvers: For decades, PDEs have been solved using numerical methods (FDM, FEM, spectral methods). These are accurate and well-understood but are often computationally expensive, require significant expertise for discretization and meshing, and can be slow for real-time applications or large-scale simulations.
ML for Fixed-Dimensional Regression: Early ML applications might use NNs to approximate specific aspects of solutions or to accelerate parts of numerical solvers.
Physics-Informed Neural Networks (PINNs): This marked a significant shift by using NNs directly as solvers, embedding the PDE physics into the loss function. This eliminated the need for explicit meshing and offered a data-free approach for single instances.
Data-Driven Neural Operators (FNO, DeepONet): This paradigm aimed to overcome the single-instance limitation of PINNs by learning the entire solution operator for a family of PDEs. This offered significant speedups (inference is very fast) and generalization within the learned family, but introduced a heavy reliance on large, high-quality datasets.
Hybrid Physics-Informed Neural Operators (PINO): The current paper's work represents the next evolutionary step. It seeks to combine the data efficiency and physics-awareness of PINNs with the generalization and speed of neural operators, specifically FNOs. This hybrid approach aims to mitigate the major drawbacks of both previous paradigms, offering a more robust and flexible tool for a wider range of PDE problems.

3.4. Differentiation Analysis

Compared to the main methods in related work, PINO presents several core differences and innovations:

Against PINNs:
- Operator Learning vs. Function Learning: PINN learns a specific solution function u(x,t) for a single PDE instance; PINO learns a general solution operator $\mathcal{G}: a \mapsto u$ for a family of PDEs. This provides inherent generalization to new initial/boundary conditions or parameters within that family without re-training the base model.
- Optimization Landscape: PINO leverages the operator learning phase to pre-train a robust operator ansatz, which provides a much better starting point and potentially a smoother optimization landscape during instance-wise fine-tuning compared to PINN's optimization from scratch. PINO claims to perform "optimization in the space of functions" rather than "point-wise optimization," making it more stable for multi-scale systems.
- Efficiency: PINO inference (after the operator learning phase) is significantly faster than re-optimizing a PINN for each new instance.
Against FNOs:
- Physics Constraints Integration: FNOs are purely data-driven. PINO explicitly integrates PDE loss functions into its training process. This is the fundamental difference, allowing PINO to achieve higher accuracy, better generalization (especially zero-shot super-resolution), and physical consistency, even with limited or no data.
- Data Efficiency: PINO can operate with "few to no data" by relying on the PDE constraints to generate "virtual instances" for training, thus overcoming FNO's high data requirements.
- Multi-Resolution Hybrid Loss: PINO uniquely incorporates training data at coarse resolutions and PDE constraints at higher resolutions. This allows it to learn high-fidelity operators that extrapolate to unseen higher frequencies, a capability lacking in purely data-driven FNOs.
- Instance-Wise Fine-Tuning: PINO adds an optional fine-tuning step for specific test instances, allowing it to achieve even higher accuracy by further optimizing the pre-trained operator using only physics constraints, which is not a standard feature of FNOs.
Against other hybrid methods (e.g., Physics-informed DeepONet):
- PINO's base architecture, FNO, is generally shown to be more scalable and efficient for large problems.
- PINO's ability to incorporate data and PDE loss at different resolutions and its focus on extrapolation to higher resolutions is highlighted as a unique feature not typically found in other hybrid approaches.

4. Methodology

The Physics-Informed Neural Operator (PINO) framework combines data-driven operator learning with physics-informed optimization. It is structured into two main phases: operator learning and instance-wise fine-tuning. At its core, PINO leverages the neural operator framework (specifically, the Fourier Neural Operator or FNO backbone) and enhances it by incorporating PDE loss functions alongside traditional data loss.

4.1. Principles

The core idea behind PINO is to develop a robust and generalizable method for solving parametric PDEs by addressing the limitations of existing approaches.

Hybrid Learning: Integrate available training data (even if scarce or low-resolution) with the exact mathematical description of the underlying physics (PDE constraints). This provides stronger supervision than data alone and a more stable optimization landscape than physics constraints alone.
Operator Generalization: Learn a solution operator mapping between function spaces, allowing the model to generalize across entire families of PDEs rather than solving individual instances.
Multi-Resolution Information: Utilize physics constraints at a higher resolution than the available training data. This enables the learned operator to perform zero-shot super-resolution, accurately predicting solutions at resolutions never seen during data-driven training.
Fine-Tuning for Precision: Allow for an optional instance-wise fine-tuning step where a pre-trained operator can be further optimized for a specific query instance using only physics constraints, achieving very high accuracy.

The theoretical basis relies on the universal approximation theorem for operators [1], which states that neural operators can approximate any continuous operator. PINO builds upon this by ensuring discretization convergence [1], meaning the learned operator converges to a continuum operator as the discretization resolution is refined, allowing for generalization across resolutions.

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Problem Settings

The paper considers two fundamental classes of PDEs that PINO aims to solve:

4.2.1.1. Stationary System

This describes systems where the solution does not change with time. $\begin{array}{rl} \mathcal{P}(u, a) = 0, & \qquad \mathrm{in} \ D \subset \mathbb{R}^d \\ u = g, & \qquad \mathrm{in} \ \partial D \end{array}$ Here:

$D$ : A bounded domain in $\mathbb{R}^d$ (e.g., 1D interval, 2D square).
$a \in \mathcal{A} \subseteq \mathcal{V}$ : A PDE coefficient or parameter (e.g., diffusion coefficient), which is a function itself from a Banach space $\mathcal{V}$ .
$u \in \mathcal{U}$ : The unknown solution function, residing in a Banach space $\mathcal{U}$ .
$\mathcal{P}: \mathcal{U} \times \mathcal{A} \to \mathcal{F}$ : A (possibly non-linear) partial differential operator, mapping to a Banach space $\mathcal{F}$ .
$g$ : The boundary condition, typically fixed.

This formulation defines a solution operator $\mathcal{G}^\dagger: \mathcal{A} \to \mathcal{U}$ that maps an input parameter function $a$ to its unique solution function $u$ . A common example is an elliptic equation like $\mathcal{P}(u, a) = -\nabla \cdot (a \nabla u) + f = 0$ .

4.2.1.2. Dynamical System

This describes systems where the solution evolves over time. $\begin{array}{rlr} \displaystyle \frac{d u}{d t} = \mathcal{R}(u), & \quad \mathrm{in} \ D \times (0, \infty) \\ u = g, & \quad \mathrm{in} \ \partial D \times (0, \infty) \\ u = a & \quad \mathrm{in} \ \bar{D} \times \{0\}, \end{array}$ Here:

$a = u(0) \in \mathcal{A} \subseteq \mathcal{V}$ : The initial condition, a function in space.
$u(t) \in \mathcal{U}$ : The unknown solution function at time $t$ , also a function in space.
$\mathcal{R}$ : A (possibly non-linear) partial differential operator.
$g$ : Known boundary condition.

This formulation defines a solution operator $\mathcal{G}^\dagger: \mathcal{A} \to C((0, T]; \mathcal{U})$ that maps an initial condition function $a$ to the time-evolving solution function $u$ over an interval (0, T]. Examples include Burgers' equation and Navier-Stokes equation.

4.2.2. Solving Equations using Physics-Informed Neural Networks (PINNs) - (Background for PINO)

PINNs approximate the specific solution function $u^\dagger = \mathcal{G}^\dagger(a)$ for a given instance $a$ . They use a neural network $u_\theta$ (with parameters $\theta$ ) as an ansatz for $u^\dagger$ . The parameters $\theta$ are found by minimizing a physics-informed loss using automatic differentiation (autograd) to compute exact derivatives.

4.2.2.1. PINN Loss for Stationary Systems

For stationary systems, the loss minimizes the residual of the PDE and enforces boundary conditions: $\begin{array}{l} \displaystyle \mathcal{L}_{\mathrm{pde}}(a, u_{\theta}) = \Big\| \mathcal{P}(a, u_{\theta}) \Big\|_{L^2(D)}^2 + \alpha \Big\| u_{\theta}|_{\partial D} - g \Big\|_{L^2(\partial D)}^2 \\ \displaystyle \qquad = \int_D | \mathcal{P}(u_{\theta}(x), a(x)) |^2 \mathrm{d}x + \alpha \int_{\partial D} | u_{\theta}(x) - g(x) |^2 \mathrm{d}x . \end{array}$ Here:

$\mathcal{P}(a, u_{\theta})$ : The residual of the PDE (the left-hand side of Equation (1)) when the neural network's output $u_\theta$ is plugged in.
$\| \cdot \|_{L^2(D)}^2$ : The squared $L^2$ norm over the domain $D$ , indicating the average squared error of the PDE residual.
$u_{\theta}|_{\partial D}$ : The value of the neural network's solution on the boundary $\partial D$ .
$g$ : The prescribed boundary condition.
$\| \cdot \|_{L^2(\partial D)}^2$ : The squared $L^2$ norm over the boundary $\partial D$ , indicating the average squared error of the boundary condition.
$\alpha > 0$ : A hyperparameter weighting the importance of the boundary condition loss.

4.2.2.2. PINN Loss for Dynamical Systems

For dynamical systems, the loss includes the time derivative residual, boundary, and initial conditions: $\begin{array}{l} \displaystyle \mathcal{L}_{\mathrm{pde}}(a, u_{\theta}) = \left\| \frac{d u_{\theta}}{d t} - \mathcal{R}(u_{\theta}) \right\|_{L^2(T;D)}^2 + \alpha \Big\| u_{\theta}|_{\partial D} - g \Big\|_{L^2(T;\partial D)}^2 + \beta \Big\| u_{\theta}|_{t=0} - a \Big\|_{L^2(D)}^2 \\ \displaystyle \ = \int_0^T \int_D \big | \frac{d u_{\theta}}{d t} (t, x) - \mathcal{R}(u_{\theta}) (t, x) \big |^2 \mathrm{d}x \mathrm{d}t \\ \displaystyle \qquad + \alpha \int_0^T \int_{\partial D} | u_{\theta}(t, x) - g(t, x) |^2 \mathrm{d}x \mathrm{d}t \\ \displaystyle \qquad + \beta \int_D | u_{\theta}(0, x) - a(x) |^2 \mathrm{d}x . \end{array}$ Here:

$\frac{d u_{\theta}}{d t} - \mathcal{R}(u_{\theta})$ : The residual of the time-dependent PDE (Equation (2)).
$\| \cdot \|_{L^2(T;D)}^2$ : The squared $L^2$ norm over the spatio-temporal domain $D \times (0, T]$ .
$u_{\theta}|_{\partial D}$ : Solution on the boundary, compared to $g$ .
$u_{\theta}|_{t=0}$ : Solution at initial time $t=0$ , compared to initial condition $a$ .
$\alpha, \beta > 0$ : Hyperparameters for boundary and initial condition losses, respectively.

4.2.3. Learning the Solution Operator via Neural Operator (Background for PINO)

Instead of approximating a single solution, Neural Operators (like FNO) aim to learn the solution operator $\mathcal{G}^\dagger$ itself. They are typically trained with supervised learning on a dataset of input-output function pairs $\{ a_j, u_j \}_{j=1}^N$ , where $u_j = \mathcal{G}^\dagger(a_j)$ .

4.2.3.1. Operator Data Loss

The empirical data loss for an operator $\mathcal{G}_{\theta}$ is defined as the average $L_2$ error across all available data instances: $\mathcal{J}_{\mathrm{data}}(\mathcal{G}_{\theta}) = \Vert \mathscr{G}^\dagger - \mathcal{G}_{\theta} \Vert_{L_\mu^2(\mathcal{R}; \mathcal{U})}^2 = \mathbb{E}_{a \sim \mu} [ \mathcal{L}_{\mathrm{data}}(a, \theta) ] \approx \frac{1}{N} \sum_{j=1}^N \int_D | u_j(x) - \mathcal{G}_{\theta}(a_j)(x) |^2 \mathrm{d}x .$ Here:

$\mathcal{L}_{\mathrm{data}}(u, \mathcal{G}_{\theta}(a)) = \| u - \mathcal{G}_{\theta}(a) \|_{\mathcal{U}}^2$ : The per-instance data loss, measuring the squared $L_2$ difference between the ground truth solution $u$ and the operator's prediction $\mathcal{G}_{\theta}(a)$ .
$\mathbb{E}_{a \sim \mu}[\cdot]$ : Expectation over input functions $a$ sampled from a distribution $\mu$ .
$N$ : Number of training data pairs.
$u_j, a_j$ : The $j$ -th ground truth output and input functions.

4.2.3.2. Operator PDE Loss

For PINO, a corresponding operator PDE loss is defined as the expected value of the physics-informed loss over the distribution of input functions: $\mathcal{T}_{\mathrm{pde}}(\mathcal{G}_{\theta}) = \mathbb{E}_{a \sim \mu} [ \mathcal{L}_{\mathrm{pde}}(a, \mathcal{G}_{\theta}(a)) ] .$ This loss term forces the learned operator $\mathcal{G}_{\theta}$ to produce physically consistent solutions on average for inputs sampled from $\mu$ .

4.2.4. Neural Operator Architecture

The paper focuses on the general Neural Operator model, which extends standard deep neural networks to learn mappings between function spaces.

4.2.4.1. Definition of Neural Operator $\mathcal{G}_{\theta}$

A neural operator $\mathcal{G}_{\theta}$ is constructed by composing linear integral operators with pointwise non-linear activation functions: $\mathcal{G}_{\boldsymbol{\theta}} := \mathcal{Q} \circ (\mathcal{W}_L + \mathcal{K}_L) \circ \dots \circ \sigma (\mathcal{W}_1 + \mathcal{K}_1) \circ \mathcal{P} ,$ Here:

$\mathcal{P}$ : A pointwise lifting operator (parameterized by a neural network $P: \mathbb{R}^{d_a} \to \mathbb{R}^{d_1}$ ) that maps the input function $a$ from a lower-dimensional space to a higher-dimensional feature space ( $d_1$ co-dimension).
$\mathcal{Q}$ : A pointwise projection operator (parameterized by a neural network $Q: \mathbb{R}^{d_L} \to \mathbb{R}^{d_u}$ ) that maps the final feature function back to the output function $u$ (with $d_u$ co-dimension).
$(\mathcal{W}_l + \mathcal{K}_l)$ $(W_{l} + K_{l})$ : A layer that combines a pointwise linear operator $\mathcal{W}_l$ $W_{l}$ and an integral kernel operator $\mathcal{K}_l$ $K_{l}$ .
- $\mathcal{W}_l$ : A pointwise linear operator, parameterized as a matrix $W_l \in \mathbb{R}^{d_{l+1} \times d_l}$ , acting on each point of the function.
- $\mathcal{K}_l$ : An integral kernel operator (explained below) that performs global information exchange.
$\sigma$ : A fixed non-linear activation function (e.g., GeLU), applied pointwise.
$L$ : The number of stacked layers.
$\theta$ : All the learnable parameters in $\mathcal{P}, \mathcal{Q}, \mathcal{W}_l, \mathcal{K}_l$ .

4.2.4.2. Kernel Integral Operators

The integral kernel operator $\mathcal{K}$ is defined as: $(\mathcal{K}v_l)(x) = \int_D \kappa^{(l)}(x, y) v_l(y) \mathrm{d}\nu(y) \qquad \forall x \in D .$ Here:

$v_l(y)$ : The input function to the $l$ -th kernel operator.
$\kappa^{(l)} \in C(D \times D; \mathbb{R}^{d_{l+1} \times d_l})$ : A learnable kernel function that defines the interaction between points $x$ and $y$ in the domain $D$ .
$\nu$ : A Borel measure on $D$ .

This integral operator can be discretized, for example, using a sum over a neighborhood B(x): $(\mathcal{K}v_l)(x) = \sum_{B(x)} \kappa^{(l)}(x, y) v_l(y) \qquad \forall x \in D .$ The kernel function $\kappa^{(l)}$ can also be non-linear, taking the form $\kappa^{(l)}(x, y, v_l(y))$ .

4.2.4.3. Fourier Convolution Operator (FNO Specific)

A specific and highly efficient form of the kernel integral operator is the Fourier convolution operator, used in Fourier Neural Operators (FNO): $(\mathcal{K}v_l)(x) = \mathcal{F}^{-1} \Big( R \cdot (\mathcal{F}v_l) \Big)(x) \qquad \forall x \in D .$ Here:

$\mathcal{F}$ : The Fast Fourier Transform (FFT), which transforms the function $v_l$ into the frequency domain.
$\mathcal{F}^{-1}$ : The inverse FFT.
$R$ : A learnable parameter (a diagonal matrix in the Fourier domain) that acts as a filter, truncating or scaling specific Fourier modes. This performs a convolution in the spatial domain by multiplication in the Fourier domain.

This Fourier convolution operator forms the backbone of the PINO model used in the experiments due to its speed and performance.

4.2.5. PINO Framework: Two Phases

PINO integrates operator learning with physics constraints through two distinct phases:

4.2.5.1. Phase 1: Physics-Informed Operator Learning

In this phase, PINO trains a neural operator $\mathcal{G}_{\theta}$ to approximate the target solution operator $\mathcal{G}^\dagger$ . The training leverages:

Data Supervision ( $\mathcal{J}_{\mathrm{data}}$ ): When available, training data (input-output function pairs) provides strong supervision. A key feature of PINO is that this data can be coarse-resolution.
Physics Constraints ( $\mathcal{T}_{\mathrm{pde}}$ ): The PDE loss is imposed to ensure physical validity. Crucially, PINO can impose these constraints at a higher resolution than the training data, allowing for high-fidelity reconstruction and zero-shot super-resolution.
Semi-Supervised Learning: This approach effectively turns the problem into a semi-supervised one. Even with limited labeled data, PINO can generate an unlimited amount of virtual PDE instances by sampling new initial conditions or coefficients $a_j \sim \mu$ . These virtual instances only require the PDE loss calculation.

The overall loss function for the operator learning phase would combine these components, for example: $\mathcal{L}_{\mathrm{operator\_learning}} = \mathcal{J}_{\mathrm{data}}(\mathcal{G}_{\theta}) + \lambda \mathcal{T}_{\mathrm{pde}}(\mathcal{G}_{\theta})$ where $\lambda$ is a hyperparameter to balance data and PDE losses.

4.2.5.2. Phase 2: Instance-Wise Fine-Tuning of Trained Operator Ansatz

After the operator learning phase, the pre-trained operator $\mathcal{G}_{\theta_0}$ (where $\theta_0$ denotes the initial trained parameters) can be used as an ansatz to solve for a specific query instance $a$ . This is akin to how PINNs operate, but with a crucial difference: the neural network is now a pre-trained neural operator, not a randomly initialized network.

For a new instance $a$ , PINO further optimizes the parameters $\theta$ of the operator $\mathcal{G}_{\theta}$ by minimizing a loss function specific to this instance: $\mathcal{L}_{\mathrm{fine\_tuning}} = \mathcal{L}_{\mathrm{pde}}(a, \mathcal{G}_{\theta}(a)) + \alpha \mathcal{L}_{\mathrm{op}}\left( \mathcal{G}_{\theta}(a), \mathcal{G}_{\theta_0}(a) \right)$ Here:

$\mathcal{L}_{\mathrm{pde}}(a, \mathcal{G}_{\theta}(a))$ : The standard physics-informed loss (as defined for PINNs) applied to the current instance $a$ and the operator's output $\mathcal{G}_{\theta}(a)$ . This forces the fine-tuned solution to satisfy the PDE.
$\mathcal{L}_{\mathrm{op}}\left( \mathcal{G}_{\theta_i}(a), \mathcal{G}_{\theta_0}(a) \right) = \| \mathcal{G}_{\theta_i}(a) - \mathcal{G}_{\theta_0}(a) \|_{\mathcal{U}}^2$ : An optional anchor loss (operator loss), where $\mathcal{G}_{\theta_i}(a)$ is the model at the $i$ -th fine-tuning epoch and $\mathcal{G}_{\theta_0}(a)$ is the prediction of the pre-trained operator. This term serves as a hard constraint or regularization, keeping the fine-tuned model close to the initial, well-generalized operator.
$\alpha$ : A hyperparameter for the anchor loss.

Advantages of this phase:

Easier Optimization: The pre-trained operator provides a much better starting point, making optimization faster and more stable compared to PINNs. The anchor loss further regularizes the optimization.
Function-Wise Optimization: PINO optimizes a function (parameterized by the operator) rather than just points, potentially leading to a better solution landscape.
Reduced Information Propagation Issues: Since the operator already "knows" the general dynamics, propagating information from initial/boundary conditions is less problematic.
Resolution Adaptability: This phase can be performed at higher resolutions to achieve very high accuracy for specific problems.

4.2.6. Derivatives of Neural Operators

A critical component for calculating the PDE loss is the efficient and accurate computation of derivatives (e.g., $\frac{\partial u}{\partial x}$ , $\frac{\partial u}{\partial t}$ ) of the neural operator's output. PINO explores three methods:

4.2.6.1. Numerical Differentiation

This is the simplest approach, using conventional numerical methods:

Finite Difference Methods: Approximate derivatives using differences between function values at discrete points (e.g., $u'(x) \approx \frac{u(x+h) - u(x-h)}{2h}$ ). Requires $O(n)$ computation for an $n$ -point grid.
Fourier Differentiation: For functions on uniform, periodic grids, derivatives can be computed very accurately and efficiently in the Fourier domain (multiplication by i k for the $k$ -th mode). Requires $O(n \log n)$ computation.
Pros: Fast, memory-efficient, agnostic to neural network architecture.
Cons: Introduces numerical errors, which can be amplified. Finite difference requires fine, uniform grids; spectral methods require smoothness and periodicity.

4.2.6.2. Pointwise Differentiation with Autograd

Similar to PINNs, autograd can compute exact derivatives. For a neural operator $\mathcal{G}_{\theta}$ , which typically outputs values on a grid, applying autograd directly on $u = \mathcal{G}_{\theta}(a)$ can be complex due to the FFT operations in FNO. To use autograd, a query function u(x) that takes a continuous point $x$ and outputs u(x) is constructed.

The output function u(x) can be written as $u(x) = Q(v_L(x))$ , where $v_L$ is the output of the final integral operator layer.

For Kernel Integral Operator: If the kernel function $\kappa^{(l)}$ can directly take query points as input, the query function is $u(x) = Q \left( \sum_{B(x)} \kappa^{(l)} (x, y, v_{L-1}(y)) \right)$ . The derivative u'(x) can then be computed via autograd: $u'(x) = Q' \big( v_L(x) \big) \cdot \sum_{B(x)} \kappa^{(l)'} (x, y, v_{L-1}(y)) .$ Here, $Q'$ is the derivative of the pointwise projection network $Q$ , and $\kappa^{(l)'}$ is the derivative of the kernel function with respect to $x$ .
For Fourier Convolution Operator: The output function u(x) can be written as a Fourier series composing with $Q$ : $u(x) = Q \circ \mathcal{F}^{-1} \Big( R \cdot (\mathcal{F} v_{L-1}) \Big)(x) = Q \left( \frac{1}{k_{max}} \sum_{k=0}^{k_{max}} \left( R_k (\mathcal{F} v_{L-1})_k \right) \exp \frac{i 2\pi k}{D} (x) \right) .$ The derivative u'(x) is: $u'(x) = Q' \big( v_L(x) \big) \cdot \frac{1}{k_{max}} \sum_{k=0}^{k_{max}} \big( R_k (\mathcal{F} v_{L-1})_k \big) \exp' \frac{i 2\pi k}{D} (x) .$ where $\exp' \frac{i 2\pi k}{D} (x) = \frac{i 2\pi k}{D} \exp \frac{i 2\pi k}{D} (x)$ is the derivative of the exponential term with respect to $x$ . If $x$ forms a uniform grid, this can be efficiently computed with FFT.
Pros: Exact derivatives.
Cons: Can be slower and more memory-consuming than numerical methods, especially for large networks, as it computes derivatives for each query point.

4.2.6.3. Function-Wise Differentiation

This method provides an efficient and exact computation of the full gradient field, specifically for FNOs, by explicitly writing out derivatives in the Fourier space and applying the chain rule. This avoids point-wise autograd computation. For an FNO, the derivative of the output $u'$ can be directly computed in the Fourier domain: $u' = Q'(v_L) \cdot \mathcal{F}^{-1} \left( \frac{i 2\pi}{D} K \cdot (\mathcal{F} v_L) \right) .$ Here:

$Q'(v_L)$ : The derivative of the pointwise projection $Q$ with respect to its input $v_L$ .
$\mathcal{F}^{-1} (\frac{i 2\pi}{D} K \cdot (\mathcal{F} v_L))$ : This term represents the Fourier differentiation of $v_L$ , where multiplication by $\frac{i 2\pi k}{D}$ in Fourier space corresponds to differentiation in physical space for each mode $k$ . $K$ in this context refers to the learnable parameter $R$ from the Fourier convolution. Higher-order derivatives can be computed by repeatedly applying the chain rule (e.g., $u'' = (Q v_L)'' = v_L^{\prime 2} \cdot Q''(v_L) + Q'(v_L) \cdot v_L''$ ).
Pros: Efficient and exact computation of the entire derivative field, especially for uniform grids, leveraging FFT.

4.2.6.4. Fourier Continuation

To apply Fourier differentiation accurately to non-periodic or non-smooth problems, PINO uses Fourier continuation. This technique embeds the problem domain into a larger, periodic space. This can be done by simply padding zeros to the input function. The loss is computed only in the original domain, and the FNO automatically learns to generate a smooth extension into the padded domain. This makes Fourier differentiation robust even for non-periodic boundary conditions.

4.2.7. Inverse Problem

PINO can also be applied to inverse problems, where the goal is to recover the input function $a$ (e.g., a PDE coefficient) given an observed output solution function $u^\dagger$ . The PDE loss is crucial here to ensure the recovered $a$ is physically valid. PINO proposes two formulations:

4.2.7.1. Forward Operator Model

In this approach, PINO learns the forward operator $\mathcal{G}_{\theta}: a \mapsto u$ from data. To solve the inverse problem for a given $u^\dagger$ :

An initial guess $\hat{a}$ for the unknown input $a^\dagger$ is made.
$\hat{a}$ $\overset{a}{^}$ is then iteratively optimized by minimizing the following loss function: $\mathcal{T}_{\mathrm{forward}} := \mathcal{L}_{\mathrm{pde}}(\hat{a}, u^\dagger) + \mathcal{L}_{\mathrm{data}}(\mathcal{G}_{\theta}(\hat{a}), u^\dagger) + R(\hat{a}) .$ Here:
- $\mathcal{L}_{\mathrm{pde}}(\hat{a}, u^\dagger)$ : The PDE loss, ensuring that the observed solution $u^\dagger$ is consistent with the PDE if $\hat{a}$ were the true input. This term constrains $\hat{a}$ to a physically valid manifold.
- $\mathcal{L}_{\mathrm{data}}(\mathcal{G}_{\theta}(\hat{a}), u^\dagger)$ : Measures the difference between the output of the learned forward operator $\mathcal{G}_{\theta}(\hat{a})$ and the observed output $u^\dagger$ . This term ensures the recovered $\hat{a}$ generates the observed $u^\dagger$ through the learned forward dynamics.
- $R(\hat{a})$ : A regularization term on $\hat{a}$ (e.g., total variance, smoothness prior) to guide the optimization and prevent ill-posedness.

4.2.7.2. Inverse Operator Model

This approach directly learns an inverse operator $\mathcal{F}_{\theta}: u \mapsto a$ from data. To solve the inverse problem for a given $u^\dagger$ :

The inverse operator $\mathcal{F}_{\theta}$ is applied to the observed solution $u^\dagger$ to get an initial approximation of $a^\dagger$ , denoted $\mathcal{F}_{\theta_0}(u^\dagger)$ .
The parameters $\theta$ $θ$ of $\mathcal{F}_{\theta}$ $F_{θ}$ are optimized using the following loss: $\mathcal{T}_{\mathrm{backward}} := \mathcal{L}_{\mathrm{pde}}(\mathcal{F}_{\theta}(u^\dagger), u^\dagger) + \mathcal{L}_{\mathrm{op}}(\mathcal{F}_{\theta}(u^\dagger), \mathcal{F}_{\theta_0}(u^\dagger)) + R(\mathcal{F}_{\theta}(u^\dagger)) .$ Here:
- $\mathcal{L}_{\mathrm{pde}}(\mathcal{F}_{\theta}(u^\dagger), u^\dagger)$ : The PDE loss, ensuring that the recovered input $\mathcal{F}_{\theta}(u^\dagger)$ produces the observed output $u^\dagger$ in a physically consistent manner.
- $\mathcal{L}_{\mathrm{op}}(\mathcal{F}_{\theta}(u^\dagger), \mathcal{F}_{\theta_0}(u^\dagger))$ : An anchor loss that keeps the fine-tuned inverse operator's output close to the initially learned inverse operator's output. This acts as regularization, leveraging the pre-trained inverse operator's knowledge.
- $R(\mathcal{F}_{\theta}(u^\dagger))$ : A regularization term on the recovered input $a$ .
  
  The paper finds the inverse operator model to be more accurate for recovering the coefficient function in Darcy flow. This is because the inverse operator provides a better ansatz and regularization for the coefficient function.

5. Experimental Setup

The experiments conducted in the paper aim to evaluate the efficacy of PINO across various PDE families, focusing on its ability to generalize to higher resolutions, learn with limited data, solve complex dynamic systems, and tackle inverse problems.

5.1. Datasets

PINO is evaluated on three popular PDE families: Burgers' Equation, Darcy Flow, and Navier-Stokes Equation. Each represents different challenges (non-linearity, dimensionality, type of equation).

5.1.1. Burgers' Equation

Description: A 1-D non-linear PDE, often used as a simplified model for fluid flow and turbulence, with periodic boundary conditions. $\begin{array}{rl} \partial_t u(x, t) + \partial_x (u^2(x, t) / 2) = \nu \partial_{xx} u(x, t), \qquad & x \in (0, 1), t \in (0, 1] \\ u(x, 0) = u_0(x), & x \in (0, 1) . \end{array}$ Here:
- u(x,t): The unknown scalar field (e.g., fluid velocity).
- $x$ : Spatial dimension.
- $t$ : Time dimension.
- $\partial_t u$ : Time derivative of $u$ .
- $\partial_x (u^2/2)$ : Non-linear convective term.
- $\nu \partial_{xx} u$ : Viscous diffusion term.
- $u_0(x)$ : Initial condition, a function in $L_{\mathrm{per}}^2((0, 1); \mathbb{R})$ , meaning square-integrable and periodic on the interval $(0, 1)$ .
- $\nu = 0.01$ : Viscosity coefficient.
Task: Learn the solution operator $\mathcal{G}^\dagger: u_0 \mapsto u|_{[0, 1]}$ , mapping the initial condition $u_0$ to the solution $u$ over the time interval [0,1].
Data Generation: 1,000 initial conditions $u_0 \sim \mathcal{N}(0, 625(-\Delta + 25I)^{-2})$ were used for training. These initial conditions are random functions drawn from a Gaussian process, typically leading to diverse and complex flow patterns.
Resolution: Training data was at $32 \times 25$ (spatio-temporal) resolution. PDE loss was imposed at $128 \times 100$ resolution.

5.1.2. Darcy Flow

Description: A 2-D steady-state linear elliptic PDE describing fluid flow through porous media, defined on a unit square with Dirichlet boundary conditions ( $u(x)=0$ $u (x) = 0$ on the boundary). $\begin{array}{rl} - \nabla \cdot (a(x) \nabla u(x)) = f(x) \quad & x \in (0, 1)^2 \\ u(x) = 0 \quad & x \in \partial (0, 1)^2 . \end{array}$ Here:
- u(x): The unknown scalar field (e.g., pressure).
- $x$ : Spatial coordinates $(x_1, x_2)$ in 2D.
- $\nabla \cdot$ : Divergence operator.
- $\nabla$ : Gradient operator.
- $a(x) \in L^\infty((0, 1)^2; \mathbb{R}_+)$ : A piecewise constant diffusion coefficient (e.g., representing different material properties), which is the input function for the operator.
- $f = 1$ : A fixed forcing function.
Task: Learn the solution operator $\mathcal{G}^\dagger: a \mapsto u$ , mapping the diffusion coefficient function $a$ to the solution $u$ . The operator is non-linear despite the PDE being linear.
Data Generation: 1,000 coefficient conditions $a \sim \mu$ , where $\mu = \psi_{\#}\mathcal{N}(0, (-\Delta + 9I)^{-2})$ . The function $\psi$ converts the Gaussian field into a piecewise constant coefficient: $\psi(a(x)) = 12$ if $a(x) \geq 0$ ; $\psi(a(x)) = 3$ if $a(x) < 0$ . This setup generates inputs representing heterogeneous porous media.
Resolution: Training data was at $11 \times 11$ spatial resolution. PDE loss was imposed at $61 \times 61$ resolution.

5.1.3. Navier-Stokes Equation

Description: A 2-D non-linear PDE describing the motion of viscous, incompressible fluid, formulated in vorticity form on a unit torus (periodic boundary conditions). $\begin{array}{rlr} & \partial_t w(x, t) + u(x, t) \cdot \nabla w(x, t) = \nu \Delta w(x, t) + f(x), & x \in (0, l)^2, t \in (0, T] \\ & & \nabla \cdot u(x, t) = 0, & x \in (0, l)^2, t \in [0, T] \\ & & w(x, 0) = w_0(x), & x \in (0, l)^2 . \end{array}$ Here:
- w(x,t): The unknown vorticity field.
- u(x,t): The velocity field, related to vorticity by $w = \nabla \times u$ and $\nabla \cdot u = 0$ .
- $\partial_t w$ : Time derivative of vorticity.
- $u \cdot \nabla w$ : Non-linear convective term.
- $\nu \Delta w$ : Viscous diffusion term ( $\Delta$ is the Laplacian operator).
- f(x): A fixed forcing function.
- $w_0(x)$ : Initial vorticity condition, a function in $L_{\mathrm{per}}^2((0, l)^2; \mathbb{R})$ .
- $\nu$ : Viscosity coefficient, related to the Reynolds number (Re).
Task: Learn the solution operator $\mathcal{G}^\dagger: w_0 \mapsto w|_{[0, T]}$ , mapping the initial vorticity $w_0$ to the time-evolving vorticity field $w$ .
Specific Problem Settings:
- Long Temporal Transient Flow: Simulating the flow build-up from a near-zero initial velocity to an ergodic (steady-state) condition over a long time interval.
  - Parameters: $T = 50$ , $l = 1$ , $Re = 20$ .
  - Data: $w_0 \sim \mathcal{N}(0, 7^{3/2}(-\Delta + 49I)^{-2.5})$ . Forcing $f(x) = 0.1(\sin(2\pi(x_1+x_2)) + \cos(2\pi(x_1+x_2)))$ . 4,800 training data.
- Chaotic Kolmogorov Flow: Simulating turbulent-like flow in an attractor state.
  - Parameters: $T = 0.125, 0.5$ or 1, $l = 2\pi$ , $Re = 500$ .
  - Data: Initial conditions from a Gaussian random field $N(0, 7^{3/2}(-\Delta + 49I)^{-5/2})$ .
  - Resolution: Training data at $64 \times 64 \times 33$ (spatial-temporal). PDE loss imposed at $256 \times 256 \times 65$ .
- Lid Cavity Flow: Simulating fluid flow in a confined square domain with a moving top lid, imposing no-slip boundary conditions. This is a challenging non-periodic boundary condition problem.
  - Parameters: $T = [5, 10]$ , $l = 1$ , $Re = 500$ .
  - Boundary Conditions: $u=(0,0)$ at left, bottom, right walls; $u=(1,0)$ on top.
  - Approach: PINO used with velocity-pressure formulation and Fourier numerical gradient with Fourier continuation. Resolution $65 \times 65 \times 50$ .

5.2. Evaluation Metrics

The paper primarily uses two quantitative metrics to assess model performance: relativeL_2error for solution approximation and classification accuracy for inverse problems involving piecewise constant coefficients.

5.2.1. Relative $L_2$ Error

Conceptual Definition: The relativeL_2error is a standard metric in numerical analysis and machine learning for PDEs that quantifies the normalized difference between a predicted solution and the true (ground-truth) solution. It measures the overall average difference between two functions, normalized by the magnitude of the true solution, making it a scale-independent measure of accuracy. A lower value indicates a more accurate prediction.
Mathematical Formula: For a ground-truth function $u$ and its predicted approximation $\hat{u}$ (both in a function space where the $L_2$ norm is defined), the relative $L_2$ error is calculated as: $\text{Relative } L_2 \text{ Error} = \frac{\|u - \hat{u}\|_{L_2}}{\|u\|_{L_2}}$
Symbol Explanation:
- $u$ : The true, ground-truth solution function (e.g., from a high-fidelity numerical solver or real-world observation).
- $\hat{u}$ : The predicted solution function generated by the PINO model or a baseline.
- $\|\cdot\|_{L_2}$ : Denotes the $L_2$ norm of a function. For a function $f$ defined on a domain $D$ , the $L_2$ norm is given by $\sqrt{\int_D |f(x)|^2 \mathrm{d}x}$ . In discretized settings, this translates to $\sqrt{\sum_i |f(x_i)|^2 \Delta x}$ or simply $\sqrt{\sum_i |f_i|^2}$ when normalized.
- $u - \hat{u}$ : Represents the pointwise difference between the true and predicted solutions.
- $\|u - \hat{u}\|_{L_2}$ : The $L_2$ norm of the error function, quantifying the magnitude of the prediction error.
- $\|u\|_{L_2}$ : The $L_2$ norm of the true solution, used for normalization.

5.2.2. Classification Accuracy

Conceptual Definition: In the context of inverse problems, particularly for scenarios like Darcy flow where the coefficient function a(x) is piecewise constant (representing distinct material types), the problem of recovering a(x) can be framed as a classification task for each spatial point. Classification accuracy then measures the proportion of correctly identified material types across the domain.
Mathematical Formula: $\text{Accuracy} = \frac{\text{Number of Correctly Predicted Points}}{\text{Total Number of Points in Domain}}$
Symbol Explanation:
- Number of Correctly Predicted Points: The count of spatial grid points where the predicted coefficient $\hat{a}(x_i)$ matches the true coefficient $a^\dagger(x_i)$ .
- Total Number of Points in Domain: The total number of spatial grid points for which a coefficient prediction is made.

5.3. Baselines

The effectiveness of PINO is benchmarked against several established and state-of-the-art methods in the field of ML for PDEs:

Fourier Neural Operator (FNO): This is the primary data-driven neural operator baseline [2, 15]. It serves as a direct comparison for PINO's operator learning capabilities, highlighting the impact of adding physics constraints and zero-shot super-resolution.
DeepONet: Another prominent neural operator framework [4]. It's used to show PINO's competitive or superior performance in learning solution operators. The paper mentions a grid search for its hyperparameters for Darcy flow.
Physics-Informed Neural Networks (PINNs): The foundational physics-informed method [14]. PINO is compared against PINNs to demonstrate its advantages in terms of optimization stability, speed, and ability to handle more complex, multi-scale, or long-time dynamic systems.
PINN Variants: The paper also compares against improved versions of PINNs:
- LAAF-PINN (Locally Adaptive Activation Functions for PINN) [54]: This variant uses learnable parameters before activation functions to improve PINN optimization.
- SA-PINN (Self-Adaptive PINN) [55]: This variant adds weight parameters for each collocation point to improve PINN optimization. These comparisons show that PINO's architectural advantages provide more significant improvements than localized optimization enhancements for PINNs.
UNet + Trilinear Interpolation: For evaluating zero-shot super-resolution, a standard UNet architecture (a convolutional neural network often used for image-to-image tasks) is trained and then combined with trilinear interpolation to upscale its predictions to higher resolutions. This baseline highlights that traditional image super-resolution techniques are insufficient for complex PDE dynamics compared to operator learning with physics constraints.
GPU-based Pseudo-spectral Solver: This is a high-fidelity numerical solver (e.g., [16] for Navier-Stokes) used as a ground-truth generator and a benchmark for computational speed (acceleration factor).
Accelerated Markov Chain Monte Carlo (MCMC): For inverse problems, conventional Bayesian methods like MCMC [17] are used as a reference. This comparison emphasizes PINO's significant speed advantage in recovering unknown parameters.

6. Results & Analysis

The experimental results demonstrate PINO's superior performance across various PDE families, particularly in zero-shot super-resolution, data efficiency, and solving complex flows, while maintaining computational speed-ups.

6.1. Core Results Analysis

6.1.1. Operator Learning with Physics Constraints (Super-resolution and Data Efficiency)

The first set of experiments highlights how integrating PDE loss during operator training (operator learning phase) enables zero-shot super-resolution and reduces data requirements.

The following are the results from Table 1 of the original paper:

PDE Training setting	Error at low data resolution	Error at 2x data resolution	Error at 4x data resolution
Data Burgers	0.32±0.01%	3.32±0.02%	3.76±0.02%
Data and PDE loss	0.17±0.01%	0.28±0.01%	0.38±0.01%
Data Darcy	5.41±0.12%	9.01±0.07%	9.46±0.07%
Data and PDE loss	5.23±0.12%	1.56±0.05%	1.58±0.06%
Data Kolmogorov flow	8.28%±0.15%	8.27%±0.15%	8.30%±0.15%
Data and PDE loss	6.04%±0.12%	6.02%±0.12%	6.01%±0.12%

Analysis of Table 1:

Burgers' Equation: When trained only with data (Data Burgers), the model shows good accuracy at the training resolution (0.32%) but significantly degrades at higher resolutions (3.32% at 2x, 3.76% at 4x). However, by adding PDE loss (Data and PDE loss), PINO maintains high accuracy across all resolutions (0.17% at 1x, 0.28% at 2x, 0.38% at 4x). This clearly demonstrates PINO's ability for zero-shot super-resolution.
Darcy Flow: Similar trends are observed. The data-only model struggles at higher resolutions (9.01% at 2x, 9.46% at 4x), while PINO with PDE loss drastically improves performance at 2x and 4x resolutions (1.56% and 1.58%). This is particularly notable as Darcy flow was unresolved at the low training resolution, and the higher-resolution PDE loss helped the operator learn the correct physics.
Kolmogorov Flow: PINO with PDE loss consistently lowers the error across all resolutions (e.g., 6.01% at 4x) compared to the data-only approach (8.30% at 4x), indicating improved generalization.

The following are the results from Table 2 of the original paper:

Method Solution error

DeepONet with data [4] 6.97 ± 0.09%

PINO with data 1.22 ± 0.03%

PINO w/o data 1.50 ± 0.03%

Analysis of Table 2 (Darcy Flow):

PINO with data significantly outperforms DeepONet (1.22% vs. 6.97%), demonstrating the advantage of the FNO backbone combined with physics constraints.
Remarkably, PINO w/o data (trained purely on PDE loss) still achieves excellent performance (1.50%), only slightly worse than PINO with data. This highlights PINO's ability to learn complex operators effectively even without any labeled solution data, solving the data scarcity issue of FNOs.

The following are the results from Table 3 of the original paper:

# data samples # PDE instances Solution error

0 2,200 6.22%±0.11%

800 2,200 6.01%±0.12%

2,200 2,200 5.04%±0.11%

Analysis of Table 3 (Kolmogorov Flow, $T=0.125$ ):

This table demonstrates PINO's flexibility and effectiveness in combining physics constraints with varying amounts of available data. Even with 0 data samples, PINO achieves a respectable 6.22% error.
As more low-resolution data is added (800 and 2,200 samples, alongside 2,200 PDE instances), the solution error consistently decreases (6.01% to 5.04%). This indicates that data still provides stronger supervision and an easier optimization landscape, but the physics constraints provide a strong baseline and allow for effective learning when data is scarce.

6.1.2. Solving Equation Using Operator Ansatz (Accuracy and Speedup)

6.1.2.1. Chaotic Kolmogorov Flow

The paper compares PINO's instance-wise fine-tuning performance against PINN and its improved variants.

The following figure (Figure 4 from the original paper) shows the test relative $L_2$ error versus runtime step for the Kolmogorov flow:

Fig. 5.PINO on Kolmogorov flow (left) and Lid-cavity flow (right). Analysis of Figure 4:

The plot shows that PINO (blue and green lines) converges much faster and to significantly lower errors than PINN, LAAF-PINN, and SA-PINN. PINO achieves very low errors within hundreds of runtime steps, whereas PINN variants struggle to reach similar accuracy even after thousands of steps.
This demonstrates the advantage of using a pre-trained neural operator ansatz for instance-wise fine-tuning. The pre-trained operator provides a much better starting point for optimization, leading to faster convergence and higher accuracy compared to PINN-type methods that start from scratch or rely on localized optimization improvements.

The following are the results from Table 4 of the original paper:

Method # data samples # PDE instances Solution error (w) Time cost

PINNs - - 18.7% 4,577 s

PINO 0 0 0.9% 608 s

PINO 0.4 k 0 0.9% 536 s

PINO 0.4 k 160 k 0.9% 473 s

Analysis of Table 4 (Kolmogorov Flow, $Re=500, T=0.5$ ):

PINO consistently achieves a significantly lower solution error (0.9%) compared to PINNs (18.7%). This is a 20x reduction in error.
PINO also exhibits a substantial speedup, with time costs ranging from 473s to 608s, compared to 4,577s for PINNs. This represents approximately a 7x to 9x speedup.
The table shows that even PINO trained with 0 data and 0 PDE instances for this specific test case (implying it relies purely on the pre-trained operator and then fine-tunes on the instance's PDE loss) performs dramatically better than PINNs. Adding some data and more PDE instances during the initial operator learning phase further reduces the fine-tuning time. This strongly supports the hypothesis that using a learned operator as an ansatz improves fine-tuning convergence.

6.1.2.2. Zero-Shot Super-Resolution

The figure below (Figure 1 from the original paper) shows the spectral energy distribution of Kolmogorov flows:

该图像是一个图表，展示了Kolmogorov流动的谱能量分布。不同曲线分别代表了NN+插值（红色）、FNO（蓝色）和PINO（绿色，结合了数据与PDE）的结果，同时显示了真实值（虚线）。训练和测试区域通过箭头标识，说明模型性能的变化。 Analysis of Figure 1:

This figure visually confirms PINO's zero-shot super-resolution capability. The graph shows the energy spectrum, where higher frequencies correspond to finer details in the flow.
$NN+Interpolation$ (red line) shows severe distortions at higher frequencies (beyond the training resolution, indicated by the arrow), indicating poor extrapolation.
FNO (blue line) follows the general trend of the ground truth but cannot perfectly match the spectrum in the super-resolution regime, indicating its data-driven limitation.
PINO (green line), especially with test-time optimization, perfectly extrapolates to unseen higher frequencies, and its spectrum almost perfectly overlaps with the Ground Truth (dotted line). This is a strong validation of PINO's ability to learn high-fidelity operators by imposing PDE constraints at higher resolutions.

6.1.2.3. Transfer Reynolds Numbers

The following figure (Figure 8 from the original paper) shows the plot of relative $L_2$ error versus update step for the Kolmogorov flow with Reynolds number 500, $T=1$ :

Fig. 9. The accuracy-complexity trade-off on PINO, PINN, and the GPU-based pseudo-spectral solver. Analysis of Figure 8:

This plot demonstrates PINO's transfer learning capability across different Reynolds numbers. The lines represent fine-tuning starting from operator ansatzes trained on various Reynolds numbers (e.g., 100, 200, etc.) compared to from scratch (no pre-training).

All pre-trained operator ansatzes (colored lines) lead to faster convergence and generally lower errors during instance-wise fine-tuning compared to starting from scratch (orange line). This indicates that PINO learns general dynamics shared across different Reynolds numbers, allowing for efficient transfer.

The following are the results from Table 8 of the original paper:

Testing Re	From scratch	100	200	250	300	350	400	500
500	0.0493	0.0383	0.0393	0.0315	0.0477	0.0446	0.0434	0.0436
400	0.0296	0.0243	0.0245	0.0244	0.0300	0.0271	0.0273	0.0240
350	0.0192	0.0210	0.0211	0.0213	0.0233	0.0222	0.0222	0.0212
300	0.0168	0.0161	0.0164	0.0151	0.0177	0.0173	0.0170	0.0160
250	0.0151	0.0150	0.0153	0.0151	0.016	0.0156	0.0160	0.0151
200	0.00921	0.00913	0.00921	0.00915	0.00985	0.00945	0.00923	0.00892
100	0.00234	0.00235	0.00236	0.00235	0.00239	0.00239	0.00237	0.00237

Analysis of Table 8:

This table quantitatively supports the transfer learning capability. Each row represents testing on a specific Reynolds number (Re), while columns show the error when starting fine-tuning from an operator pre-trained on a different Re (or from scratch).
For most Testing Re values, starting fine-tuning from a pre-trained ansatz (any column other than From scratch) yields lower relativeL_2errors compared to starting From scratch. For example, for $Testing Re=500$ , starting From scratch gives 0.0493 error, while starting from $Re=250$ gives 0.0315 error.
This indicates that the learned operator captures underlying fluid dynamics principles that are transferable across a range of Reynolds numbers, rather than just memorizing solutions for a specific Re. This is a crucial property for practical applications where exact training conditions might not always match deployment conditions.

6.1.2.4. Lid Cavity Flow

The following figure (Figure 5 from the original paper) shows PINO on Kolmogorov flow (left) and Lid-cavity flow (right):

$Fig. 6. In the above figures, (6(a)) represents the ground truth input function $a ^ { \\dagger }$ , and (6(d)) demonstrates the corresponding solution $u ^ { \\dagger }$ , that is, the output function…$ Analysis of Figure 5 (Right):

The Lid-cavity flow is a challenging problem due to its non-periodic boundary conditions and the need to solve for velocity and pressure fields. PINO demonstrates its capability to handle this by accurately predicting the ground truth velocity field.
It achieves a relative error of 14.52% in 2 minutes, using a velocity-pressure formulation and Fourier numerical gradient with Fourier continuation. This indicates PINO's flexibility to address problems with complex boundary conditions and multiple output fields, even without a prior operator-learning phase (directly instance-wise fine-tuning).

6.1.2.5. Convergence of Accuracy with Respect to Resolution

The following are the results from Table 5 of the original paper:

dtdx	2-6	2-7	2-8	2-9	2-10
2-4	0.4081	0.3150	0.3149	0.3179	0.3196
2-5	0.1819	0.1817	0.1780	0.1773	0.1757
2-6	0.0730	0.0436	0.0398	0.0386	0.0382
2-7	0.0582	0.0234	0.0122	0.0066	0.0034

Analysis of Table 5:

This table shows the relativeL_2error of PINO (in instance-wise optimization mode without data) on Kolmogorov flow as spatial resolution ( $dx = 2^{-6}$ to $2^{-10}$ ) and temporal resolution ( $dt = 2^{-4}$ to $2^{-7}$ ) are varied.
The results demonstrate that PINO inherits the convergence rate of its underlying differentiation methods (Fourier method in space, finite difference in time) without significant limitations from the optimization process.
As dx decreases (finer spatial resolution), the error decreases exponentially (e.g., for $dt=2^{-7}$ , error goes from 0.0582 to 0.0034).
As dt decreases (finer temporal resolution), the error decreases linearly (e.g., for $dx=2^{-10}$ , error goes from 0.3196 to 0.0034).
This indicates that PINO can achieve high accuracy by simply increasing the resolution, confirming its discretization convergence and that the PDE constraint can yield results comparable to those from a solver, especially when applied to an unlimited amount of virtual instances.

6.1.3. Inverse Problem

The following figure (Figure 6 from the original paper) shows the inversion process for Darcy Flow:

该图像是一个示意图，展示了不同方法生成的解的比较，包括真实解、前向PINO、逆向PINO 和求解器+MCMC。通过这些图像可以观察到，前向和逆向PINO相对于真实解在图形变化上存在一定差异，且后续方法结合MCMC进一步改善了解的精确性。 Analysis of Figure 6:

This figure clearly illustrates the importance of PDE constraints in inverse problems.
(6(a)) Ground Truth Inputa^\dagger $is the target coefficient function, and `(6(d)) Output`u^\dagger$ is the observed solution.
(6(b)) Inversion with Data Constraint Only: The recovered input $\hat{a}$ is visually inaccurate, even though the generated output $(6(e))$ is close to $u^\dagger$ . This means a data-only model can find an $\hat{a}$ that looks like it produces the correct $u^\dagger$ but is physically invalid (i.e., it doesn't satisfy the PDE). This highlights the domain-shift problem in optimization-based inverse problems.
(6(c)) Inversion with Data and PDE Constraints: The recovered input $\hat{a}$ is visually much closer to the ground truth $a^\dagger$ , and the generated output $(6(f))$ is also close to $u^\dagger$ . The PDE constraints restrict the search space to physically valid inputs, leading to a more accurate and meaningful inverse solution.

The following figure (Figure 7 from the original paper) shows the comparison of solutions generated by different methods:

$Fig. 8. Plot of relative `L _ { 2 }` error versus update step for the Kolmogorov flow with Reynolds number 500, $T = 1$ . The test error is averaged over 4instances. Weobserve that all theoperator an…$ Analysis of Figure 7:
This figure provides a qualitative comparison of the recovered coefficient functions for the Darcy inverse problem.
Ground Truth (top left) is the target.
PINO Forward Model and PINO Inverse Model both produce visually plausible reconstructions.
Solver + MCMC also produces a reasonable reconstruction.
The visual comparison suggests that PINO Inverse Model (which achieved 97.10% classification accuracy) is the most accurate.

Quantitative Comparison of Inverse Models:

PINO Inverse Model: Achieves 2.29% relative $L_2$ error on the output $u$ and 97.10% classification accuracy on the input $a$ .
PINO Forward Model: Achieves 6.43% error on $u$ and 95.38% accuracy on $a$ .
Conventional Solvers (Accelerated MCMC): Achieves 4.52% error on $u$ and 90.30% accuracy on $a$ .

Key Findings:

The inverse operator model (learning $\mathcal{F}_\theta: u \mapsto a$ ) performs best, outperforming the forward operator model and MCMC. The pre-trained inverse operator acts as an effective ansatz and regularization.
PINO methods are drastically faster than MCMC (approx. 3000x speedup), with offline training taking around 1 hour on a single GPU. PINN, in contrast, failed to converge in this case.
This demonstrates that PINO offers a highly efficient and accurate solution for inverse problems, providing physically valid results due to the integrated PDE loss.

6.2. Data Presentation (Tables)

The following are the results from Table 6 of the original paper:

Test resolution	FNO	PINO
64x64x33	9.73± 0.15%	6.30±0.11%
128x128x33	9.74± 0.16%	6.28±0.11%
256x256x65	9.84± 0.16%	6.22±0.11%

Analysis of Table 6 (Kolmogorov Flow, training on $32 \times 32 \times 17$ ):

This table further emphasizes PINO's super-resolution capability, showing performance when both FNO and PINO are trained on very low-resolution data ( $32 \times 32 \times 17$ ).
FNO shows consistent but high errors across all test resolutions (around 9.7-9.8%), indicating its struggle to generalize to higher resolutions when trained on coarse data.

PINO, however, maintains significantly lower errors (around 6.2-6.3%) across all test resolutions, including resolutions much higher than its training data ( $256 \times 256 \times 65$ ). This reinforces the conclusion that integrating PDE loss at higher resolutions enables PINO to learn physics and extrapolate to unseen frequencies effectively.

The following are the results from Table 7 of the original paper:

# data samples	# data samples # additional PDE instances	Resolution	Solution error	Equation error
400	0	128 × 128 × 65	33.32%	1.8779
		64 × 64 × 65	33.31%	1.8830
		32 × 32 × 33	30.61%	1.8421
400	40 k	128 × 128 × 65	31.74%	1.8179
		64 × 64 × 65	31.72%	1.8227
		32 × 32 × 33	29.60%	1.8296
400	160 k	128 × 128 × 65	31.32%	1.7840
		64 × 64 × 65	31.29%	1.7864
		32 × 32 × 33	29.28%	1.8524
4 k	0	128 × 128 × 65	25.15%	1.8223
		64 × 64 × 65	25.16%	1.8257
		32 × 32 × 33	21.41%	1.8468
4 k	100 k	128 × 128 × 65	24.15%	1.6112
		64 × 64 × 65	24.11%	1.6159
		32 × 32 × 33	20.85%	1.8251
4 k	400 k	128 × 128 × 65	24.22%	1.4596
		64 × 64 × 65	23.95%	1.4656
		32 × 32 × 33	20.10%	1.9146
0	100 k	128 × 128 × 65	74.36%	0.3741
		64 × 64 × 65	74.38%	0.3899
		32 × 32 × 33	74.14%	0.5226

Analysis of Table 7 (Kolmogorov Flow, $Re=500, T=0.5$ ):

This table investigates the effect of additional PDE instances (virtual data points generated purely by the PDE constraints) on solution error and equation error (the PDE residual).
Impact of PDE Instances: Comparing rows with the same number of data samples but increasing PDE instances (e.g., 400 data vs. 400 data + 40k PDE vs. 400 data + 160k PDE), the solution error consistently decreases, and the equation error generally decreases or remains stable. This indicates that adding PDE constraints helps improve the generalization ability of the operator.
Impact of Data Samples: Increasing the number of data samples from 400 to 4k leads to a significant reduction in solution error (e.g., from ~33% to ~25% at 128x128x65 resolution for 0 PDE instances). This shows that while physics constraints are powerful, supervised data still provides stronger direct guidance.
Pure PDE Training (0 data, 100k PDE): This scenario yields a very high solution error (around 74%) but a very low equation error (around 0.3-0.5%). This is an interesting trade-off: the model perfectly satisfies the PDE (low equation error) but its solution might not be close to the ground truth (high solution error). This can happen if the problem is under-constrained or if the PDE loss alone isn't sufficient to pinpoint the unique solution operator in the absence of any data-driven guidance. However, PINO's strength lies in combining these to balance.

6.3. Ablation Studies / Parameter Analysis

While the paper doesn't present classical ablation studies (removing components of the PINO architecture), it implicitly performs several analyses that serve a similar purpose by demonstrating the incremental value of its components:

Impact of PDE Loss (vs. Data-only FNO): Tables 1, 2, 3, 6 directly compare Data-only FNO (or conceptually, a data-only operator) against PINO (Data + PDE loss). This clearly shows that PDE loss is crucial for super-resolution, data efficiency, and overall generalization accuracy.
Impact of Number of PDE Instances: Table 7 shows that increasing additional PDE instances (virtual data points) consistently improves the solution error of the operator, demonstrating the value of physics constraints as a form of "synthetic data."
Impact of Instance-Wise Fine-Tuning: The comparisons in Figure 4 and Table 4 (PINO vs. PINN variants) and the results on Long Temporal Transient Flow and Chaotic Kolmogorov Flow highlight that using a pre-trained operator ansatz and performing instance-wise fine-tuning significantly boosts accuracy and convergence speed compared to starting from scratch. The anchor loss $\mathcal{L}_{\mathrm{op}}$ also plays a role here in stabilizing fine-tuning.
Transfer Learning Across Reynolds Numbers: Table 8 and Figure 8 analyze the effect of using pre-trained operator ansatzes from different Reynolds numbers, essentially demonstrating the reusability and generalizability of the learned operators.

These analyses collectively validate the design choices of PINO, showing that the integration of physics constraints, multi-resolution training, and the two-phase learning scheme are effective.

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper successfully introduces the Physics-Informed Neural Operator (PINO) framework, which thoughtfully bridges the gap between physics-informed optimization (PINNs) and data-driven neural operator learning (FNOs). PINO's core innovation lies in its hybrid learning approach, which synergistically combines available training data (even low-resolution) with rigorous physics constraints derived from the governing PDEs. The framework's two-phase learning strategy—an operator learning phase followed by an instance-wise fine-tuning phase—enables it to learn a robust and generalizable solution operator across parametric PDE families.

Key findings demonstrate PINO's ability to:

Achieve Zero-Shot Super-Resolution: Accurately predict solutions at resolutions significantly higher than the training data by imposing PDE constraints at higher resolutions.
Operate with High Data Efficiency: Learn complex operators effectively with few to no labeled training data, addressing a major limitation of purely data-driven methods.
Outperform Baselines: Consistently yield higher accuracy and faster convergence compared to PINNs and FNOs on diverse and challenging PDEs, including Burgers', Darcy, and Navier-Stokes equations (e.g., complex turbulent flows and long temporal transients).
Solve Inverse Problems Efficiently: Apply to inverse problems with high accuracy and remarkable speed-ups (e.g., 3000x faster than MCMC), ensuring physically valid solutions through PDE loss.
Demonstrate Transferability: Generalize across different parameters (e.g., Reynolds numbers) and handle complex boundary conditions (e.g., Lid-cavity flow), enhancing its practical utility.

Overall, PINO presents a compelling advancement in ML for PDEs, offering a versatile tool that combines the best aspects of data-driven and physics-informed paradigms.

7.2. Limitations & Future Work

The authors acknowledge several limitations and suggest exciting directions for future research:

7.2.1. Limitations

Scalability to Higher Dimensions: Since PINO is currently implemented with an FNO backbone that heavily relies on FFT, extending it to very high-dimensional problems (e.g., 3D+ spatial domains) can be challenging due to the computational cost and memory requirements of FFT in higher dimensions.
Optimization Efficiency in Fine-Tuning: While instance-wise fine-tuning improves accuracy, the authors note that gradient descent methods may not converge as fast as using finer grids. This suggests that the optimization landscape during fine-tuning could still be challenging, and more advanced optimization techniques might be needed.
Trade-off of Accuracy and Complexity: The paper implies an ongoing challenge in balancing the computational complexity of the model with the desired accuracy, especially when pushing for very high resolutions or long time horizons.

7.2.2. Future Work

Transferring PINN Techniques: Explore how various techniques and analyses developed for PINNs (e.g., adaptive weighting strategies, domain decomposition methods) can be effectively transferred and integrated into the PINO framework.
Overcoming Trade-offs: Investigate methods to mitigate the hard trade-off between accuracy and computational complexity, perhaps through more efficient architectures or dynamic resolution adaptation.
Generalization Across Geometries: Research how to make PINO models transfer effectively across different geometries, which is a major challenge for many ML-based PDE solvers. Fourier continuation is a step in this direction but more generalized approaches are needed.
Software Library Development: Develop a software library of pre-trained PINO models, which would make the technology more accessible and applicable for a broad set of conditions, leveraging PINO's excellent extrapolation property.

7.3. Personal Insights & Critique

7.3.1. Personal Insights

The PINO paper offers a truly significant step forward in the field of ML for PDEs. The most impactful insight is the elegant integration of physics constraints at higher resolutions with low-resolution data within an operator learning framework. This multi-resolution hybrid loss is a powerful concept that effectively tackles the zero-shot super-resolution problem, a critical challenge for data-driven models. The idea of using a pre-trained neural operator as an ansatz for instance-wise fine-tuning is also brilliant; it effectively transforms the notoriously difficult optimization problem of PINNs into a much more tractable one. This two-phase approach allows for broad generalization while retaining high accuracy for specific problems.

The application to inverse problems is particularly compelling. By embedding the PDE loss, PINO ensures that inferred parameters are physically valid, avoiding the pitfalls of purely data-driven inversion that might yield non-physical results. The demonstrated 3000x speedup over MCMC highlights its potential to revolutionize scientific discovery and engineering design.

The claim that PINO performs "optimization in the space of functions" rather than "point-wise optimization" (like PINNs) hints at a deeper theoretical advantage. While the paper provides practical evidence, further theoretical exploration of how the operator architecture intrinsically alters the optimization landscape could be a fascinating avenue.

7.3.2. Critique

Despite its strengths, some aspects could benefit from further clarification or exploration:

Balancing Hybrid Loss: The hyperparameters ( $\alpha, \beta, \lambda$ ) for balancing data loss and PDE loss (and potentially anchor loss) are crucial. While mentioned, the paper does not delve deeply into robust strategies for tuning these. The interplay between these loss terms, especially when data is scarce or at very low resolution, could be sensitive.
Computational Cost of High-Resolution PDE Loss: While beneficial for super-resolution, calculating PDE loss at high resolutions (e.g., $256 \times 256 \times 65$ for Kolmogorov flow) can still be computationally intensive during the operator learning phase. The trade-off between the resolution of PDE constraints and overall training time is an important practical consideration.
Theoretical Justification for Optimization Landscape Improvement: The statement "Optimization of the set of coefficients and basis is easier than just optimizing a single function as in PINNs" for the operator ansatz is empirically supported but could benefit from more formal mathematical analysis regarding the smoothness or convexity properties of the operator's loss landscape compared to a traditional neural network's loss landscape.
Generalization to Arbitrary Geometries: While Fourier continuation is a clever trick for non-periodic boundaries, FNO (and thus PINO) still struggles with truly arbitrary, complex geometries. The paper mentions "neural operators with deformations for PDEs on general geometries" [9] as a related work, suggesting this is an active research area. PINO's inherent reliance on Fourier transforms makes it less naturally suited for highly irregular domains compared to, for example, Graph Neural Operators or mesh-based neural networks.
Interpretation of "Equation Error" vs. "Solution Error": In Table 7, for the "0 data, 100k PDE instances" case, there's a very low equation error but a very high solution error. This indicates that the model perfectly satisfies the PDE but is far from the true solution. This is a crucial point: simply satisfying the PDE doesn't guarantee the correct solution without sufficient constraints (e.g., boundary/initial conditions, or direct data to define the specific solution within the family of PDE solutions). While PINO balances this, it highlights a potential pitfall of purely physics-informed approaches when the problem is ill-posed or under-constrained in specific scenarios.

The potential for PINO in areas like weather forecasting, airfoil designs, and turbulence control is immense. Its ability to learn general physical laws from minimal data and extrapolate to high fidelity opens doors for faster, more accurate, and more accessible scientific simulations.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.

Method	Solution error
DeepONet with data [4]	6.97 ± 0.09%
PINO with data	1.22 ± 0.03%
PINO w/o data	1.50 ± 0.03%

# data samples # PDE instances	Solution error
0 2,200	6.22%±0.11%
800 2,200	6.01%±0.12%
2,200 2,200	5.04%±0.11%

Method	# data samples	# PDE instances	Solution error (w)	Time cost
PINNs	-	-	18.7%	4,577 s
PINO	0	0	0.9%	608 s
PINO	0.4 k	0	0.9%	536 s
PINO	0.4 k	160 k	0.9%	473 s

Physics-Informed Neural Operator for Learning Partial Differential Equations

TL;DR Summary

Abstract

Mind Map

In-depth Reading

English Analysis~44 min read · 60,834 chars

1. Bibliographic Information

1.1. Title

1.2. Authors

1.3. Journal/Conference

1.4. Publication Year

1.5. Abstract

1.6. Original Source Link

2. Executive Summary

2.1. Background & Motivation

2.2. Main Contributions / Findings

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

3.1.1. Partial Differential Equations (PDEs)

3.1.2. Solution Function vs. Solution Operator

3.1.3. Neural Networks (NNs)

3.1.4. Automatic Differentiation (Autograd)

3.1.5. Fourier Transform (FFT)

3.2. Previous Works

3.2.1. Physics-Informed Neural Networks (PINNs)

3.2.2. Fourier Neural Operators (FNOs)

3.2.3. DeepONet

3.3. Technological Evolution

3.4. Differentiation Analysis

4. Methodology

4.1. Principles

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Problem Settings

4.2.1.1. Stationary System

4.2.1.2. Dynamical System

4.2.2. Solving Equations using Physics-Informed Neural Networks (PINNs) - (Background for PINO)

4.2.2.1. PINN Loss for Stationary Systems

4.2.2.2. PINN Loss for Dynamical Systems

4.2.3. Learning the Solution Operator via Neural Operator (Background for PINO)

4.2.3.1. Operator Data Loss

4.2.3.2. Operator PDE Loss

4.2.4. Neural Operator Architecture

4.2.4.1. Definition of Neural Operator Gθ\mathcal{G}_{\theta}Gθ​

4.2.4.2. Kernel Integral Operators

4.2.4.3. Fourier Convolution Operator (FNO Specific)

4.2.5. PINO Framework: Two Phases

4.2.5.1. Phase 1: Physics-Informed Operator Learning

4.2.5.2. Phase 2: Instance-Wise Fine-Tuning of Trained Operator Ansatz

4.2.6. Derivatives of Neural Operators

4.2.6.1. Numerical Differentiation

4.2.6.2. Pointwise Differentiation with Autograd

4.2.6.3. Function-Wise Differentiation

4.2.6.4. Fourier Continuation

4.2.7. Inverse Problem

4.2.7.1. Forward Operator Model

4.2.7.2. Inverse Operator Model

5. Experimental Setup

5.1. Datasets

5.1.1. Burgers' Equation

5.1.2. Darcy Flow

5.1.3. Navier-Stokes Equation

5.2. Evaluation Metrics

5.2.1. Relative L2L_2L2​ Error

5.2.2. Classification Accuracy

5.3. Baselines

6. Results & Analysis

6.1. Core Results Analysis

6.1.1. Operator Learning with Physics Constraints (Super-resolution and Data Efficiency)

6.1.2. Solving Equation Using Operator Ansatz (Accuracy and Speedup)

6.1.2.1. Chaotic Kolmogorov Flow

6.1.2.2. Zero-Shot Super-Resolution

6.1.2.3. Transfer Reynolds Numbers

6.1.2.4. Lid Cavity Flow

6.1.2.5. Convergence of Accuracy with Respect to Resolution

6.1.3. Inverse Problem

6.2. Data Presentation (Tables)

6.3. Ablation Studies / Parameter Analysis

7. Conclusion & Reflections

7.1. Conclusion Summary

7.2. Limitations & Future Work

4.2.4.1. Definition of Neural Operator $\mathcal{G}_{\theta}$

5.2.1. Relative $L_2$ Error