AiPaper
Paper status: completed

HyPINO: Multi-Physics Neural Operators via HyperPINNs and the Method of Manufactured Solutions

Published:09/05/2025
Original LinkPDF
Price: 0.10
Price: 0.10
4 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

HyPINO is introduced as a multi-physics neural operator for zero-shot generalization across various PDEs without task-specific fine-tuning, combining a Swin Transformer hypernetwork and mixed supervision for improved accuracy in benchmarks.

Abstract

We present HyPINO, a multi-physics neural operator designed for zero-shot generalization across a broad class of PDEs without requiring task-specific fine-tuning. Our approach combines a Swin Transformer-based hypernetwork with mixed supervision: (i) labeled data from analytical solutions generated via the Method of Manufactured Solutions (MMS), and (ii) unlabeled samples optimized using physics-informed objectives. The model maps PDE parameterizations to target Physics-Informed Neural Networks (PINNs) and can handle linear elliptic, hyperbolic, and parabolic equations in two dimensions with varying source terms, geometries, and mixed Dirichlet/Neumann boundary conditions, including interior boundaries. HyPINO achieves strong zero-shot accuracy on seven benchmark problems from PINN literature, outperforming U-Nets, Poseidon, and Physics-Informed Neural Operators (PINO). Further, we introduce an iterative refinement procedure that treats the residual of the generated PINN as "delta PDE" and performs another forward pass to generate a corrective PINN. Summing their contributions and repeating this process forms an ensemble whose combined solution progressively reduces the error on six benchmarks and achieves a >100x lower L2L_2 loss in the best case, while retaining forward-only inference. Additionally, we evaluate the fine-tuning behavior of PINNs initialized by HyPINO and show that they converge faster and to lower final error than both randomly initialized and Reptile-meta-learned PINNs on five benchmarks, performing on par on the remaining two. Our results highlight the potential of this scalable approach as a foundation for extending neural operators toward solving increasingly complex, nonlinear, and high-dimensional PDE problems. The code and model weights are publicly available at https://github.com/rbischof/hypino.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

HyPINO: Multi-Physics Neural Operators via HyperPINNs and the Method of Manufactured Solutions

1.2. Authors

  • Rafael Bischof (Computational Design Lab, ETH Zurich, Switzerland)

  • Michal Piovari (Computational Design Lab, ETH Zurich, Switzerland)

  • Michael A. Kraus (Institute of Structural Mechanics and Design, TU Darmstadt, Germany)

  • Siddhartha Mishra (Seminar for Applied Mathematics, ETH Zurich, Switzerland)

  • Bernd Bickel (Computational Design Lab, ETH Zurich, Switzerland)

    The corresponding author is Rafael Bischof (rabischof@ethz.ch). The authors are affiliated with well-known academic institutions, primarily ETH Zurich, a prestigious research university in Switzerland, and TU Darmstadt in Germany, indicating a strong academic and research-oriented background in computational science, applied mathematics, and machine learning.

1.3. Journal/Conference

The paper is published on arXiv, a preprint server, with the version v4v4. While arXiv itself is not a peer-reviewed journal or conference, it is a widely recognized platform for disseminating early research findings in various scientific fields, including machine learning and computational physics. The abstract mentions "Published at (UTC): 2025-09-05T13:59:25.000Z", indicating a future publication date, which is typical for preprints that are undergoing peer review or have been accepted for a future publication.

1.4. Publication Year

The indicated publication date on arXiv is 2025-09-05, suggesting it is a very recent or upcoming work.

1.5. Abstract

This paper introduces HyPINO, a multi-physics neural operator that leverages a Swin Transformer-based hypernetwork with a unique mixed supervision strategy. The model is designed for zero-shot generalization across diverse Partial Differential Equations (PDEs), including linear elliptic, hyperbolic, and parabolic types in two dimensions, with varying source terms, geometries, and mixed boundary conditions. The training data combines labeled analytical solutions generated by the Method of Manufactured Solutions (MMS) and unlabeled samples optimized using physics-informed objectives. HyPINO consistently outperforms existing baselines like U-Nets, Poseidon, and PINO on seven benchmark problems. A novel iterative refinement procedure is also proposed, which progressively reduces error by generating corrective PINNs based on residual "delta PDEs," achieving significant L2L_2 loss reductions. Furthermore, PINNs initialized by HyPINO demonstrate faster convergence and lower final errors during fine-tuning compared to randomly initialized or Reptile-meta-learned PINNs. The authors suggest that this scalable approach holds promise for tackling more complex, nonlinear, and high-dimensional PDE problems, aiming to serve as a foundation for advanced neural operators.

Official Source: https://arxiv.org/abs/2509.05117v4 PDF Link: https://arxiv.org/pdf/2509.05117v4.pdf Publication Status: Preprint on arXiv, with a future publication date (September 5, 2025).

2. Executive Summary

2.1. Background & Motivation

The core problem the paper addresses is the limitation of existing neural operators in solving Partial Differential Equations (PDEs), particularly their sample inefficiency and tendency to generalize only within narrowly defined problem families. Current neural operators often require large amounts of labeled data, typically generated by expensive high-fidelity solvers, and struggle to handle simultaneous variations in PDE operators, geometries, and boundary conditions without extensive fine-tuning. While physics-informed losses offer a way to reduce reliance on labeled data by providing self-supervision, purely physics-based training can be unstable and suffer from spectral bias or mode collapse. This creates a significant bottleneck in developing general-purpose, foundational, and multi-physics simulators capable of handling a broad range of scientific computing tasks.

The problem is important because PDEs are fundamental to modeling phenomena in nearly all scientific and engineering disciplines. Efficient and generalized PDE solvers are crucial for accelerating scientific discovery, engineering design, and simulation. The current gaps in research include the inability of neural operators to generalize widely across diverse PDE types and configurations (e.g., varying operators, source terms, geometries, and boundary conditions simultaneously) and the heavy data requirements for training them.

The paper's entry point or innovative idea is to combine physics-informed learning with a scalable synthetic data pipeline generated via the Method of Manufactured Solutions (MMS). This hybrid approach, coupled with a Swin Transformer-based hypernetwork, aims to achieve zero-shot generalization across a broad class of PDEs and provide robust PINN initializations without extensive task-specific fine-tuning.

2.2. Main Contributions / Findings

The paper makes several primary contributions:

  1. A hybrid physics-informed and supervised learning framework for multi-physics PDE solving: The HyPINO model integrates a Swin Transformer hypernetwork to generate PINN weights, allowing it to adapt to various PDE configurations. It uniquely combines physics-informed losses (for unlabeled data) with supervised losses (for MMS-generated analytical solutions), enabling broad generalization.

  2. A scalable data generation pipeline combining random physics sampling with MMS-based supervised examples: This pipeline efficiently creates a diverse dataset of PDE instances, encompassing linear elliptic, hyperbolic, and parabolic equations in 2D, with varying source terms, geometries (including interior boundaries), and mixed Dirichlet/Neumann boundary conditions. This addresses the data bottleneck faced by many neural operator approaches.

  3. An ensemble-based refinement mechanism for improved accuracy: An iterative refinement procedure is introduced where the residual error of a generated PINN is treated as a "delta PDE," which then prompts the hypernetwork to generate a corrective PINN. Summing these contributions forms an ensemble that progressively reduces prediction error, offering a lightweight alternative to traditional fine-tuning while retaining forward-only inference.

    The key conclusions and findings are:

  • Strong zero-shot accuracy: HyPINO achieves superior zero-shot generalization performance on seven diverse PDE benchmarks, outperforming baselines like U-Nets, Poseidon, and PINO.
  • Significant error reduction through iterative refinement: The proposed refinement procedure substantially reduces L2 loss (over 100x in the best case) on most benchmarks, demonstrating its effectiveness in improving prediction accuracy. This mechanism is also shown to be generic and applicable to other physics-informed neural operators.
  • Efficient PINN initialization for fine-tuning: PINNs initialized with HyPINO weights converge faster and to lower final errors during subsequent fine-tuning compared to randomly initialized or Reptile-meta-learned PINNs. This highlights HyPINO's utility as a robust initialization strategy. These findings collectively demonstrate that HyPINO provides a scalable and data-efficient approach towards developing more general-purpose neural operators for multi-physics problems and potentially serves as a foundation for "world-model predictors."

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

Partial Differential Equations (PDEs)

Partial Differential Equations (PDEs) are mathematical equations that involve an unknown function of several independent variables and its partial derivatives with respect to those variables. They are used to model a wide range of physical phenomena, such as heat conduction, wave propagation, fluid flow, and electromagnetism.

  • Example: The heat equation ut=α2ux2\frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2} describes how temperature uu changes over time tt and space xx.
  • Types:
    • Elliptic PDEs: Describe steady-state phenomena (e.g., equilibrium, time-independent problems). The Poisson equation (Δu=f\Delta u = f) and Laplace equation (Δu=0\Delta u = 0) are common examples, often found in electrostatics or steady-state heat distribution.
    • Parabolic PDEs: Describe time-dependent diffusion processes. The heat equation is a prime example.
    • Hyperbolic PDEs: Describe time-dependent wave propagation phenomena. The wave equation is a classic example.
  • Boundary Conditions (BCs): Conditions specified at the boundaries of the domain, essential for uniquely determining the solution of a PDE.
    • Dirichlet Boundary Conditions: Specify the value of the unknown function directly on the boundary (e.g., fixed temperature at a wall). Denoted as u(x)=g(x)u(\mathbf{x}) = g(\mathbf{x}) on ΩD\partial \Omega_D.
    • Neumann Boundary Conditions: Specify the value of the normal derivative of the unknown function on the boundary (e.g., fixed heat flux across a surface). Denoted as un(x)=h(x)\frac{\partial u}{\partial n}(\mathbf{x}) = h(\mathbf{x}) on ΩN\partial \Omega_N, where n\mathbf{n} is the outward normal vector.

Neural Operators

Neural operators are a class of neural networks designed to learn mappings between infinite-dimensional function spaces. Unlike traditional neural networks that learn mappings between finite-dimensional Euclidean spaces, neural operators learn operators that map an input function (e.g., PDE parameters or initial conditions) to an output function (e.g., the PDE solution). This allows them to generalize to unseen discretizations of the same underlying PDE and to problems with varying resolutions.

  • Key advantages: Zero-shot generalization (generalize to new inputs without retraining), fast inference, and full differentiability.

Physics-Informed Neural Networks (PINNs)

Physics-Informed Neural Networks (PINNs) are neural networks that embed the governing physical laws (expressed as PDEs) directly into their loss function. Instead of relying solely on labeled data, PINNs are trained to minimize two types of losses:

  1. Data loss: Measures the discrepancy between the PINN's predictions and any available labeled data.
  2. Physics-informed loss (residual loss): Measures how well the PINN's output satisfies the PDE at a set of collocation points. This loss is computed by differentiating the PINN's output with respect to its inputs using automatic differentiation. By minimizing both losses, PINNs can learn solutions that are consistent with both observed data and the underlying physical laws, even with sparse or no labeled data.

Hypernetworks

A hypernetwork is a neural network that generates the weights (or parameters) of another neural network, called the target network. In the context of PDEs, a hypernetwork can take PDE parameters (like coefficients, boundary conditions, or source terms) as input and output the weights of a PINN that is specialized to solve that particular PDE instance. This allows a single hypernetwork to implicitly learn a family of PINNs, each tailored to a specific PDE configuration.

Method of Manufactured Solutions (MMS)

The Method of Manufactured Solutions (MMS) is a technique used to verify the correctness and accuracy of PDE solvers (both numerical and, more recently, neural). Instead of trying to find the solution to a given PDE, MMS starts by choosing an arbitrary, sufficiently smooth analytical function u(x)u(\mathbf{x}) (the "manufactured solution"). Then, this chosen function is substituted into the PDE operator to analytically derive the corresponding source term f(x)f(\mathbf{x}) and boundary conditions g(x),h(x)g(\mathbf{x}), h(\mathbf{x}) that would yield u(x)u(\mathbf{x}) as the exact solution. This process generates an exact PDE problem-solution pair, which can then be used as ground truth for training or evaluating a solver.

Swin Transformer

The Swin Transformer is a type of vision transformer architecture that achieves hierarchical representation by using shifted windows. Unlike standard transformers that compute self-attention globally, Swin Transformers compute self-attention within local windows, which makes them more computationally efficient. To allow for cross-window connections and hierarchical feature learning, the windows are shifted between successive layers. In HyPINO, a Swin Transformer is used as the encoder part of the hypernetwork to process grid-based PDE inputs, benefiting from its ability to capture both local and global dependencies efficiently.

Fourier Feature Mapping

Fourier feature mapping (or positional encoding) is a technique used to transform input coordinates into a higher-dimensional space using sinusoidal functions. For an input xx, it maps to [sin(2πBx),cos(2πBx)][\sin(2\pi B x), \cos(2\pi B x)] where BB is a matrix of frequency bands. This transformation helps neural networks, especially PINNs and MLPs, to better learn high-frequency functions and mitigate spectral bias (the tendency of neural networks to learn low-frequency components before high-frequency ones). It is particularly effective for modeling complex, oscillating PDE solutions.

Optimization Concepts

  • AdamW optimizer: An optimization algorithm that is a variant of Adam and incorporates weight decay (L2 regularization) directly into the optimization step, which often leads to better generalization performance.
  • Huber function: A loss function used in robust regression that is less sensitive to outliers than the squared error loss. It is quadratic for small errors and linear for large errors, providing a balance between L2 and L1 losses.
  • Cosine learning rate schedule: A common learning rate scheduling strategy where the learning rate decreases following a cosine curve. It starts high, gradually decreases to a minimum, and can optionally increase again, allowing for stable training initially and fine-tuning later.

3.2. Previous Works

The paper builds upon and differentiates itself from several lines of prior research:

  1. Neural Operators (NOs):

    • General Operator Learning: Works by [15, 27, 28, 29, 31, 34] established neural operators as a paradigm for learning solution mappings for PDEs. These methods offer fast, mesh-free inference and generalization.
    • Foundation Models for PDEs: More recent efforts [17, 19, 35, 43, 52] aim to create large NOs (like Poseidon) that can ingest vast corpora of simulation data or equation specifications for broad cross-task transfer.
    • Limitations addressed by HyPINO: The paper points out that most existing NOs are sample inefficient [19] and target narrow PDE families (e.g., fixed equations with varying coefficients [6], boundary conditions [10], or domain shapes [55]). HyPINO addresses this by supporting concurrent variation of multiple operators, geometries, and boundary types.
  2. Physics-Informed Neural Operators (PINOs):

    • PINNs: The original concept of embedding governing equations into the loss function for PINNs [40, 47] provides self-supervision.
    • PINOs: PINO [29] (and related works [3, 12]) extended PINN principles to neural operator architectures by integrating residual losses to train from unlabeled residual samples.
    • Limitations addressed by HyPINO: The paper notes that these physics-informed approaches still require careful weighting of supervision terms, often struggle with stability, and can suffer from spectral bias for complex PDEs. HyPINO improves stability and generalization by combining physics-informed losses with supervised MMS data and leveraging Fourier feature mappings in its PINN target.
  3. Hypernetworks in PDE Solving (HyperPINNs):

    • HyperPINNs [8, 24] predict PINN weights for varying coefficients, extending the hypernetwork concept [13] to PDEs.
    • Subsequent works further extended this idea to boundary conditions, domain changes, and low-rank weight modulation [5, 10, 14, 36].
    • Limitations addressed by HyPINO: HyPINO moves beyond these by supporting simultaneous variations of multiple operators, geometries, and boundary types without task-specific fine-tuning, which existing models rarely achieve. It also uses a Swin Transformer as its hypernetwork encoder for improved feature extraction from grid-based PDE inputs.
  4. Method of Manufactured Solutions (MMS):

    • MMS [38] has long been used for numerical solver verification and more recently for PINN evaluation [23] and operator training [18].
    • Innovation in HyPINO: While MMS has been used, prior studies often focus on single equations (e.g., Poisson). HyPINO significantly expands its utility by leveraging MMS for multi-physics operator pre-training, creating a diverse and scalable synthetic dataset for broad PDE families.

Technological Evolution

The field of PDE solving has evolved from traditional mesh-based numerical methods (like Finite Element Method, Finite Difference Method) to data-driven approaches. Initially, neural networks were used as function approximators for PDE solutions. The introduction of PINNs allowed neural networks to incorporate physical laws directly, reducing reliance on labeled data. Concurrently, neural operators emerged to learn the solution operator itself, enabling generalization across an entire family of PDEs rather than just single instances. Within neural operators, approaches moved from MLP-based models to Fourier Neural Operators (FNOs) and DeepONets, which are better suited for function-to-function mappings.

More recently, the concept of hypernetworks (where one neural network generates the parameters for another) has been applied to PINNs, giving rise to HyperPINNs. These allow for parameterization of specific PDE aspects (e.g., coefficients). Simultaneously, the trend towards foundation models in AI has inspired researchers to build large, pre-trained neural operators (like Poseidon) capable of broader PDE generalization.

HyPINO fits within this timeline by integrating several cutting-edge concepts:

  • It combines the operator learning paradigm with physics-informed losses (like PINO).
  • It uses a sophisticated hypernetwork (like HyperPINN) for PINN weight generation.
  • Crucially, it employs a Swin Transformer as its hypernetwork encoder, bringing advanced vision transformer capabilities to PDE input encoding.
  • It introduces a novel, scalable synthetic data generation strategy using MMS and random sampling to overcome data limitations and enable multi-physics generalization.
  • It adds an iterative refinement procedure as a lightweight ensemble method, further pushing the boundaries of accuracy.

3.3. Differentiation Analysis

Compared to the main methods in related work, HyPINO offers several core differences and innovations:

  1. Scope of Generalization (Multi-Physics vs. Narrow Families):

    • Previous NOs/HyperPINNs: Typically focus on narrow PDE families where variations are limited to singular aspects (e.g., diffusion coefficients, specific boundary conditions, or domain shapes). Poseidon, while a large foundation model, primarily focuses on PDEs with varying source terms or coefficients, but less on diverse operator types and geometries simultaneously.
    • HyPINO's Innovation: Achieves zero-shot generalization across a broad class of PDEs (linear elliptic, hyperbolic, and parabolic) with simultaneous variations in PDE operators, source terms, geometries (including interior boundaries), and mixed Dirichlet/Neumann boundary conditions. This is a significant leap in flexibility and generality.
  2. Data Strategy (Hybrid Supervision & Scalable Synthetic Data):

    • Previous NOs: Mostly rely on large amounts of labeled simulation data from high-fidelity solvers, which is expensive and time-consuming to generate.
    • PINOs: Use physics-informed losses to alleviate data needs but can suffer from stability issues and spectral bias, making purely physics-based training challenging for complex PDEs.
    • HyPINO's Innovation: Employs a novel mixed supervision strategy:
      • Supervised data from MMS: Provides direct analytical ground truth, ensuring accuracy and overcoming the cost of traditional high-fidelity solvers.
      • Unsupervised physics-informed data: Generated by randomly sampling PDE operators, source terms, and boundary conditions, exposing the model to diverse and complex scenarios (e.g., interior boundaries) where MMS might be too restrictive or costly to apply. This hybrid approach mitigates the data bottleneck and addresses PINO's stability issues.
  3. Architecture (Swin Transformer Hypernetwork):

    • Previous HyperPINNs: While HyperPINNs exist, they often use simpler MLP-based hypernetworks or U-Net-like structures.
    • HyPINO's Innovation: Uses a Swin Transformer-based hypernetwork. The Swin Transformer is well-suited for processing grid-based PDE inputs due to its hierarchical feature learning and shifted window mechanism, allowing it to efficiently capture both local and global dependencies in the PDE parameterization. This choice contributes to better feature extraction and, consequently, more accurate PINN weight generation.
  4. Refinement Mechanism (Iterative Ensemble):

    • Previous Methods: Typically produce a single solution or rely on simple ensembles of independently trained models. Fine-tuning usually involves re-training parts of the model with backward passes.
    • HyPINO's Innovation: Introduces an iterative refinement procedure that generates an ensemble of corrective PINNs. By treating the residual error as a "delta PDE" and feeding it back into the hypernetwork, it produces progressive improvements. This is a lightweight alternative to traditional fine-tuning as it only requires forward-only inference for subsequent delta PINNs, making it efficient at test time.
  5. Robust Initialization for Fine-tuning:

    • Previous Methods: PINNs often rely on random initialization or basic meta-learning (like Reptile) which can lead to slower convergence or higher final errors during fine-tuning.

    • HyPINO's Innovation: The HyPINO-generated PINN parameters serve as an excellent prior, leading to faster convergence and lower final errors during task-specific fine-tuning, surpassing both random and Reptile-initialized models.

      In essence, HyPINO differentiates itself by achieving a higher degree of multi-physics zero-shot generalization through a clever synthesis of advanced hypernetwork architecture, a robust hybrid physics-informed and MMS-driven synthetic data generation strategy, and an efficient iterative refinement mechanism.

4. Methodology

4.1. Principles

The core idea of HyPINO is to learn a solution operator that maps the characteristics of a Partial Differential Equation (PDE) to the parameters (weights and biases) of a Physics-Informed Neural Network (PINN) that can solve that specific PDE instance. This is achieved using a hypernetwork architecture. The intuition is that instead of training a separate PINN for each PDE, a single hypernetwork can learn the underlying patterns in PDE formulations and efficiently generate tailored PINNs on demand. This enables zero-shot generalization, meaning the model can solve new PDEs (within the trained distribution) without needing specific fine-tuning.

To make this possible across a broad range of PDEs, HyPINO employs two key principles:

  1. Flexible PDE Parameterization: A standardized, rich input representation captures the PDE operator, source term, domain geometry, and boundary conditions, allowing the hypernetwork to "understand" diverse PDE problems.

  2. Hybrid Training Data: A synthetic dataset is created using a mix of Method of Manufactured Solutions (MMS) (for ground-truth labeled data) and purely physics-informed objectives (for unlabeled data). This provides both accuracy through supervision and broad coverage of complex scenarios through self-supervision, overcoming the data inefficiency of traditional neural operators.

    Additionally, the paper introduces an iterative refinement procedure based on residual errors. This principle allows the model to iteratively correct its own predictions, effectively building an ensemble of PINNs at inference time to progressively reduce error without requiring expensive backpropagation for fine-tuning.

The overall objective is to learn the solution operator that maps the tuple (L,f,g,h)(\mathcal{L}, f, g, h) to the solution uu, where L\mathcal{L} is the linear differential operator, ff is the source term, gg represents Dirichlet boundary conditions, and hh represents Neumann boundary conditions. The hypernetwork Φ\Phi realizes this mapping: $ \left( \mathbf { c } , \ F , \ M _ { g } , \ M _ { h } , \ V _ { g } , \ V _ { h } \right) \ \longmapsto \ \theta ^ { \star } \quad { \mathrm { s u c h ~ t h a t } } \quad u _ { \theta ^ { \star } } \approx u , $ where c\mathbf{c} denotes the vector of PDE coefficients, FF the discretized source function, MgM_g and MhM_h the Dirichlet and Neumann boundary condition location grids, VgV_g and VhV_h the Dirichlet and Neumann boundary condition value grids, and uu the reference solution.

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. PDE Parameterization

To represent a diverse set of linear PDEs for the neural network, HyPINO uses a flexible and efficient parameterization. The PDE is defined over a bounded domain ΩR2\Omega \subset \mathbb{R}^2 with boundary Ω=ΩDΩN\partial \Omega = \partial \Omega_D \cup \partial \Omega_N. The goal is to find a function u:ΩRmu: \Omega \to \mathbb{R}^m satisfying: $ \mathcal { L } [ u ] ( { \mathbf x } ) = f ( { \mathbf x } ) \quad \mathrm { i n ~ } \Omega , \quad u ( { \mathbf x } ) = g ( { \mathbf x } ) \quad \mathrm { o n ~ } \partial \Omega _ { D } , \quad \frac { \partial u } { \partial n } ( { \mathbf x } ) = h ( { \mathbf x } ) \quad \mathrm { o n ~ } \partial \Omega _ { N } , $ where L\mathcal{L} is a linear differential operator up to second order, ff is the source term, and g, h are boundary functions.

This PDE instance is parameterized as follows:

  1. Source Term (ff): The function ff is discretized on a uniform grid over Ω\Omega, resulting in a 2D array FF representing its values at grid points.
  2. Boundary Conditions (g, h): Boundary conditions are parameterized using two 2D grids for each boundary type (Dirichlet and Neumann):
    • A binary mask MM: Indicates the presence of the boundary at each grid point (1 for points closest to the boundary, 0 elsewhere). So, MgM_g for Dirichlet and MhM_h for Neumann.
    • A value grid VV: Stores the corresponding boundary values (gg for Dirichlet or hh for Neumann) at the marked cells, with zeros elsewhere. So, VgV_g for Dirichlet and VhV_h for Neumann.
  3. Differential Operator (L\mathcal{L}): The operator L\mathcal{L} is parameterized by its coefficients as a vector c=(c1,c2,c3,c4,c5)R5\mathbf{c} = (c_1, c_2, c_3, c_4, c_5) \in \mathbb{R}^5, following [21]. $ \mathcal { L } [ u ] ( \mathbf { \bar { x } } ) = c _ { 1 } u + c _ { 2 } u _ { x } + c _ { 3 } u _ { y } + c _ { 4 } u _ { x x } + c _ { 5 } u _ { y y } $ Here, ux=uxu_x = \frac{\partial u}{\partial x}, uy=uyu_y = \frac{\partial u}{\partial y}, uxx=2ux2u_{xx} = \frac{\partial^2 u}{\partial x^2}, uyy=2uy2u_{yy} = \frac{\partial^2 u}{\partial y^2}. The coefficients cic_i determine the specific type of linear PDE.

The combined input to the hypernetwork is the tuple (c,F,Mg,Mh,Vg,Vh)(\mathbf{c}, F, M_g, M_h, V_g, V_h).

4.2.2. Neural Operator Architecture

The HyPINO model is based on a hypernetwork that maps the PDE parameterization to the weights θ\theta^\star of a target PINN.

(A) Input Embeddings

  1. Coefficient Embedding: The vector of operator coefficients cR5\mathbf{c} \in \mathbb{R}^5 is first embedded into a fixed-length representation zCRdCz_C \in \mathbb{R}^{d_C}. This embedding uses a Fourier feature encoder followed by a fully connected layer. The Fourier feature mapping helps in avoiding spectral bias and mode collapse, especially in physics-informed settings [44, 46].
  2. Grid Embeddings: Each grid-valued input (FF, MgM_g, MhM_h, VgV_g, VhV_h) is processed individually:
    • It is passed through a Fourier feature mapping layer, which augments the input with sinusoidal encodings using five exponentially increasing frequency bands (0.12i,i{0,1,2,3,4}0.1 \cdot 2^i, i \in \{0, 1, 2, 3, 4\}). This enhances the network's ability to represent high-frequency content.

    • This is followed by two convolutional layers with a kernel size of 3 and strides of 2.

    • For the boundary location grids MgM_g and MhM_h, this process yields embeddings zD1,zD2z_D^1, z_D^2 for Dirichlet boundaries and zN1,zN2z_N^1, z_N^2 for Neumann boundaries.

    • For the boundary value grids VgV_g and VhV_h, it yields zgz_g (Dirichlet values) and zhz_h (Neumann values).

    • The source term FF yields the embedding zfz_f.

      The final spatial embedding zGz_G is constructed by concatenating these processed embeddings: $ z _ { G } = \left[ z _ { D } ^ { 1 } \odot z _ { g } + z _ { D } ^ { 2 } \parallel z _ { N } ^ { 1 } \odot z _ { h } + z _ { N } ^ { 2 } \parallel z _ { f } \right] , $ where \odot denotes element-wise multiplication and [][\cdot \parallel \cdot \parallel \cdot] denotes concatenation along the channel dimension. This composition applies spatial masking to the boundary value embeddings using the boundary location masks, ensuring that information is injected only at semantically meaningful locations (i.e., where a boundary actually exists).

(B) Encoding

The grid embedding zGz_G is then processed by a sequence of KK Swin Transformer blocks, denoted {SWi}i=1K\{ \boldsymbol { S } \mathcal { W } _ { i } \} _ { i = 1 } ^ { K }. Each Swin Transformer block's output is interleaved with a FiLM (Feature-wise Linear Modulation) layer [39] conditioned on the coefficient embedding zCz_C. Let z(i)RHi×Wi×Ciz^{(i)} \in \mathbb{R}^{H_i \times W_i \times C_i} be the output of block SWiS\mathcal{W}_i, and z(0)=zGz^{(0)} = z_G. The modulation is defined as: $ z ^ { ( i + 1 ) } = \gamma _ { i } ( z _ { C } ) \odot S { \mathcal W } _ { i } ( z _ { G } ^ { ( i ) } ) + \beta _ { i } ( z _ { C } ) , $ where γi(zC)\gamma_i(z_C) and βi(zC)\beta_i(z_C) are scale and shift parameters, respectively, generated by small MLPs (Multi-Layer Perceptrons) that take the coefficient embedding zCz_C as input: $ \gamma _ { i } ( z ) , \beta _ { i } ( z ) : \mathbb { R } ^ { d _ { C } } \rightarrow \mathbb { R } ^ { C _ { i } } $ The \odot symbol represents channel-wise scaling broadcast across spatial dimensions. This design ensures that the latent grid features at each stage are adaptively modulated by the global PDE operator coefficients zCz_C. Following Swin Transformer U-Net architectures [4, 11], all intermediate latent representations {z(i)}i=1K\{z^{(i)}\}_{i=1}^K are retained.

(C) Pooling

To aggregate the spatial information from the Swin Transformer encoding into a compact latent representation suitable for generating PINN parameters, Multi-Head Attention Pooling [25, 54] is applied. For each output z(i)RHi×Wi×Ciz^{(i)} \in \mathbb{R}^{H_i \times W_i \times C_i} from the ii-th FiLM-modulated Swin block, its spatial dimensions (Hi,WiH_i, W_i) are flattened to create a sequence of tokens kviRHiWi×Cikv_i \in \mathbb{R}^{H_i W_i \times C_i}. These tokens serve as both keys and values in the attention mechanism.

For each layer i{1,,K}i \in \{1, \ldots, K\}, a set of TT trainable query vectors qiRT×Ciq_i \in \mathbb{R}^{T \times C_i} is defined, where TT corresponds to the total number of weight and bias tensors in the target PINN. The pooled representation pip_i is then computed via multi-head attention: $ p _ { i } = \mathrm { M u l t i H e a d A t t e n t i o n } _ { i } ( q _ { i } , k v _ { i } , k v _ { i } ) , \quad p _ { i } \in \mathbb { R } ^ { T \times C _ { i } } . $ The MultiHeadAttention function calculates attention scores between queries and keys, then uses these scores to weigh the values. Finally, the pooled outputs {pi}i=1K\{p_i\}_{i=1}^K from all Swin blocks are concatenated along the channel dimension to form a unified latent matrix pp: $ p = \left[ p _ { 1 } \parallel p _ { 2 } \parallel \cdots \parallel p _ { K } \right] \in \mathbb { R } ^ { T \times \left( \sum _ { i = 1 } ^ { K } C _ { i } \right) } . $ This matrix pp contains one latent vector per target weight or bias tensor. Each row of pp is then fed into a dedicated MLP that projects it to the appropriate shape and dimensionality required by the corresponding weight matrix or bias vector of the PINN.

(D) Target PINN Architecture

The target PINN is an MLP with Fourier feature mapping [44] for its input and multiplicative skip connections [45] within its hidden layers.

  1. Input Encoding: Given a spatial input xR2\mathbf{x} \in \mathbb{R}^2, a non-trainable Fourier feature mapping encodes it as: $ \xi ( \mathbf { x } ) = \left[ \sin \left( 2 \pi \mathbf { B } \mathbf { x } \right) , \cos \left( 2 \pi \mathbf { B } \mathbf { x } \right) , \mathbf { x } \right] \in \mathbb { R } ^ { 2 N + 2 } , $ where BRN×2\mathbf{B} \in \mathbb{R}^{N \times 2} is a matrix containing exponentially spaced frequency bands. This encoding helps the PINN represent high-frequency components of the solution and mitigate spectral bias.
  2. Network Structure with Skip Connections: The encoded input ξ(x)\xi(\mathbf{x}) is projected through three parallel transformations to form initial activations for the skip connections: $ z _ { 0 } = \operatorname { t a n h } ( W _ { \mathrm { i n } } \xi + b _ { 0 } ) , \quad z _ { u } = \operatorname { t a n h } ( U \xi + b _ { u } ) , \quad z _ { v } = \operatorname { t a n h } ( V \xi + b _ { v } ) , $ where Win,U,VRd×(2N+2)W_{\mathrm{in}}, U, V \in \mathbb{R}^{d \times (2N+2)} are weight matrices and b0,bu,bvRdb_0, b_u, b_v \in \mathbb{R}^d are bias vectors, with dd being the width of the latent layers. The subsequent hidden layers incorporate multiplicative skip connections: $ z _ { i + 1 } = z _ { u } \odot \operatorname { t a n h } ( W _ { i } z _ { i } + b _ { i } ) + z _ { v } \odot ( 1 - \operatorname { t an h } ( W _ { i } z _ { i } + b _ { i } ) ) , \quad i = 0 , \dots , T - 2 , $ where WiRd×dW_i \in \mathbb{R}^{d \times d} and biRdb_i \in \mathbb{R}^d. The tanh activation function is used due to its bounded output range, which provides stability during hypernetwork training. These skip connections enhance gradient propagation and enable dynamic depth modulation by allowing the hypernetwork to effectively mask some layers by generating appropriate weights.
  3. Output Layer: The final prediction uθ(x)u_\theta(\mathbf{x}) is obtained via a linear transformation of the last hidden layer's output: $ u _ { \theta } ( \mathbf { x } ) = W _ { \mathrm { o u t } } z _ { T - 1 } + b _ { \mathrm { o u t } } , \quad W _ { \mathrm { o u t } } \in \mathbb { R } ^ { 1 \times d } , \ b _ { \mathrm { o u t } } \in \mathbb { R } . $ The hypernetwork therefore generates the complete set of parameters θ\theta^\star for this target PINN: $ \left{ W _ { 0 } , U , V , b _ { 0 } , b _ { u } , b _ { v } \right} , \quad \left{ W _ { i } , b _ { i } \right} _ { i = 1 } ^ { T - 2 } , \quad W _ { \mathrm { o u t } } , b _ { \mathrm { o u t } } , $ where TT is the number of layers in the PINN.

4.2.3. Data Sampling

A synthetic dataset of PDE instances is created by randomly drawing the differential operator L\mathcal{L}, domain Ω\Omega, boundary data, source term ff, and, when available, a reference solution uu. The dataset is a mix of two classes: supervised and unsupervised samples.

(A) Class I: Supervised PDEs

For these samples, an analytical solution uu is chosen first using MMS. Then:

  • The source term ff is computed by applying L\mathcal{L} to uu: f=L[u]f = \mathcal{L}[u].
  • The boundary conditions g(x)=u(x)g(\mathbf{x}) = u(\mathbf{x}) and/or h(x)=un(x)h(\mathbf{x}) = \frac{\partial u}{\partial n}(\mathbf{x}) are derived by evaluating u(x)u(\mathbf{x}) and its normal derivative on Ω\partial \Omega. These samples provide the analytical solution u(x)u(\mathbf{x}) and its derivatives, which can be used for additional supervised losses during training, alongside the physics-informed loss.

(B) Class II: Unsupervised PDEs

For these samples, the analytical solution uu is unknown.

  • The differential operator L\mathcal{L} and domain Ω\Omega are sampled.
  • The source term ff is set to a spatially constant random value, f(x)=N(0,102)f(\mathbf{x}) = \mathcal{N}(0, 10^2).
  • Boundary conditions are sampled subject to constraints to maximize the probability of well-posedness. These samples rely solely on the physics-informed loss as reference solutions are unavailable. They are crucial for exposing the model to complexities like interior boundaries, inclusions, and discontinuities.

(C) Sampling Differential Operators

The set of all possible terms in the linear differential operators is B={u,ux,uy,uxx,uyy}B = \{u, u_x, u_y, u_{xx}, u_{yy}\}.

  1. The number of terms nn is sampled from a uniform discrete distribution: nUniform({1,2,3})n \sim \mathrm{Uniform}(\{1, 2, 3\}).
  2. Then, nn terms are randomly selected from BB without replacement.
  3. Each selected term TiT_i is assigned a coefficient ciUniform([2,2])c_i \sim \mathrm{Uniform}([-2, 2]).
  4. The differential operator is defined as L[u]=i=1nciTi[u]\mathcal{L}[u] = \sum_{i=1}^n c_i T_i[u].

(D) Sampling Analytical Solutions via MMS

Algorithm 1 outlines the procedure for generating random, differentiable functions that serve as analytical solutions for MMS:

Algorithm1:Samplingprocedureforrandom,differentiablefunctionsthatcanbeusedasanalyticalsolutionswithMMS.1.Initializeu(x,y)02.SamplenUniform(6,7,...,10)3.fori=1tondo4.Samplea0,Uniform([10,10])5.Sampleb0,Uniform([10,10])6.Samplec,d,eUniform([2π,2π])7.Randomlyselectψ(x)x,sin,cos,tanh,(1+e(x))(1),(1+x2)(1)8.Computeterm=dψ(ax+by+c)+e9.Randomlychoosecombinationrule:10.ifaddthenu(x,y)u(x,y)+term11.elseifmultiplythenu(x,y)u(x,y)term12.elseifcomposethenu(x,y)dψ(au(x,y)+c)+e13.end14.returnu(x,y) Algorithm 1: Sampling procedure for random, differentiable functions that can be used as analytical solutions with MMS. ------------------------------------------------------------------------------------- 1. Initialize u(x, y) ← 0 2. Sample n ∼ Uniform({6, 7, ..., 10}) 3. for i = 1 to n do 4. Sample a ∼ {0, Uniform([-10, 10])} 5. Sample b ∼ {0, Uniform([-10, 10])} 6. Sample c, d, e ∼ Uniform([-2π, 2π]) 7. Randomly select ψ(x) ∈ {x, sin, cos, tanh, (1 + e^(-x))^(-1), (1 + x^2)^(-1)} 8. Compute term = d ⋅ ψ(a x + b y + c) + e 9. Randomly choose combination rule: 10. if add then u(x, y) ← u(x, y) + term 11. else if multiply then u(x, y) ← u(x, y) ⋅ term 12. else if compose then u(x, y) ← d ⋅ ψ(a ⋅ u(x, y) + c) + e 13. end 14. return u(x, y) -------------------------------------------------------------------------------------

  • The initial solution u(x, y) is set to 0.
  • The number of iterative updates nn is drawn from a uniform distribution between 6 and 10.
  • In each update, a nonlinear function ψ\psi is chosen from a predefined library.
  • Coefficients a, b, c, d, e are sampled from specified uniform distributions.
  • The new term dψ(ax+by+c)+ed \cdot \psi(ax + by + c) + e is then incorporated into the current solution u(x, y) using one of three randomly chosen rules: addition, multiplication, or composition (dψ(au(x,y)+c)+ed \cdot \psi(a \cdot u(x, y) + c) + e).

(E) Sampling Physical Domains

Domains Ω[1,1]2\Omega \subset [-1, 1]^2 are generated using randomized Constructive Solid Geometry (CSG) [32].

  • The outer boundary Ωouter\partial \Omega_{\mathrm{outer}} is initially the unit square [1,1]2[-1, 1]^2.
  • Inner boundaries Ωinner,i\partial \Omega_{\mathrm{inner},i} are formed by subtracting randomly sampled geometric primitives (e.g., disks, polygons, rectangles) from the outer region using CSG operations.
  • In time-dependent PDEs, y=1y=-1 can represent the initial time.

(F) Sampling Boundary Conditions

Boundary conditions (Dirichlet and Neumann) are imposed on Ω\partial \Omega.

  • Outer Boundary Ωouter\partial \Omega_{\mathrm{outer}}: The type of PDE (elliptic, parabolic, hyperbolic) determines the initial boundary conditions for well-posedness.
    • Elliptic: Dirichlet conditions on Ωouter\partial \Omega_{\mathrm{outer}}.
    • Parabolic: Initial condition at y=1y=-1, and Dirichlet conditions on spatial boundaries Ωouter{y=1}\partial \Omega_{\mathrm{outer}} \setminus \{y=1\}.
    • Hyperbolic: Initial condition at y=1y=-1, Dirichlet conditions on spatial boundaries, and Neumann conditions on the initial time boundary Ωouter{y=1}\partial \Omega_{\mathrm{outer}} \cap \{y=-1\}.
  • Inner Boundaries Ωinner,i\partial \Omega_{\mathrm{inner},i}: Each inner boundary component is independently assigned either a Dirichlet or Neumann condition, or both.
  • Value Assignment:
    • Class I (Supervised): g(x)=u(x)g(\mathbf{x}) = u(\mathbf{x}) and h(x)=un(x)h(\mathbf{x}) = \frac{\partial u}{\partial n}(\mathbf{x}) are directly computed from the known analytical solution uu.
    • Class II (Unsupervised): Source term f(x)f(\mathbf{x}) is set to zero on the boundary. Boundary values are sampled to be consistent with L[u]\mathcal{L}[u]. For example, if uu appears alone in L[u]\mathcal{L}[u], u(x)=0u(\mathbf{x})=0 on Ω\partial \Omega. If first-order terms (ux,uyu_x, u_y) appear alone, constant Dirichlet values are used. Otherwise, linear profiles are allowed. This helps to maximize the probability of well-posedness despite the unknown ground truth.

4.2.4. Objective Function

For each PDE instance, the HyPINO model outputs the weights θ\theta^\star for a target PINN uθu_{\theta^\star}. The total loss function is a weighted sum of several terms:

  1. Residual Loss (LR\mathcal{L}_R): Measures how well the PINN's prediction satisfies the PDE within the domain Ω\Omega. $ \mathcal { I } _ { \mathrm { R } } = \frac { 1 } { | \Omega | } \sum _ { \mathbf { x } \in \Omega } \rho \left( \mathcal { L } [ u _ { \theta ^ { \star } } ] ( \mathbf { x } ) - f ( \mathbf { x } ) \right) $ Here, Ω|\Omega| is the area of the domain (or number of collocation points), L[uθ](x)\mathcal{L}[u_{\theta^\star}](\mathbf{x}) is the result of applying the differential operator to the PINN's prediction, f(x)f(\mathbf{x}) is the source term, and ρ()\rho(\cdot) is the Huber function [20]. The Huber function is used to make the loss less sensitive to outliers.
  2. Dirichlet Boundary Loss (LD\mathcal{L}_D): Penalizes deviations from the specified Dirichlet boundary conditions on ΩD\partial \Omega_D. $ \mathcal { I } _ { \mathrm { D } } = \frac { 1 } { | \partial \Omega _ { D } | } \sum _ { \mathbf { x } \in \partial \Omega _ { D } } \rho \left( u _ { \theta ^ { \star } } ( \mathbf { x } ) - g ( \mathbf { x } ) \right) $ Here, ΩD|\partial \Omega_D| is the length of the Dirichlet boundary (or number of collocation points), uθ(x)u_{\theta^\star}(\mathbf{x}) is the PINN's prediction on the boundary, and g(x)g(\mathbf{x}) is the prescribed Dirichlet value.
  3. Neumann Boundary Loss (LN\mathcal{L}_N): Penalizes deviations from the specified Neumann boundary conditions on ΩN\partial \Omega_N. $ \mathcal { I } _ { \mathrm { N } } = \frac { 1 } { | \partial \Omega _ { N } | } \sum _ { \mathbf { x } \in \partial \Omega _ { N } } \rho \left( \nabla u _ { \theta ^ { \star } } ( \mathbf { x } ) \cdot \mathbf { n } ( \mathbf { x } ) - h ( \mathbf { x } ) \right) . $ Here, ΩN|\partial \Omega_N| is the length of the Neumann boundary (or number of collocation points), uθ(x)n(x)\nabla u_{\theta^\star}(\mathbf{x}) \cdot \mathbf{n}(\mathbf{x}) is the normal derivative of the PINN's prediction on the boundary, and h(x)h(\mathbf{x}) is the prescribed Neumann value.
  4. Sobolev Loss (LS\mathcal{L}_S): Applied only for supervised samples where the analytical solution uu is known. It penalizes errors in function values, gradients, and second derivatives, providing stronger supervision. $ \mathcal { I } _ { \mathrm { S } } = \frac { 1 } { | \Omega | } \sum _ { \mathbf { x } \in \Omega } \sum _ { k = 0 } ^ { 2 } \lambda _ { \mathrm { S } } ^ { ( k ) } \rho \left( \nabla ^ { k } u _ { \theta ^ { \star } } ( \mathbf { x } ) - \nabla ^ { k } u ( \mathbf { x } ) \right) . $ Here, 0u=u\nabla^0 u = u, 1u=u\nabla^1 u = \nabla u, and 2u\nabla^2 u represents the second derivatives (e.g., Hessian). λS(k)\lambda_S^{(k)} are weighting coefficients for each order of derivative.

The total loss L\mathcal{L} is a weighted sum of these active terms: $ \mathcal { I } = \lambda _ { \mathrm { R } } \mathcal { I } _ { \mathrm { R } } + \lambda _ { \mathrm { D } } \mathcal { I } _ { \mathrm { D } } + \lambda _ { \mathrm { N } } \mathcal { I } _ { \mathrm { N } } + \mathcal { I } _ { \mathrm { S } } , $ where LR\mathcal{L}_R is always included. LD\mathcal{L}_D and LN\mathcal{L}_N are applied when collocation points fall on the respective boundaries. LS\mathcal{L}_S is active only for supervised samples.

4.2.5. Residual-Driven Iterative Refinement

HyPINO introduces an iterative refinement procedure to improve solution accuracy at inference time, similar to multi-stage neural networks [49]. This forms an ensemble of corrective PINNs.

Given a PDE instance (L,f,g,h)(\mathcal{L}, f, g, h):

  1. Initial Prediction: The hypernetwork Φ\Phi generates the weights for an initial PINN u(0)u^{(0)}: $ u ^ { ( 0 ) } : = u _ { \Phi ( L , f , g , h ) } $
  2. Compute Residuals: The residual errors of this initial prediction are computed:
    • Residual source term: rf(0)=L[u(0)]fr_f^{(0)} = \mathcal{L}[u^{(0)}] - f
    • Residual Dirichlet conditions: rD(0)=u(0)gr_D^{(0)} = u^{(0)} - g
    • Residual Neumann conditions: rN(0)=u(0)nhr_N^{(0)} = \nabla u^{(0)} \cdot \mathbf{n} - h
  3. Generate Corrective PINN: These residuals are then treated as a "delta PDE" and fed back into the hypernetwork Φ\Phi to obtain a corrective PINN δu(1)\delta u^{(1)}: $ \delta \boldsymbol { u } ^ { ( 1 ) } : = \boldsymbol { u } _ { \Phi ( L , r _ { f } ^ { ( 0 ) } , r _ { D } ^ { ( 0 ) } , r _ { N } ^ { ( 0 ) } ) } . $ Essentially, the hypernetwork learns to solve for the error of the previous prediction.
  4. Update Solution: The updated solution u(1)u^{(1)} is the sum of the initial and corrective PINNs: $ u ^ { ( 1 ) } : = u ^ { ( 0 ) } + \delta u ^ { ( 1 ) } $
  5. Iterative Process: This process is repeated for t=0,,T1t = 0, \ldots, T-1 rounds: $ \boldsymbol { u } ^ { ( t + 1 ) } : = \boldsymbol { u } ^ { ( t ) } + \delta \boldsymbol { u } ^ { ( t + 1 ) } , \quad \mathrm { w i t h } \quad \delta \boldsymbol { u } ^ { ( t + 1 ) } : = \boldsymbol { u } _ { \Phi ( L , r _ { f } ^ { ( t ) } , r _ { D } ^ { ( t ) } , r _ { N } ^ { ( t ) } ) } . $ After TT rounds, the final solution is the sum of all contributions: $ \boldsymbol { u } ^ { ( T ) } = \boldsymbol { u } ^ { ( 0 ) } + \sum _ { t = 1 } ^ { T } \delta \boldsymbol { u } ^ { ( t ) } $ This model is denoted as HyPINOi\mathrm{HyPINO}^i, where ii is the number of refinement rounds (meaning there are i+1i+1 PINNs in the ensemble). During this process, only the small PINNs (the δu(t)\delta u^{(t)}) are differentiated to compute residuals; the hypernetwork Φ\Phi remains in inference mode, making the refinement computationally efficient (forward-only passes).

5. Experimental Setup

5.1. Datasets

The study evaluates HyPINO and baseline models on seven standard PDE benchmarks drawn from the PINN literature. All problems are reformulated over the canonical domain [1,1]2[-1, 1]^2. Problems originally defined over different ranges are mapped to [1,1]2[-1, 1]^2 using affine transformations: $ \tilde { x } = \frac { 2 ( x - a _ { x } ) } { b _ { x } - a _ { x } } - 1 , \quad \tilde { y } = \frac { 2 ( y - a _ { y } ) } { b _ { y } - a _ { y } } - 1 $ where (x, y) are the original coordinates, (ax,bx)(a_x, b_x) and (ay,by)(a_y, b_y) define the original domain ranges, and (x~,y~)[1,1]2(\tilde{x}, \tilde{y}) \in [-1, 1]^2 are the normalized coordinates.

Here are the benchmark problems:

  1. HT - 1D Heat Equation:

    • Equation: ut=α2ux2\frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}, with α=0.1\alpha = 0.1.

    • Domain: x[0,1],t[0,1]x \in [0, 1], t \in [0, 1].

    • Boundary Conditions: u(0,t)=u(1,t)=0u(0, t) = u(1, t) = 0.

    • Initial Condition: u(x,0)=sin(nπxL)u(x, 0) = \sin(\frac{n \pi x}{L}) for 0<x<L0 < x < L, L=1L=1, n{1,2,}n \in \{1, 2, \ldots\}.

    • Analytical Solution: u(x,t)=exp(n2π2αtL2)sin(nπxL)u(x, t) = \exp(-\frac{n^2 \pi^2 \alpha t}{L^2}) \sin(\frac{n \pi x}{L}).

    • Adapted from DeepXDE [32]. The following figure (Figure 5 from the original paper) shows the parameterization of the 1D Heat PDE:

      Figure 5: Parameterization of the 1D Heat PDE. 该图像是示意图,展示了1D热传导方程的各个参数化结果,包括边界条件 ΩD\partial \Omega_D、源项 g(x)、边界条件 ΩN\partial \Omega_N、函数 h(x)、源项 f(x) 以及解 u(x) 的可视化。这些图示分别对应图中的(a)至(f)部分,呈现了不同变量在空间域上的分布情况。

  2. HZ - 2D Helmholtz Equation:

    • Equation: Δu(x,y)+k2u(x,y)=f(x,y)\Delta u(x, y) + k^2 u(x, y) = f(x, y), with (x,y)[1,1]2(x, y) \in [-1, 1]^2.

    • Boundary Conditions: u(1,y)=u(1,y)=u(x,1)=u(x,1)=0u(-1, y) = u(1, y) = u(x, -1) = u(x, 1) = 0.

    • Common Instance: f(x,y)=(π2(4π)2+k2)sin(πx)sin(4πy)f(x, y) = (-\pi^2 - (4\pi)^2 + k^2) \sin(\pi x) \sin(4\pi y), u(x,y)=sin(πx)sin(4πy)u(x, y) = \sin(\pi x) \sin(4\pi y).

    • Adapted from DeepXDE [32]. The following figure (Figure 6 from the original paper) shows the parameterization of the 2D Helmholtz PDE:

      Figure 6: Parameterization of the 2D Helmholtz PDE. 该图像是参数化的2D亥姆霍兹方程示意图,包含六个部分:边界条件 rac{box[green,5px]{ extbf{D}}}{box[blue,5px]{ extbf{N}}}、函数 g(x)h(x) 和源项 f(x) 以及解 u(x) 的可视化。每个部分展示了不同的参数和条件对方程解的影响。

  3. HZ-G - Helmholtz on an Irregular Geometry:

    • Equation: Δu(x,y)+k2u(x,y)=f(x,y)-\Delta u(x, y) + k^2 u(x, y) = f(x, y), where Ω=[1,1]2Ωcircle\Omega = [-1, 1]^2 \setminus \Omega_{\mathrm{circle}}.

    • Domain: Unit square with four circular regions removed.

    • Source Term: f(x1,x2)=Aμ2x2sin(μ1πx1)sin(μ2πx2)f(x_1, x_2) = A \cdot \mu_2 x_2 \sin(\mu_1 \pi x_1) \sin(\mu_2 \pi x_2), with μ1=1,μ2=4,k=8,A=10\mu_1=1, \mu_2=4, k=8, A=10.

    • Boundary Conditions: u(x,y)=0.2u(x, y) = 0.2 on Ωrec\partial \Omega_{\mathrm{rec}} (outer rectangular boundary), u(x,y)=1.0u(x, y) = 1.0 on Ωcircle\partial \Omega_{\mathrm{circle}} (boundaries of interior circles).

    • Circles RiR_i: Defined by specific center coordinates and radii.

    • Adapted from PINNacle [16]. The following figure (Figure 7 from the original paper) shows the parameterization of the 2D Helmholtz-type (Poisson-Boltzmann) PDE with complex geometry:

      Figure 7: Parameterization of the 2D Helmholtz-type (Poisson-Boltzmann) PDE with complex geometry. 该图像是示意图,展示了2D Helmholtz类型(Poisson-Boltzmann)偏微分方程的参数化。图中包含多个子图,分别表示不同的边界条件和源项,包括边界ΩD\partial \Omega_D(子图(a))、函数g(x)(子图(b))、边界ΩN\partial \Omega_N(子图(c))、函数h(x)(子图(d))、源项f(x)(子图(e))以及解u(x)(子图(f))。这些参数化展示了复杂几何形状对模型的影响。

  4. PS-C - Poisson with Four Circular Interior Boundaries:

    • Equation: Δu(x,y)=0-\Delta u(x, y) = 0, with (x,y)Ω(x, y) \in \Omega.

    • Domain: Rectangle Ωrec=[0.5,0.5]2\Omega_{\mathrm{rec}} = [-0.5, 0.5]^2 with four interior circular exclusions RiR_i.

    • Boundary Conditions: u(x,y)=0u(x, y) = 0 on Ri\partial R_i, u(x,y)=1u(x, y) = 1 on Ωrec\partial \Omega_{\mathrm{rec}}.

    • Adapted from PINNacle [16]. The following figure (Figure 8 from the original paper) shows the parameterization of the 2D Poisson PDE with circular inner boundaries:

      Figure 8: Parameterization of the 2D Poisson PDE with circular inner boundaries. 该图像是示意图,展示了二维泊松 PDE 的参数化,包含了多个部分。图 (a) 表示边界条件 ΩD\partial \Omega_D,图 (b) 显示了函数 g(x),图 (c) 表示边界条件 ΩN\partial \Omega_N,图 (d) 展示了函数 h(x),图 (e) 代表函数 f(x),而图 (f) 显示了解 u(x) 的分布。这些部分共同描述了 PDE 的特性和边界条件。

  5. PS-L - Poisson on an L-shaped Domain:

    • Equation: uxxuyy=1-u_{xx} - u_{yy} = 1, with (x,y)Ω(x, y) \in \Omega.

    • Domain: L-shaped region Ω=[1,1]2[0,1]2\Omega = [-1, 1]^2 \setminus [0, 1]^2.

    • Boundary Conditions: u(x,y)=0u(x, y) = 0 on Ω\partial \Omega.

    • Adapted from DeepXDE [32]. The following figure (Figure 9 from the original paper) shows the parameterization of the 2D Poisson PDE on an L-shaped domain:

      Figure 9: Parameterization of the 2D Poisson PDE on an L-shaped domain. 该图像是图表,展示了二维泊松偏微分方程在L形区域的参数化,包含边界条件 rac{ heta}{ heta D}g(x)rac{ heta}{ heta N}h(x)f(x)u(x) 的不同表现。

  6. PS-G - Poisson with a Gaussian Vorticity Field:

    • Equation: Δu(x,y)=f(x,y)-\Delta u(x, y) = f(x, y), with (x,y)(0,1)2(x, y) \in (0, 1)^2.

    • Boundary Conditions: u(x,y)=0u(x, y) = 0 (homogeneous Dirichlet) on Ω\partial \Omega.

    • Source Term: f(x,y)=i=1Nexp((xμx,i)2+(yμy,i)22σi2)f(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^N \exp\left(-\frac{(\mathbf{x} - \boldsymbol{\mu}_{x,i})^2 + (\mathbf{y} - \boldsymbol{\mu}_{y,i})^2}{2\sigma_i^2}\right), where NGeom(0.4)N \sim \mathrm{Geom}(0.4), μx,i,μy,iU[0,1]\mu_{x,i}, \mu_{y,i} \sim \mathcal{U}[0, 1], and σiU[0.025,0.1]\sigma_i \sim \mathcal{U}[0.025, 0.1].

    • A sample from the dataset introduced in [19]. The following figure (Figure 10 from the original paper) shows the parameterization of the 2D Poisson PDE with Gaussian superposition vorticity field:

      Figure 10: Parameterization of the 2D Poisson PDE with Gaussian superposition vorticity field. 该图像是插图,展示了二维泊松方程的不同参数化结果,包括边界条件和源项。图(a)展示了域边界ΩD\partial \Omega_D,图(b)为函数g(x),图(c)为边界ΩN\partial \Omega_N,图(d)为函数h(x),图(e)为源项f(x),图(f)为方程解u(x)。各个子图通过颜色深浅显示对应参数的数值分布。

  7. WV - 1D Wave Equation:

    • Equation: 2ut242ux2=0\frac{\partial^2 u}{\partial t^2} - 4 \frac{\partial^2 u}{\partial x^2} = 0, with (x,t)[0,1]×[0,1](x, t) \in [0, 1] \times [0, 1].

    • Boundary Conditions: u(0,t)=u(1,t)=0u(0, t) = u(1, t) = 0.

    • Initial Conditions: u(x,0)=sin(πx)+12sin(4πx)u(x, 0) = \sin(\pi x) + \frac{1}{2} \sin(4\pi x), ut(x,0)=0\frac{\partial u}{\partial t}(x, 0) = 0.

    • Analytical Solution: u(x,t)=sin(πx)cos(2πt)+12sin(4πx)cos(8πt)u(x, t) = \sin(\pi x) \cos(2\pi t) + \frac{1}{2} \sin(4\pi x) \cos(8\pi t).

    • Adapted from PINNacle [16]. The following figure (Figure 11 from the original paper) shows the parameterization of the 1D Wave PDE:

      Figure 11: Parameterization of the 1D Wave PDE. 该图像是图表,展示了1D波动偏微分方程的参数化。其中包含六个子图:(a) 表示边界ΩD\partial \Omega_D,(b) 表示函数g(x),(c) 表示边界ΩN\partial \Omega_N,(d) 表示函数h(x),(e) 表示源项f(x),(f) 表示解u(x)。每个子图通过色彩梯度反映相应参数的数值变化。

5.2. Evaluation Metrics

The performance of the models is primarily evaluated using Mean Squared Error (MSE) and Symmetric Mean Absolute Percentage Error (SMAPE).

  1. Mean Squared Error (MSE):

    • Conceptual Definition: MSE is a commonly used metric to quantify the average of the squares of the errors or deviations, i.e., the difference between the estimated values and the actual values. It measures the average magnitude of the errors, with larger errors having a disproportionately larger impact due to squaring. It is effective for assessing the absolute accuracy of predictions.
    • Mathematical Formula: $ \mathrm{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2 $
    • Symbol Explanation:
      • NN: The total number of data points (or spatial/temporal points in the solution grid).
      • yiy_i: The actual (ground truth) value of the solution at point ii.
      • y^i\hat{y}_i: The predicted value of the solution at point ii.
  2. Symmetric Mean Absolute Percentage Error (SMAPE):

    • Conceptual Definition: SMAPE is a measure of prediction accuracy used in statistics, particularly for forecasting. It is an extension of Mean Absolute Percentage Error (MAPE) that addresses MAPE's asymmetry (it can be infinite or undefined if the actual value is zero) by making the denominator symmetric. SMAPE expresses error as a percentage, which is often easier to interpret than MSE for understanding relative prediction quality.
    • Mathematical Formula: $ \mathrm{SMAPE} = \frac{100%}{N} \sum_{t=1}^{N} \frac{|A_t - F_t|}{(|A_t| + |F_t|)/2} $
    • Symbol Explanation:
      • NN: The total number of data points.
      • AtA_t: The actual value at time (or point) tt.
      • FtF_t: The forecast (predicted) value at time (or point) tt.
      • AtFt|A_t - F_t|: The absolute error between the actual and predicted values.
      • (At+Ft)/2(|A_t| + |F_t|)/2: The average of the absolute actual and predicted values, used for normalization.

5.3. Baselines

The performance of HyPINO is compared against three baseline models:

  1. U-Net [41]:

    • Description: A convolutional encoder-decoder network known for its effectiveness in image segmentation tasks. In this context, it shares HyPINO's encoder architecture for processing PDE inputs but replaces the hypernetwork decoder with a convolutional decoder. This decoder directly outputs a 224×224224 \times 224 solution grid, matching the resolution of the input tensors.
    • Training: Trained exclusively on supervised data (PDEs with analytical solutions).
    • Parameters: 62 million (62M) trainable parameters.
    • Configuration: Trained for 30,000 batches with a batch size of 128 and an initial learning rate of 10410^{-4}.
  2. Poseidon [19]:

    • Description: A large, pretrained neural operator model. The paper uses the Poseidon-B checkpoint. It is adapted to this study's PDE parameterization by adjusting the dimensionality of its embedding and lead-time-conditioned layer normalization layers (originally designed for 1D time input) to match the size of the 5-dimensional PDE operator vector.
    • Training: Fine-tuned only on supervised data.
    • Parameters: Approximately 158 million (158M) parameters.
    • Configuration: Trained for 30,000 batches with a batch size of 128 and an initial learning rate of 10410^{-4}.
  3. PINO (Physics-Informed Neural Operator) [29]:

    • Description: A Fourier neural operator (FNO) [28] architecture, which learns solution operators in Fourier space. It is adapted to accept 5-channel grid inputs (for the PDE components) and to condition on the PDE operator using FiLM layers, similar to HyPINO.
    • Training: Trained using the same hybrid supervision and curriculum as HyPINO, including both physics-informed losses and supervised losses.
    • Parameters: 33 million (33M) parameters.
    • Configuration: Trained for 30,000 batches with a batch size of 128 and an initial learning rate of 10410^{-4}.

5.4. Training

The HyPINO model generates weights for a target PINN configured with three hidden layers, each containing 32 hidden units. The entire HyPINO model comprises 77 million (77M) trainable parameters. Training is conducted on 4 NVIDIA RTX 4090 GPUs.

The training process is divided into two distinct phases:

  1. Phase 1 (Initial 10,000 batches):
    • Data: All samples are supervised, meaning they come with known analytical solutions generated via MMS.
    • Loss Weights:
      • λR=0.01\lambda_R = 0.01 (Residual loss)
      • λS(0)=1\lambda_S^{(0)} = 1 (Sobolev loss for function values)
      • λS(1)=0.1\lambda_S^{(1)} = 0.1 (Sobolev loss for first derivatives/gradients)
      • λS(2)=0.01\lambda_S^{(2)} = 0.01 (Sobolev loss for second derivatives)
      • λD=10\lambda_D = 10 (Dirichlet boundary loss)
      • λN=1\lambda_N = 1 (Neumann boundary loss)
  2. Phase 2 (Remaining 20,000 batches):
    • Data: Each batch consists of 50% supervised samples (with analytical solutions) and 50% unsupervised samples (without analytical solutions, relying solely on physics-informed losses).
    • Loss Weights:
      • λR=0.1\lambda_R = 0.1 (Residual loss)

      • λS(0)=1\lambda_S^{(0)} = 1 (Sobolev loss for function values)

      • λS(1)=1\lambda_S^{(1)} = 1 (Sobolev loss for first derivatives/gradients)

      • λS(2)=0.1\lambda_S^{(2)} = 0.1 (Sobolev loss for second derivatives)

      • λD=10\lambda_D = 10 (Dirichlet boundary loss)

      • λN=1\lambda_N = 1 (Neumann boundary loss)

        The AdamW optimizer is used for training, with a cosine learning rate schedule that decays the learning rate from an initial 10410^{-4} down to 10610^{-6}. The batch size is fixed at 128 for all experiments.

6. Results & Analysis

6.1. Core Results Analysis

The experimental results demonstrate HyPINO's strong zero-shot generalization capabilities and the effectiveness of its iterative refinement procedure.

The following are the results from Table 1 of the original paper:

HT HZ HZ-G PS-C PS-L PS-G WV
U-Net 3.5e-2 / 67 3.7e-2 / 68 6.9e-2 / 68 2.7e-2 / 33 3.9e-3 / 112 9.2e-1 / 159 3.7e-1 / 144
Poseidon 7.1e-2 / 47 3.3e-3 / 28 1.3e-1 / 65 5.3e-2 / 93 3.5e-3 / 111 7.2e-1 / 155 8.7e-1 / 138
PINO 1.4e-2 / 38 2.0e-2 / 51 6.1e-2 / 60 1.7e-1 / 65 3.3e-3 / 51 3.1e-1 / 70 3.0e-1 / 149
PINO3 1.3e-2 / 47 7.2e-3 / 48 4.6e-2 / 64 2.8e-2 / 63 4.6e-3 / 62 2.3e-2 / 43 3.1e-1 / 127
PINO10 3.9e-2 / 78 5.1e-3 / 39 1.4e-1 / 75 1.1e-2 / 48 1.0e-3 / 47 1.8e-2 / 38 8.5e-1 / 139
HyPINO 2.3e-2 / 42 5.7e-3 / 36 1.3e-1 / 64 5.6e-2 / 86 1.7e-4 / 39 1.8e-1 / 61 2.9e-1 / 150
HyPINO3 4.9e-4 / 11 2.7e-3 / 31 1.6e-2 / 38 3.4e-3 / 18 1.9e-4 / 36 6.6e-3 / 25 2.3e-1 / 134
HyPINO10 8.0e-5 / 7 1.6e-3 / 22 1.9e-2 / 40 2.3e-3 / 15 2.7e-4 / 40 5.0e-3 / 24 1.2e-1 / 96

Analysis of Zero-Shot Performance (HyPINO vs. Baselines):

  • HyPINO (without refinement) shows consistently strong results, achieving an average rank of 2.00 across all tasks, outperforming U-Net (3.00), Poseidon (2.86), and PINO (2.14). This is particularly noteworthy given that HyPINO operates in a significantly less structured output space (generating PINN weights) compared to the grid-based outputs of the baselines.
  • The results generally support the idea that models trained with physics-informed objectives (PINO, HyPINO) tend to outperform those relying solely on supervised data (U-Net, Poseidon). This highlights the benefits of embedding physical laws for better generalization, especially when training data might not perfectly cover the target tasks.
  • HyPINO achieves the lowest MSE on PS-L (1.7e-4) among all base models, indicating its strong performance on specific geometries.

Analysis of Iterative Refinement:

  • The iterative refinement approach (denoted HyPINOiHyPINO^i and PINOiPINO^i) significantly boosts performance.

  • After just three refinement iterations (HyPINO3HyPINO^3), there are substantial reductions in prediction error across all but one benchmark. For instance, MSE for PS-C and PS-G decreases by over an order of magnitude, and for HT, it decreases by almost two orders of magnitude (from 2.3e-2 to 4.9e-4).

  • With ten refinement iterations (HyPINO10HyPINO^10), the model achieves state-of-the-art performance on five out of seven benchmarks. It outperforms the best baseline models by factors ranging from 2.1 (on HZ against Poseidon) to 173 (HT against PINO).

  • The refinement mechanism is shown to be generic, as PINO3PINO^3 and PINO10PINO^10 also demonstrate improved performance, though generally less pronounced than HyPINO's.

  • The degradation on PS-L for higher refinement steps (HyPINO10HyPINO^10 is 2.7e-4, slightly worse than HyPINO's 1.7e-4) is attributed to its already low initial error and small solution magnitudes, possibly leading correction terms to fall outside the training distribution.

    The paper hypothesizes that iterative refinement works by correcting systematic biases introduced during training on synthetic data. These consistent errors can be systematically addressed in subsequent iterations, leading to more effective ensembles than simply combining independent PINNs.

The following figure (Figure 3 from the original paper) illustrates these trends, showing mean squared error and relative error as functions of the number of refinement iterations.

Figure 3: Effect of iterative refinement on HyPINO predictions across benchmarks. MSE (left) and relative error (right) as functions of refinement iterations. Relative error at iteration \(i\) is the ratio of MSE at iteration \(i\) to that at iteration 0. 该图像是一个图表,展示了 HyPINO 在不同的精细化迭代下的均方误差(MSE)和相对误差。左图显示了不同方法在多个迭代轮次中的 MSE 变化,右图表现了相对误差 Ei/E0E_i/E_0 随迭代轮次的变化。不同颜色的曲线代表不同的方法。

As seen in the figure, MSE (left panel) generally decreases with more refinement iterations, and the relative error (right panel, Ei/E0E_i/E_0) consistently reduces for most benchmarks, highlighting the progressive improvement.

The following are the results from Table 2 of the original paper, visually comparing predictions and errors.

Table 2: Comparison of predictions and errors of HyPINO after zero, three, and 10 refinement rounds across all benchmark PDEs. 该图像是图表,展示了HyPINO在多个基准 PDE 上的预测结果和误差。每个子图分别表示不同场景下的参考值、预测值及其差异,最后一列为不同迭代轮次后的结果。整体上展示了模型在零、三、十次精炼后的效果。

This table visually confirms the improvement from iterative refinement, showing how the prediction (Prediction) approaches the ground truth (Reference) and how the difference (Diff) diminishes with more refinement rounds. For the challenging WV benchmark, the model effectively extends the undulating shape further across the time dimension, as shown in the last row of Table 2.

6.2. Ablation Studies / Parameter Analysis

Resolution Invariance Ablation

An ablation study was performed on the Helmholtz benchmark (HZ) to assess the resolution invariance of HyPINO. Although HyPINO's output is a continuous PINN, its input PDE parameterization is discretized on a fixed 224×224224 \times 224 grid. The study varied the input source function resolution from 28 to 448 and then resized it to 224×224224 \times 224 for processing.

The following are the results from Table 3 of the original paper:

28 56 96 112 140 168
SMAPE 38.04 35.78 35.91 36.00 36.05 36.05
196 224 280 336 392 448
SMAPE 36.05 36.04 36.05 36.03 36.04 36.04

Analysis: The SMAPE values vary by less than 0.3 between resolutions of 56 and 448, indicating approximate resolution invariance. Performance only starts to deteriorate at very coarse resolutions (28×2828 \times 28). This suggests that HyPINO can effectively handle inputs from various resolutions, provided they are not excessively coarse.

Fine-tuning Behavior with Different Initializations

The paper investigates the utility of HyPINO-generated PINN parameters as an initialization for fine-tuning on specific PDE instances. It compares three initialization strategies:

  1. HyPINO-initialized PINNs.

  2. Randomly initialized PINNs.

  3. PINNs initialized via Reptile meta-learning [37]. (Reptile was trained on the synthetic dataset with 10,000 outer-loop and 1,000 inner-loop cycles).

    PINN fine-tuning is performed over 10,000 steps using the Adam optimizer, with a learning rate starting at 10410^{-4} and decaying to 10710^{-7} via a cosine schedule.

The following figure (Figure 4 from the original paper) shows convergence results on the 1D Heat Equation (HT) benchmark:

Figure 4: Convergence on the 1D Heat Equation (HT) for randomly initialized PINNs (blue), Reptile-initialized PINNs (orange), and HyPINOinitialized PINNs. 该图像是一个图表,展示了在1D热方程(HT)中,随机初始化的PINNs(蓝色)、Reptile初始化的PINNs(橙色)和HyPINO初始化的PINNs(绿色)在不同迭代下的均方误差(MSE)收敛情况。该图表显示了HyPINO的收敛速度显著优于其他方法。

As shown in the figure, HyPINO-initialized PINNs (green line) consistently start with a lower loss and converge faster to a lower final error compared to randomly initialized PINNs (blue line) and Reptile-initialized PINNs (orange line).

The following figures (Figure 13 and Figure 14 from the original paper) show convergence results across all other benchmarks and ensemble comparisons:

Figure 13: Convergence of PINNs when fine-tuned on each of the benchmark PDE problems. We compare the convergence of different ensemble sizes: (a) single PINN, (b) ensemble of size 4 (c) ensemble of size 11, where an ensemble of size \(i\) is an ensemble of \(i\) randomly initialized PINNs (blue), \(i\) PINNs initialized via Reptile (orange), or one PINN initialized via HyPINO followed by \(i - 1\) refinement rounds (green). 该图像是一个图表,展示了在不同基准 PDE 问题上,PINNs 在微调过程中的收敛情况。图中比较了不同的集成大小,包括单个 PINN、大小为 4 的集成和大小为 11 的集成,分别用不同的颜色表示,其中 HyPINO 初始化的 PINN 经过 i1i - 1 次精炼轮次后的集合表现出更优的收敛特性。

Figure 14: Convergence of PINNs when fine-tuned on each of the benchmark PDE problems. We compare the convergence of different ensemble sizes: (a) single PINN, (b) ensemble of size 4 (c) ensemble of size 11, where an ensemble of size \(i\) is an ensemble of \(i\) randomly initialized PINNs (blue), \(i\) PINNs initialized via Reptile (orange), or one PINN initialized via HyPINO followed by \(i - 1\) refinement rounds (green). 该图像是图表,展示了不同初始化方法对PINNs收敛性的影响。共有六个子图,分别呈现了在各种基准问题(如二维泊松方程和一维波动方程)上,不同规模的PINN集成(单个PINN、4个PINN和11个PINN)的均方根误差(MSE)与迭代次数的关系。每个图中,蓝色线表示随机初始化的PINN,橙色线为通过Reptile初始化的PINN,绿色线为通过HyPINO初始化后的PINN。 Analysis of Fine-tuning Performance:

  • HyPINO-initialized PINNs consistently start with lower loss and converge to lower final error on 4 out of 7 benchmarks. They perform on par with baselines on two benchmarks and underperform on only one.

  • Quantitatively, a randomly initialized PINN requires an average of 1,068 steps to reach the initial MSE of a HyPINO-initialized model.

  • For ensembles, matching the MSE of HyPINO3HyPINO^3 and HyPINO10HyPINO^10 requires an average of 1,617 and 1,772 steps, respectively, for randomly initialized ensembles.

  • Reptile-initialized PINNs converge rapidly initially (first 1,000 steps) due to their meta-training configuration but tend to plateau earlier and converge to higher final errors than HyPINO initializations.

    These findings strongly suggest that HyPINO offers a robust initialization strategy for training PINNs, leading to faster and more accurate fine-tuning.

L-BFGS Fine-Tuning with Different Initializations

To further validate the effectiveness of HyPINO initializations, additional fine-tuning experiments were conducted using the L-BFGS optimizer, a second-order optimization method.

The following are the results from Table 4 of the original paper, showing iterations required to match the initial MSE of a HyPINO-initialized PINN with L-BFGS:

HT HZ HZ-G PS-C PS-L PS-G WV
Random Init 4 20 N/A 36 34 11 35
Reptile Init 4 22 211 22 65 9 27

Analysis: HyPINO initialization allows L-BFGS to reach its initial accuracy much faster. For instance, Random Init takes 36 steps on PS-C and 34 on PS-L to reach HyPINO's starting error, while Reptile Init takes 22 and 65 steps, respectively. For HZ-G, Random Init never reaches HyPINO's initial accuracy, while Reptile Init takes 211 steps. This confirms that HyPINO provides a significantly better starting point for fine-tuning, even with second-order optimizers.

The following are the results from Table 5 of the original paper, showing final MSE after L-BFGS fine-tuning:

HT HZ HZ-G PS-C PS-L PS-G WV
Random Init 2.93e-9 1.15e-7 2.89e-1 3.18e-4 7.05e-5 5.69e-4 2.68e-2
Reptile Init 2.69e-9 2.18e-7 3.55e-2 9.34e-4 8.66e-5 5.68e-4 3.80e-4
HyPINO Init 1.62e-9 1.52e-7 1.74e-2 8.19e-5 6.87e-5 5.69e-4 1.94e-2

Analysis: HyPINO initializations continue to be effective with L-BFGS. HyPINO achieves the lowest final MSE on four benchmarks (HT, PS-C, PS-L, HZ-G) and is competitive on PS-G. Only on WV does Reptile achieve the best result, and on HZ, Random Init slightly outperforms HyPINO. These differences are significant given the high computational cost of L-BFGS iterations, implying that a good initialization is crucial for efficient and effective optimization.

6.3. Additional Visualizations

The following figure (Figure 12 from the original paper) shows the visual progression of iterative refinement across different samples.

Figure 12: Visual progression of iterative refinement across different samples. Each row shows: (a) HyPINO prediction \(\\bar { u ^ { ( 0 ) } }\) , (b) 1st Refinement \(\\delta u ^ { ( 1 ) }\) , (c) 2nd Refinement \(\\bar { \\delta } u ^ { ( 2 ) }\) , (d) Final prediction \(u ^ { ( 0 ) } + \\delta u ^ { ( 1 ) } + \\delta u ^ { ( 2 ) }\) , and ground truth (e). 该图像是插图,展示了迭代改进过程的视觉进展。每一行包括四个部分: (a) HyPINO 预测 u(0)ˉ\bar { u ^ { ( 0 ) } },(b) 第一次改进 δu(1)\delta u ^ { ( 1 ) },(c) 第二次改进 δˉu(2)\bar { \delta } u ^ { ( 2 ) },(d) 最终预测 u(0)+δu(1)+δu(2)u ^ { ( 0 ) } + \delta u ^ { ( 1 ) } + \delta u ^ { ( 2 ) },以及 (e) 真实值。图中展示了不同样本的逐步改进过程。

This visualization provides qualitative evidence for the effectiveness of the iterative refinement procedure, showing how the corrections (δu(1)\delta u^{(1)}, δu(2)\delta u^{(2)}) progressively improve the initial HyPINO prediction (uˉ(0)\bar{u}^{(0)}) to match the ground truth ((e)).

7. Conclusion & Reflections

7.1. Conclusion Summary

The paper introduces HyPINO, a novel multi-physics neural operator that excels in zero-shot generalization across a broad spectrum of Partial Differential Equations (PDEs). Its key innovation lies in combining a Swin Transformer-based hypernetwork with a mixed supervision strategy. This strategy utilizes both labeled data derived from the Method of Manufactured Solutions (MMS) and unlabeled samples optimized through physics-informed objectives. HyPINO can handle linear elliptic, hyperbolic, and parabolic PDEs in two dimensions, accommodating variations in operators, source terms, complex geometries (including interior boundaries), and mixed boundary conditions. The model consistently outperforms existing baselines like U-Nets, Poseidon, and PINO in zero-shot accuracy on diverse benchmarks.

Furthermore, HyPINO introduces an iterative refinement procedure that significantly enhances prediction accuracy by treating residual errors as "delta PDEs" and generating corrective PINNs. This ensemble-based approach drastically reduces L2 loss (over 100x improvement in some cases) while retaining forward-only inference. Finally, HyPINO-generated PINN parameters serve as superior initializations for fine-tuning, leading to faster convergence and lower final errors compared to random or Reptile-meta-learned initializations.

7.2. Limitations & Future Work

The authors acknowledge several limitations of the current HyPINO implementation:

  1. Scope of PDEs: It is currently restricted to linear 2D PDEs with spatially uniform coefficients. This narrows the class of PDEs it can address, as many real-world phenomena involve nonlinear effects, spatially varying properties, or higher dimensions.

  2. Increased Complexity: Extending the framework to more complex PDEs will likely necessitate increased model capacity, either by scaling the architecture or improving the target networks' parameter generation process.

    Based on these limitations, the authors suggest several directions for future work:

  3. Increased Input Dimensionality: Extending HyPINO to higher-dimensional PDEs.

  4. Spatially Varying Coefficients: Incorporating PDEs where coefficients are not uniform across the domain.

  5. Nonlinear PDEs: Adapting the framework to handle nonlinear PDEs, which are significantly more challenging.

  6. Coupled Systems: Modeling coupled systems of PDEs, which are common in multi-physics scenarios.

    They believe some extensions might be achievable with modest modifications to the data generation, input encoding, or training processes, while others will require more substantial architectural enhancements.

7.3. Personal Insights & Critique

HyPINO represents a significant step forward in the quest for generalized PDE solvers using neural operators. The clever combination of a Swin Transformer hypernetwork, MMS-based supervised data, and physics-informed self-supervision for a diverse synthetic dataset is particularly insightful. This hybrid data strategy is crucial for overcoming the data inefficiency of pure neural operators and the stability issues of pure physics-informed approaches. The ability to generalize zero-shot across different PDE types, geometries, and boundary conditions simultaneously is a major achievement, pushing beyond the narrower scope of many prior works.

The iterative refinement procedure is an elegant solution to improve accuracy without incurring the heavy computational cost of backpropagation during inference or extensive fine-tuning. This forward-only ensemble approach is practical and demonstrates a clear path to boosting performance at deployment. The robustness of HyPINO's initialization for fine-tuning also highlights its potential as a foundational model component.

However, some aspects warrant further consideration:

  • Scalability to True Nonlinearity and High Dimensions: While the paper suggests extensibility, nonlinear PDEs and high-dimensional problems introduce significant challenges (e.g., stiffness, turbulence, complex interactions) that may not be easily overcome by simply scaling the architecture or refining data generation. The spectral bias and mode collapse issues, though mitigated by Fourier features, can re-emerge in more complex scenarios.

  • Complexity of MMS Data Generation: While MMS provides ground truth, generating truly representative and diverse analytical solutions for complex multi-physics, nonlinear PDEs in higher dimensions could become prohibitively complex. The current Algorithm 1 for MMS is already quite intricate for 2D linear PDEs.

  • Interpretability of Hypernetwork Outputs: The hypernetwork outputs a large set of PINN weights. While effective, understanding why certain PINN parameters are generated for a given PDE remains largely a black box. Better interpretability could lead to more robust designs and debugging.

  • Computational Cost of Hypernetwork Training: Training a 77M-parameter hypernetwork on 4 RTX 4090 GPUs for 30,000 batches is substantial. Scaling this to more complex PDEs might require even larger models and more resources, which could limit accessibility for researchers without high-end computing.

  • Generalization to Truly Unseen Physics: The current benchmarks, while diverse, are drawn from the PINN literature. It would be interesting to see how HyPINO performs on PDEs with physical phenomena qualitatively different from those seen during training, rather than just quantitative variations within known types.

    Overall, HyPINO offers a practical and powerful framework for advancing neural operators towards multi-physics and foundation model capabilities. Its strengths in zero-shot generalization and efficient refinement suggest a promising direction for AI-driven scientific computing, providing a solid foundation for tackling increasingly complex PDE challenges.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.