HyPINO: Multi-Physics Neural Operators via HyperPINNs and the Method of Manufactured Solutions
TL;DR Summary
HyPINO is introduced as a multi-physics neural operator for zero-shot generalization across various PDEs without task-specific fine-tuning, combining a Swin Transformer hypernetwork and mixed supervision for improved accuracy in benchmarks.
Abstract
We present HyPINO, a multi-physics neural operator designed for zero-shot generalization across a broad class of PDEs without requiring task-specific fine-tuning. Our approach combines a Swin Transformer-based hypernetwork with mixed supervision: (i) labeled data from analytical solutions generated via the Method of Manufactured Solutions (MMS), and (ii) unlabeled samples optimized using physics-informed objectives. The model maps PDE parameterizations to target Physics-Informed Neural Networks (PINNs) and can handle linear elliptic, hyperbolic, and parabolic equations in two dimensions with varying source terms, geometries, and mixed Dirichlet/Neumann boundary conditions, including interior boundaries. HyPINO achieves strong zero-shot accuracy on seven benchmark problems from PINN literature, outperforming U-Nets, Poseidon, and Physics-Informed Neural Operators (PINO). Further, we introduce an iterative refinement procedure that treats the residual of the generated PINN as "delta PDE" and performs another forward pass to generate a corrective PINN. Summing their contributions and repeating this process forms an ensemble whose combined solution progressively reduces the error on six benchmarks and achieves a >100x lower loss in the best case, while retaining forward-only inference. Additionally, we evaluate the fine-tuning behavior of PINNs initialized by HyPINO and show that they converge faster and to lower final error than both randomly initialized and Reptile-meta-learned PINNs on five benchmarks, performing on par on the remaining two. Our results highlight the potential of this scalable approach as a foundation for extending neural operators toward solving increasingly complex, nonlinear, and high-dimensional PDE problems. The code and model weights are publicly available at https://github.com/rbischof/hypino.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
HyPINO: Multi-Physics Neural Operators via HyperPINNs and the Method of Manufactured Solutions
1.2. Authors
-
Rafael Bischof (Computational Design Lab, ETH Zurich, Switzerland)
-
Michal Piovari (Computational Design Lab, ETH Zurich, Switzerland)
-
Michael A. Kraus (Institute of Structural Mechanics and Design, TU Darmstadt, Germany)
-
Siddhartha Mishra (Seminar for Applied Mathematics, ETH Zurich, Switzerland)
-
Bernd Bickel (Computational Design Lab, ETH Zurich, Switzerland)
The corresponding author is Rafael Bischof (rabischof@ethz.ch). The authors are affiliated with well-known academic institutions, primarily ETH Zurich, a prestigious research university in Switzerland, and TU Darmstadt in Germany, indicating a strong academic and research-oriented background in computational science, applied mathematics, and machine learning.
1.3. Journal/Conference
The paper is published on arXiv, a preprint server, with the version . While arXiv itself is not a peer-reviewed journal or conference, it is a widely recognized platform for disseminating early research findings in various scientific fields, including machine learning and computational physics. The abstract mentions "Published at (UTC): 2025-09-05T13:59:25.000Z", indicating a future publication date, which is typical for preprints that are undergoing peer review or have been accepted for a future publication.
1.4. Publication Year
The indicated publication date on arXiv is 2025-09-05, suggesting it is a very recent or upcoming work.
1.5. Abstract
This paper introduces HyPINO, a multi-physics neural operator that leverages a Swin Transformer-based hypernetwork with a unique mixed supervision strategy. The model is designed for zero-shot generalization across diverse Partial Differential Equations (PDEs), including linear elliptic, hyperbolic, and parabolic types in two dimensions, with varying source terms, geometries, and mixed boundary conditions. The training data combines labeled analytical solutions generated by the Method of Manufactured Solutions (MMS) and unlabeled samples optimized using physics-informed objectives. HyPINO consistently outperforms existing baselines like U-Nets, Poseidon, and PINO on seven benchmark problems. A novel iterative refinement procedure is also proposed, which progressively reduces error by generating corrective PINNs based on residual "delta PDEs," achieving significant loss reductions. Furthermore, PINNs initialized by HyPINO demonstrate faster convergence and lower final errors during fine-tuning compared to randomly initialized or Reptile-meta-learned PINNs. The authors suggest that this scalable approach holds promise for tackling more complex, nonlinear, and high-dimensional PDE problems, aiming to serve as a foundation for advanced neural operators.
1.6. Original Source Link
Official Source: https://arxiv.org/abs/2509.05117v4 PDF Link: https://arxiv.org/pdf/2509.05117v4.pdf Publication Status: Preprint on arXiv, with a future publication date (September 5, 2025).
2. Executive Summary
2.1. Background & Motivation
The core problem the paper addresses is the limitation of existing neural operators in solving Partial Differential Equations (PDEs), particularly their sample inefficiency and tendency to generalize only within narrowly defined problem families. Current neural operators often require large amounts of labeled data, typically generated by expensive high-fidelity solvers, and struggle to handle simultaneous variations in PDE operators, geometries, and boundary conditions without extensive fine-tuning. While physics-informed losses offer a way to reduce reliance on labeled data by providing self-supervision, purely physics-based training can be unstable and suffer from spectral bias or mode collapse. This creates a significant bottleneck in developing general-purpose, foundational, and multi-physics simulators capable of handling a broad range of scientific computing tasks.
The problem is important because PDEs are fundamental to modeling phenomena in nearly all scientific and engineering disciplines. Efficient and generalized PDE solvers are crucial for accelerating scientific discovery, engineering design, and simulation. The current gaps in research include the inability of neural operators to generalize widely across diverse PDE types and configurations (e.g., varying operators, source terms, geometries, and boundary conditions simultaneously) and the heavy data requirements for training them.
The paper's entry point or innovative idea is to combine physics-informed learning with a scalable synthetic data pipeline generated via the Method of Manufactured Solutions (MMS). This hybrid approach, coupled with a Swin Transformer-based hypernetwork, aims to achieve zero-shot generalization across a broad class of PDEs and provide robust PINN initializations without extensive task-specific fine-tuning.
2.2. Main Contributions / Findings
The paper makes several primary contributions:
-
A hybrid physics-informed and supervised learning framework for multi-physics PDE solving: The
HyPINOmodel integrates aSwin Transformerhypernetworkto generatePINNweights, allowing it to adapt to variousPDEconfigurations. It uniquely combinesphysics-informed losses(for unlabeled data) withsupervised losses(forMMS-generated analytical solutions), enabling broad generalization. -
A scalable data generation pipeline combining random physics sampling with
MMS-based supervised examples: This pipeline efficiently creates a diverse dataset ofPDEinstances, encompassing linear elliptic, hyperbolic, and parabolic equations in 2D, with varying source terms, geometries (including interior boundaries), and mixedDirichlet/Neumannboundary conditions. This addresses the data bottleneck faced by manyneural operatorapproaches. -
An ensemble-based refinement mechanism for improved accuracy: An
iterative refinement procedureis introduced where the residual error of a generatedPINNis treated as a "deltaPDE," which then prompts thehypernetworkto generate a correctivePINN. Summing these contributions forms an ensemble that progressively reduces prediction error, offering a lightweight alternative to traditionalfine-tuningwhile retainingforward-only inference.The key conclusions and findings are:
- Strong zero-shot accuracy:
HyPINOachieves superiorzero-shot generalizationperformance on seven diversePDEbenchmarks, outperforming baselines likeU-Nets,Poseidon, andPINO. - Significant error reduction through iterative refinement: The proposed refinement procedure substantially reduces
L2loss (over 100x in the best case) on most benchmarks, demonstrating its effectiveness in improving prediction accuracy. This mechanism is also shown to be generic and applicable to otherphysics-informed neural operators. - Efficient
PINNinitialization for fine-tuning:PINNsinitialized withHyPINOweights converge faster and to lower final errors during subsequentfine-tuningcompared to randomly initialized orReptile-meta-learnedPINNs. This highlightsHyPINO's utility as a robust initialization strategy. These findings collectively demonstrate thatHyPINOprovides a scalable and data-efficient approach towards developing more general-purposeneural operatorsformulti-physics problemsand potentially serves as a foundation for "world-model predictors."
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
Partial Differential Equations (PDEs)
Partial Differential Equations (PDEs) are mathematical equations that involve an unknown function of several independent variables and its partial derivatives with respect to those variables. They are used to model a wide range of physical phenomena, such as heat conduction, wave propagation, fluid flow, and electromagnetism.
- Example: The heat equation describes how temperature changes over time and space .
- Types:
- Elliptic PDEs: Describe steady-state phenomena (e.g., equilibrium, time-independent problems). The
Poisson equation() andLaplace equation() are common examples, often found in electrostatics or steady-state heat distribution. - Parabolic PDEs: Describe time-dependent diffusion processes. The
heat equationis a prime example. - Hyperbolic PDEs: Describe time-dependent wave propagation phenomena. The
wave equationis a classic example.
- Elliptic PDEs: Describe steady-state phenomena (e.g., equilibrium, time-independent problems). The
- Boundary Conditions (BCs): Conditions specified at the boundaries of the domain, essential for uniquely determining the solution of a
PDE.- Dirichlet Boundary Conditions: Specify the value of the unknown function directly on the boundary (e.g., fixed temperature at a wall). Denoted as on .
- Neumann Boundary Conditions: Specify the value of the normal derivative of the unknown function on the boundary (e.g., fixed heat flux across a surface). Denoted as on , where is the outward normal vector.
Neural Operators
Neural operators are a class of neural networks designed to learn mappings between infinite-dimensional function spaces. Unlike traditional neural networks that learn mappings between finite-dimensional Euclidean spaces, neural operators learn operators that map an input function (e.g., PDE parameters or initial conditions) to an output function (e.g., the PDE solution). This allows them to generalize to unseen discretizations of the same underlying PDE and to problems with varying resolutions.
- Key advantages:
Zero-shot generalization(generalize to new inputs without retraining), fast inference, and full differentiability.
Physics-Informed Neural Networks (PINNs)
Physics-Informed Neural Networks (PINNs) are neural networks that embed the governing physical laws (expressed as PDEs) directly into their loss function. Instead of relying solely on labeled data, PINNs are trained to minimize two types of losses:
- Data loss: Measures the discrepancy between the
PINN's predictions and any available labeled data. - Physics-informed loss (residual loss): Measures how well the
PINN's output satisfies thePDEat a set of collocation points. This loss is computed by differentiating thePINN's output with respect to its inputs using automatic differentiation. By minimizing both losses,PINNscan learn solutions that are consistent with both observed data and the underlying physical laws, even with sparse or no labeled data.
Hypernetworks
A hypernetwork is a neural network that generates the weights (or parameters) of another neural network, called the target network. In the context of PDEs, a hypernetwork can take PDE parameters (like coefficients, boundary conditions, or source terms) as input and output the weights of a PINN that is specialized to solve that particular PDE instance. This allows a single hypernetwork to implicitly learn a family of PINNs, each tailored to a specific PDE configuration.
Method of Manufactured Solutions (MMS)
The Method of Manufactured Solutions (MMS) is a technique used to verify the correctness and accuracy of PDE solvers (both numerical and, more recently, neural). Instead of trying to find the solution to a given PDE, MMS starts by choosing an arbitrary, sufficiently smooth analytical function (the "manufactured solution"). Then, this chosen function is substituted into the PDE operator to analytically derive the corresponding source term and boundary conditions that would yield as the exact solution. This process generates an exact PDE problem-solution pair, which can then be used as ground truth for training or evaluating a solver.
Swin Transformer
The Swin Transformer is a type of vision transformer architecture that achieves hierarchical representation by using shifted windows. Unlike standard transformers that compute self-attention globally, Swin Transformers compute self-attention within local windows, which makes them more computationally efficient. To allow for cross-window connections and hierarchical feature learning, the windows are shifted between successive layers. In HyPINO, a Swin Transformer is used as the encoder part of the hypernetwork to process grid-based PDE inputs, benefiting from its ability to capture both local and global dependencies efficiently.
Fourier Feature Mapping
Fourier feature mapping (or positional encoding) is a technique used to transform input coordinates into a higher-dimensional space using sinusoidal functions. For an input , it maps to where is a matrix of frequency bands. This transformation helps neural networks, especially PINNs and MLPs, to better learn high-frequency functions and mitigate spectral bias (the tendency of neural networks to learn low-frequency components before high-frequency ones). It is particularly effective for modeling complex, oscillating PDE solutions.
Optimization Concepts
AdamW optimizer: An optimization algorithm that is a variant ofAdamand incorporatesweight decay(L2 regularization) directly into the optimization step, which often leads to better generalization performance.Huber function: A loss function used in robust regression that is less sensitive to outliers than the squared error loss. It is quadratic for small errors and linear for large errors, providing a balance betweenL2andL1losses.Cosine learning rate schedule: A commonlearning rate schedulingstrategy where thelearning ratedecreases following acosinecurve. It starts high, gradually decreases to a minimum, and can optionally increase again, allowing for stable training initially and fine-tuning later.
3.2. Previous Works
The paper builds upon and differentiates itself from several lines of prior research:
-
Neural Operators (NOs):
- General Operator Learning: Works by [15, 27, 28, 29, 31, 34] established
neural operatorsas a paradigm for learning solution mappings forPDEs. These methods offer fast, mesh-free inference and generalization. - Foundation Models for PDEs: More recent efforts [17, 19, 35, 43, 52] aim to create large
NOs(likePoseidon) that can ingest vast corpora of simulation data or equation specifications for broadcross-task transfer. - Limitations addressed by HyPINO: The paper points out that most existing
NOsaresample inefficient[19] and targetnarrow PDE families(e.g., fixed equations with varying coefficients [6], boundary conditions [10], or domain shapes [55]).HyPINOaddresses this by supporting concurrent variation of multiple operators, geometries, and boundary types.
- General Operator Learning: Works by [15, 27, 28, 29, 31, 34] established
-
Physics-Informed Neural Operators (PINOs):
- PINNs: The original concept of embedding governing equations into the loss function for
PINNs[40, 47] providesself-supervision. - PINOs:
PINO[29] (and related works [3, 12]) extendedPINNprinciples toneural operatorarchitectures by integratingresidual lossesto train from unlabeled residual samples. - Limitations addressed by HyPINO: The paper notes that these
physics-informedapproaches still require careful weighting of supervision terms, often struggle with stability, and can suffer fromspectral biasfor complexPDEs.HyPINOimproves stability and generalization by combiningphysics-informedlosses with supervisedMMSdata and leveragingFourier feature mappingsin itsPINNtarget.
- PINNs: The original concept of embedding governing equations into the loss function for
-
Hypernetworks in PDE Solving (HyperPINNs):
HyperPINNs[8, 24] predictPINNweights for varying coefficients, extending thehypernetworkconcept [13] toPDEs.- Subsequent works further extended this idea to boundary conditions, domain changes, and low-rank weight modulation [5, 10, 14, 36].
- Limitations addressed by HyPINO:
HyPINOmoves beyond these by supporting simultaneous variations of multiple operators, geometries, and boundary types withouttask-specific fine-tuning, which existing models rarely achieve. It also uses aSwin Transformeras itshypernetworkencoder for improved feature extraction from grid-basedPDEinputs.
-
Method of Manufactured Solutions (MMS):
MMS[38] has long been used for numerical solver verification and more recently forPINN evaluation[23] andoperator training[18].- Innovation in HyPINO: While
MMShas been used, prior studies often focus on single equations (e.g.,Poisson).HyPINOsignificantly expands its utility by leveragingMMSformulti-physics operator pre-training, creating a diverse and scalable synthetic dataset for broadPDEfamilies.
Technological Evolution
The field of PDE solving has evolved from traditional mesh-based numerical methods (like Finite Element Method, Finite Difference Method) to data-driven approaches. Initially, neural networks were used as function approximators for PDE solutions. The introduction of PINNs allowed neural networks to incorporate physical laws directly, reducing reliance on labeled data. Concurrently, neural operators emerged to learn the solution operator itself, enabling generalization across an entire family of PDEs rather than just single instances. Within neural operators, approaches moved from MLP-based models to Fourier Neural Operators (FNOs) and DeepONets, which are better suited for function-to-function mappings.
More recently, the concept of hypernetworks (where one neural network generates the parameters for another) has been applied to PINNs, giving rise to HyperPINNs. These allow for parameterization of specific PDE aspects (e.g., coefficients). Simultaneously, the trend towards foundation models in AI has inspired researchers to build large, pre-trained neural operators (like Poseidon) capable of broader PDE generalization.
HyPINO fits within this timeline by integrating several cutting-edge concepts:
- It combines the
operator learningparadigm withphysics-informed losses(likePINO). - It uses a sophisticated
hypernetwork(likeHyperPINN) forPINNweight generation. - Crucially, it employs a
Swin Transformeras itshypernetworkencoder, bringing advancedvision transformercapabilities toPDEinput encoding. - It introduces a novel, scalable
synthetic data generationstrategy usingMMSand random sampling to overcome data limitations and enablemulti-physicsgeneralization. - It adds an
iterative refinement procedureas a lightweightensemblemethod, further pushing the boundaries of accuracy.
3.3. Differentiation Analysis
Compared to the main methods in related work, HyPINO offers several core differences and innovations:
-
Scope of Generalization (Multi-Physics vs. Narrow Families):
- Previous NOs/HyperPINNs: Typically focus on
narrow PDE familieswhere variations are limited to singular aspects (e.g., diffusion coefficients, specific boundary conditions, or domain shapes).Poseidon, while a largefoundation model, primarily focuses onPDEswith varying source terms or coefficients, but less on diverse operator types and geometries simultaneously. - HyPINO's Innovation: Achieves
zero-shot generalizationacross a broad class ofPDEs(linear elliptic, hyperbolic, and parabolic) with simultaneous variations inPDE operators, source terms, geometries (including interior boundaries), and mixedDirichlet/Neumannboundary conditions. This is a significant leap in flexibility and generality.
- Previous NOs/HyperPINNs: Typically focus on
-
Data Strategy (Hybrid Supervision & Scalable Synthetic Data):
- Previous NOs: Mostly rely on large amounts of labeled simulation data from high-fidelity solvers, which is expensive and time-consuming to generate.
- PINOs: Use
physics-informed lossesto alleviate data needs but can suffer from stability issues andspectral bias, making purelyphysics-based trainingchallenging for complexPDEs. - HyPINO's Innovation: Employs a novel
mixed supervisionstrategy:Supervised datafromMMS: Provides direct analytical ground truth, ensuring accuracy and overcoming the cost of traditionalhigh-fidelity solvers.Unsupervised physics-informed data: Generated by randomly samplingPDEoperators, source terms, and boundary conditions, exposing the model to diverse and complex scenarios (e.g., interior boundaries) whereMMSmight be too restrictive or costly to apply. This hybrid approach mitigates the data bottleneck and addressesPINO's stability issues.
-
Architecture (Swin Transformer Hypernetwork):
- Previous HyperPINNs: While
HyperPINNsexist, they often use simplerMLP-basedhypernetworksorU-Net-like structures. - HyPINO's Innovation: Uses a
Swin Transformer-basedhypernetwork. TheSwin Transformeris well-suited for processing grid-basedPDEinputs due to itshierarchical feature learningandshifted window mechanism, allowing it to efficiently capture both local and global dependencies in thePDEparameterization. This choice contributes to better feature extraction and, consequently, more accuratePINNweight generation.
- Previous HyperPINNs: While
-
Refinement Mechanism (Iterative Ensemble):
- Previous Methods: Typically produce a single solution or rely on simple
ensemblesof independently trained models.Fine-tuningusually involves re-training parts of the model with backward passes. - HyPINO's Innovation: Introduces an
iterative refinement procedurethat generates anensembleof correctivePINNs. By treating the residual error as a "deltaPDE" and feeding it back into thehypernetwork, it produces progressive improvements. This is a lightweight alternative to traditionalfine-tuningas it only requiresforward-only inferencefor subsequentdelta PINNs, making it efficient at test time.
- Previous Methods: Typically produce a single solution or rely on simple
-
Robust Initialization for Fine-tuning:
-
Previous Methods:
PINNsoften rely on random initialization or basicmeta-learning(likeReptile) which can lead to slower convergence or higher final errors duringfine-tuning. -
HyPINO's Innovation: The
HyPINO-generatedPINNparameters serve as an excellent prior, leading to faster convergence and lower final errors duringtask-specific fine-tuning, surpassing both random andReptile-initialized models.In essence,
HyPINOdifferentiates itself by achieving a higher degree ofmulti-physics zero-shot generalizationthrough a clever synthesis of advancedhypernetworkarchitecture, a robust hybridphysics-informedandMMS-drivensynthetic data generationstrategy, and an efficientiterative refinementmechanism.
-
4. Methodology
4.1. Principles
The core idea of HyPINO is to learn a solution operator that maps the characteristics of a Partial Differential Equation (PDE) to the parameters (weights and biases) of a Physics-Informed Neural Network (PINN) that can solve that specific PDE instance. This is achieved using a hypernetwork architecture. The intuition is that instead of training a separate PINN for each PDE, a single hypernetwork can learn the underlying patterns in PDE formulations and efficiently generate tailored PINNs on demand. This enables zero-shot generalization, meaning the model can solve new PDEs (within the trained distribution) without needing specific fine-tuning.
To make this possible across a broad range of PDEs, HyPINO employs two key principles:
-
Flexible PDE Parameterization: A standardized, rich input representation captures the
PDE operator, source term, domain geometry, and boundary conditions, allowing thehypernetworkto "understand" diversePDEproblems. -
Hybrid Training Data: A synthetic dataset is created using a mix of
Method of Manufactured Solutions (MMS)(for ground-truth labeled data) and purelyphysics-informedobjectives (for unlabeled data). This provides both accuracy through supervision and broad coverage of complex scenarios through self-supervision, overcoming thedata inefficiencyof traditionalneural operators.Additionally, the paper introduces an
iterative refinement procedurebased onresidual errors. This principle allows the model to iteratively correct its own predictions, effectively building an ensemble ofPINNsat inference time to progressively reduce error without requiring expensivebackpropagationforfine-tuning.
The overall objective is to learn the solution operator that maps the tuple to the solution , where is the linear differential operator, is the source term, represents Dirichlet boundary conditions, and represents Neumann boundary conditions. The hypernetwork realizes this mapping:
$
\left( \mathbf { c } , \ F , \ M _ { g } , \ M _ { h } , \ V _ { g } , \ V _ { h } \right) \ \longmapsto \ \theta ^ { \star } \quad { \mathrm { s u c h ~ t h a t } } \quad u _ { \theta ^ { \star } } \approx u ,
$
where denotes the vector of PDE coefficients, the discretized source function, and the Dirichlet and Neumann boundary condition location grids, and the Dirichlet and Neumann boundary condition value grids, and the reference solution.
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. PDE Parameterization
To represent a diverse set of linear PDEs for the neural network, HyPINO uses a flexible and efficient parameterization. The PDE is defined over a bounded domain with boundary . The goal is to find a function satisfying:
$
\mathcal { L } [ u ] ( { \mathbf x } ) = f ( { \mathbf x } ) \quad \mathrm { i n ~ } \Omega , \quad u ( { \mathbf x } ) = g ( { \mathbf x } ) \quad \mathrm { o n ~ } \partial \Omega _ { D } , \quad \frac { \partial u } { \partial n } ( { \mathbf x } ) = h ( { \mathbf x } ) \quad \mathrm { o n ~ } \partial \Omega _ { N } ,
$
where is a linear differential operator up to second order, is the source term, and g, h are boundary functions.
This PDE instance is parameterized as follows:
- Source Term (): The function is discretized on a uniform grid over , resulting in a 2D array representing its values at grid points.
- Boundary Conditions (
g, h): Boundary conditions are parameterized using two 2D grids for each boundary type (DirichletandNeumann):- A binary mask : Indicates the presence of the boundary at each grid point (1 for points closest to the boundary, 0 elsewhere). So, for
Dirichletand forNeumann. - A value grid : Stores the corresponding boundary values ( for
Dirichletor forNeumann) at the marked cells, with zeros elsewhere. So, forDirichletand forNeumann.
- A binary mask : Indicates the presence of the boundary at each grid point (1 for points closest to the boundary, 0 elsewhere). So, for
- Differential Operator (): The operator is parameterized by its coefficients as a vector , following [21].
$
\mathcal { L } [ u ] ( \mathbf { \bar { x } } ) = c _ { 1 } u + c _ { 2 } u _ { x } + c _ { 3 } u _ { y } + c _ { 4 } u _ { x x } + c _ { 5 } u _ { y y }
$
Here, , , , . The coefficients determine the specific type of linear
PDE.
The combined input to the hypernetwork is the tuple .
4.2.2. Neural Operator Architecture
The HyPINO model is based on a hypernetwork that maps the PDE parameterization to the weights of a target PINN.
(A) Input Embeddings
- Coefficient Embedding: The vector of operator coefficients is first embedded into a fixed-length representation . This embedding uses a
Fourier feature encoderfollowed by afully connected layer. TheFourier feature mappinghelps in avoidingspectral biasandmode collapse, especially inphysics-informed settings[44, 46]. - Grid Embeddings: Each grid-valued input (, , , , ) is processed individually:
-
It is passed through a
Fourier feature mappinglayer, which augments the input with sinusoidal encodings using five exponentially increasing frequency bands (). This enhances the network's ability to represent high-frequency content. -
This is followed by two
convolutional layerswith a kernel size of 3 and strides of 2. -
For the boundary location grids and , this process yields embeddings for
Dirichletboundaries and forNeumannboundaries. -
For the boundary value grids and , it yields (Dirichlet values) and (Neumann values).
-
The source term yields the embedding .
The final spatial embedding is constructed by concatenating these processed embeddings: $ z _ { G } = \left[ z _ { D } ^ { 1 } \odot z _ { g } + z _ { D } ^ { 2 } \parallel z _ { N } ^ { 1 } \odot z _ { h } + z _ { N } ^ { 2 } \parallel z _ { f } \right] , $ where denotes
element-wise multiplicationand denotesconcatenationalong the channel dimension. This composition appliesspatial maskingto the boundary value embeddings using the boundary location masks, ensuring that information is injected only at semantically meaningful locations (i.e., where a boundary actually exists).
-
(B) Encoding
The grid embedding is then processed by a sequence of Swin Transformer blocks, denoted . Each Swin Transformer block's output is interleaved with a FiLM (Feature-wise Linear Modulation) layer [39] conditioned on the coefficient embedding .
Let be the output of block , and . The modulation is defined as:
$
z ^ { ( i + 1 ) } = \gamma _ { i } ( z _ { C } ) \odot S { \mathcal W } _ { i } ( z _ { G } ^ { ( i ) } ) + \beta _ { i } ( z _ { C } ) ,
$
where and are scale and shift parameters, respectively, generated by small MLPs (Multi-Layer Perceptrons) that take the coefficient embedding as input:
$
\gamma _ { i } ( z ) , \beta _ { i } ( z ) : \mathbb { R } ^ { d _ { C } } \rightarrow \mathbb { R } ^ { C _ { i } }
$
The symbol represents channel-wise scaling broadcast across spatial dimensions. This design ensures that the latent grid features at each stage are adaptively modulated by the global PDE operator coefficients . Following Swin Transformer U-Net architectures [4, 11], all intermediate latent representations are retained.
(C) Pooling
To aggregate the spatial information from the Swin Transformer encoding into a compact latent representation suitable for generating PINN parameters, Multi-Head Attention Pooling [25, 54] is applied.
For each output from the -th FiLM-modulated Swin block, its spatial dimensions () are flattened to create a sequence of tokens . These tokens serve as both keys and values in the attention mechanism.
For each layer , a set of trainable query vectors is defined, where corresponds to the total number of weight and bias tensors in the target PINN. The pooled representation is then computed via multi-head attention:
$
p _ { i } = \mathrm { M u l t i H e a d A t t e n t i o n } _ { i } ( q _ { i } , k v _ { i } , k v _ { i } ) , \quad p _ { i } \in \mathbb { R } ^ { T \times C _ { i } } .
$
The MultiHeadAttention function calculates attention scores between queries and keys, then uses these scores to weigh the values.
Finally, the pooled outputs from all Swin blocks are concatenated along the channel dimension to form a unified latent matrix :
$
p = \left[ p _ { 1 } \parallel p _ { 2 } \parallel \cdots \parallel p _ { K } \right] \in \mathbb { R } ^ { T \times \left( \sum _ { i = 1 } ^ { K } C _ { i } \right) } .
$
This matrix contains one latent vector per target weight or bias tensor. Each row of is then fed into a dedicated MLP that projects it to the appropriate shape and dimensionality required by the corresponding weight matrix or bias vector of the PINN.
(D) Target PINN Architecture
The target PINN is an MLP with Fourier feature mapping [44] for its input and multiplicative skip connections [45] within its hidden layers.
- Input Encoding: Given a spatial input , a non-trainable
Fourier feature mappingencodes it as: $ \xi ( \mathbf { x } ) = \left[ \sin \left( 2 \pi \mathbf { B } \mathbf { x } \right) , \cos \left( 2 \pi \mathbf { B } \mathbf { x } \right) , \mathbf { x } \right] \in \mathbb { R } ^ { 2 N + 2 } , $ where is a matrix containing exponentially spaced frequency bands. This encoding helps thePINNrepresent high-frequency components of the solution and mitigatespectral bias. - Network Structure with Skip Connections: The encoded input is projected through three parallel transformations to form initial activations for the skip connections:
$
z _ { 0 } = \operatorname { t a n h } ( W _ { \mathrm { i n } } \xi + b _ { 0 } ) , \quad z _ { u } = \operatorname { t a n h } ( U \xi + b _ { u } ) , \quad z _ { v } = \operatorname { t a n h } ( V \xi + b _ { v } ) ,
$
where are weight matrices and are bias vectors, with being the width of the latent layers.
The subsequent hidden layers incorporate
multiplicative skip connections: $ z _ { i + 1 } = z _ { u } \odot \operatorname { t a n h } ( W _ { i } z _ { i } + b _ { i } ) + z _ { v } \odot ( 1 - \operatorname { t an h } ( W _ { i } z _ { i } + b _ { i } ) ) , \quad i = 0 , \dots , T - 2 , $ where and . Thetanhactivation function is used due to its bounded output range, which provides stability duringhypernetworktraining. These skip connections enhancegradient propagationand enable dynamic depth modulation by allowing thehypernetworkto effectively mask some layers by generating appropriate weights. - Output Layer: The final prediction is obtained via a linear transformation of the last hidden layer's output:
$
u _ { \theta } ( \mathbf { x } ) = W _ { \mathrm { o u t } } z _ { T - 1 } + b _ { \mathrm { o u t } } , \quad W _ { \mathrm { o u t } } \in \mathbb { R } ^ { 1 \times d } , \ b _ { \mathrm { o u t } } \in \mathbb { R } .
$
The
hypernetworktherefore generates the complete set of parameters for thistarget PINN: $ \left{ W _ { 0 } , U , V , b _ { 0 } , b _ { u } , b _ { v } \right} , \quad \left{ W _ { i } , b _ { i } \right} _ { i = 1 } ^ { T - 2 } , \quad W _ { \mathrm { o u t } } , b _ { \mathrm { o u t } } , $ where is the number of layers in thePINN.
4.2.3. Data Sampling
A synthetic dataset of PDE instances is created by randomly drawing the differential operator , domain , boundary data, source term , and, when available, a reference solution . The dataset is a mix of two classes: supervised and unsupervised samples.
(A) Class I: Supervised PDEs
For these samples, an analytical solution is chosen first using MMS. Then:
- The source term is computed by applying to : .
- The boundary conditions and/or are derived by evaluating and its normal derivative on .
These samples provide the analytical solution and its derivatives, which can be used for additional
supervised lossesduring training, alongside thephysics-informed loss.
(B) Class II: Unsupervised PDEs
For these samples, the analytical solution is unknown.
- The
differential operatorand domain are sampled. - The source term is set to a spatially constant random value, .
- Boundary conditions are sampled subject to constraints to maximize the probability of
well-posedness. These samples rely solely on thephysics-informed lossas reference solutions are unavailable. They are crucial for exposing the model to complexities like interior boundaries, inclusions, and discontinuities.
(C) Sampling Differential Operators
The set of all possible terms in the linear differential operators is .
- The number of terms is sampled from a
uniform discrete distribution: . - Then, terms are randomly selected from without replacement.
- Each selected term is assigned a coefficient .
- The
differential operatoris defined as .
(D) Sampling Analytical Solutions via MMS
Algorithm 1 outlines the procedure for generating random, differentiable functions that serve as analytical solutions for MMS:
- The initial solution
u(x, y)is set to 0. - The number of iterative updates is drawn from a uniform distribution between 6 and 10.
- In each update, a nonlinear function is chosen from a predefined library.
- Coefficients
a, b, c, d, eare sampled from specified uniform distributions. - The new term is then incorporated into the current solution
u(x, y)using one of three randomly chosen rules: addition, multiplication, or composition ().
(E) Sampling Physical Domains
Domains are generated using randomized Constructive Solid Geometry (CSG) [32].
- The outer boundary is initially the unit square .
- Inner boundaries are formed by subtracting randomly sampled geometric primitives (e.g., disks, polygons, rectangles) from the outer region using
CSGoperations. - In time-dependent
PDEs, can represent the initial time.
(F) Sampling Boundary Conditions
Boundary conditions (Dirichlet and Neumann) are imposed on .
- Outer Boundary : The type of
PDE(elliptic, parabolic, hyperbolic) determines the initial boundary conditions forwell-posedness.- Elliptic:
Dirichlet conditionson . - Parabolic: Initial condition at , and
Dirichlet conditionson spatial boundaries . - Hyperbolic: Initial condition at ,
Dirichlet conditionson spatial boundaries, andNeumann conditionson the initial time boundary .
- Elliptic:
- Inner Boundaries : Each inner boundary component is independently assigned either a
DirichletorNeumann condition, or both. - Value Assignment:
- Class I (Supervised): and are directly computed from the known analytical solution .
- Class II (Unsupervised): Source term is set to zero on the boundary. Boundary values are sampled to be consistent with . For example, if appears alone in , on . If first-order terms () appear alone, constant
Dirichletvalues are used. Otherwise, linear profiles are allowed. This helps to maximize the probability ofwell-posednessdespite the unknown ground truth.
4.2.4. Objective Function
For each PDE instance, the HyPINO model outputs the weights for a target PINN . The total loss function is a weighted sum of several terms:
- Residual Loss (): Measures how well the
PINN's prediction satisfies thePDEwithin the domain . $ \mathcal { I } _ { \mathrm { R } } = \frac { 1 } { | \Omega | } \sum _ { \mathbf { x } \in \Omega } \rho \left( \mathcal { L } [ u _ { \theta ^ { \star } } ] ( \mathbf { x } ) - f ( \mathbf { x } ) \right) $ Here, is the area of the domain (or number of collocation points), is the result of applying thedifferential operatorto thePINN's prediction, is the source term, and is theHuber function[20]. TheHuber functionis used to make the loss less sensitive to outliers. - Dirichlet Boundary Loss (): Penalizes deviations from the specified
Dirichlet boundary conditionson . $ \mathcal { I } _ { \mathrm { D } } = \frac { 1 } { | \partial \Omega _ { D } | } \sum _ { \mathbf { x } \in \partial \Omega _ { D } } \rho \left( u _ { \theta ^ { \star } } ( \mathbf { x } ) - g ( \mathbf { x } ) \right) $ Here, is the length of theDirichlet boundary(or number of collocation points), is thePINN's prediction on the boundary, and is the prescribedDirichletvalue. - Neumann Boundary Loss (): Penalizes deviations from the specified
Neumann boundary conditionson . $ \mathcal { I } _ { \mathrm { N } } = \frac { 1 } { | \partial \Omega _ { N } | } \sum _ { \mathbf { x } \in \partial \Omega _ { N } } \rho \left( \nabla u _ { \theta ^ { \star } } ( \mathbf { x } ) \cdot \mathbf { n } ( \mathbf { x } ) - h ( \mathbf { x } ) \right) . $ Here, is the length of theNeumann boundary(or number of collocation points), is the normal derivative of thePINN's prediction on the boundary, and is the prescribedNeumannvalue. - Sobolev Loss (): Applied only for
supervised sampleswhere the analytical solution is known. It penalizes errors in function values, gradients, and second derivatives, providing stronger supervision. $ \mathcal { I } _ { \mathrm { S } } = \frac { 1 } { | \Omega | } \sum _ { \mathbf { x } \in \Omega } \sum _ { k = 0 } ^ { 2 } \lambda _ { \mathrm { S } } ^ { ( k ) } \rho \left( \nabla ^ { k } u _ { \theta ^ { \star } } ( \mathbf { x } ) - \nabla ^ { k } u ( \mathbf { x } ) \right) . $ Here, , , and represents the second derivatives (e.g.,Hessian). are weighting coefficients for each order of derivative.
The total loss is a weighted sum of these active terms:
$
\mathcal { I } = \lambda _ { \mathrm { R } } \mathcal { I } _ { \mathrm { R } } + \lambda _ { \mathrm { D } } \mathcal { I } _ { \mathrm { D } } + \lambda _ { \mathrm { N } } \mathcal { I } _ { \mathrm { N } } + \mathcal { I } _ { \mathrm { S } } ,
$
where is always included. and are applied when collocation points fall on the respective boundaries. is active only for supervised samples.
4.2.5. Residual-Driven Iterative Refinement
HyPINO introduces an iterative refinement procedure to improve solution accuracy at inference time, similar to multi-stage neural networks [49]. This forms an ensemble of corrective PINNs.
Given a PDE instance :
- Initial Prediction: The
hypernetworkgenerates the weights for an initialPINN: $ u ^ { ( 0 ) } : = u _ { \Phi ( L , f , g , h ) } $ - Compute Residuals: The residual errors of this initial prediction are computed:
- Residual
source term: - Residual
Dirichlet conditions: - Residual
Neumann conditions:
- Residual
- Generate Corrective PINN: These residuals are then treated as a "delta
PDE" and fed back into thehypernetworkto obtain a corrective PINN : $ \delta \boldsymbol { u } ^ { ( 1 ) } : = \boldsymbol { u } _ { \Phi ( L , r _ { f } ^ { ( 0 ) } , r _ { D } ^ { ( 0 ) } , r _ { N } ^ { ( 0 ) } ) } . $ Essentially, thehypernetworklearns to solve for the error of the previous prediction. - Update Solution: The updated solution is the sum of the initial and corrective
PINNs: $ u ^ { ( 1 ) } : = u ^ { ( 0 ) } + \delta u ^ { ( 1 ) } $ - Iterative Process: This process is repeated for rounds:
$
\boldsymbol { u } ^ { ( t + 1 ) } : = \boldsymbol { u } ^ { ( t ) } + \delta \boldsymbol { u } ^ { ( t + 1 ) } , \quad \mathrm { w i t h } \quad \delta \boldsymbol { u } ^ { ( t + 1 ) } : = \boldsymbol { u } _ { \Phi ( L , r _ { f } ^ { ( t ) } , r _ { D } ^ { ( t ) } , r _ { N } ^ { ( t ) } ) } .
$
After rounds, the final solution is the sum of all contributions:
$
\boldsymbol { u } ^ { ( T ) } = \boldsymbol { u } ^ { ( 0 ) } + \sum _ { t = 1 } ^ { T } \delta \boldsymbol { u } ^ { ( t ) }
$
This model is denoted as , where is the number of refinement rounds (meaning there are
PINNsin the ensemble). During this process, only the smallPINNs(the ) are differentiated to compute residuals; thehypernetworkremains ininference mode, making the refinement computationally efficient (forward-only passes).
5. Experimental Setup
5.1. Datasets
The study evaluates HyPINO and baseline models on seven standard PDE benchmarks drawn from the PINN literature. All problems are reformulated over the canonical domain . Problems originally defined over different ranges are mapped to using affine transformations:
$
\tilde { x } = \frac { 2 ( x - a _ { x } ) } { b _ { x } - a _ { x } } - 1 , \quad \tilde { y } = \frac { 2 ( y - a _ { y } ) } { b _ { y } - a _ { y } } - 1
$
where (x, y) are the original coordinates, and define the original domain ranges, and are the normalized coordinates.
Here are the benchmark problems:
-
HT - 1D Heat Equation:
-
Equation: , with .
-
Domain: .
-
Boundary Conditions: .
-
Initial Condition: for , , .
-
Analytical Solution: .
-
Adapted from DeepXDE [32]. The following figure (Figure 5 from the original paper) shows the parameterization of the 1D Heat PDE:
该图像是示意图,展示了1D热传导方程的各个参数化结果,包括边界条件 、源项 g(x)、边界条件 、函数h(x)、源项f(x)以及解u(x)的可视化。这些图示分别对应图中的(a)至(f)部分,呈现了不同变量在空间域上的分布情况。
-
-
HZ - 2D Helmholtz Equation:
-
Equation: , with .
-
Boundary Conditions: .
-
Common Instance: , .
-
Adapted from DeepXDE [32]. The following figure (Figure 6 from the original paper) shows the parameterization of the 2D Helmholtz PDE:
该图像是参数化的2D亥姆霍兹方程示意图,包含六个部分:边界条件 rac{box[green,5px]{ extbf{D}}}{box[blue,5px]{ extbf{N}}}、函数g(x)、h(x)和源项f(x)以及解u(x)的可视化。每个部分展示了不同的参数和条件对方程解的影响。
-
-
HZ-G - Helmholtz on an Irregular Geometry:
-
Equation: , where .
-
Domain: Unit square with four circular regions removed.
-
Source Term: , with .
-
Boundary Conditions: on (outer rectangular boundary), on (boundaries of interior circles).
-
Circles : Defined by specific center coordinates and radii.
-
Adapted from PINNacle [16]. The following figure (Figure 7 from the original paper) shows the parameterization of the 2D Helmholtz-type (Poisson-Boltzmann) PDE with complex geometry:
该图像是示意图,展示了2D Helmholtz类型(Poisson-Boltzmann)偏微分方程的参数化。图中包含多个子图,分别表示不同的边界条件和源项,包括边界(子图(a))、函数g(x)(子图(b))、边界(子图(c))、函数h(x)(子图(d))、源项f(x)(子图(e))以及解u(x)(子图(f))。这些参数化展示了复杂几何形状对模型的影响。
-
-
PS-C - Poisson with Four Circular Interior Boundaries:
-
Equation: , with .
-
Domain: Rectangle with four interior circular exclusions .
-
Boundary Conditions: on , on .
-
Adapted from PINNacle [16]. The following figure (Figure 8 from the original paper) shows the parameterization of the 2D Poisson PDE with circular inner boundaries:
该图像是示意图,展示了二维泊松 PDE 的参数化,包含了多个部分。图 (a) 表示边界条件 ,图 (b) 显示了函数 g(x),图 (c) 表示边界条件 ,图 (d) 展示了函数h(x),图 (e) 代表函数f(x),而图 (f) 显示了解u(x)的分布。这些部分共同描述了 PDE 的特性和边界条件。
-
-
PS-L - Poisson on an L-shaped Domain:
-
Equation: , with .
-
Domain: L-shaped region .
-
Boundary Conditions: on .
-
Adapted from DeepXDE [32]. The following figure (Figure 9 from the original paper) shows the parameterization of the 2D Poisson PDE on an L-shaped domain:
该图像是图表,展示了二维泊松偏微分方程在L形区域的参数化,包含边界条件 rac{ heta}{ heta D}、g(x)、rac{ heta}{ heta N}、h(x)、f(x)和u(x)的不同表现。
-
-
PS-G - Poisson with a Gaussian Vorticity Field:
-
Equation: , with .
-
Boundary Conditions: (homogeneous
Dirichlet) on . -
Source Term: , where , , and .
-
A sample from the dataset introduced in [19]. The following figure (Figure 10 from the original paper) shows the parameterization of the 2D Poisson PDE with Gaussian superposition vorticity field:
该图像是插图,展示了二维泊松方程的不同参数化结果,包括边界条件和源项。图(a)展示了域边界,图(b)为函数g(x),图(c)为边界,图(d)为函数h(x),图(e)为源项f(x),图(f)为方程解u(x)。各个子图通过颜色深浅显示对应参数的数值分布。
-
-
WV - 1D Wave Equation:
-
Equation: , with .
-
Boundary Conditions: .
-
Initial Conditions: , .
-
Analytical Solution: .
-
Adapted from PINNacle [16]. The following figure (Figure 11 from the original paper) shows the parameterization of the 1D Wave PDE:
该图像是图表,展示了1D波动偏微分方程的参数化。其中包含六个子图:(a) 表示边界,(b) 表示函数g(x),(c) 表示边界,(d) 表示函数h(x),(e) 表示源项f(x),(f) 表示解u(x)。每个子图通过色彩梯度反映相应参数的数值变化。
-
5.2. Evaluation Metrics
The performance of the models is primarily evaluated using Mean Squared Error (MSE) and Symmetric Mean Absolute Percentage Error (SMAPE).
-
Mean Squared Error (MSE):
- Conceptual Definition:
MSEis a commonly used metric to quantify the average of the squares of the errors or deviations, i.e., the difference between the estimated values and the actual values. It measures the average magnitude of the errors, with larger errors having a disproportionately larger impact due to squaring. It is effective for assessing the absolute accuracy of predictions. - Mathematical Formula: $ \mathrm{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2 $
- Symbol Explanation:
- : The total number of data points (or spatial/temporal points in the solution grid).
- : The actual (ground truth) value of the solution at point .
- : The predicted value of the solution at point .
- Conceptual Definition:
-
Symmetric Mean Absolute Percentage Error (SMAPE):
- Conceptual Definition:
SMAPEis a measure of prediction accuracy used in statistics, particularly for forecasting. It is an extension ofMean Absolute Percentage Error (MAPE)that addressesMAPE's asymmetry (it can be infinite or undefined if the actual value is zero) by making the denominator symmetric.SMAPEexpresses error as a percentage, which is often easier to interpret thanMSEfor understanding relative prediction quality. - Mathematical Formula: $ \mathrm{SMAPE} = \frac{100%}{N} \sum_{t=1}^{N} \frac{|A_t - F_t|}{(|A_t| + |F_t|)/2} $
- Symbol Explanation:
- : The total number of data points.
- : The actual value at time (or point) .
- : The forecast (predicted) value at time (or point) .
- : The absolute error between the actual and predicted values.
- : The average of the absolute actual and predicted values, used for normalization.
- Conceptual Definition:
5.3. Baselines
The performance of HyPINO is compared against three baseline models:
-
U-Net [41]:
- Description: A
convolutional encoder-decoder networkknown for its effectiveness in image segmentation tasks. In this context, it sharesHyPINO's encoder architecture for processingPDEinputs but replaces thehypernetwork decoderwith aconvolutional decoder. This decoder directly outputs a solution grid, matching the resolution of the input tensors. - Training: Trained exclusively on
supervised data(PDEs with analytical solutions). - Parameters: 62 million (
62M) trainable parameters. - Configuration: Trained for 30,000 batches with a batch size of 128 and an initial
learning rateof .
- Description: A
-
Poseidon [19]:
- Description: A large,
pretrained neural operatormodel. The paper uses thePoseidon-Bcheckpoint. It is adapted to this study'sPDE parameterizationby adjusting the dimensionality of its embedding andlead-time-conditioned layer normalization layers(originally designed for 1D time input) to match the size of the 5-dimensionalPDE operatorvector. - Training:
Fine-tunedonly onsupervised data. - Parameters: Approximately 158 million (
158M) parameters. - Configuration: Trained for 30,000 batches with a batch size of 128 and an initial
learning rateof .
- Description: A large,
-
PINO (Physics-Informed Neural Operator) [29]:
- Description: A
Fourier neural operator (FNO)[28] architecture, which learns solution operators inFourier space. It is adapted to accept 5-channel grid inputs (for thePDEcomponents) and to condition on thePDE operatorusingFiLM layers, similar toHyPINO. - Training: Trained using the same
hybrid supervisionand curriculum asHyPINO, including bothphysics-informed lossesandsupervised losses. - Parameters: 33 million (
33M) parameters. - Configuration: Trained for 30,000 batches with a batch size of 128 and an initial
learning rateof .
- Description: A
5.4. Training
The HyPINO model generates weights for a target PINN configured with three hidden layers, each containing 32 hidden units. The entire HyPINO model comprises 77 million (77M) trainable parameters. Training is conducted on 4 NVIDIA RTX 4090 GPUs.
The training process is divided into two distinct phases:
- Phase 1 (Initial 10,000 batches):
- Data: All samples are
supervised, meaning they come with known analytical solutions generated viaMMS. - Loss Weights:
- (Residual loss)
- (Sobolev loss for function values)
- (Sobolev loss for first derivatives/gradients)
- (Sobolev loss for second derivatives)
- (Dirichlet boundary loss)
- (Neumann boundary loss)
- Data: All samples are
- Phase 2 (Remaining 20,000 batches):
- Data: Each batch consists of 50%
supervised samples(with analytical solutions) and 50%unsupervised samples(without analytical solutions, relying solely onphysics-informed losses). - Loss Weights:
-
(Residual loss)
-
(Sobolev loss for function values)
-
(Sobolev loss for first derivatives/gradients)
-
(Sobolev loss for second derivatives)
-
(Dirichlet boundary loss)
-
(Neumann boundary loss)
The
AdamW optimizeris used for training, with acosine learning rate schedulethat decays thelearning ratefrom an initial down to . Thebatch sizeis fixed at 128 for all experiments.
-
- Data: Each batch consists of 50%
6. Results & Analysis
6.1. Core Results Analysis
The experimental results demonstrate HyPINO's strong zero-shot generalization capabilities and the effectiveness of its iterative refinement procedure.
The following are the results from Table 1 of the original paper:
| HT | HZ | HZ-G | PS-C | PS-L | PS-G | WV | |
|---|---|---|---|---|---|---|---|
| U-Net | 3.5e-2 / 67 | 3.7e-2 / 68 | 6.9e-2 / 68 | 2.7e-2 / 33 | 3.9e-3 / 112 | 9.2e-1 / 159 | 3.7e-1 / 144 |
| Poseidon | 7.1e-2 / 47 | 3.3e-3 / 28 | 1.3e-1 / 65 | 5.3e-2 / 93 | 3.5e-3 / 111 | 7.2e-1 / 155 | 8.7e-1 / 138 |
| PINO | 1.4e-2 / 38 | 2.0e-2 / 51 | 6.1e-2 / 60 | 1.7e-1 / 65 | 3.3e-3 / 51 | 3.1e-1 / 70 | 3.0e-1 / 149 |
| PINO3 | 1.3e-2 / 47 | 7.2e-3 / 48 | 4.6e-2 / 64 | 2.8e-2 / 63 | 4.6e-3 / 62 | 2.3e-2 / 43 | 3.1e-1 / 127 |
| PINO10 | 3.9e-2 / 78 | 5.1e-3 / 39 | 1.4e-1 / 75 | 1.1e-2 / 48 | 1.0e-3 / 47 | 1.8e-2 / 38 | 8.5e-1 / 139 |
| HyPINO | 2.3e-2 / 42 | 5.7e-3 / 36 | 1.3e-1 / 64 | 5.6e-2 / 86 | 1.7e-4 / 39 | 1.8e-1 / 61 | 2.9e-1 / 150 |
| HyPINO3 | 4.9e-4 / 11 | 2.7e-3 / 31 | 1.6e-2 / 38 | 3.4e-3 / 18 | 1.9e-4 / 36 | 6.6e-3 / 25 | 2.3e-1 / 134 |
| HyPINO10 | 8.0e-5 / 7 | 1.6e-3 / 22 | 1.9e-2 / 40 | 2.3e-3 / 15 | 2.7e-4 / 40 | 5.0e-3 / 24 | 1.2e-1 / 96 |
Analysis of Zero-Shot Performance (HyPINO vs. Baselines):
HyPINO(without refinement) shows consistently strong results, achieving an average rank of 2.00 across all tasks, outperformingU-Net(3.00),Poseidon(2.86), andPINO(2.14). This is particularly noteworthy given thatHyPINOoperates in a significantly less structured output space (generatingPINNweights) compared to the grid-based outputs of the baselines.- The results generally support the idea that models trained with
physics-informed objectives(PINO,HyPINO) tend to outperform those relying solely onsupervised data(U-Net,Poseidon). This highlights the benefits of embedding physical laws for better generalization, especially when training data might not perfectly cover the target tasks. HyPINOachieves the lowestMSEonPS-L(1.7e-4) among all base models, indicating its strong performance on specific geometries.
Analysis of Iterative Refinement:
-
The
iterative refinement approach(denoted and ) significantly boosts performance. -
After just three refinement iterations (), there are substantial reductions in prediction error across all but one benchmark. For instance,
MSEforPS-CandPS-Gdecreases by over an order of magnitude, and forHT, it decreases by almost two orders of magnitude (from 2.3e-2 to 4.9e-4). -
With ten refinement iterations (), the model achieves state-of-the-art performance on five out of seven benchmarks. It outperforms the best baseline models by factors ranging from 2.1 (on
HZagainstPoseidon) to 173 (HTagainstPINO). -
The refinement mechanism is shown to be generic, as and also demonstrate improved performance, though generally less pronounced than
HyPINO's. -
The degradation on
PS-Lfor higher refinement steps ( is 2.7e-4, slightly worse thanHyPINO's 1.7e-4) is attributed to its already low initial error and small solution magnitudes, possibly leading correction terms to fall outside the training distribution.The paper hypothesizes that
iterative refinementworks by correcting systematic biases introduced during training on synthetic data. These consistent errors can be systematically addressed in subsequent iterations, leading to more effective ensembles than simply combining independentPINNs.
The following figure (Figure 3 from the original paper) illustrates these trends, showing mean squared error and relative error as functions of the number of refinement iterations.
该图像是一个图表,展示了 HyPINO 在不同的精细化迭代下的均方误差(MSE)和相对误差。左图显示了不同方法在多个迭代轮次中的 MSE 变化,右图表现了相对误差 随迭代轮次的变化。不同颜色的曲线代表不同的方法。
As seen in the figure, MSE (left panel) generally decreases with more refinement iterations, and the relative error (right panel, ) consistently reduces for most benchmarks, highlighting the progressive improvement.
The following are the results from Table 2 of the original paper, visually comparing predictions and errors.
该图像是图表,展示了HyPINO在多个基准 PDE 上的预测结果和误差。每个子图分别表示不同场景下的参考值、预测值及其差异,最后一列为不同迭代轮次后的结果。整体上展示了模型在零、三、十次精炼后的效果。
This table visually confirms the improvement from iterative refinement, showing how the prediction (Prediction) approaches the ground truth (Reference) and how the difference (Diff) diminishes with more refinement rounds. For the challenging WV benchmark, the model effectively extends the undulating shape further across the time dimension, as shown in the last row of Table 2.
6.2. Ablation Studies / Parameter Analysis
Resolution Invariance Ablation
An ablation study was performed on the Helmholtz benchmark (HZ) to assess the resolution invariance of HyPINO. Although HyPINO's output is a continuous PINN, its input PDE parameterization is discretized on a fixed grid. The study varied the input source function resolution from 28 to 448 and then resized it to for processing.
The following are the results from Table 3 of the original paper:
| 28 | 56 | 96 | 112 | 140 | 168 | |
|---|---|---|---|---|---|---|
| SMAPE | 38.04 | 35.78 | 35.91 | 36.00 | 36.05 | 36.05 |
| 196 | 224 | 280 | 336 | 392 | 448 | |
| SMAPE | 36.05 | 36.04 | 36.05 | 36.03 | 36.04 | 36.04 |
Analysis: The SMAPE values vary by less than 0.3 between resolutions of 56 and 448, indicating approximate resolution invariance. Performance only starts to deteriorate at very coarse resolutions (). This suggests that HyPINO can effectively handle inputs from various resolutions, provided they are not excessively coarse.
Fine-tuning Behavior with Different Initializations
The paper investigates the utility of HyPINO-generated PINN parameters as an initialization for fine-tuning on specific PDE instances. It compares three initialization strategies:
-
HyPINO-initializedPINNs. -
Randomly initialized
PINNs. -
PINNsinitialized viaReptile meta-learning[37]. (Reptilewas trained on the synthetic dataset with 10,000 outer-loop and 1,000 inner-loop cycles).PINN fine-tuningis performed over 10,000 steps using theAdam optimizer, with alearning ratestarting at and decaying to via acosine schedule.
The following figure (Figure 4 from the original paper) shows convergence results on the 1D Heat Equation (HT) benchmark:
该图像是一个图表,展示了在1D热方程(HT)中,随机初始化的PINNs(蓝色)、Reptile初始化的PINNs(橙色)和HyPINO初始化的PINNs(绿色)在不同迭代下的均方误差(MSE)收敛情况。该图表显示了HyPINO的收敛速度显著优于其他方法。
As shown in the figure, HyPINO-initialized PINNs (green line) consistently start with a lower loss and converge faster to a lower final error compared to randomly initialized PINNs (blue line) and Reptile-initialized PINNs (orange line).
The following figures (Figure 13 and Figure 14 from the original paper) show convergence results across all other benchmarks and ensemble comparisons:
该图像是一个图表,展示了在不同基准 PDE 问题上,PINNs 在微调过程中的收敛情况。图中比较了不同的集成大小,包括单个 PINN、大小为 4 的集成和大小为 11 的集成,分别用不同的颜色表示,其中 HyPINO 初始化的 PINN 经过 次精炼轮次后的集合表现出更优的收敛特性。
该图像是图表,展示了不同初始化方法对PINNs收敛性的影响。共有六个子图,分别呈现了在各种基准问题(如二维泊松方程和一维波动方程)上,不同规模的PINN集成(单个PINN、4个PINN和11个PINN)的均方根误差(MSE)与迭代次数的关系。每个图中,蓝色线表示随机初始化的PINN,橙色线为通过Reptile初始化的PINN,绿色线为通过HyPINO初始化后的PINN。
Analysis of Fine-tuning Performance:
-
HyPINO-initializedPINNsconsistently start with lower loss and converge to lower final error on 4 out of 7 benchmarks. They perform on par with baselines on two benchmarks and underperform on only one. -
Quantitatively, a randomly initialized
PINNrequires an average of 1,068 steps to reach the initialMSEof aHyPINO-initialized model. -
For ensembles, matching the
MSEof and requires an average of 1,617 and 1,772 steps, respectively, for randomly initialized ensembles. -
Reptile-initializedPINNsconverge rapidly initially (first 1,000 steps) due to theirmeta-trainingconfiguration but tend toplateau earlierand converge to higher final errors thanHyPINOinitializations.These findings strongly suggest that
HyPINOoffers a robust initialization strategy for trainingPINNs, leading to faster and more accuratefine-tuning.
L-BFGS Fine-Tuning with Different Initializations
To further validate the effectiveness of HyPINO initializations, additional fine-tuning experiments were conducted using the L-BFGS optimizer, a second-order optimization method.
The following are the results from Table 4 of the original paper, showing iterations required to match the initial MSE of a HyPINO-initialized PINN with L-BFGS:
| HT | HZ | HZ-G | PS-C | PS-L | PS-G | WV | |
|---|---|---|---|---|---|---|---|
| Random Init | 4 | 20 | N/A | 36 | 34 | 11 | 35 |
| Reptile Init | 4 | 22 | 211 | 22 | 65 | 9 | 27 |
Analysis: HyPINO initialization allows L-BFGS to reach its initial accuracy much faster. For instance, Random Init takes 36 steps on PS-C and 34 on PS-L to reach HyPINO's starting error, while Reptile Init takes 22 and 65 steps, respectively. For HZ-G, Random Init never reaches HyPINO's initial accuracy, while Reptile Init takes 211 steps. This confirms that HyPINO provides a significantly better starting point for fine-tuning, even with second-order optimizers.
The following are the results from Table 5 of the original paper, showing final MSE after L-BFGS fine-tuning:
| HT | HZ | HZ-G | PS-C | PS-L | PS-G | WV | |
|---|---|---|---|---|---|---|---|
| Random Init | 2.93e-9 | 1.15e-7 | 2.89e-1 | 3.18e-4 | 7.05e-5 | 5.69e-4 | 2.68e-2 |
| Reptile Init | 2.69e-9 | 2.18e-7 | 3.55e-2 | 9.34e-4 | 8.66e-5 | 5.68e-4 | 3.80e-4 |
| HyPINO Init | 1.62e-9 | 1.52e-7 | 1.74e-2 | 8.19e-5 | 6.87e-5 | 5.69e-4 | 1.94e-2 |
Analysis: HyPINO initializations continue to be effective with L-BFGS. HyPINO achieves the lowest final MSE on four benchmarks (HT, PS-C, PS-L, HZ-G) and is competitive on PS-G. Only on WV does Reptile achieve the best result, and on HZ, Random Init slightly outperforms HyPINO. These differences are significant given the high computational cost of L-BFGS iterations, implying that a good initialization is crucial for efficient and effective optimization.
6.3. Additional Visualizations
The following figure (Figure 12 from the original paper) shows the visual progression of iterative refinement across different samples.
该图像是插图,展示了迭代改进过程的视觉进展。每一行包括四个部分: (a) HyPINO 预测 ,(b) 第一次改进 ,(c) 第二次改进 ,(d) 最终预测 ,以及 (e) 真实值。图中展示了不同样本的逐步改进过程。
This visualization provides qualitative evidence for the effectiveness of the iterative refinement procedure, showing how the corrections (, ) progressively improve the initial HyPINO prediction () to match the ground truth ((e)).
7. Conclusion & Reflections
7.1. Conclusion Summary
The paper introduces HyPINO, a novel multi-physics neural operator that excels in zero-shot generalization across a broad spectrum of Partial Differential Equations (PDEs). Its key innovation lies in combining a Swin Transformer-based hypernetwork with a mixed supervision strategy. This strategy utilizes both labeled data derived from the Method of Manufactured Solutions (MMS) and unlabeled samples optimized through physics-informed objectives. HyPINO can handle linear elliptic, hyperbolic, and parabolic PDEs in two dimensions, accommodating variations in operators, source terms, complex geometries (including interior boundaries), and mixed boundary conditions. The model consistently outperforms existing baselines like U-Nets, Poseidon, and PINO in zero-shot accuracy on diverse benchmarks.
Furthermore, HyPINO introduces an iterative refinement procedure that significantly enhances prediction accuracy by treating residual errors as "delta PDEs" and generating corrective PINNs. This ensemble-based approach drastically reduces L2 loss (over 100x improvement in some cases) while retaining forward-only inference. Finally, HyPINO-generated PINN parameters serve as superior initializations for fine-tuning, leading to faster convergence and lower final errors compared to random or Reptile-meta-learned initializations.
7.2. Limitations & Future Work
The authors acknowledge several limitations of the current HyPINO implementation:
-
Scope of PDEs: It is currently restricted to
linear 2D PDEswithspatially uniform coefficients. This narrows the class ofPDEsit can address, as many real-world phenomena involvenonlineareffects,spatially varyingproperties, orhigher dimensions. -
Increased Complexity: Extending the framework to more complex
PDEswill likely necessitate increasedmodel capacity, either by scaling the architecture or improving thetarget networks'parameter generation process.Based on these limitations, the authors suggest several directions for future work:
-
Increased Input Dimensionality: Extending
HyPINOto higher-dimensionalPDEs. -
Spatially Varying Coefficients: Incorporating
PDEswhere coefficients are not uniform across the domain. -
Nonlinear PDEs: Adapting the framework to handle
nonlinear PDEs, which are significantly more challenging. -
Coupled Systems: Modeling
coupled systemsofPDEs, which are common inmulti-physicsscenarios.They believe some extensions might be achievable with modest modifications to the data generation, input encoding, or training processes, while others will require more substantial architectural enhancements.
7.3. Personal Insights & Critique
HyPINO represents a significant step forward in the quest for generalized PDE solvers using neural operators. The clever combination of a Swin Transformer hypernetwork, MMS-based supervised data, and physics-informed self-supervision for a diverse synthetic dataset is particularly insightful. This hybrid data strategy is crucial for overcoming the data inefficiency of pure neural operators and the stability issues of pure physics-informed approaches. The ability to generalize zero-shot across different PDE types, geometries, and boundary conditions simultaneously is a major achievement, pushing beyond the narrower scope of many prior works.
The iterative refinement procedure is an elegant solution to improve accuracy without incurring the heavy computational cost of backpropagation during inference or extensive fine-tuning. This forward-only ensemble approach is practical and demonstrates a clear path to boosting performance at deployment. The robustness of HyPINO's initialization for fine-tuning also highlights its potential as a foundational model component.
However, some aspects warrant further consideration:
-
Scalability to True Nonlinearity and High Dimensions: While the paper suggests extensibility,
nonlinear PDEsandhigh-dimensional problemsintroduce significant challenges (e.g., stiffness, turbulence, complex interactions) that may not be easily overcome by simply scaling the architecture or refining data generation. Thespectral biasandmode collapseissues, though mitigated byFourier features, can re-emerge in more complex scenarios. -
Complexity of MMS Data Generation: While
MMSprovides ground truth, generating truly representative and diverse analytical solutions for complexmulti-physics,nonlinear PDEsin higher dimensions could become prohibitively complex. The currentAlgorithm 1forMMSis already quite intricate for 2D linearPDEs. -
Interpretability of Hypernetwork Outputs: The
hypernetworkoutputs a large set ofPINNweights. While effective, understanding why certainPINNparameters are generated for a givenPDEremains largely a black box. Better interpretability could lead to more robust designs and debugging. -
Computational Cost of Hypernetwork Training: Training a
77M-parameterhypernetworkon 4RTX 4090 GPUsfor 30,000 batches is substantial. Scaling this to more complexPDEsmight require even larger models and more resources, which could limit accessibility for researchers without high-end computing. -
Generalization to Truly Unseen Physics: The current benchmarks, while diverse, are drawn from the
PINN literature. It would be interesting to see howHyPINOperforms onPDEswith physical phenomena qualitatively different from those seen during training, rather than just quantitative variations within known types.Overall,
HyPINOoffers a practical and powerful framework for advancingneural operatorstowardsmulti-physicsandfoundation modelcapabilities. Its strengths inzero-shot generalizationand efficientrefinementsuggest a promising direction forAI-driven scientific computing, providing a solid foundation for tackling increasingly complexPDEchallenges.
Similar papers
Recommended via semantic vector search.