Restora-Flow: Mask-Guided Image Restoration with Flow Matching
TL;DR Summary
Restora-Flow is a novel training-free image restoration method that utilizes flow matching guided by a degradation mask, incorporating trajectory correction. It shows superior perceptual quality and processing time compared to existing diffusion and flow matching methods across v
Abstract
Flow matching has emerged as a promising generative approach that addresses the lengthy sampling times associated with state-of-the-art diffusion models and enables a more flexible trajectory design, while maintaining high-quality image generation. This capability makes it suitable as a generative prior for image restoration tasks. Although current methods leveraging flow models have shown promising results in restoration, some still suffer from long processing times or produce over-smoothed results. To address these challenges, we introduce Restora-Flow, a training-free method that guides flow matching sampling by a degradation mask and incorporates a trajectory correction mechanism to enforce consistency with degraded inputs. We evaluate our approach on both natural and medical datasets across several image restoration tasks involving a mask-based degradation, i.e., inpainting, super-resolution and denoising. We show superior perceptual quality and processing time compared to diffusion and flow matching-based reference methods.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
The central topic of this paper is Restora-Flow, a novel method for mask-guided image restoration using Flow Matching.
1.2. Authors
The authors of the paper are:
-
Arnela Hadzic
-
Franz Thaler
-
Lea Bogensperger
-
Simon Johannes Joham
-
Martin Urschler
Their affiliations indicate a strong presence in medical informatics and related fields:
-
Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria (Arnela Hadzic, Franz Thaler, Simon Johannes Joham, Martin Urschler)
-
Division of Medical Physics and Biophysics, Medical University of Graz, Graz, Austria (Franz Thaler)
-
nstiu Visualui GraUnivrsiy Tecolgy, Gra Aus (Franz Thaler) - Note: This appears to be a transcription error in the original paper's appendix, likely referring to a university or institute in Graz.
-
Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland (Lea Bogensperger)
1.3. Journal/Conference
The paper is published at (UTC): 2025-11-25T10:22:26.000Z. While the specific journal or conference is not explicitly mentioned in the provided text, the arxiv link suggests it is currently a preprint, often submitted to major conferences or journals in machine learning or computer vision (e.g., NeurIPS, ICLR, CVPR, ECCV, MICCAI) for peer review and eventual publication. Given the early 2025 publication date, it's likely targeting a 2025 or early 2026 conference cycle. These venues are highly reputable and influential in the field of generative models and image processing.
1.4. Publication Year
The publication year is 2025.
1.5. Abstract
The paper introduces Restora-Flow, a training-free method for mask-guided image restoration using Flow Matching. It leverages Flow Matching as a generative prior, which is a promising approach for high-quality image generation with faster sampling times and more flexible trajectory design compared to diffusion models. The core idea of Restora-Flow is to guide Flow Matching sampling using a degradation mask and incorporate a trajectory correction mechanism to ensure consistency with degraded inputs. The method is evaluated on both natural and medical datasets across various mask-based image restoration tasks, including inpainting, super-resolution, and denoising. The results demonstrate superior perceptual quality and processing time compared to existing diffusion and Flow Matching-based reference methods.
1.6. Original Source Link
The original source link is: https://arxiv.org/abs/2511.20152.
The PDF link is: https://arxiv.org/pdf/2511.20152v2.pdf.
This paper is currently a preprint on arXiv, indicating it is publicly available but may not have undergone formal peer review or official publication yet.
2. Executive Summary
2.1. Background & Motivation
The core problem the paper addresses is image restoration, which involves recovering an original image from its degraded observation. Many critical restoration tasks, such as denoising, super-resolution, and inpainting, can be framed as inverse problems involving mask-based degradation. The objective is to produce a restored image that not only possesses high visual quality but also maintains fidelity to the degraded input.
Prior research has widely utilized diffusion models as powerful generative priors for these tasks, demonstrating success in guiding restoration. However, a significant limitation of diffusion models is their long sampling times due to highly curved sampling trajectories and challenges posed by intermediate noisy steps. More recently, Flow Matching (FM) has emerged as an alternative generative modeling approach, known for its significantly straighter trajectories and faster training and sampling times while maintaining high-quality image generation. Although existing flow-based methods show promise in restoration, they still face challenges such as relatively long processing times, over-smoothed results, or the introduction of artifacts.
The paper's entry point and innovative idea is to leverage the advantages of Flow Matching to overcome these limitations in image restoration. Specifically, it aims to develop a training-free method that combines mask-guided sampling with a novel trajectory correction mechanism to enforce consistency with the degraded input, thereby achieving both high restoration quality and fast processing times.
2.2. Main Contributions / Findings
The paper's primary contributions are:
-
Introduction of
Restora-Flow: Atraining-free algorithmspecifically designed formask-based inverse problems. It utilizesunconditional flow prior modelsandmask-guided fusionduringFlow Matching sampling. -
Novel
Trajectory Correction Mechanism: To enhance the fidelity of the restoration process, a new correction mechanism is introduced that guides the flow trajectory towards better alignment with observed data. This mechanism refines samples by extrapolating towards the clean image and reintroducing noise, improving consistency between restored and known regions. -
Comprehensive Evaluation and State-of-the-Art Performance: The method is rigorously evaluated on both computer vision (natural) and medical datasets across various settings for
inpainting,super-resolution, anddenoisingtasks. The findings demonstrate thatRestora-Flowachievessuperior perceptual quality(lowerLPIPS) andfaster processing timescompared to relateddiffusionandFlow Matching-based approaches. Notably, it often achieves state-of-the-art results when jointly considering reconstruction quality and processing time. -
Simplicity in Hyperparameter Tuning:
Restora-Flowrequires only one hyperparameter to optimize (the number ofODEsteps), as the number of corrections () is fixed across all experiments and datasets, simplifying its application.The key conclusions are that
Restora-Floweffectively addresses the trade-off between restoration quality and processing time inmask-based image restoration. By intelligently guidingFlow Matchingwith a mask and a trajectory correction, it overcomes the limitations of previous generative prior-based methods, offering a robust, fast, and high-quality solution.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
3.1.1. Image Restoration
Image restoration is the process of estimating an original, clean image from a degraded observation . The degradation can be due to various factors like noise, blur, missing pixels, or low resolution. This process is often formulated as an inverse problem, where the observed degraded image is related to the original image by a degradation operator and additive noise : . The goal is to "invert" this process to recover .
3.1.2. Generative Models
Generative models are a class of artificial intelligence models that learn the underlying distribution of a dataset and can then generate new samples that resemble the training data. For image restoration, they are used as "priors" to guide the restoration process, ensuring that the recovered images are realistic and consistent with the learned data distribution. Two prominent types discussed in this paper are Diffusion Models and Flow Matching.
3.1.3. Diffusion Models
Diffusion Models (also known as Denoising Diffusion Probabilistic Models or DDPMs) are a class of generative models that work by gradually adding noise to data (forward diffusion process) and then learning to reverse this process (reverse denoising process) to generate new data. They define a sequence of latent variables where is the original data and is pure noise. The model learns to predict the noise added at each step to reverse the diffusion process and generate a clean image from noise.
While powerful, a significant limitation of diffusion models is their long sampling time because they typically require many sequential steps (e.g., hundreds or thousands) to denoise from pure noise back to a clear image, leading to highly curved sampling trajectories.
3.1.4. Flow Matching (FM)
Flow Matching (FM) is a more recent generative modeling approach that addresses some limitations of diffusion models. Instead of defining discrete noise steps, Flow Matching learns a continuous-time vector field (or velocity field) that transports samples from a simple base distribution (e.g., Gaussian noise) at time to the complex target data distribution at time . This transformation is governed by an Ordinary Differential Equation (ODE).
The key advantage of Flow Matching is that it learns significantly straighter trajectories between the noise and data distributions, enabling faster and more efficient sampling. This makes it a promising alternative to diffusion models for tasks requiring high-quality generation with reduced computational cost.
3.1.5. Ordinary Differential Equation (ODE)
An Ordinary Differential Equation (ODE) is a mathematical equation that relates a function with its derivatives. In the context of Flow Matching, the ODE describes how a sample evolves over continuous time under the influence of the learned velocity field . Integrating this ODE allows samples to be generated from noise to data.
3.1.6. Mask-Based Degradation
Mask-based degradation refers to image degradation scenarios where a binary mask explicitly indicates known (unmasked) and unknown (masked) regions of an image.
- Inpainting: Filling in missing or corrupted parts of an image (unknown regions indicated by the mask).
- Super-resolution: Enhancing the resolution of a low-resolution image. This can be viewed as a mask-based problem if the degradation operator involves downsampling, effectively "masking" the high-frequency information.
- Denoising: Removing noise from an image. While not always explicitly mask-based, the paper proposes a time-dependent mask for denoising to control the influence of the noisy input.
- Occlusion Removal: A specific type of inpainting where objects
occlude(cover) parts of an image, and the task is to remove these occlusions and fill in the background.
3.1.7. Maximum A Posteriori (MAP) Estimation
Maximum A Posteriori (MAP) estimation is a statistical method for estimating an unknown quantity (e.g., the original image ) based on observed data (e.g., the degraded image ) and prior knowledge about the quantity. It seeks to find the value of that maximizes the posterior probability , which can be expressed as:
$
\hat{x} = \arg \max_x P(x|z) = \arg \max_x P(z|x) P(x)
$
In image restoration, this is often formulated as minimizing a cost function:
$
\hat{x} = \arg \min_x \mathcal{D}(Hx, z) + \mathcal{R}_\theta(x)
$
Here, is the data fidelity term, which measures how well the restored image Hx (after applying the degradation operator ) matches the observed degraded image . is the prior term (or regularization term), which encodes learned knowledge about the properties of natural (or medical) images, parameterized by .
3.1.8. Perception-Distortion Trade-off
The perception-distortion trade-off [3] refers to a fundamental challenge in image quality assessment and generative modeling. It states that it is generally impossible to simultaneously optimize for both distortion metrics (which measure pixel-wise accuracy, like PSNR and SSIM) and perceptual metrics (which measure visual realism and human-like perception, like LPIPS). Models that produce highly realistic, perceptually pleasing images might have higher pixel-wise differences from the ground truth, while models that minimize pixel-wise errors might appear overly smooth or lack realistic textures. The paper acknowledges this trade-off by using both types of metrics.
3.2. Previous Works
3.2.1. Direct Mapping Methods
Traditional deep learning approaches for image restoration often learn a direct mapping from degraded images to clean images by minimizing a reconstruction loss. Examples include SRCNN for super-resolution [9] or DnCNN for denoising [38]. These methods typically require large datasets of paired degraded and clean images and need to be retrained for each new task (e.g., a different degradation level or type).
3.2.2. Diffusion-Based Prior Methods
To overcome the limitations of paired datasets and task-specific retraining, deep generative priors, particularly diffusion models, have been widely adopted for inverse problems. These methods guide the generative process to ensure consistency with degraded observations.
-
DDRM(Denoising Diffusion Restoration Models) [18]: Tackles linear inverse problems by employingsingular value decompositionof the degradation operator . -
(Denoising Diffusion Null-Space Model) [34]: Addresses inverse problems in a
zero-shot mannerby utilizingrange-null space decompositionas a guidance function, meaning it can solve problems without task-specific training. -
RePaint[23]: Focuses oninpaintingtasks by usingunmasked regionsto guide thediffusion process. It iterativelyresamplesunknown regions while keeping known regions fixed or slightly perturbed. -
(Pseudoinverse-Guided Diffusion Models) [29]: Incorporates a
vector-Jacobian productas additional guidance to ensure consistency between denoising results and degraded measurements. -
RED-Diff[24]: Formulates image restoration as anoptimization problem, minimizing ameasurement consistency losswhile applyingscore-matching regularizationfrom the diffusion model.A common limitation for many diffusion-based methods is the
long sampling timerequired to generate high-quality images.
3.2.3. Flow Matching-Based Prior Methods
More recently, Flow Matching has gained attention as a prior for image restoration due to its potential for faster sampling.
OT-ODE(Optimal Transport ODE) [26]: Incorporates agradient correction term(similar to ) to guide the flow-based generation process. It demonstrated superior perceptual quality compared to diffusion paths.D-Flow[2]: Formulates the restoration as asource point optimization problem, minimizing a cost function associated with the initial point in theFlow Matching framework. However, this requiresbackpropagation through an ODE solver, leading torelatively long sampling times(reported as 5 to 15 minutes per sample).Flow-Priors[40]: Decomposes the flow's trajectory intoseveral local objectivesand usesTweedie's formula[11] to sequentially optimize these objectives throughgradient steps. This results inreduced sampling timecompared toD-Flow.PnP-Flow(Plug-and-Play Flow) [25]: CombinesPlug-and-Play (PnP)methods [33] withFlow Matchingwithout requiringbackpropagation.PnPmethods iteratively alternate between a data fidelity step (e.g., enforcing consistency with observations) and a denoising step (using a pre-trained denoiser, which in this case is the flow model). While faster,PnP-Flowoften tends to produceover-smoothed results.
3.3. Technological Evolution
The field of image restoration has evolved from traditional signal processing techniques (e.g., Wiener filtering) to optimization-based methods with handcrafted priors, and then to data-driven deep learning methods. Initially, deep learning focused on direct mapping using Convolutional Neural Networks (CNNs). The next major leap involved leveraging deep generative priors, first with Generative Adversarial Networks (GANs) and then more prominently with Diffusion Models. Diffusion Models set a new standard for image generation quality but were computationally expensive during sampling. Flow Matching emerged as a successor, aiming to achieve comparable or superior generation quality with significantly faster sampling by learning straighter trajectories.
Restora-Flow fits into this timeline by building upon the latest advancements in Flow Matching. It further refines Flow Matching for inverse problems by introducing a training-free mask-guided sampling approach combined with a trajectory correction mechanism, specifically targeting the limitations of existing flow-based methods (speed and over-smoothing) while maintaining high perceptual quality.
3.4. Differentiation Analysis
Compared to the main methods in related work, Restora-Flow differentiates itself primarily through:
- Training-Free Nature: Similar to some
diffusionandflow-basedmethods,Restora-Flowoperatestraining-freeby using anunconditional flow prior model. This means the generative model does not need to be fine-tuned for specific restoration tasks, making it versatile. - Mask-Guided Fusion: It explicitly incorporates
mask-guidance(inspired byRePaintfordiffusion models) into theFlow Matchingsampling process. This ensures that known regions from the degraded input are preserved. - Novel Trajectory Correction Mechanism: This is a key innovation. Unlike
OT-ODEwhich uses gradient correction, orD-FlowandFlow-Priorswhich rely on optimization andTweedie's formula,Restora-Flowuses a simple yet effectiveforward extrapolationandnoise reintroductionstep. Thistrajectory correctionmechanism explicitly projects the fused sample closer to the data manifold and re-aligns it with the generative trajectory, preventing misalignments and artifacts. This mechanism aims to counter the divergence from the learned data distribution that can occur with simple mask fusion, a problem observed in a naive mask-guidedFlow Matchingapplication. - Speed and Perceptual Quality:
Restora-Flowachievessuperior perceptual quality(lowerLPIPS) while also demonstratingsignificantly faster processing timescompared todiffusion models(RePaint, ) and most existingflow-based methods(D-Flow,Flow-Priors). EvenPnP-Flow, which is fast, tends to produceover-smoothed results, a problemRestora-Flowaims to avoid.OT-ODEcan be fast in some cases but often yields artifacts. - Simplicity: It introduces only one hyperparameter (number of
ODEsteps), with the number of correction steps fixed at , making it simpler to deploy than methods requiring extensive hyperparameter tuning for different tasks and datasets.
4. Methodology
4.1. Principles
The core idea behind Restora-Flow is to leverage the speed and high-quality generation capabilities of Flow Matching models for mask-based image restoration tasks. The method operates training-free, meaning it uses an already trained unconditional Flow Matching model as a prior. The two main principles are:
- Mask-Guided Sampling: During the continuous generation process of
Flow Matching(which normally generates an image from pure noise), the algorithmguidesthe sample at each time step using the known, unmasked regions of the degraded input. This ensures fidelity to the observed data. - Trajectory Correction: A novel mechanism is introduced to address the issue of
misalignmentthat can occur when fusing known data with the evolving sample. This correction activelyrefinesthe generative path by projecting the sample towards the clean image manifold and then reintroducing appropriate noise, thereby maintaining consistency with the learned flow trajectory and preventing artifacts or deviations from realism. The goal is to achieve both high reconstruction quality and fast processing times.
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. Flow Matching (FM) Background
Flow Matching learns a continuous velocity field that defines a deterministic transformation from a simple base distribution to a complex target data distribution.
The method aims to learn a velocity field of the probability flow . This field governs how a sample evolves from a simple base distribution (e.g., standard normal distribution ) at to the target data distribution at . The training objective for this velocity field is given by the conditional Flow Matching loss:
$ \operatorname*{min}{\theta} \mathbb{E}{t, x_1, x_0} \Big[ \frac{1}{2} \Big\lVert v_{\theta,t} \big( \Psi_t(x_0) \big) - \big( x_1 - x_0 \big) \Big\rVert^2 \Big] $
Where:
-
: The parameters of the neural network approximating the
velocity field. -
: Expectation over the random variables.
-
: A continuous time variable, sampled uniformly from .
-
: A sample from the target data distribution .
-
: A sample from a simple base distribution, typically standard normal noise .
-
: The learned
velocity fieldpredicted by the neural network at time for a sample . -
: The true
velocity vectorthat maps to along a straight line path. -
: The
conditional flow(or linear interpolation) between and at time . This is a straight line path.The loss function encourages the learned
velocity fieldto predict the direction of change () required to move from the interpolated point towards the target data point . This allows forsimulation-free training, meaning theODEdoes not need to be solved during training.
Once the velocity field is learned, new images can be generated by integrating the corresponding Ordinary Differential Equation (ODE):
$ \frac{\mathrm{d}}{\mathrm{d}t} \Psi_t(x) = v_{\theta,t} ( \Psi_t(x) ) $
This equation describes how the sample changes over time. To obtain samples, numerical integration methods are used. For example, using the explicit Euler integration scheme with a small time step , the estimate of the sample is updated iteratively:
$ x_{t + \Delta_t} = x_t + \Delta_t v_{\theta,t} ( x_t ) $
Here, represents the current estimate of the image at time , and is the updated estimate at the next time step. The term represents the small change applied to based on the predicted velocity at that point. The sampling process starts with and iteratively updates until , yielding a generated image.
4.2.2. Restora-Flow Algorithm for Mask-Based Restoration
The goal of image restoration is to find the optimal image that is consistent with the degraded observation and also adheres to a learned image prior. This is formulated as a Maximum A Posteriori (MAP) estimation problem:
$ \hat{x} = \arg \operatorname*{min}x \mathcal{D}(Hx, z) + \mathcal{R}\theta(x) $
Where:
-
: The estimated original image.
-
: The
degradation operator(e.g., downsampling for super-resolution, zeroing out pixels for inpainting). -
: The
data fidelity term, typically amean squared error (MSE)or similar measure, quantifying how well the restored image matches the observed degraded input in the known regions. -
: The
prior term(orregularization term), which leverages the learnedFlow Matchingmodel (parameterized by ) to ensure the restored image is realistic and belongs to the data manifold.Simply running the
Flow Matchinggeneration (Eq. (3)) from noise will produce a generic image, but it won't necessarily be consistent with the specific degraded observation . Therefore, the sampling process must beguidedtowards theMAP solution.Restora-Flowopts formask-guidancedue to its applicability to variousmask-based degradations.
4.2.2.1. Incorporating Mask-Guidance
Restora-Flow incorporates mask-guidance, drawing inspiration from RePaint [23]. This involves fusing the time-dependent sample with the unmasked portion of the original degraded image .
First, the degraded observation is adapted to match the noise level present in the current flow estimate . This is done through a convex combination with Gaussian noise :
$ z' = t z + (1-t) \epsilon, \quad \epsilon \sim \mathcal{N}(0, I) $
Where:
-
: The noise-adapted version of the degraded observation.
-
: Current time step in the
ODEintegration (from 0 to 1). -
: The original degraded observation.
-
: A sample from a standard normal distribution, .
-
: As approaches 0 (more noise in ), more noise is added to . As approaches 1 (less noise in ), becomes closer to .
Next, this noise-adapted is
fusedwith the current flow estimate using abinary mask:
$ x_t' = m \odot z' + (1-m) \odot x_t $
Where:
-
: The
mask-guided(fused) sample at time . -
: A
binary maskwhere for known (unmasked) regions and for unknown (masked) regions. -
: Element-wise multiplication.
-
: Preserves the known regions from the noise-adapted degraded observation .
-
: Retains the unknown (to be restored) regions from the current flow estimate .
A naive approach would be to simply use this fused sample in the
Flow Matchingupdate equation:
$ x_{t + \Delta_t} = x_t' + \Delta_t v_{\theta,t} ( x_t' ) $
However, the authors point out that this naive approach leads to visible misalignments at mask boundaries because might diverge from the distribution of samples that the Flow Matching model was trained on. In essence, might not lie on a trajectory consistent with the learned generative process, causing the model to produce low-quality results or artifacts.
4.2.2.2. Trajectory Correction
To address the issues of misalignment and improve sample quality, Restora-Flow introduces a trajectory correction mechanism. This mechanism is applied after the initial mask-guided update (Eq. (7)) and within each ODE step.
The correction involves two main steps:
-
Forward Extrapolation: The updated sample (from Eq. (7)) is projected forward along the
velocity fieldtowards the endpoint (the clean image manifold). This acts as a learned denoiser, helping to correct misalignments and push the sample closer to the data manifold:$ \widetilde{x}1 = x{t + \Delta_t} + \left( 1 - (t + \Delta_t) \right) v_{\theta, t + \Delta_t} ( x_{t + \Delta_t} ) $
Where:
- : The extrapolated estimate of the clean image (at ).
- : The sample after the mask-guided
ODEupdate. - : The
velocity fieldpredicted for the updated sample at time . - : A scaling factor that accounts for the remaining time to .
-
Reintroduction of Noise and Rescaling: To place the sample back at the correct location along the generative trajectory and allow for
stochasticityanddiversityin generation, the extrapolated clean image is scaled back to the current time and combined with new Gaussian noise :$ x_t \leftarrow t \widetilde{x}_1 + (1 - t) \eta, \quad \eta \sim \mathcal{N}(0, I) $
Here, (the variable used for the next iteration) is updated. This step scales the
denoisedestimate based on (more weight on as ) and adds noise (more weight on noise as ). This ensures that the sample remains on a validFlow Matchingtrajectory while being consistent with thedenoisedestimate.
The authors empirically find that even a single correction step (i.e., ) per ODE iteration significantly improves alignment and reconstruction quality, as shown in Figure 2.
The overall Restora-Flow sampling process for mask-based image restoration (like inpainting and super-resolution) is detailed in Algorithm 1.
4.2.3. Algorithm 1: Mask-Guided Restora-Flow Sampling
Algorithm 1 outlines the steps for mask-based image restoration using Restora-Flow.
Algorithm 1 Mask-Guided Restora-Flow Sampling
1: Input: learned flow network , degraded observation , number of ODE steps (with , number of corrections , mask 2: Sample 3: for do 4: for do 5: Sample 6: if ,else 7: 8: 9: if and then 10: Sample 11: 12: x ← tx1 + (1 − t)η 13: else 14:
Step-by-step breakdown of Algorithm 1:
- Line 1 (Input):
- : The pre-trained
Flow Matchingvelocity fieldnetwork. - : The degraded input image.
- : Total number of
ODEintegration steps. is the size of each time step. - : Number of
trajectory correctionsteps within eachODEiteration. The paper empirically sets . - : The
binary maskdistinguishing known and unknown regions.
- : The pre-trained
- Line 2 (Initialization):
- An initial sample is drawn from a standard normal distribution, . This represents the starting point (pure noise) at .
- Line 3 (Outer Loop - Time Integration):
- The algorithm iterates through time steps from up to . Each iteration performs an
ODEupdate.
- The algorithm iterates through time steps from up to . Each iteration performs an
- Line 4 (Inner Loop - Correction Steps):
- For each time step , an inner loop runs for iterations (from to ). The iteration performs the initial mask-guided update, and subsequent iterations () apply the
trajectory correction.
- For each time step , an inner loop runs for iterations (from to ). The iteration performs the initial mask-guided update, and subsequent iterations () apply the
- Line 5 (Sample Noise):
- A random noise vector is sampled from a standard normal distribution, .
- Line 6 (Noise-Adapt Degraded Observation):
- The degraded observation is adapted to the current noise level. If , is set to 0 (effectively meaning the input contributes no information initially, as the sample is pure noise). Otherwise, it's a convex combination of and , as per Eq. (5).
- Line 7 (Mask-Guided Fusion):
- The current sample is fused with the noise-adapted degraded observation using the mask , as per Eq. (6). Known regions come from , unknown regions from . The result is .
- Line 8 (Flow Matching Update):
- The sample is updated using the
explicit Eulerstep, incorporating thevelocity fieldapplied to the fused sample , as per Eq. (7). Note: The original paper has a prime symbol after , which is likely a minor formatting artifact. The standardEulerstep is .
- The sample is updated using the
- Line 9 (Correction Condition):
- This condition checks if it's a
correction step() and not the very lastODEtime step (). The correction is not applied at the very last step because the image is already nearly clean.
- This condition checks if it's a
- Line 10 (Sample Noise for Correction):
- If it's a
correction step, another noise vector is sampled from .
- If it's a
- Line 11 (Forward Extrapolation):
- The current sample is projected forward to estimate the clean image at , using the
velocity fieldat the current time , as per Eq. (8).
- The current sample is projected forward to estimate the clean image at , using the
- Line 12 (Reintroduction of Noise and Rescaling):
- The sample is then rescaled using the extrapolated clean image and the newly sampled noise , effectively placing it back on a valid
Flow Matchingtrajectory at time , as per Eq. (9). This updated is then used for the next iteration of the inner loop (if ) or the outer loop (if ).
- The sample is then rescaled using the extrapolated clean image and the newly sampled noise , effectively placing it back on a valid
- Line 13-14 (Time Increment):
- If no correction is performed (i.e., and , or it's the last correction step for the current
ODEiteration), the time step for the outer loop is advanced. Note: The indentation in the original Algorithm 1 is a bit ambiguous for line 13-14, but based on the overall logic, should be incremented after all corrections for the currentODEinterval.
- If no correction is performed (i.e., and , or it's the last correction step for the current
4.2.4. Algorithm 2: Restora-Flow Sampling for Denoising
For image denoising, Restora-Flow uses a slightly different strategy, described in Algorithm 2, which can be viewed as a special case of mask-guidance where the mask is time-dependent and global.
Algorithm 2 Restora-Flow Sampling for Denoising
1: Input: degraded observation with noise level , number of ODE steps (with ) 2: Sample 3: for do 4: Sample 5: 6: 7:
Step-by-step breakdown of Algorithm 2:
- Line 1 (Input):
- : The noisy degraded observation.
- : The estimated noise level of the observation.
- : Number of
ODEsteps.
- Line 2 (Initialization):
- An initial sample is drawn from a standard normal distribution. This is the starting point for
Flow Matching.
- An initial sample is drawn from a standard normal distribution. This is the starting point for
- Line 3 (Time Integration Loop):
- The algorithm iterates through time steps from up to .
- Line 4 (Sample Noise):
- A random noise vector is sampled. Note: is sampled but not explicitly used in the final update in this algorithm. This might be a leftover from a more general formulation or an oversight in simplification.
- Line 5 (Scaled Observation):
- The degraded observation is scaled by to get . This effectively reduces the influence of noise as the
Flow Matchingprocess progresses towards a clean image.
- The degraded observation is scaled by to get . This effectively reduces the influence of noise as the
- Line 6 (Flow Matching Update):
- The sample is updated using the standard
Eulerstep ofFlow Matching. Here, is the input to thevelocity field, and is the result of applying the generative prior.
- The sample is updated using the standard
- Line 7 (Time-Dependent Masking/Fusion):
-
This is the core difference for denoising. A
time-dependent global maskis implicitly used. -
: An
indicator functionthat is 1 if the condition is true, and 0 otherwise. -
: An
indicator functionthat is 1 if , and 0 otherwise. -
Interpretation:
- For early time steps where , the updated sample is directly set to . This means the noisy observation (scaled by ) is used as the dominant input or initialization for the sampling process up to a certain point determined by the noise level .
- For later time steps where , the
ODEevolves the solution without further direct influence from . TheFlow Matchingmodel takes over to generate the clean image, relying purely on its learned prior knowledge.
-
In essence, the noisy observation is used as an initialization that influences the early stages of the
Flow Matchinggeneration. Once the process has evolved sufficiently (beyond ), theFlow Matchingmodel completes the denoising based on its generative capabilities.Figure 2 visually demonstrates the impact of the
trajectory correctionmechanism, showing that offers the best trade-off between quality and speed.
该图像是一个示意图,展示了Restora-Flow在不同矫正步骤下的样本效果。通过对比,时复原质量与处理速度达到最佳平衡,表现出色。其他矫正步骤(, , )的结果则各有差异。
-
Figure 2. Restora-Flow samples with and without correction steps. Empirically, one correction step ( )offers the best trade-off between high reconstruction quality and fast processing.
5. Experimental Setup
5.1. Datasets
The authors utilized four diverse datasets to thoroughly assess the performance of Restora-Flow:
-
CelebA [22]:
- Characteristics: Features training images of celebrity faces.
- Size: Resized to pixels.
- Domain: Natural images, specifically human faces.
- Test Set: 100 test images.
- Purpose: Common benchmark for generative models and face-related tasks.
-
AFHQ-Cat [5]:
- Characteristics: Includes training images of cat faces.
- Size: Resized to pixels.
- Domain: Natural images, specifically animal faces.
- Test Set: 100 test images.
- Purpose: Evaluates performance on a different natural image domain with potentially more complex textures than human faces.
-
COCO [19]:
- Characteristics: Contains training images of various object types and complex scenes.
- Size: Resized to pixels.
- Domain: Natural images, highly diverse and complex scenes.
- Test Set: 100 validation images.
- Purpose: Assesses versatility on a broad range of natural images with high variability.
-
X-ray Hand [13, 36]:
-
Characteristics: Comprises 895 hand radiographs.
-
Size: Resized to pixels.
-
Domain: Medical images (radiographs).
-
Test Set: 298 test images.
-
Purpose: Crucial for demonstrating the method's applicability and robustness in a specialized, sensitive domain like medical imaging, which often has different image statistics than natural images.
These datasets were chosen to represent a range of image complexities, sizes, and domains (natural vs. medical), providing a comprehensive evaluation of
Restora-Flow's versatility and robustness across different scenarios.
-
5.2. Evaluation Metrics
The paper uses a combination of distortion metrics and perceptual metrics to evaluate the quality of the restored images, acknowledging the perception-distortion trade-off [3].
5.2.1. Peak Signal-to-Noise Ratio (PSNR)
- Conceptual Definition:
PSNRis a common image quality metric that quantifies the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. It is most easily defined via theMean Squared Error (MSE). A higherPSNRvalue indicates better image quality, meaning less distortion or noise. It primarily measures pixel-wise accuracy. - Mathematical Formula:
First, the
Mean Squared Error (MSE)between two images, and , of size is calculated as: $ MSE = \frac{1}{mn} \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} [I(i,j) - K(i,j)]^2 $ Then,PSNR(in decibels) is calculated as: $ PSNR = 10 \cdot \log_{10} \left( \frac{MAX_I^2}{MSE} \right) $ - Symbol Explanation:
- : The original (ground truth) image.
- : The restored (degraded) image.
- , : The dimensions (height and width) of the images.
I(i,j),K(i,j): The pixel values at coordinates(i,j)in images and , respectively.- : The maximum possible pixel value of the image. For 8-bit grayscale images, this is 255. For images normalized to
[0,1], this is 1.
5.2.2. Structural Similarity Index (SSIM)
- Conceptual Definition:
SSIMis a perceptual metric that evaluates the similarity between two images. UnlikePSNRwhich focuses on absolute errors,SSIMattempts to model how the human visual system perceives image quality, considering three key factors:luminance,contrast, andstructure. A value closer to 1 indicates higher similarity. - Mathematical Formula: For two image patches and : $ SSIM(x, y) = \frac{(2\mu_x\mu_y + c_1)(2\sigma_{xy} + c_2)}{(\mu_x^2 + \mu_y^2 + c_1)(\sigma_x^2 + \sigma_y^2 + c_2)} $
- Symbol Explanation:
- , : The mean (luminance) of image patches and , respectively.
- , : The standard deviation (contrast) of image patches and , respectively.
- : The covariance between image patches and (structure correlation).
- , : Stability constants to avoid division by a small denominator.
- : The dynamic range of the pixel values (e.g., 255 for 8-bit images).
- , : Small constant values (e.g., , ).
5.2.3. Learned Perceptual Image Patch Similarity (LPIPS)
- Conceptual Definition:
LPIPS[39] is a perceptual similarity metric designed to correlate well with human judgment of image quality. Instead of comparing pixels directly,LPIPScomputes the distance between deep features extracted from pre-trained convolutional neural networks (e.g., AlexNet, VGG, SqueezeNet). It measures how perceptually similar two images are; a lowerLPIPSscore indicates higher perceptual similarity. It is particularly useful for evaluating the realism and perceptual quality of generative models. - Mathematical Formula:
LPIPSdoes not have a single, simple mathematical formula likePSNRorSSIM. It is calculated as the weighted distance between feature stacks extracted from a pre-trained deep neural network (often an image classification network likeAlexNetorVGG) at various layers. $ LPIPS(x, x_0) = \sum_l \frac{1}{H_l W_l} \sum_{h,w} |w_l \odot (\phi_l(x){h,w} - \phi_l(x_0){h,w})|_2^2 $ - Symbol Explanation:
- : The reference image.
- : The generated/restored image.
- : Index of a convolutional layer in the chosen pre-trained network.
- : The feature map extracted from layer .
- : Height and width of the feature map at layer .
- : A learnable weight vector that scales the feature differences at each layer. These weights are learned by training the
LPIPSmodel on a dataset of human perceptual similarity judgments. - : Element-wise multiplication.
- : Squared norm (Euclidean distance).
5.3. Baselines
Restora-Flow was compared against a comprehensive set of baseline methods, including both diffusion-based and flow-based approaches.
5.3.1. Flow-Based Baselines
-
OT-ODE[26]: A flow-based method that uses a gradient correction term to guide the generation process. -
Flow-Priors[40]: Decomposes the flow's trajectory into local objectives and optimizes them iteratively. -
D-Flow[2]: Formulates restoration as a source point optimization problem, requiring backpropagation through ODE solvers. -
PnP-Flow[25]: CombinesPlug-and-Playmethods withFlow Matchingwithout backpropagation.For experiments on
CelebAandAFHQ-Cat, hyperparameters for these flow-based baselines were adopted from [25], where a grid search was performed. ForCOCO,CelebA's hyperparameters were used, and forX-ray Hand,AFHQ-Cat's hyperparameters were used.
5.3.2. Diffusion-Based Baselines
-
RePaint[23]: Adiffusion-basedmethod that uses unmasked regions to guide thediffusion processfor inpainting tasks. -
[34]: A
diffusion-basedmethod for zero-shot image restoration usingrange-null space decomposition.For
RePaint, standard hyperparameters (jump length of 10, 10 resampling steps) were used. For , optimal performance was achieved with time-travel trick parameters and .
5.3.3. Implementation Details
- Flow-Based Models: Pretrained
CelebAandAFHQ-Catmodels from [25] were used. ForCOCOandX-ray Hand, newFlow Matchingmodels were trained from scratch. All flow-based reference methods were implemented using the framework from [25]. - Diffusion-Based Models: For a fair comparison,
DDPMswere trained from scratch using the sameU-Netarchitecture and training parameters as the flow-based models (e.g., learning rate1e-4, batch size 128/64, 200/400 epochs, 250 diffusion time steps). ForX-ray Hand, pretrained models from [14] and theMED-DDPMframework [10, 14] were used. - Restora-Flow Hyperparameters: A fixed number of
correction steps() was used across all experiments and datasets. The only optimized hyperparameter was the number ofODEsteps:- Natural image datasets (
CelebA,COCO): 64ODEsteps for denoising and box inpainting; 128ODEsteps for super-resolution and random inpainting; 256ODEsteps for super-resolution. - Medical
X-ray Handdataset: 64ODEsteps for denoising and super-resolution; 32ODEsteps for box inpainting and occlusion removal.
- Natural image datasets (
- Hardware:
NVIDIA A100 GPUsfor resolution experiments,NVIDIA GeForce RTX 3090for resolution experiments.
5.4. Tasks
The evaluation covered a range of mask-based image restoration tasks:
-
Denoising: Removing
Gaussian measurement noisefrom images.- for
CelebA,COCO,AFHQ-Cat. - for
X-ray Hand.
- for
-
Box Inpainting: Filling in a square region in the center of the image.
- centered mask for
CelebA,COCO. - centered mask for
AFHQ-Cat. - centered mask for
X-ray Hand.
- centered mask for
-
Super-resolution: Increasing the resolution of an image.
- super-resolution for
CelebA,COCO,X-ray Hand. - super-resolution for
AFHQ-Cat.
- super-resolution for
-
Random Inpainting: Filling in randomly masked pixels.
- Mask covering of pixels for
CelebA,COCO,AFHQ-Cat.
- Mask covering of pixels for
-
Occlusion Removal: A clinically motivated task on
X-ray Handdataset, where synthetically added occlusions (as described in [14]) are removed.All experiments, unless noted, included a
Gaussian measurement noiselevel of .
6. Results & Analysis
6.1. Core Results Analysis
The experimental results consistently demonstrate the superior performance of Restora-Flow across various tasks and datasets, particularly in perceptual quality (lower LPIPS) and processing time.
The following are the results from Table 1 of the original paper:
| Model | Denoising σ = 0.2 | Box inpainting 40 × 40 | Super-resolution 2× | Random inpainting 70% | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LPIPS ↓ | SSIM ↑ | PSNR ↑ | Time in s ↓ | LPIPS ↓ | SSIM ↑ | PSNR ↑ | Time in s ↓ | LPIPS ↓ | SSIM ↑ | PSNR ↑ | Time in s ↓ | LPIPS ↓ | SSIM ↑ | PSNR ↑ | Time in s ↓ | |
| CelebA | ||||||||||||||||
| RePaint [23] | N/A | N/A | N/A | N/A | 0.016 | 0.967 | 30.81 | 32.89 | 0.014 | 0.946 | 32.59 | 32.89 | 0.014 | 0.945 | 32.37 | 32.89 |
| DDNM+ [34] | 0.076 | 0.885 | 30.70 | 11.57 | 0.019 | 0.969 | 31.05 | 11.57 | 0.046 | 0.905 | 30.02 | 11.57 | 0.031 | 0.920 | 30.83 | 11.57 |
| OT-ODE [26] | 0.033 | 0.858 | 30.36 | 2.95 | 0.022 | 0.954 | 29.85 | 3.68 | 0.055 | 0.870 | 28.65 | 3.76 | 0.051 | 0.871 | 28.41 | 3.76 |
| Flow-Priors [40] | 0.132 | 0.767 | 29.27 | 26.22 | 0.020 | 0.969 | 31.17 | 26.22 | 0.110 | 0.722 | 28.52 | 26.22 | 0.019 | 0.944 | 32.34 | 26.22 |
| D-Flow [2] | 0.099 | 0.695 | 24.64 | 22.73 | 0.041 | 0.907 | 29.77 | 65.81 | 0.031 | 0.894 | 31.30 | 71.43 | 0.021 | 0.931 | 32.48 | 131.78 |
| PnP-Flow [25] | 0.056 | 0.910 | 32.12 | 4.60 | 0.045 | 0.941 | 30.48 | 4.60 | 0.058 | 0.908 | 31.37 | 4.60 | 0.022 | 0.954 | 33.55 | 4.60 |
| Restora-Flow | 0.019 | 0.922 | 33.09 | 0.58 | 0.018 | 0.964 | 30.91 | 2.06 | 0.014 | 0.952 | 33.59 | 3.63 | 0.015 | 0.947 | 32.71 | 3.63 |
| AFHQ-Cat | ||||||||||||||||
| RePaint [23] | N/A | N/A | N/A | N/A | 0.043 | 0.939 | 26.26 | 86.23 | 0.139 | 0.701 | 24.71 | 86.23 | 0.034 | 0.897 | 30.93 | 86.23 |
| DDNM+ [34] | 0.170 | 0.818 | 29.06 | 13.74 | 0.048 | 0.942 | 25.16 | 13.74 | 0.462 | 0.534 | 19.69 | 13.74 | 0.065 | 0.876 | 30.12 | 13.74 |
| OT-ODE [26] | 0.078 | 0.814 | 29.73 | 5.54 | 0.048 | 0.924 | 24.36 | 6.94 | 0.285 | 0.565 | 21.85 | 7.28 | 0.094 | 0.839 | 28.87 | 7.28 |
| Flow-Priors [40] | 0.153 | 0.771 | 29.43 | 67.10 | 0.054 | 0.942 | 26.04 | 67.05 | 0.271 | 0.565 | 23.50 | 67.30 | 0.046 | 0.909 | 31.82 | 67.69 |
| D-Flow [2] | 0.184 | 0.648 | 24.98 | 44.45 | 0.112 | 0.839 | 26.17 | 126.09 | 0.123 | 0.707 | 25.34 | 261.84 | 0.056 | 0.878 | 30.97 | 266.18 |
| PnP-Flow [25] | 0.165 | 0.864 | 31.10 | 9.86 | 0.124 | 0.904 | 26.18 | 9.86 | 0.180 | 0.790 | 26.95 | 46.26 | 0.042 | 0.930 | 33.07 | 19.15 |
| Restora-Flow | 0.051 | 0.899 | 32.35 | 0.72 | 0.047 | 0.939 | 25.96 | 3.96 | 0.158 | 0.761 | 26.33 | 14.48 | 0.034 | 0.914 | 31.99 | 7.48 |
| COCO | ||||||||||||||||
| RePaint [23] | N/A | N/A | N/A | N/A | 0.093 | 0.922 | 21.20 | 32.89 | 0.046 | 0.856 | 25.84 | 32.89 | 0.038 | 0.876 | 26.82 | 32.89 |
| DDNM+ [34] | 0.162 | 0.805 | 27.04 | 11.57 | 0.112 | 0.925 | 21.71 | 11.57 | 0.257 | 0.682 | 19.05 | 11.57 | 0.069 | 0.845 | 25.80 | 11.57 |
| OT-ODE [26] | 0.066 | 0.810 | 27.52 | 2.95 | 0.073 | 0.914 | 23.40 | 3.68 | 0.146 | 0.745 | 23.83 | 3.76 | 0.130 | 0.763 | 23.98 | 3.76 |
| Flow-Priors [40] | 0.116 | 0.751 | 27.08 | 26.22 | 0.084 | 0.927 | 23.58 | 26.22 | 0.112 | 0.698 | 24.93 | 26.22 | 0.055 | 0.855 | 25.97 | 26.22 |
| D-Flow [2] | 0.252 | 0.552 | 21.19 | 22.73 | 0.115 | 0.825 | 23.46 | 65.81 | 0.083 | 0.778 | 24.80 | 71.43 | 0.053 | 0.840 | 26.29 | 131.78 |
| PnP-Flow [25] | 0.128 | 0.855 | 28.97 | 4.60 | 0.121 | 0.892 | 24.56 | 4.60 | 0.118 | 0.827 | 26.73 | 4.60 | 0.053 | 0.896 | 28.13 | 4.60 |
| Restora-Flow | 0.026 | 0.905 | 30.57 | 0.58 | 0.084 | 0.929 | 24.80 | 2.06 | 0.044 | 0.877 | 27.44 | 3.63 | 0.040 | 0.881 | 27.37 | 3.63 |
| X-ray Hand | ||||||||||||||||
| RePaint [23] | N/A | N/A | N/A | N/A | 0.046 | 0.821 | 23.90 | 17.02 | 0.074 | 0.767 | 20.04 | 17.02 | 0.032 | 0.898 | 29.66 | 17.02 |
| DDNM+ [34] | 0.057 | 0.819 | 23.78 | 13.35 | 0.059 | 0.801 | 22.76 | 13.35 | 0.143 | 0.635 | 14.10 | 13.35 | 0.047 | 0.884 | 26.57 | 13.35 |
| OT-ODE [26] | 0.026 | 0.853 | 27.83 | 8.73 | 0.038 | 0.801 | 23.58 | 11.17 | 0.076 | 0.684 | 22.01 | 11.17 | 0.029 | 0.845 | 26.55 | 11.17 |
| Flow-Priors [40] | 0.033 | 0.885 | 28.58 | 68.54 | 0.035 | 0.882 | 25.74 | 68.59 | 0.162 | 0.460 | 20.38 | 68.62 | 0.023 | 0.933 | 27.07 | 68.59 |
| D-Flow [2] | 0.077 | 0.630 | 24.09 | 101.66 | 0.145 | 0.588 | 13.61 | 285.55 | 0.127 | 0.639 | 15.26 | 361.22 | 0.110 | 0.587 | 22.23 | 361.22 |
| PnP-Flow [25] | 0.052 | 0.843 | 25.17 | 20.48 | 0.054 | 0.822 | 23.67 | 20.35 | 0.029 | 0.884 | 25.88 | 102.29 | 0.045 | 0.889 | 26.83 | 20.35 |
| Restora-Flow | 0.021 | 0.912 | 31.34 | 0.50 | 0.035 | 0.846 | 24.67 | 4.03 | 0.037 | 0.857 | 24.66 | 7.95 | 0.017 | 0.935 | 33.51 | 4.03 |
-
CelebA Dataset:
- Perceptual Quality (LPIPS):
Restora-Flowconsistently achieves thebest LPIPS scoresacross all tasks (denoising, box inpainting, super-resolution, random inpainting), indicating superior perceptual quality and visual realism. For example, in denoising,Restora-Flowgets 0.019 compared to the next bestOT-ODEat 0.033. In super-resolution,Restora-Flowgets 0.014 compared toRePaint's 0.014, making it competitive with the best diffusion method. - Distortion Metrics (SSIM, PSNR):
Restora-Flowachievessuperior SSIM and PSNR scoresfor denoising and super-resolution. For box inpainting and random inpainting, it is a close second toFlow-PriorsandPnP-Flow, respectively, but still very competitive. - Processing Time: A standout advantage is
Restora-Flow'sprocessing time. It is remarkably faster than all other methods. For instance, in denoising, it takes only 0.58 seconds, which is significantly faster than (11.57s),RePaint(N/A for denoising, but 32.89s for inpainting/SR), and even faster flow-based methods likeOT-ODE(2.95s) andPnP-Flow(4.60s). This makesRestora-Flowapproximately faster thanRePaintand faster than for comparable or better quality. - Overall: The plots in Figure 4 further emphasize this, showing
Restora-Flowoccupying the bottom-left region (best quality, fastest time) across all metrics and tasks onCelebA.
- Perceptual Quality (LPIPS):
-
AFHQ-Cat and COCO Datasets:
Restora-Flowmaintains its strong performance on these diverse natural image datasets. It generallyoutperforms all other flow-based methods in LPIPSfor most tasks, with a few instances where it's a very close second (e.g., super-resolution onAFHQ-Cat, box inpainting onCOCO).- For
SSIMandPSNR, it also consistently achieves the best results or closely matches the top performers. - Crucially, these high-quality results are achieved with the
fastest processing timesin most settings.OT-ODEis occasionally faster (e.g., denoising onAFHQ-Catat 5.54s vs.Restora-Flowat 0.72s), butOT-ODEdelivers inferior reconstruction quality in those instances. Diffusion-based baselines(RePaint, ) often yield competitive quality but at asignificantly higher computational cost. Figure 1 (left) visually confirmsRestora-Flow's superior speed-quality trade-off forAFHQ-Cat.
-
X-ray Hand Dataset (Medical Domain):
-
The results on the medical dataset further confirm
Restora-Flow's versatility and robustness. It consistently achievesexcellent scores across LPIPS, SSIM, and PSNRcompared to baselines, while operating in afraction of the time. -
In the
clinically motivated occlusion removal task,Restora-Flowagain shows improvedperceptual qualityatlower processing times, which is significant for downstream medical tasks.Visual results (Figure 3 for
CelebA, Figure 6 forAFHQ-Cat, and supplemental figures) corroborate these quantitative findings, showingRestora-Flowgeneratingartifact-free and realistic imagesthatmaintain texture. In contrast, some baselines (OT-ODE,Flow-Priors, ) produce artifacts,D-Flowis slow and struggles with some object reconstruction, andPnP-Flowoften yieldsover-smoothed results. For medical images, while overall variability is lower,Restora-Flowshows clear advantages in maintaining anatomical detail and cleanly removing occlusions compared to baselines that might alter known regions or leave residual artifacts.
该图像是图表,展示了Restora-Flow与相关方法在CelebA数据集上的定量比较结果。图表中包含三个子图,分别展示了LPIPS值(左)、SSIM值(中)和PSNR值(右)与处理时间的关系。每个子图的y轴显示评分,x轴表示处理时间,使用不同形状和颜色标识不同方法及任务类型,包括去噪、超分辨率和图像修复。
-
Figure 4. Visual representation of quantitative results on CelebA. Restora-Flow is compared to related work methods (other shapes) on four different tasks (colors). The plots show LPIPS (left), (center) and PSNR (right) on the y-axis, and processing time (all plots) on the ai.For better visualizatin and comparison each plot is separateinto two parts with different scales in the -axis.
该图像是图表和示意图,展示了不同图像恢复方法在去噪、超分辨率、随机填充和框填充任务上的评分与处理时间的比较。左侧图表展示了各方法在不同时间下的得分表现,右侧为各任务处理结果的图示。
Figure 1. The image is a chart and schematic showing the comparison of different image restoration methods in terms of score and processing time for denoising, super-resolution, random inpainting, and box inpainting tasks. The left chart displays the performance of various methods at different time intervals, while the right side illustrates the results for each task.
该图像是一个比较不同图像修复方法的插图,展示了去噪、框内修补、超分辨率以及随机修补等任务下的结果。各行展示了不同方法的恢复效果,包括原始图像和使用 Restora-Flow 制作的结果。
Figure 3. The image is an illustration comparing different image restoration methods, showcasing results for denoising, box inpainting, super-resolution, and random inpainting tasks. Each row displays the recovery effects of different methods, including the original image and the result produced using Restora-Flow.
该图像是一个比较不同图像恢复方法效果的示意图。展示了去噪、盒子填充、超分辨率和随机填充的结果,并分别与原始图像及其他方法(如RePaint、DDNM、Flow-Priors等)进行了对比,最后展示了Restora-Flow的效果。
Figure 6. The image is a comparative illustration of the effects of different image restoration methods. It presents the results of denoising, box inpainting, super-resolution, and random inpainting, comparing them with the original image and other methods such as RePaint, DDNM, and Flow-Priors, and ultimately showing the results of Restora-Flow.
6.2. Ablation Studies / Parameter Analysis
The authors conducted an ablation study to investigate the influence of two key parameters for Restora-Flow: the number of ODE steps and the number of correction steps (C). This study focused on super-resolution on the CelebA dataset, evaluating LPIPS, SSIM, and PSNR against processing time.
The following figure (Figure 5 from the original paper) shows the ablation study results:
该图像是图表,展示了在 CelebA 数据集上进行 2 imes 超分辨率的实验结果。图中分别显示了使用不同修正步骤 值下的 LPIPS、SSIM 和 PSNR 指标相对于处理时间的变化,横轴为时间,纵轴为各指标的值。OD步骤的变化在图中标记。
Figure 5. Ablation of ODE steps (indicated by markers) and correction steps for super-resolution on CelebA comparing LPIPS (top), (middle) and (bottom) to processing time .ODE steps increase from left to right and represent 4, 8, 16, 32, 64, 128 and 256, respectively. For better visualization, ODE steps 4 and 8 when using are omitted. The circle indicates the selected hyperparameters. Time is per image and displayed on a logarithmic scale.
Analysis of Figure 5:
-
Impact of Correction Steps (C):
- As expected, increasing the number of
correction steps (C)(e.g., from to ) leads tolonger evaluation timesfor the same number ofODE steps. Each correction adds computational overhead. - The most significant improvement in quality (lower
LPIPS, higherSSIM/PSNR) occurs when moving from (no correction) to . For , the quality metrics are substantially worse, especiallyLPIPSandPSNR, indicating that thetrajectory correctionis critical for high-quality restoration. - Increasing beyond 1 (e.g., to or ) offers diminishing returns in terms of quality improvement, while further increasing processing time. For
LPIPS, show very similar performance, often with being slightly better or on par for comparableODEsteps. - This empirically validates the authors' choice to fix for all experiments, as it offers the
best trade-offbetween high reconstruction quality and fast processing, requiring minimal additional computational cost beyond theODEintegration itself.
- As expected, increasing the number of
-
Impact of ODE Steps:
-
For a fixed , increasing the
number of ODE stepsgenerally leads tobetter quality metrics(lowerLPIPS, higherSSIM/PSNR) up to a certain point where the metricssaturate. MoreODEsteps mean finer integration of thevelocity field, leading to more accurate trajectories. -
However, increasing
ODEsteps also directly increasesprocessing time. -
The plots show a clear
trade-off between fast sampling and better scores. Lowercorrection steps(specifically ) are advantageous because they allow for achieving good quality with a manageable number ofODEsteps without excessive processing time. -
For , the quality metrics (
LPIPS,SSIM,PSNR) show good performance starting from 32 or 64ODEsteps, with further increases providing only marginal gains. The chosen hyperparameters (e.g., 64 or 128ODEsteps depending on the task) reflect this saturation point for the best balance.The ablation study confirms that the
trajectory correction mechanismis a vital component ofRestora-Flow, significantly improving reconstruction quality, and that is a well-justified choice. It also highlights the typical trade-off between computational cost and quality when selecting the number ofODEsteps, which is common inODE-based generative models.
-
6.3. Qualitative Results
The qualitative results, as illustrated in Figure 3 (CelebA), Figure 6 (AFHQ-Cat), and supplemental figures, provide strong visual evidence supporting the quantitative superiority of Restora-Flow.
OT-ODE: Often producesartifacts, especially noticeable inbox inpaintingtasks onCelebAandAFHQ-Cat, and inrandom inpaintingonCOCO. This suggests that its gradient correction might not fully resolve inconsistencies or maintain perceptual quality across diverse tasks.Flow-Priors: Tends to generatenoisy reconstructionsinsuper-resolutiononCelebAand introducesartifactsin bothsuper-resolutionandrandom inpaintingonAFHQ-CatandCOCO. This indicates issues with maintaining smoothness and realism.D-Flow: While generally yieldingrealistic results, it suffers fromslow processing times. More critically, it sometimes struggles to accuratelyreconstruct certain objects(e.g., for denoising onCOCO) and cansubstantially alter known input regions(e.g.,box inpaintingonX-ray Hand), suggesting sensitivity to hyperparameters or inherent limitations in its optimization approach.PnP-Flow: Despite achieving goodSSIMandPSNRscores,PnP-Flowfrequently producesover-smoothed resultsacross all datasets. This is a common characteristic ofPlug-and-Playmethods that prioritize distortion metrics, often at the expense ofperceptual realism. The lack of fine textures and details makes images appear less natural.Diffusion-based Baselines(,RePaint):-
often introduces
artifactsinsuper-resolutiontasks. -
RePaintgenerally producesvisually realistic resultsbut is significantlyslowerthanRestora-Flow.In stark contrast,
Restora-Flowconsistently generatesartifact-freeandrealistic imagesthatmaintain texturewhile ensuringfast processingacross all conducted experiments. The restored images preserve fine details and natural appearances without the common pitfalls of other methods (artifacts, over-smoothing, or altered known regions). This comprehensive qualitative analysis highlightsRestora-Flow's ability to balanceperceptual qualitywithfidelityandspeed.
-
For the medical X-ray Hand dataset, while overall visual variability among methods is lower due to the inherent uniformity of X-ray images, Restora-Flow still demonstrates clear advantages. For instance, in occlusion removal, other baselines often leave partially visible occlusions, whereas Restora-Flow effectively removes them and produces clean, high-quality reconstructions that are crucial for clinical interpretation. The consistency and quality, combined with faster processing, make Restora-Flow particularly suitable for sensitive applications like medical imaging.
7. Conclusion & Reflections
7.1. Conclusion Summary
The paper successfully introduces Restora-Flow, a novel training-free algorithm for mask-guided image restoration based on Flow Matching. By intelligently combining mask-guided sampling with an effective trajectory correction mechanism, Restora-Flow overcomes critical limitations of existing diffusion and flow-based methods. It consistently achieves superior perceptual quality (lower LPIPS) and significantly reduced processing times across various tasks (denoising, inpainting, super-resolution, occlusion removal) and diverse datasets (natural and medical). A key advantage is its simplicity, requiring only the number of ODE steps as a hyperparameter, as the number of corrections is fixed at .
7.2. Limitations & Future Work
The primary limitation acknowledged by the authors is that Restora-Flow is currently designed for mask-based degradation operators. This means it might not be directly applicable to image restoration tasks involving more complex, non-mask-based degradations (e.g., motion blur, specific camera artifacts, or complex non-linear degradation models) without modification.
As future work, the authors plan to extend the algorithm to tackle image restoration tasks that involve these non-mask-based degradation operators. This would significantly broaden the applicability of Restora-Flow to an even wider range of real-world scenarios.
7.3. Personal Insights & Critique
This paper presents a highly valuable contribution to the field of image restoration, particularly in demonstrating the practical superiority of Flow Matching over diffusion models for this specific application. The training-free nature and the emphasis on processing speed while maintaining perceptual quality are highly desirable characteristics for real-world deployment.
Inspirations and Applications to Other Domains:
- Efficiency for Real-time Applications: The significant reduction in processing time makes
Restora-Flowhighly inspirational forreal-time image enhancementsystems, such as in autonomous driving (e.g., removing sensor noise or inpainting occluded regions from cameras), video conferencing (real-time denoising or super-resolution), or medical imaging during live procedures. - Medical Imaging Potential: Its strong performance on the
X-ray Handdataset, especially forocclusion removal, suggests immense potential for other medical imaging modalities. The consistency and artifact-free generation could be crucial fordiagnostic accuracyanddownstream AI tasks(like segmentation or classification) where noisy or degraded inputs are common. - Simplicity of Use: The minimal hyperparameter tuning (just
ODEsteps) is a significant advantage. This ease of use could encourage broader adoption by practitioners who might be deterred by the complex tuning requirements of other generative models. - Foundation for Other Inverse Problems: The core idea of
mask-guided generationwithtrajectory correctioncould be a robust framework for otherinverse problemsbeyond image restoration, such ascomputational photographytasks or even somescientific image analysiswhere parts of data are known or missing.
Potential Issues, Unverified Assumptions, or Areas for Improvement:
-
Generalization to Non-Mask Degradations: The current limitation to
mask-based degradationsis a critical point. ExtendingRestora-Flowto non-mask-based problems will likely require a more generalizeddata fidelity termand potentially a differentcorrection mechanismthat doesn't rely on a binary mask. This is a non-trivial extension. -
Fidelity Term in Denoising (Algorithm 2): In
Algorithm 2for denoising, the noise is sampled but not explicitly used in the final update. The observation is only incorporated for . While this works, it might be interesting to explore if a more continuous or adaptivedata fidelity termthroughout theODEsteps could further improve denoising performance, especially for varying noise distributions beyond simple Gaussian. The paper notes that is sampled, but it's not present in the update equation as expected. This might be a typo in the algorithm or a simplified representation. -
Impact of Universality: While empirically is shown to be optimal for the tested scenarios, it's worth considering if specific, highly complex degradation types or very diverse datasets might benefit from a dynamically chosen or slightly higher . However, the authors' empirical evidence suggests it's robust.
-
Computational Cost of Velocity Field Evaluation: While the
ODEintegration is faster thandiffusion, the underlyingvelocity fieldis still a deep neural network (U-Net). The speed-up primarily comes from fewer total evaluations due to straighter trajectories. Further optimizations for thevelocity fieldinference itself (e.g., knowledge distillation, smaller models) could push the boundaries of real-time performance even further. -
Memory Footprint:
Flow Matchingmodels, likediffusion models, can be memory-intensive due to theirU-Netarchitecture. While not explicitly discussed, the memory footprint could be a consideration for very high-resolution images or embedded systems.Overall,
Restora-Flowrepresents a significant step forward, showcasing the practical utility and efficiency ofFlow Matchingin a crucial application domain. Its elegant design and empirical success make it a strong candidate for futuregenerative prior-based image restoration methods.
Similar papers
Recommended via semantic vector search.