NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution
TL;DR Summary
This paper reviews the NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution, detailing proposed methods for restoring noisy RAW images and upscaling Bayer images, with participation from 230 entrants and submissions from 45.
Abstract
This paper reviews the NTIRE 2025 RAW Image Restoration and Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Restoration and Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. The goal of this challenge is two fold, (i) restore RAW images with blur and noise degradations, (ii) upscale RAW Bayer images by 2x, considering unknown noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. This report presents the current state-of-the-art in RAW Restoration.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
The central topic of this paper is the NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution. It focuses on reviewing the solutions and results from this challenge.
1.2. Authors
The paper lists a large number of authors, indicating a collaborative effort typical of challenge reports, where organizers and winning teams are credited. The primary organizers appear to be:
-
Marcos V. Conde *†
-
Radu Timofte *
-
Zihao Lu *
-
Xiangyu Kong
-
Xiaoxia Xing
-
Fan Wang
-
Suejin Han
-
MinKyu Park
-
Tianyu Zhang
-
Xin Luo
-
Yeda Chen
-
Dong Liu
-
Li Pang
-
Yuhang Yang
-
Hongzhong Wang
-
Xiangyong Cao
-
Ruixuan Jiang
-
Senyan Xu
-
Siyuan Jiang
-
Xueyang Fu
-
Zheng-Jun Zha
-
Tianyu Hao
-
Yuhong He
-
Ruoqi Li
-
Yueqi Yang
-
Xiang Yu
-
Guanlan Hoong
-
Minmin Yi
-
Yuanjia Chen
-
Liwen Zhang
-
Zijie Jin
-
Cheng Li
-
Lian Liu
-
Wei Song
-
Heng Sun
-
Yubo Wang
-
Jinghua Wang
-
Jiajie Lu
-
Watchara Ruangsang
Their affiliations include academic institutions (e.g., University of Science and Technology of China, Xi'an Jiaotong University, Nanjing University, Harbin Institute of Technology, Huazhong University of Science and Technology, Northeastern University, Dalian University of Technology, Politecnico di Milano, Chulalongkorn University) and industry research labs (e.g., Samsung R&D Institute China - Beijing (SRC-B), The Department of Camera Innovation Group, Samsung Electronics, Shanghai Shuangshen Information Technology Co., Ltd., Ministry of Education Key Laboratory of Intelligent Networks and Network Security, Xiaomi Inc., E-surfing Vision Technology Co., Ltd, China, Institute of Automation, Chinese Academy of Sciences). This mix of academic and industry participants highlights the practical relevance and research interest in
RAW image processing.
1.3. Journal/Conference
The paper is published at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025. CVPR is one of the premier conferences in computer vision, known for presenting cutting-edge research. Workshop papers, like this one, often focus on specific challenges or emerging topics, providing a snapshot of the current state-of-the-art in a specialized area.
1.4. Publication Year
2025
1.5. Abstract
The abstract introduces the NTIRE 2025 RAW Image Restoration and Super-Resolution Challenge, emphasizing its relevance to modern Image Signal Processing (ISP) pipelines. It notes that RAW image restoration is less explored than its RGB counterpart. The challenge had two main goals: (i) restoring RAW images affected by blur and noise, and (ii) upscaling RAW Bayer images by despite unknown noise and blur. A total of 230 participants registered, with 45 teams submitting results. The report aims to present the current state-of-the-art in RAW Restoration.
1.6. Original Source Link
https://arxiv.org/abs/2506.02197
1.7. PDF Link
https://arxiv.org/pdf/2506.02197v2.pdf
2. Executive Summary
2.1. Background & Motivation
The core problem the paper addresses, via the NTIRE 2025 Challenge, is the restoration and super-resolution of RAW images. This problem is crucial for modern Image Signal Processing (ISP) pipelines, particularly in portable camera devices.
Why is this problem important?
- Physical Limitations of Portable Devices: Smaller sensors in portable devices lead to reduced light collection, lower
signal-to-noise ratios (SNR), and limited optical resolution. This makes it challenging to achieve high resolution, low noise, and sharpness simultaneously. Image restoration tasks are thus indispensable. - RAW vs. sRGB Processing:
RAW imagescontain more pristine sensor data with minimal processing (linear amplification, white balance), preserving a nearly linear response. In contrast, images processed through theISP pipeline(demosaicing, tone mapping, gamma correction, color adjustment) undergo strong nonlinear operations that can introduce irreversible information loss, amplify errors, and limit the performance of tasks likedenoisingandsuper-resolutionwhen conducted in thesRGB domain. While some methods attempt to reconstructRAWfromsRGB, they often fail to recover original detail patterns. - Generalization and Robustness:
sRGBimages from the same sensor can vary significantly across manufacturers due to differentISP-tuning standardsand stylistic preferences. This makescross-sensor model generalizationandrobustnessdifficult when relying onsRGBinputs. Processing in theRAW domainoffers greater consistency. - Computational Constraints: Portable devices have limited computational resources, making
model sizeandcomputational complexitycritical factors in model design.
What is the paper's entry point or innovative idea?
The paper's entry point is the NTIRE 2025 Challenge, which explicitly focuses on RAW image restoration and super-resolution. By organizing this challenge, the authors aim to:
- Push the boundaries of research in the less-explored
RAW domain. - Provide a standardized platform and dataset for developing and benchmarking new methods.
- Address the practical needs of
ISP pipelinesin portable devices by considering both restoration quality and computational efficiency. - Highlight the advantages of operating directly on
RAW databefore complexISPtransformations.
2.2. Main Contributions / Findings
The paper's primary contributions stem from the organization and reporting of the NTIRE 2025 RAW Image Restoration and Super-Resolution Challenge:
- Challenge Design and Setup: The paper details the two-fold challenge: (i)
RAW image restorationfrom blur and noise, and (ii)2x RAW Bayer image super-resolutionwith unknown noise and blur. It defines the degradation models, dataset properties, and evaluation protocols for both tracks. - Comprehensive Dataset Provision: It describes the use of existing datasets like
BSRAWandAdobe MIT5K, and new data collected from diverse smartphone sensors (Samsung Galaxy S9, iPhone X, Google Pixel 10, Vivo X90, Samsung S21) to create robust training and testing sets for both tasks. Pre-processing steps like normalization andRGGB Bayer patternconversion are also detailed. - Benchmark of State-of-the-Art Solutions: The paper presents a comprehensive benchmark of the solutions submitted by 45 teams (out of 230 registered participants). These solutions represent the current state-of-the-art in
RAW image restorationandsuper-resolution, covering a wide range of architectures from efficient to general models. - Detailed Analysis of Top Solutions: It provides in-depth descriptions of several top-performing methods, highlighting their architectural designs (e.g.,
RawRTSR,SMFFRaw,Multi-PromptIR,ERIRNet), training strategies (e.g., knowledge distillation, multi-stage training, reparameterization), and efficiency considerations. - Key Findings:
- Significant Quality Improvement: The proposed methods demonstrate great ability to increase
RAW imagequality and resolution, reduce blurriness and noise, and avoid detectable color artifacts, even for full-resolution12MPoutputs. - RAW vs. sRGB Advantages: Reinforces the idea that processing in the
RAW domainoffers superior performance compared tosRGBdue to the preservation of more detail information and linearity. - Efficiency vs. Performance Trade-off: The challenge included
efficienttracks with parameter constraints, pushing participants to design lightweight yet high-performing models. Teams likeSamsung AI Cameraexcelled in bothgeneralandefficienttracks. - Remaining Challenges: Identifies
more realistic downsamplingand the difficulty of simultaneously tacklingdenoisinganddeblurringeffectively as open research problems.
- Significant Quality Improvement: The proposed methods demonstrate great ability to increase
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To fully understand this paper, a reader should be familiar with several fundamental concepts in imaging and deep learning:
- RAW Images:
RAW imagesare unprocessed or minimally processed data captured directly by a digital camera's sensor. UnlikeJPEGorsRGBimages, they retain much more information, including a wider dynamic range, higher bit depth (e.g., 10-14 bits per pixel), and a linear response to light. They are often saved in proprietary formats or standardized asDNG(Digital Negative). ProcessingRAW imagesdirectly avoids information loss introduced by anImage Signal Processor (ISP). - Bayer Pattern: Most digital camera sensors use a
Bayer filter array(orBayer pattern), which is a color filter mosaic that arranges red, green, and blue color filters on a square grid of photosensors. Typically, there are twice as many green filters as red or blue, as the human eye is more sensitive to green light. A common arrangement isRGGB.RAW imagesare essentiallyBayer-patternedimages beforedemosaicing. The challenge often "packs" theseBayer patternsinto a 4-channel representation for easier processing by deep learning models (e.g.,RGGBpixels are grouped into a block, forming a 4-channel pixel). - Image Signal Processor (ISP) Pipeline: The
ISP pipelineis a series of computational imaging steps that transform the raw sensor data into a viewable image (e.g.,JPEGorsRGB). Key steps include:- Demosaicing (Debayering): Interpolating missing color information at each pixel from its neighbors to create a full-color
RGBimage. - White Balance: Adjusting color temperatures to ensure white objects appear white.
- Noise Reduction: Removing sensor noise.
- Sharpening: Enhancing edge details.
- Tone Mapping/Gamma Correction: Adjusting the image's brightness and contrast to fit the display's dynamic range and human perception (which is non-linear).
- Color Adjustment: Applying stylistic color enhancements.
The paper argues that performing
restorationtasks before these non-linearISPoperations is beneficial.
- Demosaicing (Debayering): Interpolating missing color information at each pixel from its neighbors to create a full-color
- Image Restoration: A general term for improving the quality of an image that has been degraded by various factors. In this paper, it specifically refers to:
- Denoising: Removing unwanted random variations in image intensity (noise) that can obscure details. Common types include
Gaussian noiseandshot noise. - Deblurring: Reversing the effects of blur, which can be caused by camera motion (motion blur), subject motion, or misfocus (defocus blur). This often involves estimating and reversing a
Point Spread Function (PSF), which describes how a point of light is spread out in the image.
- Denoising: Removing unwanted random variations in image intensity (noise) that can obscure details. Common types include
- Super-Resolution (SR): The process of enhancing the resolution of an image, typically by generating a high-resolution (HR) image from a low-resolution (LR) input. This involves hallucinating plausible details that were lost during downsampling. The challenge focuses on
2x upscaling. - Convolutional Neural Networks (CNNs): A class of deep neural networks widely used for image processing tasks. They consist of layers that perform
convolutional operations(applying learnable filters to detect patterns),pooling(downsampling), andactivation functions.CNNsare effective at learning hierarchical features from images. - Transformers (in Computer Vision): Originally developed for natural language processing,
Transformershave been adapted for computer vision. Key features include:- Self-Attention: A mechanism that allows the model to weigh the importance of different parts of the input sequence (or image patches) when processing a particular element. This enables capturing long-range dependencies.
- Multi-Head Attention: Multiple
self-attentionmechanisms running in parallel, each learning different relationships. - Feed-Forward Networks (FFNs): Simple fully connected layers applied independently to each position.
Transformerblocks often incorporateLayer Normalizationfor stable training.
- Peak Signal-to-Noise Ratio (PSNR): A common metric for evaluating the quality of image reconstruction. It quantifies the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. Higher
PSNRvalues indicate better image quality.- Conceptual Definition:
PSNRis widely used to measure the quality of reconstruction of lossy compression codecs or image restoration algorithms. It is defined as the ratio of the maximum possible power of a signal to the power of corrupting noise that affects the fidelity of its representation. Because many signals have a very wide dynamic range,PSNRis usually expressed in terms of the logarithmic decibel (dB) scale. A higherPSNRgenerally indicates a better reconstruction. - Mathematical Formula: $ \mathrm{PSNR} = 10 \cdot \log_{10} \left( \frac{\mathrm{MAX}_I^2}{\mathrm{MSE}} \right) $
- Symbol Explanation:
- : The maximum possible pixel value of the image. For an 8-bit image, this is 255. For floating-point
RAW imagesnormalized to [0, 1], it is 1. - : Mean Squared Error between the original (ground truth) image and the reconstructed image . $ \mathrm{MSE} = \frac{1}{mn} \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} [I(i,j) - K(i,j)]^2 $ Where and are the dimensions of the image.
- : The maximum possible pixel value of the image. For an 8-bit image, this is 255. For floating-point
- Conceptual Definition:
- Structural Similarity Index Measure (SSIM): Another common metric for image quality assessment, designed to be more perceptually relevant than
PSNR. It measures the similarity between two images based on luminance, contrast, and structure. Values range from -1 to 1, with 1 indicating perfect similarity.- Conceptual Definition:
SSIMis a perceptual metric that quantifies the perceived quality degradation caused by processing operations such as data compression or loss. It is designed to model the human visual system's perception of structural information in an image. Instead of comparing absolute pixel differences,SSIMconsiders three key comparison components: luminance, contrast, and structure. - Mathematical Formula: $ \mathrm{SSIM}(x,y) = \frac{(2\mu_x\mu_y + c_1)(2\sigma_{xy} + c_2)}{(\mu_x^2 + \mu_y^2 + c_1)(\sigma_x^2 + \sigma_y^2 + c_2)} $
- Symbol Explanation:
- : Reference image (ground truth).
- : Processed image.
- : Average of .
- : Average of .
- : Variance of .
- : Variance of .
- : Covariance of and .
- , : Small constants to avoid division by zero. is the dynamic range of the pixel values (e.g., 255 for 8-bit grayscale images). and are small default constants (e.g., , ).
- Conceptual Definition:
3.2. Previous Works
The paper references several prior works, both as foundational concepts for the challenge and as baselines for comparison.
- BSRAW [18]: This is a key reference for the
NTIRE 2025challenge, used for both the dataset and the degradation pipeline.BSRAW(Improving Blind RAW Image Super-Resolution) likely introduced methods for synthesizing realisticlow-resolution (LR)RAW imagesfromhigh-resolution (HR)RAW imagesby modeling various degradations (noise, blur, downsampling). Its contributions likely include a robustdegradation pipelineand aRAWSRdataset, which are directly utilized here. - NTIRE 2024 RAWSR Challenge [19]: The current challenge builds directly on its predecessor. The
NTIRE 2024 RAWSR Challengealready exploreddeep RAW image super-resolution, and its top-performing methods (RBSFormer,BSRAW) are used as baselines, providing a performance context for the 2025 challenge. - PMRID [65]: (Practical Deep Raw Image Denoising on Mobile Devices) This model is a baseline for the
RAW Image Restoration (RAWIR)track. As its name suggests, it focuses onefficient RAW denoisingsuitable for mobile devices, indicating a practical, lightweight approach. - MOFA [9]: (A Model Simplification Roadmap for Image Restoration on Mobile Devices) Another baseline for
RAWIR,MOFAemphasizesmodel simplificationforimage restorationonmobile devices. This aligns with the challenge's focus on efficiency. - NAFNet [7]: (Simple Baselines for Image Restoration)
NAFNetis a widely recognized lightweightU-Net-likemodel that achieves strongimage restorationperformance without complexattention mechanisms. It's a key baseline forRAWIRand is often a component or inspiration for submitted solutions due to its efficiency and effectiveness. Many solutions in this challenge (e.g.,Team Samsung AI,Team NJU,Team ER-NAFNet) build upon or adaptNAFNet'sNafBlockarchitecture. - XRestormer [10]: (A Comparative Study of Image Restoration Networks for General Backbone Network Design) This is used as a
teacher modelinknowledge distillationstrategies byTeam Samsung AIfor bothRAWSRandRAWIR.XRestormerlikely represents a strong, potentially more complex,image restoration networkthat provides good features for student models to learn from. - RBSFormer [32, 33]: (Enhanced Transformer Network for Raw Image Super-Resolution) This
Transformer-basednetwork was a top performer in theNTIRE 2024 RAWSR Challengeand serves as a strong baseline inNTIRE 2025 RAWSR.Team USTC-VIDARexplicitly uses a streamlined version ofRBSFormer. It's known for itsTransformer blocksand efficient feature extraction. - SwinFIR-Tiny [76]: (Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super-Resolution)
Team Miersbases theirRAWIRsolution onSwinFIR-Tiny, which is a variant ofSwinIR(aTransformer-basedmodel for image restoration) optimized for efficiency. - MPRNet [74]: (Multi-Stage Progressive Image Restoration)
Team WIRTeamusesMPRNetas a baseline for theirLMPR-NetinRAWIR.MPRNetis amulti-stagenetwork that decomposesimage restorationinto multiple sub-tasks, processing images progressively to handle various degradations. - PromptIR [51]: (Prompting for All-in-One Image Restoration)
Team WIRTeam'sMulti-PromptIRforRAWIRis based onPromptIR, which uses aprompt mechanismto guide the restoration process, potentially improving performance in complex scenarios. - Restormer [75]: (Efficient Transformer for High-Resolution Image Restoration) Also referenced by
Team WIRTeamforMulti-PromptIR,Restormeris known for itsefficient Transformerarchitecture forhigh-resolution image restoration. - SYEnet [23]: (A Simple Yet Effective Network for Multiple Low-Level Vision Tasks with Real-Time Performance on Mobile Device)
Team EiffLowCVerbases theirRepRawSRonSYEnet, emphasizinglightweightandreal-time performanceformobile devices.
3.3. Technological Evolution
The field of image restoration has seen a significant evolution, moving from traditional signal processing methods to deep learning approaches, and more recently, from sRGB domain processing to RAW domain processing, with a growing emphasis on efficiency for mobile deployment.
-
Early Days (Traditional Methods): Initially,
image restorationrelied on classical signal processing techniques, often model-based (e.g., Wiener filtering fordenoising, iterative deconvolution fordeblurring). These methods were mathematically rigorous but often struggled with complex, real-world degradations and generalization. -
Rise of Deep Learning: The advent of
Convolutional Neural Networks (CNNs)revolutionizedimage restoration.CNNscould learn complex mappings from degraded to clean images, significantly outperforming traditional methods. EarlyCNNslikeSRCNNforsuper-resolutiondemonstrated the power of end-to-end learning. -
U-Net Architectures and Attention: Architectures like
U-Netbecame popular for their ability to capture multi-scale features throughencoder-decoderstructures withskip connections, which are vital forpixel-level restorationtasks. Later,attention mechanisms, inspired byTransformers, were integrated intoCNNs(e.g.,Squeeze-and-Excitation (SE) blocks,Channel Attention) or used as fullTransformer blocksto capture global dependencies and improve feature learning. -
Shift to RAW Domain Processing: Historically, most
image restorationresearch focused onsRGBimages, as these are the common output format. However, researchers realized the limitations ofsRGBdue to irreversible information loss from theISP pipeline. This led to a critical shift towards processingRAW imagesdirectly, leveraging their linear response and richer data. Papers likeBSRAWand challenges likeNTIRE RAWSRare at the forefront of this shift. -
Efficiency for Mobile/Edge Devices: As
deep learning modelsgrew in complexity, their deployment onresource-constrained mobile and edge devicesbecame a major challenge. This spurred research intolightweight network design,knowledge distillation,reparameterization, andquantization. Many solutions inNTIRE 2025explicitly address these efficiency concerns withefficient tracksandparameter constraints.This paper's work fits squarely within the current wave of
RAW domain deep learningwith a strong focus onefficiency. It benchmarks the latestCNN-andTransformer-basedarchitectures tailored forRAW dataandmobile deployment.
3.4. Differentiation Analysis
Compared to the main methods in related work, the NTIRE 2025 Challenge and its submitted solutions offer several core differentiations and innovations:
- Exclusive Focus on RAW Domain: While
sRGB image restorationis mature,RAW domain processingis less explored. This challenge specifically pushes the boundaries ofrestorationandsuper-resolutiondirectly onRAW Bayer data. This is a crucial differentiation, asRAW dataoffers higher fidelity and a more linear representation, potentially leading to superior results compared tosRGB-based methods that suffer fromISP-inducedinformation loss. - Combined Restoration and Super-Resolution: The challenge tackles
denoising,deblurring, andsuper-resolutionconcurrently onRAW images. This integrated approach is more aligned with real-worldISP pipelineneeds, where multiple degradations often coexist. The solutions need to be robust tounknown noiseandblur. - Emphasis on Efficiency: The introduction of an
Efficient Track(e.g.,max 200K parameters) explicitly drives innovation inlightweight model designformobile devices. This is a significant differentiator from many academic papers that prioritize absolute performance without stringent efficiency constraints. Techniques likereparameterization,knowledge distillation,narrow-and-deep architectures, anddepthwise separable convolutionsare heavily utilized and benchmarked here. - Diverse Real-World Degradations: The challenge uses sophisticated
degradation pipelines(e.g.,BSRAW,AISP) that simulaterealistic noise profilesandmultiple blur kernels. Crucially, participants are encouraged to develop their owndegradation pipelinesto generate more realistic training data, pushing models beyond simplistic synthetic degradations. TheRAWIRtrack even incorporatesdark frame noisecaptured from multiple mobile sensors. - Benchmarking of Modern Architectures: The challenge serves as a proving ground for contemporary
deep learning architecturesadapted toRAW data. This includesU-Net variants(NAFNet,ER-NAFNet),Transformer-basedmodels (RBSFormer,SwinFIR-Tinyvariants,PromptIR/Restormervariants), and novel combinations (MambaIRv2). The results provide a direct comparison of how these different architectural philosophies perform in theRAW domainunder challenging conditions. - Addressing Cross-Sensor Generalization: By using datasets from diverse camera sensors (Canon, Nikon, Samsung, iPhone, Google Pixel, Vivo), the challenge implicitly encourages the development of models that are robust and generalizable across different hardware, mitigating the
ISP-tuningvariability seen insRGBimages.
4. Methodology
4.1. Principles
The core principle behind the methods used in the NTIRE 2025 Challenge is to leverage deep learning models to directly process RAW image data for restoration (denoising and deblurring) and super-resolution. The intuition is that by operating on the raw, linear sensor data before the complex, non-linear transformations of an Image Signal Processor (ISP) pipeline, more original information is preserved, leading to higher quality restoration and upscaling.
The general theoretical basis involves learning a complex mapping function that takes a degraded low-resolution (LR) RAW image and outputs a high-resolution (HR) clean RAW image , i.e., . This function is typically parameterized by a deep neural network.
For RAW Super-Resolution (RAWSR), the principle is to learn to:
-
Remove noise and blur: Tackle the degradations present in the
LR RAW image. -
Upscale: Increase the resolution, often by a factor of , by inferring missing pixel information.
-
Maintain RAW properties: Ensure the output remains a
RAW Bayer imagewith correct color patterns and linear response.For
RAW Image Restoration (RAWIR), the principle is to learn to: -
Remove noise: Address various noise profiles, including
shot noiseandheteroscedastic Gaussian noise. -
Remove blur: Undo the effects of various
blur kernels (PSFs). -
Produce a clean RAW image: Output a
RAW imagethat is free from degradation but retains original detail.Both tasks emphasize efficiency for deployment on
mobile devices, requiring models with a lowparameter countandcomputational complexity.
4.2. Core Methodology In-depth (Layer by Layer)
The challenge is structured into two main tracks: RAW Image Super-Resolution (RAWSR) and RAW Image Restoration (RAWIR).
4.2.1. NTIRE 2025 RAWSR Challenge Setup
Degradation Pipeline:
The low-resolution (LR) degraded images for training are generated online using the degradation pipeline proposed in BSRAW [18]. This pipeline simulates realistic degradations, considering:
-
Different
noise profiles. -
Multiple
blur kernels (PSFs). -
A simple
downsampling strategyto synthesizeLR RAW images.Participants are allowed to apply other augmentation techniques or expand this
degradation pipelineto generate more realistic training data.
Data Pre-processing:
The challenge dataset is based on BSRAW [18] and the NTIRE 2024 RAWSR Challenge [19], using images from the Adobe MIT5K dataset [3].
-
Filtering: Images are manually filtered for diversity, natural properties, and sharpness (removing extremely dark, overexposed, or blurry images; only in-focus, sharp images with low
ISOare considered). -
Normalization: All
RAW imagesare normalized based on theirblack levelandbit-depth(e.g., 10, 12, 14 bits per pixel). -
Bayer Pattern Conversion: Images are converted ("packed") into the
RGGB Bayer pattern(4-channels). This is crucial as it allows transformations and degradations to be applied without damaging the original color pattern information, treating eachBayer blockas a 4-channel pixel.Training Data: 1064 clean
high-resolution (HR) RAW imagesof resolution are provided.
Testing: Three testing splits are used:
Validation: 401024pximages for model development.Test 1MP: 200 images of resolution.Full-resolution Test: The same 200 test images at full-resolution . Participants processLR RAW images(e.g., ) and submit theirHRresults, without access to the ground-truth images.
4.2.2. NTIRE 2025 RAW Image Restoration (RAWIR) Challenge Setup
Degradation Pipeline:
Participants use the baseline degradation pipeline [18] to simulate realistic degradations. They are also encouraged to develop their own degradation pipelines to simulate more realistic blur and noise. The core components include:
-
Different
noise profiles. -
Multiple
blur kernels (PSFs).Training Data: 2139 clean patches of dimension are provided for training.
Data Pre-processing:
The dataset uses images from diverse sensors (Samsung Galaxy S9, iPhone X from RAW2RAW [1], plus Google Pixel 10, Vivo X90, and Samsung S21).
-
Filtering: Images are manually filtered for high quality, sharpness, and clear details (captured under low
ISO(), in-focus, and with proper exposure). OriginalRAW filesareDNGformat, unprocessed byISP. -
Normalization: All images are normalized based on
camera blacklevelandbit depth. -
Bayer Pattern Conversion: Images are converted to the
RGGB Bayer pattern(4-channels). -
Cropping: Images are cropped into non-overlapping (packed) patches of dimension .
Testing: The synthetic test dataset is generated by applying the challenge's
degradation pipelineat three different levels:
- Test Level 1: Degradation is only sampled
noisefrom real noise profiles. , where is the clean image and is the noise. - Test Level 2: Degradation is
noise and/or blur, with 0.3 probability of blur and 0.5 of real noise. , where is a blur kernel. - Test Level 3: All images have
realistic blur and noise. .
4.2.3. Top Solutions for RAWSR
4.2.3.1. RawRTSR: Raw Real-Time Super Resolution (Team Samsung AI)
Method Description:
The model structure is based on CASR [71]. To meet parameter and inference time requirements, it adopts knowledge distillation and reparameterization. XRestormer [10] acts as the teacher model. Two student models (Efficient and General) are trained from this shared teacher. Re-parameterized convolution blocks are used during training, which are then fused into a unified convolution layer for deployment.
The task involves denoising, super-resolution, and blur quality degradation, which are decomposed into two fundamental processes: denoising and detail enhancement. Both networks employ a straight-through architecture integrating these modules.
-
Efficient Model (RawRTSR): (See image/2.jpg - student model part)
- Denoising Module: Reduces image resolution via
unPixelshuffledownsampling to capture global information for noise removal. Processes features through fourconvolutions, then restores resolution viaupsampling. - Detail Enhancement Module: Employs five
convolutionsto recover fine textures.Residual connectionsfrom the original input prevent excessive detail loss duringdenoising. - Upscaling: The final output is upscaled through
pixel shuffleoperations to achievesuper-resolution. - Channel Count: Maintains a maximum feature channel number of 48 throughout both modules.
- Denoising Module: Reduces image resolution via
-
General Model (RawRTSR-L): (See image/3.jpg)
- Channel Count: Increases the number of feature channels from 48 to 64 to enhance representational capacity compared to
RawRTSR. - Channel Attention: Incorporates a
channel attention mechanismto adaptively recalibrate feature responses, preventing information redundancy during thedenoisingstage due to channel expansion.
- Channel Count: Increases the number of feature channels from 48 to 64 to enhance representational capacity compared to
Implementation Details:
-
Synthetic Degradation: Two methods are used to obtain
low quality (LQ)images:- Randomly add
noiseandblurmultiple times atRAW domain. - Convert
RAWtoRGB, addmotion blurandnoise, then convert back toRAW.
- Randomly add
-
Training Steps (Three steps): All steps use
PyTorchandA100 GPU.- Separate Training:
Teacherandstudent modelsare trained separately.LQ patchesare cropped to .AdamW [47]optimizer (, , weight decay 0.0001) with a learning rate of 0.0005 for 800 epochs. Uses loss. - Feature Distillation: Model initialized with weights from step 1.
Feature distillationis used, with learning rate 0.00005 for 800 epochs. Uses loss. - Fine-tuning: Model initialized with weights from step 2.
LR patchesare . (Details on loss or epochs for this step are not fully elaborated beyond initialization).
- Separate Training:
-
Reparameterization: The final submitted model is the
student modelafterreparameterizationto ensure parameter and inference time requirements are met.The following figure (Figure 2 from the original paper) shows the network architecture of the student model and the submitted model for RAW image super-resolution processing:
该图像是一个示意图,展示了学生模型和提交模型在进行RAW图像超分辨率处理时的网络结构。上半部分为教师模型的结构,显示了输入低分辨率图像LR通过多个卷积层和SSAB模块逐步生成高分辨率图像SR。下半部分展示了学生模型的训练模式与提交模型的推理模式,强调了不同阶段的操作和结构设计。
The following figure (Figure 3 from the original paper) illustrates the overall structure of the RawRTSR-L network:
该图像是一个示意图,展示了RawRTSR-L网络的整体结构。左侧是低分辨率(LR)图像,经过去噪模块和细节增强模块处理后,生成右侧的超分辨率(SR)图像。网络架构包括多个卷积层和激活层,能够有效地提升RAW图像的质量。
4.2.3.2. Streamlined Transformer Network for RealTime Raw Image Super Resolution (Team USTC-VIDAR)
Method Description:
The overall framework (Figure 4) is a streamlined version of RBSFormer [33], designed for efficient processing.
- Main Branch: Consists of a convolution, cascaded
transformer blocks, and anupsample block. - Residual Branch: Contains only an
upsample block. - Upsample Block: Employs a convolution followed by a
PixelShuffleoperation[58]to upscale features by .
Computational Complexity Mitigation:
- Transformers:
Self-attention modulesandfeed-forward networksare primary sources of complexity. - InceptionNeXt [72]: Incorporated for efficient
spatial feature extractionduringQ, K, and V projectionby leveragingpartial convolutionanddepth-wise convolution. - ShuffleNet [78]: Adopted for
feed-forward networkswithchannel groupsto reduce input projection parameters while maintainingcross-channel communication. - Output Projection: Streamlined using
element-wise multiplicationwith adepth-wise convolution gate(inspired by[33, 48]). - Parameters:
transformer blocksandchannel groups.
Implementation Details:
-
Dataset: Trained exclusively on the provided
NTIRE 2025dataset. -
Augmentation: Random horizontal flips, vertical flips, and transpositions.
-
Degradation: Degraded images simulated using the
BSRAW degradation pipelinewith additionalPSF kernels [26]. -
Training (Two stages):
- Stage 1: steps, batch size 8, patch size 192. Learning rate to (decayed). ~12 hours on
NVIDIA RTX 3090 GPUs. - Stage 2: steps, batch size 64, patch size 256. Learning rate to (decayed). ~31 hours on
A800 GPUs.
- Stage 1: steps, batch size 8, patch size 192. Learning rate to (decayed). ~12 hours on
-
Optimizer:
Adamwith default hyperparameters. -
Loss Function: Combination of
Charbonnier lossand aFrequency loss [49], with the latter assigned a weight of 0.5.The following figure (Figure 4 from the original paper) shows Team USTC framework for RAW image super resolution:
该图像是团队USTC在RAW图像超分辨率任务中的框架示意图。该框架通过多个Transformer块处理输入的低分辨率RAW图像,并通过上采样技术生成高分辨率RAW图像,展示了现代图像信号处理中的新方法。
4.2.3.3. SMFFRaw: Simplified Multi-Level Feature Fusion Network for RAW Image Super-Resolution (Team XJTU)
Method Description:
SMFFRaw is a computationally efficient network based on MFFSSR [41], designed to bridge the gap between high restoration quality and computational overhead. It employs a novel iterative training strategy. The architecture (Figure 5) has three main components:
- Shallow Feature Extraction: A simple
convolutional operationextracts shallow features from the degraded input image . - Deep Feature Extraction: Deep features are extracted using a sequence of
Hybrid Attention Feature Extraction Block (HAFEB)modules. EachHAFEBconsists of:Point-wise Convolution (Pconv)Depthwise Convolution (DWconv)Reparameterized Convolution (RepConv)(no reparameterization during inference)Channel Attention (CA)Large Kernel Attention (LKA)
- Reconstruction: The feature map is first
upsampledusing aconvolutional layerwithpixel shuffle [58]. Then, it is added to thebilinearly interpolated inputto produce the finalsuper-resolvedresult . This design reduces training complexity while enhancingSRperformance.
Implementation Details:
-
Dataset: Solely uses the provided challenge dataset.
-
Augmentations: Common augmentations (rotation, flipping) along with
mixup [76]. -
Degradation:
BSRAW degradation pipeline [18]generatesRAW-degraded image pairs. -
Training (Five stages): (See Table 5)
- Each phase gradually introduces more complex degradations or increases patch size.
Adam optimizer [35]with initial learning rate 1e-3, decaying to 1e-6 usingCosine Annealing.- Loss function: Combination of
Charbonnier loss [64]andfrequency loss [31]for the first four stages;MSEandfrequency lossfor the final stage.
-
Hardware:
PyTorchonRTX 4090 GPUs.The following figure (Figure 5 from the original paper) illustrates the overall framework of the
SMFFRawmodel:
该图像是一个示意图,展示了SMFFRaw模型的整体框架和各个组成部分,包括浅层特征提取、深层特征提取和图像重建过程。同时,图中展示了混合注意力特征提取块(HAFEB)及其通道注意力(CA)和大内核注意力(LKA)的工作机制。
4.2.3.4. An Enhanced Transformer Network for Raw Image Super-Resolution (Team EGROUP)
Method Description:
This approach leverages the RBSFormer [32] architecture, directly processing RAW images for super-resolution. Operating in the RAW domain avoids complexities of non-linear transformations from ISP. The pipeline maintains a three-component structure:
-
Shallow Feature Extraction: Given a
low-resolution RAW image, shallow features are extracted: $ F_s = \mathbf{Conv}{3 \times 3}(I{LR}) $ Where is a convolutional operation. -
Deep Feature Extraction:
Transformer blocksare used to extract deep features: $ F_i = \mathcal{H}{tb_i}(F{i-1}), \quad i = 1, 2, ..., K $ $ F_d = \mathrm{Conv}_{3 \times 3}(F_K) $ Where represents the -thtransformer blockand is the total number of blocks. is the output of the lasttransformer block, which is then processed by another convolution to get . -
Reconstruction: The
high-resolution (HR)image is reconstructed by aggregating features: $ I_{HR} = \mathcal{H}{rec}(I{LR}, F_d) = \mathbf{Up}(F_s + F_d) $ Where is the reconstruction head, and denotes an upsampling operation, typically usingPixelShuffleor similar, combined with a residual connection from the shallow features and deep features.
Implementation Details:
-
Dataset: Official training dataset provided by organizers.
-
Augmentation: Random
noiseandblur degradation patternsin theRAW domain. -
Optimizer:
AdamWwith , . -
Learning Rate: Initial ,
cosine annealingto . -
Hardware:
PyTorch 1.11.0with twoNVIDIA 4090 GPUs. -
Training Parameters: Batch size 8, crop size 192.
-
Loss Function: Trained with loss for iterations, then fine-tuned with
FFT lossfor iterations.The following figure (Figure 6 from the original paper) shows the architecture of the
RBSFormer [32]used by Team EGROUP for RAW image super-resolution:
该图像是示意图,展示了Team EGROUP为RAW图像超分辨率使用的RBSFormer架构。图中包括多个组件,如原始低分辨率图像(Raw LR)、多个变换器块(Transformer Block)以及增强的交叉协方差注意机制(EXCA)和增强门控前馈网络(EGFN)。每个模块的功能和操作通过图例进行说明,清晰阐释了信息流和数据处理的过程,最后输出高分辨率图像(Raw HR)。
4.2.3.5. A fast neural network to do super-resolution based on NAFSSR (Team NJU)
Method Description:
Team NJU RSR proposes a CNN framework for RAW image super-resolution based on the NAFBlock from NAFSSR [14]. It adopts reparameterization during inference, fusing Batch Normalization parameters into CNN layers for efficient inference.
The architecture (Figure 7) redesigns the NAFBlock by modifying the SimpleGate component with a CNN layer and GeLU activation function, and removing the FFN component to constrain parameters. Layer Normalization is replaced with Batch Normalization for easier training and more efficient inference (as BN can be fused).
Computational Cost: For a 4-channel RGGB RAW image patch of size , NAFBN has 11.90 GFLOPS and 189K trainable parameters after BN fusion. Inference time on NVIDIA RTX3090 is (with BN fusion) and (half precision), compared to (without fusion).
Implementation Details:
-
Dataset:
NTIRE 2025 RAW Image Super Resolutionchallenge data, withdegradation pipeline [18]. -
Framework:
PyTorchonsingle vGPU-32. -
Model: 12
NAFBlocksof width 48. -
Optimizer:
Adamwwith , . -
Batch Normalization: Momentum set to 0.03.
-
Learning Rate: Initial ,
Cosine Annealingto . -
Training: iterations, ~7 hours.
-
Data Augmentation:
- Random patch cropping ().
- Random
white balance(unit interval normalized images). - Random horizontal or vertical flips.
- Random right-angle rotations.
Exposure adjustment(linear scaling in [-0.1, 0.1]).- Random
downsamplewithAvePool2dandbicubic interpolation(0.3 probability) for lower resolution images. All augmentations with probability 0.5.
-
Loss Function: loss.
The following figure (Figure 7 from the original paper) shows
NAFBNproposed by Team NJU RSR:
该图像是示意图,展示了Team NJU RSR提出的NAFBN网络架构。图中展示了从低分辨率(LR)图像经过三个卷积层和多个NAFBlock模块处理后,最终生成超分辨率(SR)图像的过程。通过逐层处理和特征提升,最后使用像素重排(Pixel Shuffle)方法将处理结果转换为高分辨率图像。
The following figure (Figure 8 from the original paper) shows adopted NAFBlock used by Team NJU RSR:
该图像是示意图,展示了团队 NJU RSR 采用的 NAFBlock 结构。该结构通过多个卷积层和批量归一化层,以及深度卷积和通道注意力机制,提升了图像处理的效果。
4.2.3.6. A efficient neural network baseline report using Mamba (Team TYSL)
Method Description:
Team TYSL implemented MambaIRv2 [24] on RAW data to provide a baseline from a different perspective. The architecture (Figure 9) is a simplified, lightweight version of MambaIRv2 (size less than ).
- Key Components: , , .
- Motivation: Chosen for its potential for
lightweightingand its novel application toRAW data. - Downsampling Analysis: Explored various
downsampling methodsincluding directbicubic downsamplingon each channel,AvgPool2D(as provided in the competition), andbicubic downsampling with bias.AvgPool2Dperformed significantly better. The team highlights the impact ofdownsamplingmethods if test set images are synthetic.
Proposed Downsampling Method (Figure 10): Based on the intuition that the central pixel value of a region is closer to the region's average.
- For a unit downsampled to (RGGB pattern), a red pixel after downsampling represents the average of the top-left pixels before downsampling.
- If direct
bicubic downsamplingorAvgPool2Dis applied to the red channel, the resulting value approximates the pixel at coordinate (1,1) in the original image, which ends up at the bottom-right of the downsampled red pixel, not its center. - Therefore, they propose
bicubic interpolationusing the nearest 16 points, with the interpolation point positioned at the center of the downsampled red pixel, to achieve more precisedownsampling.
Implementation Details:
-
Hardware: Server with several
A100 GPUs,PyTorchframework. -
Batch Size: 64.
-
Learning Rate: 8e-4.
-
Training Time: ~26 hours (slow due to image degradation process).
-
Dataset: Provided training set, adhering to the given
degradation pipeline(except fordownsampling), noimage enhancement.The following figure (Figure 9 from the original paper) illustrates the structure of
MambaIRv2:
该图像是图示,展示了 MambaIRv2 的结构。图中包含多个模块和连接,包括 W-MSA 和 ASSM,以及合并和重构的过程。
The following figure (Figure 10 from the original paper) shows Downsampling structure:
该图像是一个示意图,展示了双三次插值和平均池化在图像下采样中的应用。其中(a)部分介绍了双三次插值和平均池化的基本过程,(b)部分展示了带有偏差的双三次插值及其对每种颜色的详细计算方式。
4.2.3.7. RepRawSR: Accelerating Raw Image Super-Resolution with Reparameterization (Team EiffLowCVer)
Method Description:
Team EiffLowCVer designed two lightweight variants of SYEnet [23] for RAW image super-resolution, incorporating structural reparameterization and efficient network design.
-
RepTiny-21k:
- Increases the number of
feature extraction modulesto four (fromSYEnet's one). - Introduces
skip connections(red arrows in Figure 11) to mitigategradient vanishingand stabilize training. - Channel number set to 16 for efficiency.
- Achieves 5.65G
FLOPsand 21k parameters.
- Increases the number of
-
RepLarge-97k:
-
Increases the channel width to 32.
-
Uses only one
Feature Extraction Module. -
Incorporates the
FEBlock(a pre-processing module fromSYEnet) forsuper-resolution. -
Parameters and
FLOPsincrease significantly compared to Tiny-21k.The team observed that increasing feature extraction modules with
skip connectionsprovides an optimal balance for performance and speed. An additionaltailgenerates a second predicted image from intermediate feature maps, included in the loss calculation during training for stability, but removed during inference for efficiency.
-
Implementation Details:
- Optimizer:
Adamwith initial learning rate . - Learning Rate Scheduler:
CosineAnnealingRestartLR. - Hardware:
NVIDIA GeForce RTX 3090 24Gb. - Datasets: 1,064
RAW imagesprovided by organizers, 40 images randomly selected for validation. - Training Time: 22 hours for Tiny-21k, 26 hours for Large-97k.
- Training Strategies (Multi-stage):
-
Stage 1:
100,000training steps. Randomly croppedGT patches, random rotation and flipping.LQ imagesgenerated using the organizer'sdegradation pipeline. -
Stage 2: Patch size increased to . Training continued for an additional
50,000steps.The following figure (Figure 11 from the original paper) shows the main branch of
RepRawSRproposed by Team EffiLowCVer:
该图像是图示,展示了Team EffiLowCVer提出的RepRawSR的主要结构。图中包含特征提取模块及通道注意力机制,主要运算由多个卷积层和批归一化组成。该模型在Tiny-21k上进行了四次迭代。
-
4.2.3.8. ECAN: Efficient Channel Attention Network for RAW Image Super-Resolution (Team CUEE-MDAP)
Method Description:
ECAN aims to be an efficient Super-resolution algorithm for the NTIRE 2025 Efficient Track (parameter limit of ). It is a CNN-based model, trained end-to-end on the NTIRE 2025 RAW training dataset without external pre-trained models.
The ECAN architecture (Figure 12) uses four stages:
-
Shallow Feature Extraction:
convolutionon 4-channelRAW input. -
Deep Feature Extraction: 8
EfficientResidualBlockswith aglobal skip connection. EachEfficientResidualBlockuses:- An
inverted residual structurewithdepthwise separable convolutions(inspired byMobileNetV2 [57]). - A
Squeeze-and-Excitation (SE) block [27]forchannel attention. This focus on channel interaction aligns with the group's work onMulti-FusNet [55]andchannel attention networks [79].
- An
-
Upsampling:
PixelShuffle. -
Reconstruction:
convolutionto 4-channelRAWoutput.Efficiency: Only 93,092 parameters (), well below the limit. Computational cost estimated at 21.82
GMACs(or 43.65GFLOPs) for a input (scaling to 1MP output). Inference time is ~ per output megapixel on anNVIDIA RTX 4090.
Implementation Details:
-
Framework:
PyTorch. -
Optimizer:
AdamW(), weight decay . -
Learning Rate: Initial ,
cosine annealingto . -
Hardware:
NVIDIA RTX 4090 (24GB). -
Datasets:
NTIRE 2025 RAW training set. No extra data. -
Augmentation: Random 90/180/270 rotations, horizontal flips.
-
Degradation:
Gaussian Blur() +Gaussian Noise(). -
Training Time: 600 epochs (~1.6 hours).
-
Training Strategies: End-to-end from scratch,
Automatic Mixed Precision (AMP). Input patch size , batch size 64. Loss.Gradient clippingat 1.0.The following figure (Figure 12 from the original paper) illustrates the efficient residual blocks stack structure in RAW image processing:
该图像是一个示意图,展示了RAW图像处理中的高效残差模块堆叠结构,包括全局跳跃连接、像素重排和压缩-激励(SE)块。图中详细描述了输入、特征提取和输出的各个环节。此结构在图像信号处理和超分辨率重建中起到关键作用。
4.2.4. Top Solutions for RAWIR
4.2.4.1. Efficient RAW Image Restoration (Team SamsungAI)
Method Description:
This solution for the RAW Restoration Challenge is primarily based on NAFNet [7], a lightweight network. To satisfy parameter constraints and preserve performance, Samsung AI reduced NAFNet parameters and implemented a distillation strategy. X-Restormer [10] is adopted as the teacher model. Two student models (ERIRNet-S and ERIRNet-T) are trained through knowledge distillation from this shared teacher.
The method addresses coupled degradation processes (noise suppression and blur correction). They select the NafBlock architecture from NAFNet [7] as a pivotal component due to its SOTA efficacy in joint denoising and deblurring. To achieve parameter efficiency, they designed networks based on a narrow-and-deep architectural principle, reducing channel dimensions while preserving layer depth.
- ERIRNet-S (Figure 15): A simplified version of
NAFNetwith reducedchannel widthand fewerencoder-decoder blocksfor improved efficiency. - ERIRNet-T (Figure 16): Further reduces complexity by decreasing the number of
blocksand using smallerFFN expansion ratios. It replacesPixelUnshuffle layerswithConvTranspose, enabling deeper architectures under a strict parameter budget.
Implementation Details:
-
Training Process (Three stages - Figure 14):
- Stage 1 - Train Base Model: Each
ERIRNet variant(S and T) is trained independently using the originalground truth supervision. - Stage 2 - Train Teacher Model: A
Teacher Modelbased onX-Restormeris trained and fine-tuned on each mobile device. - Stage 3 - Distillation with Teacher Model:
Knowledge distillationis applied. Models are initialized with weights from Stage 1. Theteacher's outputsare used as targets to guide bothERIRNet-SandERIRNet-T.
- Stage 1 - Train Base Model: Each
-
Framework:
PyTorchonA100 GPU. -
Optimizer:
Adam [35]with . -
Loss Function: loss.
-
Learning Rate: Initial
1e-4(Stage 1), reduced to1e-5(Stage 3). -
Scheduler:
MultiStepLRscheduler forlearning rate decay. -
Training Epochs: 1000 (Stage 1), 2000 (Stage 2), 1000 (Stage 3).
-
Batch Size: 16.
The following figure (Figure 14 from the original paper) shows Training Stage Description by Samsung AI:
该图像是一个示意图,展示了NTIRE 2025挑战中的模型训练阶段。上半部分包括X-Restorer模型的第二阶段,下半部分是ERNnet模型的第一阶段,而右侧则展示了与Distillation Loss相关的第三阶段。
The following figure (Figure 15 from the original paper) shows Architectures of ERIRNet-S, with reduced channels and fewer blocks. Proposal by Samsung AI:
该图像是ERIRNet-S的架构示意图,展示了采用NafBlock和PixelShuffle模块的网络结构,其中包含下采样和上采样的卷积层,构建了处理RAW图像的深度学习模型。
The following figure (Figure 16 from the original paper) shows Architectures of ERIRNet-T, with ConvTranspose and reduced FFN expansion. Proposal by Samsung AI:
该图像是ERIRNet-T的架构示意图,展示了不同的NafBlock配置及用于 RAW 图像恢复的上下采样过程。该结构通过多层卷积和转置卷积来处理图像的模糊和噪声,旨在有效地进行RAW图像恢复与超分辨率。图中的标注显示了每层的输出尺寸和通道数变化。
4.2.4.2. Modified SwinFIR-Tiny for Raw Image Restoration (Team Miers)
Method Description:
The method is an improvement of SwinFIR-Tiny [76]. It enhances the model's feature representation capability by aggregating outputs of different RSTB (Residual Swin Transformer Block) modules. It also incorporates the HAB (Hybrid Attention Block) module from HAT [8] via zero convolution and applies reparameterization techniques [23]. The final model has parameters.
The complete network architecture is systematically illustrated in Figure 17.
- Baseline:
SwinFIR-Tiny [76]. - RSTBs: Four
Residual Swin Transformer Blocksforhierarchical feature extraction. EachRSTBcontains 5 or 6Hybrid Attention Blocks (HABs)and 1HSFB. - HABs: Integrate
multi-head self-attentionand alocally constrained attention moduleto capturemulti-scale contextual information. - FeatureFusion Strategy: A hierarchical
FeatureFusion strategysystematically aggregates outputs from eachRSTB blockto mitigateshallow feature degradationduringdeep feature extraction. - Enhancements: Integrates
Channel Attention Block (CAB [8])andCovRep5 [23]module to improvenoise robustnessandblur resilience.
Implementation Details:
-
Framework:
PyTorch, modified fromSwinFIRproject. -
Dataset: Divided into training (2,099 samples) and validation (40 samples).
-
Data Degradation:
BSRAW degradation pipeline [18], with enhancednoise levels:log_max_shot_noiseincreased from -3 to -2.- range (heteroscedastic Gaussian noise) extended from (5e-3, 5e-2) to (5e-3, 1e-1).
- range extended from (1e-3, 1e-2) to (1e-3, 5e-2).
-
Hardware: Four
H800 80G GPUs. -
Data Augmentation:
mixup. -
Optimizer:
Adam. -
Loss Function:
Charbonnier loss. -
Development Stages (Four stages):
- Baseline Training: Original
SwinFIR-Tinywith originaldata degradation. Initial learning rate 2e-4, batch size 8, input size . Trained for iterations. - Module Addition: Added
Feature Fusion module,Channel Attention module, andConvRep5 module. Initialized with weights from stage 1. Originaldata degradation. Initial learning rate 2e-4, batch size 8, input size . Trained for iterations. - CAB and Noise Intensity: Introduced
CABmodule usingzero convolutionand increasednoise intensity. Initial learning rate 3e-5, batch size 8, input size . Trained for iterations. - Fine-tuning: Further adjusted initial learning rate to 2e-5, reduced batch size to 2, increased input size to . Trained for iterations.
- Baseline Training: Original
-
Final Model: Utilized
reparameterizationto convertConvRep5 moduleinto a standard convolution.The following figure (Figure 17 from the original paper) shows
CABATTSwinFIRproposed method by Team Miers (Xiaomi Inc.):
该图像是示意图,展示了CABATTSwinFIR模型的整体架构及其各个组成部分。图中包含多个模块,包括RSTB、HAB和特征融合块,分别对RAW图像进行处理与提升。
4.2.4.3. Multi-PromptIR: Multi-scale Prompt-base Raw Image Restoration (Team WIRTeam)
Method Description:
Building on PromptIR [51] and Restormer [75], Multi-PromptIR transforms a degraded RAW image into a high-quality, clear image. The core is a four-layer encoder-decoder architecture with Transformer Blocks [75] for cross-channel global feature extraction.
- Encoding Stage: Incorporates images at reduced resolutions (1/2, 1/4, and 1/8 of original size) to enrich the encoding process, drawing from
multi-resolution imagesuccess[11]. - Decoding Phase: Adopts a specialized
prompt mechanism [51], consisting of aPrompt Generation Module (PGM)and aPrompt Interaction Module (PIM). The model combinesCNNsandTransformersas shown in Figure 18. The total number of parameters is .
Implementation Details:
-
Optimizer:
AdamW(initial learning rate , reduces to withcosine annealing). -
Hardware:
NVIDIA A100 (80G). -
Datasets: Provided datasets only.
PSF blurandnoiseadded to original images forsynthetic degradations. -
Training: End-to-end manner for 700 epochs.
-
Optimizer Parameters: .
-
Data Augmentation: Horizontal and vertical flips.
-
Inference: Splits input degraded images into patches, restores them, then merges processed patches.
-
Training Time: ~12 hours.
The following figure (Figure 18 from the original paper) shows Overall architecture proposed by the Team WIRTeam:
该图像是示意图,展示了WIRTeam提出的整体架构。图中左侧是输入RAW图像(Input I),右侧是恢复后的图像(Restored Ũ),中间部分描述了信号处理流程,包括多个下采样和Transformer模块。关键组件包括提示迭代模块(PIM)和提示生成模块(PGM),利用不同深度的信息逐步恢复图像,旨在有效处理图像的模糊和噪声问题。
4.2.4.4. LMPR-Net: Lightweight Multi-Stage Progressive RAW Restoration (Team WIRTeam)
Method Description:
Considering the potential of multi-stage feature interaction, LMPR-Net is a lightweight model for RAW image restoration based on MPRNet [74]. The multi-stage model decomposes the RAW image restoration task into multiple subtasks to address various degradation information (noise, blurring, unknown degradations).
- Original Resolution Block (ORB): Composed of
convolutionandchannel attention mechanismforcross-channel key feature extraction. - SAM (Stage-wise Attention Module): Efficiently refines incoming features at each stage.
- Lightness: Simplified components, hidden dimension set to 8.
- Depthwise Overparameterized Convolution [4]: Introduced to increase training speed and improve expressive power without significantly increasing computational cost.
- Parameter Count: parameters, .
Implementation Details:
-
Framework:
PyTorch. -
Optimizer:
AdamW(initial learning rate , reduces to withcosine annealing). -
Hardware:
NVIDIA GeForce RTX 4090 (24G) * 1. -
Datasets: Provided datasets only.
PSF blurandnoiseadded forsynthetic degradations. -
Data Augmentation: Horizontal and vertical flips.
-
Training: End-to-end for 600 epochs.
-
Optimizer Parameters: .
-
Loss Function:
Charbonnier loss [5]to avoid overly smooth restored images. -
Inference: Partitions input degraded images into patches, processes them, then seamlessly merges.
-
Training Time: ~10 hours.
The following figure (Figure 19 from the original paper) shows Overall architecture of the
LMPR-Netmethod proposed by the Team WIRTeam:
该图像是示意图,展示了团队 WIRTeam 提出的 LMPR-Net 方法的整体架构。该网络通过多个阶段处理降质的 RAW 图像,采用多个卷积、注意力模块和 U-Net 结构,实现RAW图像的恢复与超分辨率。
4.2.4.5. ER-NAFNet Raw Restoration (Team ER-NAFNet)
Method Description:
ER-NAFNet is a U-shaped framework for Efficient RAW image Restoration. Its architecture and compression mechanisms are built upon the NAFNet block proposed in NAFNet [6]. It enhances efficiency and performance.
The network is trained directly from 4-channel RGGB RAW data with a complex blurring and noise degradation pipeline from AISP [20].
Dataset Degradation Method: Inspired by [53], [61], [77].
Bayer pattern RAW imagesare cropped () to speed up training and learn real degradation.- Sequence of degradation operations:
multiple blurring,hardware specific added noise,random digital gainbylumaorexposure compensations. - Noise Handling: Dark frames captured using multiple
mobile sensorsto modelreal dark current noise. Highlight AreaPixels removed to preventhighlight color cast.
Model Framework (Figure 20):
- Shallow Feature Extraction: A
convolutional filterextracts shallow feature encodings from the low-qualityRAW image. - Deep Feature Extraction: A classic
U-shaped architecturewithskip connectionsperforms deep feature extraction. - NAFNet Block: Plays a pivotal role in the
U-shaped architecture, addressingattention mechanismlimitations. Integrates anddilated convolutionsto capture fine-grained details and large-scale patterns. Eliminates reliance on complexattention mechanismsfor superior performance with reducedcomputational overhead. - SimpleGate and Simple Channel Attention (SCA) modules: Incorporated to focus on relevant features and suppress irrelevant ones, enhancing feature representations.
- Raw Reconstruction: A
convolutionreconstructs thehigh-quality (HQ)image.
Implementation Details:
-
Dataset:
NTIRE 2024 official challenge data [20],Development Phase submission setfor validation. -
Framework:
PyTorch. -
Model Parameters: Width 16, [2, 2, 4, 8]
encoder blocks, [2, 2, 2, 2]decoder blocksat each stage. Middle block num set to 6. -
Data Augmentation: Simple horizontal or vertical flips,
channel shifts, andmixup augmentationsfor stability, convergence, and performance. -
Loss Function: loss.
-
Optimization:
AdamW [46]optimizer (, weight decay 0.00001) withcosine annealing strategy. -
Learning Rate: Initial to for
300,000iterations. -
Training Parameters: Batch size 12, patch size 512.
-
Hardware:
A100 GPUs.The following figure (Figure 20 from the original paper) shows The overall network architecture proposed by Team ER-NAFNet:
该图像是图示,展示了团队ER-NAFNet提出的整体网络架构。图中包含多个NAFNet模块和相应的层归一化、卷积等操作,显示了网络的数据流动和结构设计。
5. Experimental Setup
5.1. Datasets
The challenge utilized datasets designed to represent real-world RAW image characteristics and degradations, focusing on both Super-Resolution (RAWSR) and Restoration (RAWIR).
5.1.1. RAWSR Challenge Dataset
- Source: Based on
BSRAW [18]and theNTIRE 2024 RAWSR Challenge [19]. Images are from theAdobe MIT5K dataset [3], which includes images captured by multiple Canon and Nikon DSLR cameras. - Characteristics:
- Manually filtered for diversity and natural properties, removing extremely dark or overexposed images.
- Only in-focus, sharp images with low
ISOare considered.
- Pre-processing:
- All
RAW imagesare normalized based on theirblack levelandbit-depth. - Images are converted ("packed") into the well-known
RGGB Bayer pattern(4-channels). This allows transformations and degradations to be applied without damaging original color pattern information.
- All
- Scale:
- Training: 1064 clean
high-resolution (HR) RAW imagesof resolution . - Low-Resolution (LR) Degradation Generation:
LR degraded imagesare generatedon-lineduring training using thedegradation pipelineproposed inBSRAW [18]. This pipeline includes differentnoise profiles, multipleblur kernels (PSFs), and a simpledownsampling strategy. Participants can expand this.
- Training: 1064 clean
- Testing:
-
Validation: 401024pximages (used during model development). -
Test 1MP: 200 images of resolution. -
Full-resolution Test: The same 200 test images at full-resolution . -
Participants process corresponding
LR RAW images(e.g., ) without access to ground truth.The following figure (Figure 1 from the original paper) shows samples of the NTIRE 2025 RAW Image Super-Resolution Challenge testing set:
该图像是NTIRE 2025 RAW图像超分辨率挑战测试集的样本,对比了高分辨率(HR)地面真相与低分辨率(LR)输入图像。上半部分展示了花丛场景,下半部分则呈现了市区商店,二者都强调了恢复过程中的细节差异。
-
5.1.2. RAWIR Challenge Dataset
- Source: Uses images from diverse sensors, including:
- Samsung Galaxy S9 and iPhone X (from
RAW2RAW [1]). - Additionally, images collected from Google Pixel 10, Vivo X90, and Samsung S21 to enrich diversity.
- Samsung Galaxy S9 and iPhone X (from
- Characteristics:
- Covers various scenes (indoor and outdoor), lighting conditions (day and night), and subjects.
- Manually filtered to ensure high quality, sharpness, and clear details (captured under low
ISO(), in-focus, proper exposure). - Original
RAW filessaved inDNGformat, unprocessed by smartphones'ISPs.
- Pre-processing:
- All images normalized based on
camera blacklevelandbit depth(e.g., 10, 12, 14 bits per pixel). - Images converted to the
RGGB Bayer pattern(4-channels). - Images cropped into non-overlapping (packed) patches of dimension .
- All images normalized based on
- Scale:
- Training: 2139 clean patches are provided for training models.
- Degradation Generation: Participants use the baseline
degradation pipeline [18]to simulate realistic degradations. Participants also develop their own pipelines, whose core components include differentnoise profilesandmultiple blur kernels (PSFs).
- Testing:
- Synthetic test dataset generated by applying a
degradation pipelineat three levels:-
Test Level 1: (only sampled noise from real noise profiles).
-
Test Level 2: (noise and/or blur, with 0.3 probability of blur and 0.5 of real noise).
-
Test Level 3: (all images have realistic blur and noise).
The following figure (Figure 13 from the original paper) shows RAW image samples from RawIR Dataset:
该图像是几张RAW图像样本,各自展示了不同环境下的细节,如标语、植物、街道等。这些样本来源于RawIR数据集,展示了图像恢复和超分辨率挑战下的多样性。
-
- Synthetic test dataset generated by applying a
5.2. Evaluation Metrics
The primary quantitative evaluation metrics used in the challenge are Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). All fidelity metrics are calculated in the RAW domain.
-
Peak Signal-to-Noise Ratio (PSNR):
- Conceptual Definition:
PSNRis a common objective metric used to quantify the quality of image reconstruction. It compares the maximum possible power of a signal to the power of distorting noise. It is expressed in decibels (dB), with higher values indicating better quality (less distortion or noise).PSNRis widely used in image compression and restoration to assess the fidelity of a processed image to its original, ground-truth version. - Mathematical Formula: $ \mathrm{PSNR} = 10 \cdot \log_{10} \left( \frac{\mathrm{MAX}_I^2}{\mathrm{MSE}} \right) $
- Symbol Explanation:
- : The maximum possible pixel value of the image. For
RAW imagesnormalized to [0, 1], this value is 1. - : Mean Squared Error, which is the average of the squared intensity differences between the reference image and the processed image. $ \mathrm{MSE} = \frac{1}{mn} \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} [I(i,j) - K(i,j)]^2 $ Where:
- : The ground-truth (original) image.
- : The restored or super-resolved image.
m, n: The dimensions (height and width) of the image.I(i,j)andK(i,j): The pixel values at coordinates(i,j)in the ground-truth and processed images, respectively.
- : The maximum possible pixel value of the image. For
- Conceptual Definition:
-
Structural Similarity Index Measure (SSIM):
- Conceptual Definition:
SSIMis a perceptual metric that measures the similarity between two images. UnlikePSNRwhich measures absolute error,SSIMattempts to quantify the perceived quality based on three independent comparisons: luminance, contrast, and structure. A value of 1 indicates perfect structural similarity, while values closer to 0 indicate less similarity. It often correlates better with human visual perception thanPSNR. - Mathematical Formula: $ \mathrm{SSIM}(x,y) = \frac{(2\mu_x\mu_y + c_1)(2\sigma_{xy} + c_2)}{(\mu_x^2 + \mu_y^2 + c_1)(\sigma_x^2 + \sigma_y^2 + c_2)} $
- Symbol Explanation:
-
: The reference (ground-truth) image patch.
-
: The comparison (processed) image patch.
-
: The mean (average) of image patch .
-
: The mean (average) of image patch .
-
: The variance of image patch .
-
: The variance of image patch .
-
: The covariance of image patches and .
-
: A small constant to prevent division by zero in the luminance term. is a small scalar (e.g., 0.01), and is the dynamic range of pixel values (e.g., 1 for normalized
RAW images). -
: A small constant to prevent division by zero in the contrast term. is a small scalar (e.g., 0.03).
In addition to these fidelity metrics, the challenge also reports implementation details such as the number of
parameters(in millions,Par.) andMACs(Multiply-Accumulate Operations, in Giga, ) orFLOPs(Floating Point Operations, in Giga, ), andinference time(in ms), especially for theEfficienttracks.
-
- Conceptual Definition:
5.3. Baselines
For both challenge tracks, several existing and top-performing methods from previous years are used as references or baselines to compare against the new solutions.
5.3.1. RAWSR Challenge Baselines
- RBSFormer [33]: (Enhanced Transformer Network for Raw Image Super-Resolution) A top-performing method from the
NTIRE 2024 RAWSR Challenge, with 3.3 million parameters, achievingPSNRof 43.649 andSSIMof 0.987. It serves as a strongTransformer-basedbaseline. - BSRAW [18]: (Improving Blind Raw Image Super-Resolution) Another strong baseline from
NTIRE 2024, with 1.5 million parameters, achievingPSNRof 42.853 andSSIMof 0.986. This method also provides thedegradation pipelineused in the 2025 challenge. - Bicubic [19]: A standard
interpolation methodused as a basic reference point, representing a non-deep learning approach. It achievesPSNRof 36.038 andSSIMof 0.952, highlighting the significant improvements offered by deep learning models.
5.3.2. RAWIR Challenge Baselines
-
PMRID [65]: (Practical Deep Raw Image Denoising on Mobile Devices) A main baseline model, known for efficient
image denoisingonmobile devices. It achievesPSNRvalues of 42.41, 38.43, and 35.97 for Test Levels 1, 2, and 3 respectively, with 1.032 million parameters and 1.21GMACs. -
MOFA [9]: (A Model Simplification Roadmap for Image Restoration on Mobile Devices) Another main baseline model, focusing on
model simplificationforimage restorationonmobile devices. It achievesPSNRvalues of 42.54, 38.71, and 36.33 for Test Levels 1, 2, and 3 respectively, with 0.971 million parameters and 1.14GMACs. -
NAFNet [7]: (Simple Baselines for Image Restoration) Included as a representative
U-Net-likemodel and a popular efficientimage denoisingmethod. It achievesPSNRvalues of 43.50, 39.70, and 37.49 for Test Levels 1, 2, and 3 respectively, with 1.130 million parameters and 3.99GMACs. -
RawIR [20]: (Toward Efficient Deep Blind Raw Image Restoration) Another baseline, achieving
PSNRvalues of 44.20, 40.30, and 38.30 for Test Levels 1, 2, and 3 respectively, with 1.5 million parameters and 12.3GMACs.These baselines provide a context for evaluating the performance and efficiency of the new methods submitted to the
NTIRE 2025challenge, demonstrating how the top-performing solutions improve upon previousstate-of-the-art.
6. Results & Analysis
6.1. Core Results Analysis
The NTIRE 2025 Challenge presented two tracks: RAW Image Super-Resolution (RAWSR) and RAW Image Restoration (RAWIR). Both tracks included an Efficient category with tight parameter constraints and a General category without such limitations. The results demonstrate significant advancements in RAW image processing, with various deep learning architectures achieving high fidelity and, in some cases, remarkable efficiency.
6.1.1. RAWSR Challenge Results
The results for the RAWSR Challenge are presented in Table 1, showing PSNR/SSIM on the complete testing set (200 images), parameters, and a comparison with NTIRE 2024 baselines.
The following are the results from Table 1 of the original paper:
| Method | Track | PSNR | SSIM | # Par. |
| SMFFRaw-S 4.3 | Efficient | 42.12 | 0.9433 | 0.18 |
| RawRTSR 4.1 | Efficient | 41.74 | 0.9417 | 0.19 |
| NAFBN 4.5 | Efficient | 40.67 | 0.9347 | 0.19 |
| MambaIRv2 4.6 | Efficient | 40.32 | 0.9396 | 0.19 |
| RepRAW-SR-Tiny 4.7 | Efficient | 40.01 | 0.9297 | 0.02 |
| RepRAW-SR-Large 4.7 | Efficient | 40.56 | 0.9339 | 0.09 |
| ECAN 4.8 | Efficient | 39.13 | 0.9057 | 0.09 |
| USTC 4.2 | General | 42.70 | 0.9479 | 1.94 |
| SMFFRaw-S 4.3 | General | 42.60 | 0.9467 | 1.99 |
| RawRTSR-L 4.1 | General | 42.58 | 0.9475 | 0.26 |
| ERBSFormer 4.4 | General | 42.45 | 0.9448 | 3.30 |
| ER-NAFNet 5.5 | General | 41.17 | 0.9348 | - |
| RBSFormer [33] | 2024 | 43.649 | 0.987 | 3.3 |
| BSRAW [18] | 2024 | 42.853 | 0.986 | 1.5 |
| Bicubic [19] | 2024 | 36.038 | 0.952 | - |
Analysis:
- Efficient Track:
SMFFRaw-S(Team XJTU) achieved the highestPSNR(42.12 dB) andSSIM(0.9433) with 0.18M parameters, demonstrating excellent performance under strict efficiency constraints.RawRTSR(Team Samsung AI) also performed very well, close behindSMFFRaw-S, with 41.74 dBPSNRand 0.19M parameters.- All efficient methods are well below the 0.2M parameter limit, showcasing the success in developing lightweight
RAWSRmodels.
- General Track:
USTC(Team USTC-VIDAR) led this track with 42.70 dBPSNRand 0.9479SSIMusing 1.94M parameters. This indicates that their streamlinedTransformer network(RBSFormervariant) is highly effective.SMFFRaw-SandRawRTSR-Lalso showed strong performance in the general track, maintaining their competitive edge even with slightly higher parameter counts. Notably,RawRTSR-Lachieved 42.58 dBPSNRwith a relatively low 0.26M parameters, suggesting strong efficiency even in the general track.
- Comparison to NTIRE 2024 Baselines:
-
The
2024 baselines(RBSFormer [33]andBSRAW [18]) achieved higherPSNR(43.649 dB and 42.853 dB respectively) andSSIM(0.987 and 0.986). This suggests that while the 2025 challenge solutions are highly competitive, especially in efficiency, the absolute fidelity of the top 2024 methods still sets a high bar. It's important to note that evaluation datasets or exact degradation models might vary slightly between years, influencing direct comparison. -
All deep learning methods drastically outperform the
Bicubicbaseline (36.038 dBPSNR), highlighting the substantial gains from neural network approaches.The report concludes that the methods greatly improve
RAW imagequality and resolution, even for12MPoutputs, without detectable color artifacts. It notes that(synthetic) RAW image super-resolutioncan be solved similarly toRAW denoising, butmore realistic downsamplingremains an open challenge.
-
6.1.2. RAWIR Challenge Results
The results for the RAWIR Challenge are presented in Table 2, showing SSIM/PSNR results for three degradation levels, along with parameters and MACs.
The following are the results from Table 2 of the original paper:
| Method | Type | Test Level 1 | Test Level 2 | Test Level 3 | # Params. (M) | # MACs (G) |
| Test Images | 0.953 / 39.56 | 0.931 / 35.30 | 0.907 / 33.03 | |||
| PMRID [] | Baseline | 0.982 / 42.41 | 0.965 / 38.43 | 0.951 / 35.97 | 1.032 | 1.21 |
| NAFNET [7] | Baseline | 0.983 / 43.50 | 0.972 / 39.70 | 0.962 / 37.49 | 1.130 | 3.99 |
| MOFA [9] | Baseline | 0.982 / 42.54 | 0.966 / 38.71 | 0.974 / 36.33 | 0.971 | 1.14 |
| RawIR [20] | Baseline | 0.984 / 44.20 | 0.978 / 40.30 | 0.974 / 38.30 | 1.5 | 12.3 |
| Samsung AI 5.1 | Efficient | 0.991 / 45.10 | 0.980 / 40.82 | 0.971 / 38.46 | 0.19 | 10.98 |
| LMPR-Net 5.4 | Efficient | 0.989 / 42.57 | 0.973 / 39.17 | 0.961 / 37.26 | 0.19 | 2.63 |
| Samsung AI 5.1 | General | 0.993 / 46.04 | 0.985 / 42.25 | 0.978 / 40.10 | 4.97 | 23.79 |
| Miers 5.2 | General | 0.993 / 45.72 | 0.983 / 41.73 | 0.974 / 39.50 | 4.76 | #N/A |
| Multi-PromptIR 5.3 | General | 0.986 / 44.80 | 0.978 / 41.38 | 0.968 / 38.96 | 39.92 | 158.24 |
| ER-NAFNet 5.5 | General | 0.992 / 45.10 | 0.972 / 39.32 | 0.953 / 36.13 | 4.57 | #N/A |
Analysis:
-
Test Levels:
- Level 1 (Noise only): Easiest degradation. Most solutions perform well, with top entries achieving
PSNRover 45 dB.Samsung AI(General) leads with 46.04 dBPSNR. - Level 2 (Noise and/or Blur): Intermediate degradation.
Samsung AI(General) again leads with 42.25 dBPSNR.Miersfollows closely with 41.73 dBPSNR. - Level 3 (Realistic Blur and Noise): Most challenging degradation.
Samsung AI(General) achieves 40.10 dBPSNR. This level shows the greatest performance gap between methods, as combiningdenoisinganddeblurringis complex.
- Level 1 (Noise only): Easiest degradation. Most solutions perform well, with top entries achieving
-
Efficient Track:
Samsung AI(Efficient) wins this track convincingly, with 45.10 dB, 40.82 dB, and 38.46 dBPSNRacross the three levels, all with only 0.19M parameters. This highlights their successfuldistillation strategyandNAFNet-basedarchitectural modifications (ERIRNet-T).LMPR-Net(Team WIRTeam) is the second best in the efficient track, with 0.19M parameters and 2.63GMACs, showing goodPSNRvalues (42.57, 39.17, 37.26).
-
General Track:
Samsung AI(General) is the overall winner forRAWIR, achieving the bestPSNRandSSIMscores across all degradation levels. TheirERIRNet-Smodel with 4.97M parameters showcasesstate-of-the-artperformance.Miers(Team Xiaomi Inc.) with theirModified SwinFIR-Tinyfollows closely, especially in the more challenging levels (41.73 dB for Level 2, 39.50 dB for Level 3).Multi-PromptIR(Team WIRTeam) has a significantly higher parameter count (39.92M) andMACs(158.24 G) but achieves slightly lowerPSNRthanSamsung AIandMiers. This indicates that larger models do not always guarantee the best performance, or that their architectural choices might be less optimized for this specific challenge's degradations despite the increased complexity.
-
Comparison to Baselines: All top challenge solutions notably improve upon the provided baselines (
PMRID,NAFNet,MOFA,RawIR), demonstrating progress inRAW image restoration. For instance,Samsung AI(General) achieves 46.04 dBPSNRat Level 1, significantly higher thanRawIR's 44.20 dB.Qualitative results (Figure 21) show that most solutions perform excellently for
Level 1(noise removal). ForLevels 2 and 3(noise + blur),Samsung AIandMiersshow the best results, while others likeWIRTeamandChickentRun(impliedER-NAFNet) show less effective blur removal. The report notes that models struggle to tackledenoisinganddeblurringsimultaneously, especially blur, as these might require opposite operations.
6.2. Data Presentation (Tables)
The following are the results from Table 1 of the original paper:
| Method | Track | PSNR | SSIM | # Par. |
| SMFFRaw-S 4.3 | Efficient | 42.12 | 0.9433 | 0.18 |
| RawRTSR 4.1 | Efficient | 41.74 | 0.9417 | 0.19 |
| NAFBN 4.5 | Efficient | 40.67 | 0.9347 | 0.19 |
| MambaIRv2 4.6 | Efficient | 40.32 | 0.9396 | 0.19 |
| RepRAW-SR-Tiny 4.7 | Efficient | 40.01 | 0.9297 | 0.02 |
| RepRAW-SR-Large 4.7 | Efficient | 40.56 | 0.9339 | 0.09 |
| ECAN 4.8 | Efficient | 39.13 | 0.9057 | 0.09 |
| USTC 4.2 | General | 42.70 | 0.9479 | 1.94 |
| SMFFRaw-S 4.3 | General | 42.60 | 0.9467 | 1.99 |
| RawRTSR-L 4.1 | General | 42.58 | 0.9475 | 0.26 |
| ERBSFormer 4.4 | General | 42.45 | 0.9448 | 3.30 |
| ER-NAFNet 5.5 | General | 41.17 | 0.9348 | - |
| RBSFormer [33] | 2024 | 43.649 | 0.987 | 3.3 |
| BSRAW [18] | 2024 | 42.853 | 0.986 | 1.5 |
| Bicubic [19] | 2024 | 36.038 | 0.952 | - |
The following are the results from Table 2 of the original paper:
| Method | Type | Test Level 1 | Test Level 2 | Test Level 3 | # Params. (M) | # MACs (G) |
| Test Images | 0.953 / 39.56 | 0.931 / 35.30 | 0.907 / 33.03 | |||
| PMRID [] | Baseline | 0.982 / 42.41 | 0.965 / 38.43 | 0.951 / 35.97 | 1.032 | 1.21 |
| NAFNET [7] | Baseline | 0.983 / 43.50 | 0.972 / 39.70 | 0.962 / 37.49 | 1.130 | 3.99 |
| MOFA [9] | Baseline | 0.982 / 42.54 | 0.966 / 38.71 | 0.974 / 36.33 | 0.971 | 1.14 |
| RawIR [20] | Baseline | 0.984 / 44.20 | 0.978 / 40.30 | 0.974 / 38.30 | 1.5 | 12.3 |
| Samsung AI 5.1 | Efficient | 0.991 / 45.10 | 0.980 / 40.82 | 0.971 / 38.46 | 0.19 | 10.98 |
| LMPR-Net 5.4 | Efficient | 0.989 / 42.57 | 0.973 / 39.17 | 0.961 / 37.26 | 0.19 | 2.63 |
| Samsung AI 5.1 | General | 0.993 / 46.04 | 0.985 / 42.25 | 0.978 / 40.10 | 4.97 | 23.79 |
| Miers 5.2 | General | 0.993 / 45.72 | 0.983 / 41.73 | 0.974 / 39.50 | 4.76 | #N/A |
| Multi-PromptIR 5.3 | General | 0.986 / 44.80 | 0.978 / 41.38 | 0.968 / 38.96 | 39.92 | 158.24 |
| ER-NAFNet 5.5 | General | 0.992 / 45.10 | 0.972 / 39.32 | 0.953 / 36.13 | 4.57 | #N/A |
6.3. Ablation Studies / Parameter Analysis
Some teams provided details on their architectural choices and efficiency optimizations, which can be seen as implicit ablation or parameter analyses.
-
Team EiffLowCVer (RepRawSR): They explicitly conducted an ablation study (Table 7 in the paper) to evaluate their model variants (
RepTiny-21k,RepLarge-97k) against aNAFnet-1.9Mbaseline.NAFnet-1.9M (baseline): 9.68GFLOPs, 40.21 dBPSNR(self-val), 41.70 dBPSNR(val).RepLarge-97k: 24.42GFLOPs, 39.22 dBPSNR(self-val), 40.80 dBPSNR(val).RepTiny-21k: 5.65GFLOPs, 39.00 dBPSNR(self-val). This study shows the trade-off betweenFLOPs/parametersandPSNR. WhileRepLarge-97kandRepTiny-21kare significantly more parameter-efficient, they achieve slightly lowerPSNRthan theNAFnetbaseline, particularly in the validation set.RepTiny-21koffers extremely lowFLOPsand parameters but with a furtherPSNRdrop. Their analysis also noted that increasing the number offeature extraction moduleswithskip connectionsforRepTiny-21kwas key to balancing performance and speed.
-
Team Samsung AI (RawRTSR & ERIRNet): Their work implicitly details a parameter analysis by offering
Efficient(e.g.,RawRTSR,ERIRNet-T) andGeneral(e.g.,RawRTSR-L,ERIRNet-S) variants.- For
RawRTSR, increasingfeature channelsfrom 48 to 64 and addingchannel attention(inRawRTSR-L) led to higher performance (from 41.74 dBPSNRto 42.58 dBPSNR) with a small increase in parameters (0.19M to 0.26M). - For
ERIRNet, theEfficientversion (ERIRNet-T, 0.19M params) still achieved very strong performance (e.g., 45.10 dBPSNRat Level 1), while theGeneralversion (ERIRNet-S, 4.97M params) pushed thePSNReven higher (46.04 dB at Level 1). This clearly demonstrates how architectural choices (channel width, block count,ConvTransposevs.PixelUnshuffle) directly impact theperformance-efficiency trade-off. - Their
knowledge distillationstrategy, usingX-Restormeras ateacher, is a key factor in how the smallerstudent models(likeERIRNet-T) achieve high performance despite their limited parameters.
- For
-
Team Miers (Modified SwinFIR-Tiny): Their multi-stage training process, gradually adding modules like
Feature Fusion,Channel Attention,ConvRep5, andCABviazero convolution, effectively acts as an ablation study for their specific architectural enhancements. Adjustingnoise intensityandpatch sizesat different stages also highlights parameter tuning strategies.These details, while not always presented as formal ablation tables, provide crucial insights into how different teams designed, optimized, and fine-tuned their models for specific
performanceandefficiencytargets within the challenge.
7. Conclusion & Reflections
7.1. Conclusion Summary
The NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution successfully pushed the boundaries of RAW image processing in two critical areas: restoration (denoising and deblurring) and super-resolution. The challenge highlighted the growing importance of processing RAW data directly within modern Image Signal Processing (ISP) pipelines, mitigating the irreversible information loss associated with sRGB processing.
Key contributions include the establishment of a robust challenge framework, comprehensive datasets based on real-world RAW images and sophisticated degradation pipelines, and a thorough benchmarking of diverse state-of-the-art solutions. A total of 45 teams submitted results, showcasing innovative approaches primarily leveraging Convolutional Neural Networks (CNNs) and Transformers.
The challenge demonstrated that deep learning methods can significantly improve RAW image quality and resolution, outperforming traditional baselines by a large margin. Notably, Team Samsung AI Camera emerged as a strong performer, winning both the general and efficient tracks in the RAWIR challenge and demonstrating leading performance in RAWSR. Their success was often attributed to efficient NAFNet-based architectures and effective knowledge distillation strategies. The efficient tracks, with their strict parameter budgets (e.g., 200K parameters), successfully spurred the development of lightweight yet high-performing models suitable for mobile devices.
7.2. Limitations & Future Work
The paper and challenge results implicitly and explicitly point to several limitations and suggest future research directions:
- Realistic Downsampling: The report explicitly states that
more realistic downsamplingremains an open challenge forRAWSR. While synthetic degradation pipelines are used, creatinglow-resolution RAW imagesthat perfectly mimic real-world acquisition physics forsuper-resolutionis complex and crucial for real-world applicability. - Joint Denoising and Deblurring: For
RAWIR, the paper observes that models struggle to tackledenoisinganddeblurringsimultaneously, especiallyblur, as these might require opposite operations. This indicates a need for more robust and specialized architectures or training strategies that can disentangle and address these coupled degradations more effectively. - Generalization to Diverse Real-World Conditions: While the datasets include diverse camera sensors and scenes, real-world conditions present an even wider array of unknown
noise profiles,blur types, andlighting scenarios. Further research is needed to ensure models generalize robustly to entirely unseen degradation patterns. - Computational Efficiency vs. Absolute Performance: Although efficient models made significant strides, a gap in absolute performance often remains when compared to larger, less constrained models (e.g., comparing 2025 efficient RAWSR methods to 2024 general track baselines). Future work could focus on techniques that close this gap further without sacrificing efficiency.
- Specific ISP Integration: While operating on
RAWis beneficial, the ultimate goal is to integrate these improvedRAWoutputs back into a fullISP pipeline. Research on how theserestored RAW imagesinteract with subsequentISPstages (likedemosaicing,tone mapping) and how to optimize the entire pipeline (or replace parts of it) could be a valuable direction.
7.3. Personal Insights & Critique
This paper and the NTIRE 2025 Challenge represent a vital step forward in computational photography. My personal insights and critiques are:
-
Value of RAW Domain Processing: The challenge strongly reinforces the critical advantage of processing
RAW imagesoversRGB. By operating on linear sensor data, models have access to richer information, leading to betterrestorationoutcomes. This paradigm shift is essential for pushing image quality limits, especially in mobile devices. -
Importance of Efficiency: The
efficient trackis a brilliant addition. In real-world scenarios, particularly forsmartphones, computational budgets are extremely tight. Forcing participants to innovate within these constraints drives practical solutions and fosters research into lightweight architectures,knowledge distillation, andreparameterization, which are directly transferable to commercial products. -
Synthetic Degradation Gap: While the degradation pipelines are sophisticated, a fundamental challenge with
synthetic degradationsis the "reality gap." Models trained solely on synthetic data may not perform optimally on real-world degraded images due to unmodeled complexities. The encouragement for participants to develop their owndegradation pipelinesis a good step, but fully learning from real-worlddegraded-clean RAW pairs(which are hard to acquire) remains the ultimate goal. The observation thatmore realistic downsamplingis an open challenge specifically highlights this issue forSR. -
Complexity of Joint Restoration: The difficulty in effectively tackling both
denoisinganddeblurringsimultaneously is a recurring theme inimage restoration. These two tasks can sometimes have conflicting objectives (e.g., aggressive denoising might smooth out fine details, which are then hard to deblur). This suggests that multi-task learning for these specific degradations might benefit from more sophisticated disentanglement strategies or adaptive modules that can prioritize based on estimated degradation levels. -
Impact on ISP Pipelines: The methods and findings from this challenge have direct implications for the future of
smartphone ISP pipelines. Instead of fixed, hand-tunedISPalgorithms, we might see more dynamic,AI-drivenmodules operating onRAW datato deliverreal-time, high-quality image enhancement. The "replace mobile camera ISP with a single deep learning model" vision mentioned in the introduction is becoming increasingly tangible. -
Future Transferability: Many of the architectural principles and optimization techniques (e.g.,
NafBlocks,Transformer block variants,reparameterization,knowledge distillation) developed for thisRAW domain challengeare highly transferable. They could be adapted for other low-level vision tasks or evenRAW processingin other imaging modalities (e.g., medical imaging, scientific cameras).In conclusion, the
NTIRE 2025 Challengesuccessfully catalyzes innovation inRAW image processing, providing a robust benchmark and highlighting crucial areas for future research at the intersection of image quality, computational efficiency, and real-world applicability.
Similar papers
Recommended via semantic vector search.