Abstract

Underwater images suffer from severe degradations, including color distortions, reduced visibility, and loss of structural details due to wavelength-dependent attenuation and scattering. Existing enhancement methods primarily focus on spatial-domain processing, neglecting the frequency domain’s potential to capture global color distributions and long-range dependencies. To address these limitations, we propose FUSION, a dual-domain deep learning framework that jointly leverages spatial and frequency domain information. FUSION independently processes each RGB channel through multi-scale convolutional kernels and adaptive attention mechanisms in the spatial domain, while simultaneously extracting global structural information via FFT-based frequency attention. A Frequency Guided Fusion module integrates complementary features from both domains, followed by inter-channel fusion and adaptive channel recalibration to ensure balanced color distributions. Extensive experiments on benchmark datasets (UIEB, EUVP, SUIM-E) demonstrate that FUSION achieves state-of-the-art performance, consistently outperforming existing methods in reconstruction fidelity (highest PSNR of 23.717 dB and SSIM of 0.883 on UIEB), perceptual quality (lowest LPIPS of 0.112 on UIEB), and visual enhancement metrics (best UIQM of 3.414 on UIEB), while requiring significantly fewer parameters (0.28 M) and lower computational complexity, demonstrating its suitability for real-time underwater imaging applications.

Contributions

We introduce a novel dual-domain framework that combines multi-scale spatial feature extraction with FFT-based frequency attention to address wavelength-dependent degradations in underwater imagery.
We design a dedicated Frequency Attention module that preserves phase information while adaptively weighting frequency magnitudes to capture global structural cues.
We propose an inter-channel fusion and adaptive channel calibration stage to balance R-G-B intensities, removing residual color casts common in underwater scenes.
We achieve state-of-the-art results on UIEB, EUVP, and SUIM-E with only 0.28 M parameters and 36.73 GFLOPs, enabling real-time performance on resource-constrained platforms.

FUSION: Method

The FUSION framework enhances underwater images by processing them in both spatial and frequency domains. Starting with an input image of size H×W×3, we split it into its three color channels (D_R, D_G, D_B). In the spatial path, each channel undergoes multi-scale convolutions (3×3 for red, 5×5 for green, and 7×7 for blue) followed by channel and spatial attention (CBAM) and a residual connection to preserve fine details. Concurrently, in the frequency path, each channel is transformed via a 2D FFT to extract its magnitude, which is refined through two 1×1 convolutions and a Frequency Attention mechanism that produces weighted magnitude maps. The refined magnitude is recombined with original phase information and passed through an IFFT to yield frequency-domain features.

For each channel, spatial and frequency features are concatenated and fused via a small convolutional block (Frequency Guided Fusion), followed by adding back the original input channel in a residual fashion. The three fused channels are then concatenated to form a joint representation, which is projected to a higher-dimensional space and further combined with aggregated frequency features through a learned transform. A global CBAM module refines this fused representation, and a decoder reconstructs a preliminary enhanced RGB image. Finally, an Adaptive Channel Calibration step computes per-channel scaling factors from global image statistics and applies them to balance color distributions, yielding the final enhanced output.

Qualitative Results: UIEB Dataset

Visual comparison of FUSION against other state-of-the-art methods on the UIEB test set; note the restored natural colors and enhanced details.

Qualitative Results: EUVP Dataset

FUSION effectively recovers contrast and corrects color casts compared to competing approaches on EUVP.

Ablation Study: Component Contributions

From the ablation studies across UIEB and EUVP, it is evident that each architectural component contributes meaningfully to overall performance. Removing frequency attention, branch, or guided fusion consistently leads to notable degradation in perceptual quality (higher LPIPS, lower UIQM and UISM), affirming the critical role of frequency-aware design in FUSION. Similarly, channel calibration and attention blocks - both local and global - also drive significant gains, especially in structural sharpness and perceptual realism. Interestingly, global attention appears to be particularly vital in retaining fine-grained global coherence, while local attention improves texture fidelity. Models stripped of frequency modules or reduced to spatial-only designs suffer from reduced enhancement quality, confirming the synergy between spectral and spatial representations in underwater image enhancement.

Ablation Study Visuals

Visual examples showing the impact of removing key modules: frequency attention, frequency branch, fusion, channel calibration, local attention, and global attention.

Efficiency Analysis

FUSION achieves a favorable balance between parameter count (0.28 M) and computational cost (36.73 GFLOPs), outperforming larger models in both quality and efficiency.

BibTeX

@InProceedings{FUSION,
  author    = {Jaskaran Singh Walia and Shravan Venkatraman and Pavithra LK},
  title={FUSION: Frequency-guided Underwater Spatial Image recOnstructioN}, 
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  month     = {June},
  year      = {2025}
}

FUSION: Frequency-guided Underwater Spatial Image recOnstructioN

CVPRW 2025 (NTIRE)

FUSION jointly leverages spatial and frequency domains to remove color casts, restore detail, and balance underwater images.