Paper status: completed

VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis

Published:12/24/2024
Original LinkPDF
Price: 0.100000
Price: 0.100000
Price: 0.100000
7 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

VisionGRU introduces a novel linear-complexity RNN architecture using a simplified minGRU and hierarchical bidirectional scanning to efficiently analyze high-resolution images. This method achieves superior performance compared to ViTs while significantly reducing computational a

Abstract

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) are two dominant models for image analysis. While CNNs excel at extracting multi-scale features and ViTs effectively capture global dependencies, both suffer from high computational costs, particularly when processing high-resolution images. Recently, state-space models (SSMs) and recurrent neural networks (RNNs) have attracted attention due to their efficiency. However, their performance in image classification tasks remains limited. To address these challenges, this paper introduces VisionGRU, a novel RNN-based architecture designed for efficient image classification. VisionGRU leverages a simplified Gated Recurrent Unit (minGRU) to process large-scale image features with linear complexity. It divides images into smaller patches and progressively reduces the sequence length while increasing the channel depth, thus facilitating multi-scale feature extraction. A hierarchical 2DGRU module with bidirectional scanning captures both local and global contexts, improving long-range dependency modeling, particularly for tasks like semantic segmentation. Experimental results on the ImageNet and ADE20K datasets demonstrate that VisionGRU outperforms ViTs, significantly reducing memory usage and computational costs, especially for high-resolution images. These findings underscore the potential of RNN-based approaches for developing efficient and scalable computer vision solutions. Codes will be available at https://github.com/YangLiu9208/VisionGRU.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

  • Title: VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis
  • Authors: Shicheng Yin, Kaixuan Yin, Weixing Chen, Enbo Huang, Yang Liu
    • Affiliations: Sun Yat-sen University, Nanning Normal University. The authors are primarily from academic institutions, suggesting a research-focused contribution.
  • Journal/Conference: The paper is available on arXiv, which is a preprint server. This means it has not yet undergone formal peer review for a conference or journal but is shared to disseminate findings quickly.
  • Publication Year: 2024 (based on the arXiv ID).
  • Abstract: The paper introduces VisionGRU, a novel architecture based on Recurrent Neural Networks (RNNs) for efficient image analysis. It aims to solve the high computational cost problem of dominant models like Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), especially with high-resolution images. The core of VisionGRU is a simplified Gated Recurrent Unit (minGRU) that processes image features with linear complexity. The architecture is hierarchical, reducing sequence length while increasing channel depth to extract multi-scale features. A key component is the 2DGRU module, which uses bidirectional scanning to capture both local and global context. Experiments on ImageNet (classification) and ADE20K (segmentation) show that VisionGRU outperforms ViTs with significantly lower memory and computational costs.
  • Original Source Link:

Similar papers

Recommended via semantic vector search.

No similar papers found yet.