Paper status: completed

VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis

Published:12/24/2024

Linear-Complexity RNN Architecture (1)VisionGRU Model (1)Image Classification and Semantic Segmentation (1)Multi-Scale Feature Extraction (1)Large-Scale Image Sequence Modeling (1)

Original Link PDF

Price: 0.100000

7 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

VisionGRU introduces a novel linear-complexity RNN architecture using a simplified minGRU and hierarchical bidirectional scanning to efficiently analyze high-resolution images. This method achieves superior performance compared to ViTs while significantly reducing computational a

Abstract

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) are two dominant models for image analysis. While CNNs excel at extracting multi-scale features and ViTs effectively capture global dependencies, both suffer from high computational costs, particularly when processing high-resolution images. Recently, state-space models (SSMs) and recurrent neural networks (RNNs) have attracted attention due to their efficiency. However, their performance in image classification tasks remains limited. To address these challenges, this paper introduces VisionGRU, a novel RNN-based architecture designed for efficient image classification. VisionGRU leverages a simplified Gated Recurrent Unit (minGRU) to process large-scale image features with linear complexity. It divides images into smaller patches and progressively reduces the sequence length while increasing the channel depth, thus facilitating multi-scale feature extraction. A hierarchical 2DGRU module with bidirectional scanning captures both local and global contexts, improving long-range dependency modeling, particularly for tasks like semantic segmentation. Experimental results on the ImageNet and ADE20K datasets demonstrate that VisionGRU outperforms ViTs, significantly reducing memory usage and computational costs, especially for high-resolution images. These findings underscore the potential of RNN-based approaches for developing efficient and scalable computer vision solutions. Codes will be available at https://github.com/YangLiu9208/VisionGRU.

Mind Map

In-depth Reading

English Analysis~1 min read · 1,290 chars

1. Bibliographic Information

Title: VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis
Authors: Shicheng Yin, Kaixuan Yin, Weixing Chen, Enbo Huang, Yang Liu
- Affiliations: Sun Yat-sen University, Nanning Normal University. The authors are primarily from academic institutions, suggesting a research-focused contribution.
Journal/Conference: The paper is available on arXiv, which is a preprint server. This means it has not yet undergone formal peer review for a conference or journal but is shared to disseminate findings quickly.
Publication Year: 2024 (based on the arXiv ID).
Abstract: The paper introduces VisionGRU, a novel architecture based on Recurrent Neural Networks (RNNs) for efficient image analysis. It aims to solve the high computational cost problem of dominant models like Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), especially with high-resolution images. The core of VisionGRU is a simplified Gated Recurrent Unit (minGRU) that processes image features with linear complexity. The architecture is hierarchical, reducing sequence length while increasing channel depth to extract multi-scale features. A key component is the 2DGRU module, which uses bidirectional scanning to capture both local and global context. Experiments on ImageNet (classification) and ADE20K (segmentation) show that VisionGRU outperforms ViTs with significantly lower memory and computational costs.
Original Source Link:
- arXiv Link: https://arxiv.org/abs/2412.18178
- PDF Link: http://arxiv.org/pdf/2

Similar papers

Recommended via semantic vector search.

No similar papers found yet.