CNN-Transformer Hybrid Models for Object Detection: A Comprehensive Review

Lyuyang  Gao

doi:10.61173/19bc6r78

Authors

Lyuyang Gao Author

DOI:

https://doi.org/10.61173/19bc6r78

Keywords:

CNN-Transformer Hybrid Model, Serial Ar-chitecture Fusion Approach, Parallel Architecture Fusion Method

Abstract

Initially, conventional convolutional neural networks were the primary approach for object detection, a core computer vision task. However, the emergence of Transformer architecture has significantly enhanced detection accuracy and generalization capabilities, playing a pivotal role in advancing intelligent systems across various domains. Recently, the integration of CNN and Transformer architectures has emerged as a key area of investigation for detecting objects. By combining the complementary advantages of CNNs and Transformers, these hybrid architectures enhance accuracy in various object recognition scenarios. This study commences with a concise overview of CNNs and Transformers, critically analyzing their respective advantages and limitations. Subsequently, we conduct a systematic examination of state-of-the-art hybrid architectures and their optimization strategies. Finally, a comprehensive comparison and summary are presented in tabular form to facilitate clear performance evaluation. These approaches are designed to harness CNNs’ superiority in local feature extraction while leveraging Transformers’ capacity for global context modeling. At the end of the paper, the prospects of hybrid models in object detection and the insights to guide further research have been discussed.

CNN-Transformer Hybrid Models for Object Detection: A Comprehensive Review

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section