Evolutionary Trajectory and Groundbreaking Innovations in the YOLO Object Detection Algorithm
DOI:
https://doi.org/10.61173/541gkj05Keywords:
Convolutional Neural Networks, single-stage object detection, YOLO, algorithmsAbstract
This paper systematically traces the evolutionary trajectory of the YOLO series (versions 1 through 12) within the field of computer vision object detection. Pioneered by YOLOv1 in 2015, this framework introduced the groundbreaking single-stage detection and regression paradigm, enabling end-to-end detection through its S×S grid architecture. Its Fast YOLO variant demonstrated notable real-time performance advantages on the PASCAL VOC dataset. Subsequent iterations marked significant advancements: v2 incorporated batch normalization and anchor priors, enhancing efficiency with the Darknet-19 backbone while YOLO9000 expanded multi-category recognition capabilities; v3 optimized accuracy through Darknet-53 and multi-scale feature fusion; v4 formalized the modular "Backbone-Neck-Head" design. Enhancements continued from v8 to v12—v8's C2f module bolstered feature fusion, v9 addressed gradient misalignment via its PGI framework, v10 achieved NMS-free end-to-end detection, v11 improved efficiency with the C3k2 module, and v12 enhanced real-time capabilities via the R-ELAN structure. Through iterative development, the series exhibits substantial improvements in detection speed, accuracy, and adaptability to complex scenarios, securing its position as a mainstream solution. Future applications hold considerable promise for leveraging this technology in demanding contexts such as embodied intelligence, medical diagnostics, and tunnel inspection.