Toward Real-Time and Efficient Edge Intelligence: Advances and Challenges in Lightweight Machine Learning
DOI:
https://doi.org/10.61173/bwn0ez36Keywords:
Machine Learning, knowledge distillation, Lightweight modelAbstract
Deploying advanced Machine Learning (ML), particularly Deep Neural Networks (DNNs), on resource-constrained edge devices is crucial for realizing low-latency, privacy-preserving, and reliable edge intelligence applications. However, a significant gap exists between the high computational, memory, and energy demands of state-of-the-art models and the severe limitations inherent to edge hardware. This review systematically analyzes the field of lightweight ML for edge devices, aiming to bridge this gap. Methods: Focusing on the inference phase, the review critically examines three primary technical pillars: (1) Model Compression techniques, including knowledge distillation, network pruning (structured and unstructured), and quantization; (2) Efficient Neural Architecture Design of inherently compact models (e.g., MobileNet, ShuffleNet, EfficientNet series); and (3) Hardware-aware Optimization and Adaptation, encompassing operator fusion, dedicated inference engines, and leveraging heterogeneous systems. Results and Conclusion: The analysis highlights key achievements in reducing model size, complexity, and latency while maintaining accuracy. However, fundamental challenges persist, including the accuracy-efficiency tradeoff, hardware fragmentation, the memory wall bottleneck, and privacy/security concerns during deployment. Emerging solutions like neural-symbolic learning, adaptive federated learning, hardware-aware Neural Architecture Search (NAS), Processing-in-Memory (PIM) accelerators, and cross-stack co-design frameworks represent promising future directions. Overcoming these challenges is strategically vital for unlocking the full potential of ubiquitous, real-time edge intelligence.