Abstract:To address the issues of low detection accuracy for small-scale objects and large model parameters in aerial images of dense traffic, we propose a lightweight and efficient aerial image detection algorithm model, UAVDet. First, we design the large-kernel separable attention spatial pooling module (LSKASPM) to enhance the model's ability to capture spatial and semantic information for small-scale objects. Next, we construct the deformable context feature-guided aggregation module (C2f-DCG) to improve the model's feature understanding across multiple scales. Then, we introduce the multi-scale feature fusion module (MSFM) to aggregate high-resolution detection branch (SHead) features and provide more fine-grained global features. Finally, the layer-wise adaptive sparse pruning technique (LAMP) based on network weight magnitudes is applied to reduce the model's parameter size. Experimental results on the public VisDrone dataset show that the model achieves an average detection accuracy of 47.2% and a missed detection rate of 47.5% for ten common traffic target classes in urban areas. The model has 6.3M parameters and an inference speed of 197 frames per second, outperforming existing public algorithms. The relevant algorithm code will be publicly available at https://github.com/XMUT-Vsion-Lab/UAVDet.