UAVDet:航拍密集交通目标的轻量检测算法
作者单位:

1.厦门理工学院;2.厦门大学

基金项目:

福建省自然科学基金资助项目(2023J01439)


UAVDet: Lightweight Detection Algorithm for Aerial Imagery of Dense Traffic Targets
Author:
Affiliation:

School of Mechanical and Automotive Engineering

Fund Project:

upported by the Natural Science Foundation of Fujian Province(2023J01439)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [32]
  • | |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    针对航拍密集交通图像中存在的小尺度目标检测精度低和模型参数量大的问题,提出一种轻量、高效的航拍图像检测算法模型UAVDet。首先设计了大核可分离注意力空间池化模块LSKASPM来加强模型对小尺度目标空间信息和语义信息的捕捉能力;其次构建了可变形上下文特征引导聚合模块C2f-DCG来增强模型对各尺度目标的特征理解能力;然后提出多尺度特征融合模块MSFM为引入的高分辨率检测分支SHead聚和更具细粒度的全局特征;最后应用基于网络权重幅值的层自适应稀疏剪枝技术LAMP降低模型参数规模。在公开数据集VisDrone上的实验结果表明,该模型针对城市十类常见交通目标的平均检测精度和漏检率分别为47.2%和47.5%,模型参数量为6.3M,推理速度达到197帧/秒,均优于现有公开算法。相关算法代码将在https://github.com/XMUT-Vsion-Lab/UAVDet公开。

    Abstract:

    To address the issues of low detection accuracy for small-scale objects and large model parameters in aerial images of dense traffic, we propose a lightweight and efficient aerial image detection algorithm model, UAVDet. First, we design the large-kernel separable attention spatial pooling module (LSKASPM) to enhance the model's ability to capture spatial and semantic information for small-scale objects. Next, we construct the deformable context feature-guided aggregation module (C2f-DCG) to improve the model's feature understanding across multiple scales. Then, we introduce the multi-scale feature fusion module (MSFM) to aggregate high-resolution detection branch (SHead) features and provide more fine-grained global features. Finally, the layer-wise adaptive sparse pruning technique (LAMP) based on network weight magnitudes is applied to reduce the model's parameter size. Experimental results on the public VisDrone dataset show that the model achieves an average detection accuracy of 47.2% and a missed detection rate of 47.5% for ten common traffic target classes in urban areas. The model has 6.3M parameters and an inference speed of 197 frames per second, outperforming existing public algorithms. The relevant algorithm code will be publicly available at https://github.com/XMUT-Vsion-Lab/UAVDet.

    参考文献
    [1] 邓天民,程鑫鑫,刘金凤等.基于特征复用机制的航拍图像小目标检测算法[J].浙江大学学报(工学版),2024,58(03):437-448.
    DENG T M, CHENG X X, LIU J F, et al. Small target detection algorithm for aerial images based on feature reuse mechanism.[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 437-448.
    [3] [2] 谌海云,肖章勇,郭勇,等. 基于改进YOLOv8的无人机航拍目标检测算法 [J/OL].电光与控制,1-12[2024-11-18]. http://kns.cnki.net/kcms/detail/41.1227.TN.20241106.1730.007.html.
    ZHAN H Y, XIAO WZ Y, GUO Y, et al. UAV Aerial Target Detection Algorithm Based on Improved YOLOv8 [J/OL]. Electro-Optics and Contro,1-12[2024-11-18]. http://kns.cnki.net/kcms/detail/41.1227.TN.20241106.1730.007.html.
    [5] [3] Lim J S, Astrid M, Yoon H J, et al. Small object detection using context and attention[C]//2021 international Conference on Artificial intelligence in information and Communication (ICAIIC). IEEE, 2021: 181-186.
    [6] [4] Meethal A, Granger E, Pedersoli M. Cascaded zoom-in detector for high resolution aerial images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 2046-2055.
    [7] [5] Liu C, Gao G, Huang Z, et al. YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images[J]. IEEE Transactions on Intelligent Transportation Systems, 2024.
    [8] [6] Liu F, Yao L, Zhang C, et al. Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection[J]. arXiv preprint arXiv:2405.15465, 2024.
    [9] [7] Du Z, Hu Z, Zhao G, et al. Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images[J]. arXiv preprint arXiv:2407.19696, 2024.
    [10] [8] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]// European conference on computer vision. Cham: Springer International Publishing, 2020: 213-229.
    [11] [9] Kim Y M, Yoo Y H, Yoon I U, et al. Spatio-Temporal Deformable DETR for Weakly Supervised Defect Localization[J]. IEEE Sensors Journal, 2023.
    [12] [10] Wang C Y, Yeh I H, Liao H Y M. Yolov9: Learning what you want to learn using programmable gradient information[J]. arXiv preprint arXiv:2402.13616, 2024.
    [13] [11] Zhao Y, Lv W, Xu S, et al. Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 16965-16974.
    [14] [12] Lee J, Park S, Mo S, et al. Layer-adaptive sparsity for the magnitude-based pruning. arXiv 2020[J]. arXiv preprint arXiv:2010.07611.2020.
    [15] [13] Terven J, Córdova-Esparza D M, Romero-González J A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas[J]. Machine Learning and Knowledge Extraction, 2023, 5(4): 1680-1716.
    [16] [14] 潘玮,韦超,钱春雨.面向无人机视角下小目标检测的YOLOv8s改进模型[J].计算机工程与应用, 2024, 60(09):142-150.
    PAN W, WEI C, QIAN C Y. Improved YOLOv8s Model for Small Object Detection from Perspective of Drones[J]. Computer Engineering and Applications, 2024, 60(9): 142-150.
    [18] [15] Lau K W, Po L M, Rehman Y A U. Large separable kernel attention: Rethinking the large kernel attention design in cnn[J]. Expert Systems with Applications, 2024, 236: 121352.
    [19] [16] Zheng Z, Wang P, Liu W, et al. Distance-IoU loss: Faster and better learning for bounding box regression[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 12993-13000.
    [20] [17] Ross T Y, Dollár G. Focal loss for dense object detection[C]//proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2980-2988.
    [21] [18] Cao Y, He Z, Wang L, et al.VisDrone-DET2021:The vision meets drone object detection challenge results [C]//Proceedings of the IEEE/CVF International conference on computer vision. 2021: 2847-2854.
    [22] [19] 刘洲峰,吴文涛,李环宇,等.聚类和群智能优化算法的自动剪枝方法[J/OL].计算机工程与应用,1-14[2024-11-04].http://kns.cnki.net/kcms/dtail/11.2127.TP.20240602.1057.002.html.
    Liu Z F,Wu W T,Li Y H, et al. Automatic Channel Pruning Method Based on Clustering and Swarm Intelligence Optimization Algorithm[J/OL].Computer Engineering and Applications,1-14[2024-1104].http://kns.cnki.net/kcms/detail/11.2127.TP.20240602.1057.002.html.
    [24] [20] Fang G, Ma X, Song M, et al. Depgraph: Towards any structural pruning[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 16091-16101.
    [25] [21] Friedman J, Hastie T, Tibshirani R. A note on the group lasso and a sparse group lasso[J]. arXiv preprint arXiv:1001.0736, 2010.
    [26] [22] Molchanov P, Mallya A, Tyree S, et al. Importance estimation for neural network pruning[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 11264-11272.
    [27] [23] Faster R. Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 9199(10.5555): 2969239-2969250.
    [28] [24] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21-37.
    [29] [25] Wang A, Chen H, Liu L, et al. Yolov10: Real-time end-to-end object detection[J]. arXiv preprint arXiv:2405.14458, 2024.
    [30] [26] Khanam R, Hussain M. YOLOv11: An Overview of the Key Architectural Enhancements[J]. arXiv preprint arXiv:2410.17725, 2024.
    [31] [27] Liu S, Li F, Zhang H, et al. Dab-detr: Dynamic anch-or boxes are better queries for detr[J]. arXiv preprint arXiv:2201.12329, 2022.
    [32] [28] Zhang H, Li F, Liu S, et al. Dino: Detr with improved denoising anchor boxes for end-to-end object detection[J]. arXiv preprint arXiv:2203.03605, 2022.
    相似文献
    引证文献
    引证文献 [0] 您输入的地址无效!
    没有找到您想要的资源,您输入的路径无效!

    网友评论
    网友评论
    分享到微博
    发 布
引用本文
分享
文章指标
  • 点击次数:63
  • 下载次数: 0
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2024-11-18
  • 最后修改日期:2024-12-04
  • 录用日期:2025-02-20
文章二维码