面向自动驾驶的多模态信息融合动态目标识别方法研究
作者:
作者单位:

1.广东轻工职业技术学院;2.广汽埃安新能源汽车股份有限公司研发中心;3.华南理工大学;4.广州城市理工学院

基金项目:

国家自然科学基金项目(面上项目)


Research on Multimodal Information Fusion DynamicTarget Recognition Method for Autonomous Driving
Author:
Affiliation:

1.Guangzhou City University of Technology;2.GAC AION NEW ENERGY AUTOMOBILE CO.LTD;3.South China University of Technology;4.Engineering Research Institute, Guangzhou City University of Technology

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [20]
  • | | | |
  • 文章评论
    摘要:

    针对自动驾驶环境下的车辆检测问题存在小目标多、目标遮挡严重等情况,提出一种面向自动驾驶的多模态信息融合的目标识别方法。该方法主要包括以下几个方面的改进:1.基于空间注意力机制和混合空洞卷积改进的ResNet50网络:使用选择核卷积替换conv2_x和conv3_x部分的3×3标准卷积,使网络可以根据特征尺寸动态调整感受野的大小。在conv4_x部分使用锯齿状混合空洞卷积[1,2,1,2,1,2],使网络能够捕获多尺度上下文信息,提高网络特征提取能力。2.改用GIoU损失函数:将YOLOv3中的定位损失函数进行替换,改用GIoU损失函数,GIoU在实际应用时有较好的可操作性。3.基于两种数据融合的人车目标分类识别算法:提出了基于两种数据融合的人车目标分类识别算法,可以有效地提高目标检测的准确率。实验结果表明,该方法与OFTNet 、VoxelNet 和FasterRCNN网络相比,在mAP指标白天提升幅度最高可达0.05,晚上可达0.09,收敛效果也更好。

    Abstract:

    The vehicle detection problem in the autonomous driving environment has many small targets and serious target occlusion, etc. In this paper, a multimodal information fusion dynamic target recognition method for autonomous driving is proposed.. The method mainly includes the following improvements: 1. Improved ResNet50 network based on spatial attention mechanism and hybrid null convolution: the 3×3 standard convolution in the conv2_x and conv3_x parts is replaced using selective kernel convolution, which allows the network to dynamically adjust the size of the perceptual field according to the feature size. The sawtooth hybrid null convolution [1,2,1,2,1,2] is used in the conv4_x part to enable the network to capture multi-scale contextual information and improve the network feature extraction capability.2. Switch to GIoU loss function: the localization loss function in YOLOv3 is replaced with the GIoU loss function, which has better operability in practical applications.3. Based on Two data fusion algorithm for human-vehicle target classification and recognition: A human-vehicle target classification and recognition algorithm based on two kinds of data fusion is proposed, which can effectively improve the accuracy of target detection. Experimental results show that compared with OFTNet, VoxelNet and FASTERRCNN, the mAP index can be improved by 0.05 in the daytime and 0.09 in the evening, and the convergence effect is better.

    参考文献
    [1] Murphy R R. Introduction to AI robotics[M]. MIT press, 2000
    [2] 刘万军,李琳. 基于熵理论改进混合高斯模型的运动目标检测算法[J]. 计算机应用研究,2015,32(7):226-229
    [3] Chen X, Ma H, Wan J, et al. Multi-view 3d object detection network for autonomous driving. In: Proc of Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Hawaii, 2017,1907–1915
    [4] Vora S, Lang A H, Helou B, et al. Pointpainting: Sequential fusion for 3d object detection. In: Proc of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 2020, 4604–4612
    [5] Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6):1137–1149
    [6] Ku J, Mozifian M, Lee J, et al. Joint 3d proposal generation and object detection from view aggregation. In: Proc of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, 2018, IEEE, 1–8
    [7] Zhao G, Xiao X, Yuan J. Fusion of Velodyne and camera data for scene parsing[C]. 15th IEEE International Conference on Information Fusion, Singapore, 2012, 1172-1179
    [8] Gong X, Lin Y, Liu J. Extrinsic calibration of a 3D LIDAR and a camera using a trihedron[J]. Optics and Lasers in Engineering, 2013, 51(4): 394-401
    [9] Levinson J, Thrun S. Automatic Online Calibration of Cameras and Lasers[C]. Robotics: Science and Systems, Berlin, 2013, 7
    [10] Pandey G, McBride J R, Savarese S, et al. Automatic extrinsic calibration of vision and lidar by maximizing mutual information[J]. Journal of Field Robotics, 2015, 32(5): 696-722
    [11] Dhall A, Chelani K, Radhakrishnan V, et al. LiDAR-camera calibration using 3D-3D point correspondences[J]. arXiv preprint arXiv:1705.09785, 2017
    [12] 刘远源. 校园环境下的运动目标检测与跟踪的研究与实现[D]. 浙江大 学,2015.
    [13] He K , Zhang X , Ren S , et al. Deep Residual Learning for Image Recognition[J]. IEEE, 2016.
    [14] Wang P , Chen P , Yuan Y , et al. Understanding Convolution for Semantic Segmentation[C]// 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2018.
    [15] 姚巍巍,张洁. 基于模型剪枝和半精度加速改进 YOLOv3-tiny 算法的实时司机 违章行为检测[J]. 计算机系统应用,2020, 029(004):41-47.
    [16] 邹承明,薛榕刚. 融合 GIoU 和 Focal loss 的 YOLOv3 目标检测算法[J]. 计算机工程与应用,2020,56(24):214-222.
    [17] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? the kitti vision benchmark suite[C].2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012: 3354-3361.
    [18] Roddick T, Kendall A, Cipolla R. Orthographic feature transform for monocular 3d object detection. arXiv preprint arXiv:1811.08188, 2018.
    [19] Zhou Y, Tuzel O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proc of Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018, 4490–4499
    [20] 吴喆. 基于深度学习的动态背景下航道船舶检测识别与跟踪研究[D].三峡大 学,2019.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文
分享
文章指标
  • 点击次数:123
  • 下载次数: 0
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2023-04-26
  • 最后修改日期:2023-07-13
  • 录用日期:2023-08-24
文章二维码