基于多尺度特征融合和边缘增强的多传感器融合3D目标检测算法

基于多尺度特征融合和边缘增强的多传感器融合3D目标检测算法
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        刘建国1刘建国
佛山仙湖实验室
在期刊界中查找
在百度中查找
在本站中查找
陈 文1陈 文
佛山仙湖实验室
在期刊界中查找
在百度中查找
在本站中查找
赵奕凡2赵奕凡
上汽通用五菱汽车股份有限公司
在期刊界中查找
在百度中查找
在本站中查找
周 琪1周 琪
佛山仙湖实验室
在期刊界中查找
在百度中查找
在本站中查找
颜伏伍3颜伏伍
武汉理工大学
在期刊界中查找
在百度中查找
在本站中查找
尹智帅3尹智帅
武汉理工大学
在期刊界中查找
在百度中查找
在本站中查找
郑 灏1郑 灏
佛山仙湖实验室
在期刊界中查找
在百度中查找
在本站中查找
吴友华1吴友华
佛山仙湖实验室
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:1.佛山仙湖实验室;2.上汽通用五菱汽车股份有限公司;3.武汉理工大学
作者简介:
通讯作者:
中图分类号:U469.79
基金项目:佛山仙湖实验室先进能源科学与技术广东开放基金项目(XHD2020-003)

Multi-sensor fusion 3D target detection algorithm based on multi-scale feature fusion and edge enhancement

Author:

liu jian guo ^¹
liu jian guo
Foshan Xianhu Laboratory of the Advanced Energy Science and Technology Guangdong Laboratory
在期刊界中查找
在百度中查找
在本站中查找
chen wen ^¹
chen wen
Foshan Xianhu Laboratory of the Advanced Energy Science and Technology Guangdong Laboratory
在期刊界中查找
在百度中查找
在本站中查找
zhao yi fan ^²
zhao yi fan
SAIC-GM-Wuling Automobile Co
在期刊界中查找
在百度中查找
在本站中查找
zhou qi ^¹
zhou qi
Foshan Xianhu Laboratory of the Advanced Energy Science and Technology Guangdong Laboratory
在期刊界中查找
在百度中查找
在本站中查找
yan fu wu ^³
yan fu wu
Wuhan University of Technology
在期刊界中查找
在百度中查找
在本站中查找
yin zhi shuai ^³
yin zhi shuai
Wuhan University of Technology
在期刊界中查找
在百度中查找
在本站中查找
zheng hao ^¹
zheng hao
Foshan Xianhu Laboratory of the Advanced Energy Science and Technology Guangdong Laboratory
在期刊界中查找
在百度中查找
在本站中查找
wu you hua ^¹
wu you hua
Foshan Xianhu Laboratory of the Advanced Energy Science and Technology Guangdong Laboratory
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

1.Foshan Xianhu Laboratory of the Advanced Energy Science and Technology Guangdong Laboratory;2.SAIC-GM-Wuling Automobile Co;3.Wuhan University of Technology

Fund Project:

Foshan Xianhu Laboratory of the Advanced Energy Science and Technology Guangdong Open Fund Project(XHD2020-003)

摘要

图/表

访问统计

参考文献 [15]

相似文献

引证文献

资源附件

文章评论

摘要:

基于 BEV 的多传感器融合的自动驾驶感知算法近年来已经取得了重大进展,持续促进着自动驾驶的发展。在多传感器融合的感知算法的研究中,多视角图像向 BEV 视角的转换和多模态特征的融合一直是 BEV 感知算法的重点和难点。在本文中,我们提出了 MSEPE-CRN,一种用于3D目标检测的相机与毫米波雷达融合的感知算法,利用边缘特征和点云提升深度预测的精度,进而实现多视角图像向 BEV 特征的精确转换。同时,引入了多尺度可变形大核注意力机制进行模态融合,解决因不同传感器特征差异过大而导致的错位问题。在 nuScenes 开源数据集上的实验结果表明,与基准网络相比,mAP 提升 2.17%、NDS 提升 1.93%、mATE 提升 2.58%、mAOE提升8.08%、mAVE 提升 2.13%。这表明我们的算法可以有效提高车辆对路面上的运动障碍物的感知能力,具有一定实用价值。

关键词:3D目标检测;Bird`s Eye View;多模态融合;深度预测

Abstract:

BEV-based multi-sensor fusion perception algorithms for autonomous driving have made significant progress in recent years and continue to contribute to the development of autonomous driving. In the research of multi-sensor fusion perception algorithms, the conversion of multi-view images to BEV viewpoints and the fusion of multi-modal features have been the focus and difficulty of BEV perception algorithms. In this paper, we propose MSEPE-CRN, a fusion sensing algorithm of camera and millimeter-wave radar for 3D target detection, which utilizes edge features and point clouds to improve the accuracy of depth prediction, and then realizes the accurate conversion of multi-view images to BEV features. Meanwhile, a multi-scale deformable large kernel attention mechanism is introduced for modal fusion to solve the misalignment problem due to the excessive difference of features from different sensors. Experimental results on the nuScenes open-source dataset show that mAP improves 2.17%, NDS improves 1.93%, mATE improves 2.58%, mAOE improves 8.08%, and mAVE improves 2.13% compared with the benchmark network. This shows that our algorithm can effectively improve the vehicle"s ability to perceive moving obstacles on the road, and has some practical value.

Key words:3D target detection; Bird`s Eye View; multimodal fusion; depth prediction

参考文献

[1] Jonah Philion, Sanja Fidler. Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D[C], European Conference on Computer Vision, 2020, abs/2008.05711: 194-210.

[2] Youngseok K, Sanmin K, Juyeb S, Jun W C, Dongsuk K, et al. CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception[J], 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023, abs/2304.00670: 17569-17580.

[3] Zhiqin Z, Xianyu H, Guanqiu Q, Yuanyuan L, Baisen C, Yu L, et al. Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI[J], Information Fusion, 2023, 91: 376-387.

[4] Yinhao L, Zheng G, Guanyi Y, Jinrong Y, Zengran W, Yukang S, Jianjian S, Zeming L, et al. BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection[J], AAAI 2023, 2023, 37(2): 1477-1485.

[5] H. H, Fanyi W, Jilan S, Laifeng H, Tao F, Zhaokai Z, Wangzhi Z, et al. EA-LSS: Edge-aware Lift-splat-shot Framework for 3D BEV Object Detection[J], arXiv (Cornell University), 2023

[6] Gwangbin Bae, Ignas Budvytis, Roberto Cipolla. Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry[J],? Proceedings - IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, 2022(1): 2832-2841.

[7] Yanwei L, Yilun C, Xiaojuan Q, Zeming L, Jian S, Jiaya J, et al. Unifying Voxel-based Representation with Transformer for 3D Object? Detection[C], Conference on Neural Information Processing Systems, 2022

[8] Tingting L, Hongwei X, Kaicheng Y, Zhongyu X, Zhiwei L, Yongtao W, Tao T, Bing W, Zhi T, et al. BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework[J], NeurIPS 2022, 2022

[9] Junjie Y, Yingfei L, Jianjian S, Fan J, Shuailin L, Tiancai W, Xiangyu Z, et al. Cross Modal Transformer: Towards Fast and Robust 3D Object Detection[J], ICCV, 2023: 18222-18232.

[10] Yuexin M, Tai W, Xuyang B, Huitong Y, Yuenan H, Yaming W, Yu Q, Ruigang Y, Dinesh M, Xinge Z, et al. Vision-Centric BEV Perception: A Survey[J], CoRR, 2022

[11] https://raw.githubusercontent.com/Blealtan/efficient-kan/master/src/efficient_kan/kan.py

[12] Ziming L, Yixuan W, Sachin V, Fabian R, James H, Marin S, Thomas Y H, Max T, et al. KAN: Kolmogorov-Arnold Networks[J], CoRR, 2024, abs/2404.19756

[13] Yan W, Yusen L, Gang W, Xiaoguang L, et al. Multi-scale Attention Network for Single Image Super-Resolution[J], arXiv (Cornell University), 2022

[14] Reza A, Leon N, Michael H, Amirhossein K, Ehsan K A, Yury V, Ulas B, Dorit M, et al. Beyond Self-Attention: Deformable Large Kernel Attention for Medical? Image Segmentation[J], IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision, 2023, abs/2309.00121: 1276-1286.

[15] Xiao J, Chunle G, Zhen H, Jing X, Yongwei W, Yuting S, et al. FCMNet: Frequency-aware cross-modality attention networks for RGB-D salient object detection[J], Neurocomputing, 2022, 491: 414-425.

引用本文

复制

文章指标

点击次数:121
下载次数: 0
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2024-08-26
最后修改日期:2024-09-27
录用日期:2024-11-11
在线发布日期:
出版日期:

期刊社主页

编辑部首页

期刊介绍

编委会

数据库收录

过刊浏览

联系我们

引用本文

分享

文章指标

历史

文章二维码

期刊社主页

编辑部首页

期刊介绍

编委会

数据库收录

过刊浏览

联系我们

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码