Abstract:BEV (bird’s eye view)-based multi-sensor fusion perception algorithms for autonomous driving have made significant progress in recent years and continue to contribute to the development of autonomous driving. In the research of multi-sensor fusion perception algorithms, multi-view image-to-BEV conversion and multi-modal feature fusion have been the key challenges in BEV perception algorithms. In this paper, we propose MSEPE-CRN, a fusion sensing algorithm of camera and millimeter-wave radar for 3D target detection, which utilizes edge features and point clouds to improve the accuracy of depth prediction, and then realizes the accurate conversion of multi-view images to BEV features. Meanwhile, a multi-scale deformable large kernel attention mechanism is introduced for modal fusion to solve the misalignment problem due to the excessive difference of features from different sensors. Experimental results on the nuScenes open-source dataset show that compared to the baseline network, the proposed algorithm achieves improvements of 2.17% in mAP, 1.93% in NDS, 2.58% in mATE, 8.08% in mAOE, and 2.13% in mAVE. This algorithm can effectively improve the vehicle’s ability to perceive moving obstacles on the road, and has practical value.