[关键词]
[摘要]
食物图像分割在食物体积估计领域中起着至关重要的作用,但是由于食物的细微结构和拍摄时可能存在的挑战,如边界模糊、图像过曝等,使得许多分割算法的性能往往不高。为解决上述问题,提出了基于注意力机制的互补融合RGB-D食物图像分割网络(RGB-D ABCFNet),该网络总体上采用U型结构,分为编码过程和解码过程。在编码过程中,所提出的膨胀头通道注意力模块提取深度图中对分割更有利的通道特征,通过层层叠加,使深度图的特征和RGB图的特征互相补充。在解码过程中,所提出的多头空间注意力模块可以更好地恢复细节和位置信息,提取的语义特征可以更好地映射语义分割结果。此外,还构建了一个多类别食物语义分割数据集Nutrition-Pix,并在其上进行大量对比实验和消融实验,证明所提出模型以87.5%的平均交并比mIoU优于目前的方法。
[Key word]
[Abstract]
Food image segmentation plays an important role in the field of food volume estimation, but there is still much room for improvement in its performance due to the fine structure of food and some challenges in shooting, such as blurred boundaries and image overexposure. To solve these above problems, a complementary fusion RGB-D Food Image segmentation Network (RGB-D ABCFNet) based on attention mechanism is proposed. The network adopts U-shaped structure and is divided into encoding process and decoding process. In the coding process, the Expand Head Channel Attention Module (EHCAM) proposed extracts the channel features that are more helpful to segmentation of the depth map, so that the characteristics of depth map are well complemented to RGB feature map by adding layer by layer. In decoding process, the Multi-Head Spatial Attention Module (MHSAM) present enables the detailed information and location information to be well recovered, and the extracted semantic features can better map the semantic segmentation results. In addition, a multi-class food semantic segmentation dataset Nutrition-Pix is constructed and a large number of comparison and ablation experiments are conducted on it, proving that the proposed model is superior to the current method with the mIoU of 87.5%.
[中图分类号]
[基金项目]
国家自然科学基金(基金号61771338)