[关键词]
[摘要]
传统的多标签方法只能粗略定位图像语义区域,且无法充分挖掘语义区域之间存在的标签相关性。为解决这个问题,笔者提出一种改进的语义注意力图表示(SAGR)算法,算法主要包括两部分:1)语义定位模块(SL):利用视觉注意力机制和多模态技术,精确定位图像语义目标,并汇聚目标区域的语义信息来获得每个标签类的特征表示;2)语义关联模块(SC):采用图结构的方式将所获语义特征表示交互,并基于图注意力网络捕捉图像中动态的标签依赖关系。实验结果表明,SAGR算法在Pascal VOC2007和MirFlickr25k数据集上mAP可提高到93.5%和84.2%,相比传统方法效果更优。
[Key word]
[Abstract]
In the traditional multi label research, they can only roughly locate the semantic regions of the image, and can not fully excavate the label correlation between the semantic regions. To solve the above problems, the author proposes a Semantic Attention Graph Representation (SAGR) algorithm that composed of two key modules for multi-label classification : 1) Semantic Location(SL) module that integrated the semantic information of all labels categories in the image for learning to obtain the feature representation of each label category; 2) Semantic Correlation(SC) module that used graph structure to interact with the obtained semantic feature representation, and captured the dynamic label dependency in image by graph attention network. The experimental results of Pascal VOC2007 and MirFlickr25k datasets show that SAGR algorithm is better than traditional methods, and the mAP of SAGR can be improved to 93.5% and 84.2%.
[中图分类号]
[基金项目]