基于多标签图像分类的语义注意力图表示算法
作者单位:

重庆大学


Semantic Attention Graph Representation algorithm for Multi-Label Image Classification
Affiliation:

Chongqing University

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [32]
  • | |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    传统的多标签方法只能粗略定位图像语义区域,且无法充分挖掘语义区域之间存在的标签相关性。为解决这个问题,笔者提出一种改进的语义注意力图表示(SAGR)算法,算法主要包括两部分:1)语义定位模块(SL):利用视觉注意力机制和多模态技术,精确定位图像语义目标,并汇聚目标区域的语义信息来获得每个标签类的特征表示;2)语义关联模块(SC):采用图结构的方式将所获语义特征表示交互,并基于图注意力网络捕捉图像中动态的标签依赖关系。实验结果表明,SAGR算法在Pascal VOC2007和MirFlickr25k数据集上mAP可提高到93.5%和84.2%,相比传统方法效果更优。

    Abstract:

    In the traditional multi label research, they can only roughly locate the semantic regions of the image, and can not fully excavate the label correlation between the semantic regions. To solve the above problems, the author proposes a Semantic Attention Graph Representation (SAGR) algorithm that composed of two key modules for multi-label classification : 1) Semantic Location(SL) module that integrated the semantic information of all labels categories in the image for learning to obtain the feature representation of each label category; 2) Semantic Correlation(SC) module that used graph structure to interact with the obtained semantic feature representation, and captured the dynamic label dependency in image by graph attention network. The experimental results of Pascal VOC2007 and MirFlickr25k datasets show that SAGR algorithm is better than traditional methods, and the mAP of SAGR can be improved to 93.5% and 84.2%.

    参考文献
    [1] 刘晓玲,刘柏嵩,王洋洋,等.基于深度学习的多标签生成研究进展[J].计算机科学,2020, 47(3):8.LIU X L,LIU B S,WANG Y Y, et al. Research and Development of Multi-label Generation Based on Deep Learning[J].Computer Science,2020,47(3):8.(in Chinese)
    [2] Chua T S ,? Pung H K ,? Lu G J , et al. A concept-based image retrieval system[C]// Twenty-seventh Hawaii International Conference on System Sciences. IEEE, 2011.
    [3] Yang X T, Li Y C, and Luo J B. Pinterest board recommendation for twitter users. In Proceedings of the ACM International Conference on Multimedia (ACM MM), pages 963–966. ACM, 2015.
    [4] Ge Z,? Mahapatra D,? Sedai S, et al. Chest X-rays Classification: A Multi-Label and Fine-Grained Problem[J].? 2018.
    [5] Li Y, Song Y, Luo J. Improving Pairwise Ranking for Multi-label Image Classification[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
    [6] Wei Y, Wei X, Min L, et al. HCP: A Flexible CNN Framework for Multi-Label Image Classification[J]. IEEE Transactions on Software Engineering, 2016, 38(9):1901-1907.
    [7] Feng Z, Li H, Ouyang W , et al. Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification[J]. IEEE, 2017.
    [8] Sivic J, Zisserman A. Video Google: a text retrieval approach to object matching in videos[C]//Proceedings 9th IEEE International Conference on Computer Vision, 2003: 1470–1477.
    [9] 黄启宏, 刘钊. 基于多超平面支持向量机的图像语义分类算法[J]. 光电工程, 2007, 34(8): 99-104.Huang Q H, Liu Z. Multiple-hyperplane SVMs algorithm in image semantic classification[J]. Opto-Electronic Engineering, 2007, 34(8): 99-104. (in Chinese)
    [10] Chang C C, Lin C J. LIBSVM: a library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27.
    [11] Simonyan K, Zisserman A .Very Deep Convolutional Networks for Large-Scale Image Recognition[J]. Computer Science, 2014.
    [12] Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Computer Vision and Pattern Recognition, 2017: 2261–2269.
    [13] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778.
    [14] Zhang M L, Zhou Z H. Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization[J]. IEEE Transactions on Knowledge Data Engineering, 2006, 18(10):1338-1351.
    [15] Kurata G, Xiang B, Zhou B. Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence[C]// Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.
    [16] Zitnick C L, Dollar P. Edge Boxes: Locating Object Proposals from Edges[C]// European Conference on Computer Vision. Springer, Cham, 2014.
    [17] Zhang J, Wu Q, Shen C, et al. Multi-Label Image Classification with Regional Latent Semantic Dependencies[J]. IEEE Transactions on Multimedia, 2016:1-1.
    [18] Feng Z, Li H, Ouyang W, et al. Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification[J]. IEEE, 2017.
    [19] Wang Z,? Chen T, Li G,? et al. Multi-label Image Recognition by Recurrently Discovering Attentional Regions[C]// IEEE Computer Society. IEEE Computer Society, 2017:464-472.
    [20] 薛丽霞, 江迪, 汪荣贵,等. 融合注意力机制和语义关联性的多标签图像分类[J]. 光电工程, 2019, 46(9):9.XUE L X,JIANG D,WANG R G,et al. Multi-label classification based on attention mechanism and semantic dependencies[J].Opto-Electronic Engineering,2019,49(9):9.(in Chinese)
    [21] Chen T, Xu M, Hui X, et al. Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition[J]. //? International Conference on Computer Vision .2019.
    [22] Jiang W,? Yi Y,? Mao J, et al. CNN-RNN: A Unified Framework for Multi-label Image Classification[J]. IEEE, 2016.
    [23] Xu J H, Tian H D, Wang Z Y, et al.? Joint Input and Output Space Learning for Multi-Label Image Classification.[J]. 2020.
    [24] Li Y ,? Tarlow D ,? Brockschmidt M , et al. Gated Graph Sequence Neural Networks[J]. Computer Science, 2015.
    [25] Everingham M, van Gool L, Williams C K I, et al. The Pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338.
    [26] Velikovi P ,? Cucurull G ,? Casanova A , et al. Graph Attention Networks[J].? 2017.
    [27] Woo S, Park J, Lee J Y, et al. CBAM: Convolutional Block Attention Module[J]. Springer, Cham, 2018.
    [28] Pennington J,Socher R, and Manning C. GloVe: Global vectors for word representation. In EMNLP, pages 1532–1543, 2014.
    [29] Kim J H, On K W, Lim W, et al. Hadamard product for low-rank bilinear pooling. arXiv preprint arXiv:1610.04325, 2016.
    [30] Kip F T N ,? Welling M . Semi-Supervised Classification with Graph Convolutional Networks[J].? 2016.
    [31] Hao Y, Zhou J T, Yu Z, et al. Exploit Bounding Box Annotations for Multi-Label Object Recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016.
    [32] Chen T, Wang Z, Li G, et al. Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition[J].? 2017.
    相似文献
    引证文献
    引证文献 [0] 您输入的地址无效!
    没有找到您想要的资源,您输入的路径无效!

引用本文
分享
文章指标
  • 点击次数:191
  • 下载次数: 0
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2021-11-29
  • 最后修改日期:2022-03-21
  • 录用日期:2022-03-29
文章二维码