基于多标签图像分类的语义注意力图表示算法

基于多标签图像分类的语义注意力图表示算法
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        杨广超杨广超
重庆大学
在期刊界中查找
在百度中查找
在本站中查找
陈珂陈珂
重庆大学
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:重庆大学
作者简介:
通讯作者:
中图分类号:
基金项目:

Semantic Attention Graph Representation algorithm for Multi-Label Image Classification

Author:

YANG Guang-chao
YANG Guang-chao
Chongqing University
在期刊界中查找
在百度中查找
在本站中查找
CHEN Ke
CHEN Ke
Chongqing University
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Chongqing University

Fund Project:

摘要

图/表

访问统计

参考文献 [32]

相似文献

引证文献

资源附件

文章评论

摘要:

传统的多标签方法只能粗略定位图像语义区域，且无法充分挖掘语义区域之间存在的标签相关性。为解决这个问题，笔者提出一种改进的语义注意力图表示（SAGR）算法，算法主要包括两部分：1）语义定位模块（SL）：利用视觉注意力机制和多模态技术，精确定位图像语义目标，并汇聚目标区域的语义信息来获得每个标签类的特征表示；2）语义关联模块（SC）：采用图结构的方式将所获语义特征表示交互，并基于图注意力网络捕捉图像中动态的标签依赖关系。实验结果表明，SAGR算法在Pascal VOC2007和MirFlickr25k数据集上mAP可提高到93.5%和84.2%，相比传统方法效果更优。

关键词:语义特征;标签共现性;多标签分类;图注意力网络;

Abstract:

In the traditional multi label research, they can only roughly locate the semantic regions of the image, and can not fully excavate the label correlation between the semantic regions. To solve the above problems, the author proposes a Semantic Attention Graph Representation (SAGR) algorithm that composed of two key modules for multi-label classification : 1) Semantic Location(SL) module that integrated the semantic information of all labels categories in the image for learning to obtain the feature representation of each label category; 2) Semantic Correlation(SC) module that used graph structure to interact with the obtained semantic feature representation, and captured the dynamic label dependency in image by graph attention network. The experimental results of Pascal VOC2007 and MirFlickr25k datasets show that SAGR algorithm is better than traditional methods, and the mAP of SAGR can be improved to 93.5% and 84.2%.

Key words:semantic features;co-occurrence of labels;multi-label classification;graph attention network;

参考文献

[1] 刘晓玲,刘柏嵩,王洋洋,等.基于深度学习的多标签生成研究进展[J].计算机科学,2020, 47(3):8.LIU X L,LIU B S,WANG Y Y, et al. Research and Development of Multi-label Generation Based on Deep Learning[J].Computer Science,2020,47(3):8.(in Chinese)

[2] Chua T S ,? Pung H K ,? Lu G J , et al. A concept-based image retrieval system[C]// Twenty-seventh Hawaii International Conference on System Sciences. IEEE, 2011.

[3] Yang X T, Li Y C, and Luo J B. Pinterest board recommendation for twitter users. In Proceedings of the ACM International Conference on Multimedia (ACM MM), pages 963–966. ACM, 2015.

[4] Ge Z,? Mahapatra D,? Sedai S, et al. Chest X-rays Classification: A Multi-Label and Fine-Grained Problem[J].? 2018.

[5] Li Y, Song Y, Luo J. Improving Pairwise Ranking for Multi-label Image Classification[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.

[6] Wei Y, Wei X, Min L, et al. HCP: A Flexible CNN Framework for Multi-Label Image Classification[J]. IEEE Transactions on Software Engineering, 2016, 38(9):1901-1907.

[7] Feng Z, Li H, Ouyang W , et al. Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification[J]. IEEE, 2017.

[8] Sivic J, Zisserman A. Video Google: a text retrieval approach to object matching in videos[C]//Proceedings 9th IEEE International Conference on Computer Vision, 2003: 1470–1477.

[9] 黄启宏, 刘钊. 基于多超平面支持向量机的图像语义分类算法[J]. 光电工程, 2007, 34(8): 99-104.Huang Q H, Liu Z. Multiple-hyperplane SVMs algorithm in image semantic classification[J]. Opto-Electronic Engineering, 2007, 34(8): 99-104. (in Chinese)

[10] Chang C C, Lin C J. LIBSVM: a library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27.

[11] Simonyan K, Zisserman A .Very Deep Convolutional Networks for Large-Scale Image Recognition[J]. Computer Science, 2014.

[12] Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Computer Vision and Pattern Recognition, 2017: 2261–2269.

[13] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778.

[14] Zhang M L, Zhou Z H. Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization[J]. IEEE Transactions on Knowledge Data Engineering, 2006, 18(10):1338-1351.

[15] Kurata G, Xiang B, Zhou B. Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence[C]// Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.

[16] Zitnick C L, Dollar P. Edge Boxes: Locating Object Proposals from Edges[C]// European Conference on Computer Vision. Springer, Cham, 2014.

[17] Zhang J, Wu Q, Shen C, et al. Multi-Label Image Classification with Regional Latent Semantic Dependencies[J]. IEEE Transactions on Multimedia, 2016:1-1.

[18] Feng Z, Li H, Ouyang W, et al. Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification[J]. IEEE, 2017.

[19] Wang Z,? Chen T, Li G,? et al. Multi-label Image Recognition by Recurrently Discovering Attentional Regions[C]// IEEE Computer Society. IEEE Computer Society, 2017:464-472.

[20] 薛丽霞, 江迪, 汪荣贵,等. 融合注意力机制和语义关联性的多标签图像分类[J]. 光电工程, 2019, 46(9):9.XUE L X,JIANG D,WANG R G,et al. Multi-label classification based on attention mechanism and semantic dependencies[J].Opto-Electronic Engineering,2019,49(9):9.(in Chinese)

[21] Chen T, Xu M, Hui X, et al. Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition[J]. //? International Conference on Computer Vision .2019.

[22] Jiang W,? Yi Y,? Mao J, et al. CNN-RNN: A Unified Framework for Multi-label Image Classification[J]. IEEE, 2016.

[23] Xu J H, Tian H D, Wang Z Y, et al.? Joint Input and Output Space Learning for Multi-Label Image Classification.[J]. 2020.

[24] Li Y ,? Tarlow D ,? Brockschmidt M , et al. Gated Graph Sequence Neural Networks[J]. Computer Science, 2015.

[25] Everingham M, van Gool L, Williams C K I, et al. The Pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338.

[26] Velikovi P ,? Cucurull G ,? Casanova A , et al. Graph Attention Networks[J].? 2017.

[27] Woo S, Park J, Lee J Y, et al. CBAM: Convolutional Block Attention Module[J]. Springer, Cham, 2018.

[28] Pennington J,Socher R, and Manning C. GloVe: Global vectors for word representation. In EMNLP, pages 1532–1543, 2014.

[29] Kim J H, On K W, Lim W, et al. Hadamard product for low-rank bilinear pooling. arXiv preprint arXiv:1610.04325, 2016.

[30] Kip F T N ,? Welling M . Semi-Supervised Classification with Graph Convolutional Networks[J].? 2016.

[31] Hao Y, Zhou J T, Yu Z, et al. Exploit Bounding Box Annotations for Multi-Label Object Recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016.

[32] Chen T, Wang Z, Li G, et al. Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition[J].? 2017.

引用本文

复制

文章指标

点击次数:191
下载次数: 0
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2021-11-29
最后修改日期:2022-03-21
录用日期:2022-03-29
在线发布日期:
出版日期:

期刊社主页

编辑部首页

期刊介绍

编委会

数据库收录

过刊浏览

联系我们

引用本文

相关视频

分享

文章指标

历史

文章二维码

期刊社主页

编辑部首页

期刊介绍

编委会

数据库收录

过刊浏览

联系我们

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码