一种基于自注意力机制的文本图像生成对抗网络
作者:
中图分类号:

TP311

基金项目:

重庆市自然科学基金资助项目(cstc2014jcyjA40030)。


A generative adversarial network based on self-attention mechanism for text-to-image generation
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [14]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    图像自动生成一直以来都是计算机视觉领域的一项重要挑战,其中的文本到图像的生成更是图像生成领域的重要分支。随着深度学习技术迅猛发展,生成对抗网络的出现使得图像生成领域焕发生机,借助生成对抗网络能够生成较为生动且多样的图像。本文将自注意力机制引入生成对抗网络,提出GAN-SelfAtt以提升生成图像的质量。同时,使用WGAN、WGAN-GP 2种生成对抗网络框架对GAN-SelfAtt进行实现。实验结果表明,自注意力机制的引入能够提高生成图像的清晰度,这归功于自注意力机制弥补了卷积运算中只能计算局部像素区域内的相关性的缺陷。除此之外,GAN-SelfAtt在训练时有着更好的稳定性,避免了原始生成对抗网络中的模式坍塌问题。

    Abstract:

    Automatic image generation is a challenging problem in computer vision for a long time. As a branch of this area, there are also challenges in text-to-image generation. With the fast development of deep learning, generative adversarial networks (GANs) give a new inspiration to the image generation because it can generate highly compelling images of various categories. In this paper, we introduce the self-attention mechanism to GAN and propose GAN-SelfAtt to improve the quality of images. Meanwhile, we implement GAN-SelfAtt using two different GAN frameworks, i.e., WGAN and WGAN-GP. The experimental results show that self-attention mechanism improves the resolution of generated images. The reason of this improvement is that the self-attention mechanism fixes the defect of convolution computation which only calculates the correlation in the local pixel region. In addition, our results show that the stability of GAN-SelfAtt during the training process is improved. This fixes the problem of mode collapse which appears in the original GANs.

    参考文献
    [1] Kingma D P, Welling M. Auto-encoding variational bayes[C/OL]. 2nd International Conference on Learning Representations (ICLR2014), NY:arXiv.org. 2014[2019-09-25]. https://dare.uva.nl/search?identifier=cf65ba0f-d88f-4a49-8ebd-3a7fce86edd7.
    [2] Oord A, Kalchbrenner N, Kavukcuoglu K. Pixel recurrent neural networks[J/OL]. arXiv:Computer Vision and Pattern Recognition, 2016[2019-09-25]. https://arxiv.org/abs/1601.06759.
    [3] Ian J. Goodfellow. Generative adversarial nets[C]//NIPS'14 Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. Cambridge, MA, USA:MIT Press.2014:2672-2680.
    [4] Mirza M, Osindero S. Conditional generative adversarial nets[J/OL]. arXiv:Learning, 2014[2019-09-25]. http://www.oalib.com/paper/4066323#.XfimLbNuJZQ.
    [5] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversaria networks[J/OL]. arXiv:Machine Learning,2015[2019-09-25]. https://arxiv.org/abs/1511.06434.
    [6] Arjovsky M, Chintala S, Bottou L. Wasserstein gan[J/OL]. arXiv:Machine Learning,2017[2019-09-25]. https://arxiv.org/abs/1701.07875.
    [7] Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein gans[J/OL]. Computer Science, 2017[2019-09-25]. https://arxiv.org/abs/1704.00028.
    [8] Reed S, Akata Z, Lee H, et al. Learning deep representations of fine-grained visual descriptions[C/OL]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).NewYork,USA:IEEE,2016(2016-12-12)[2019-09-25]. https://ieeexplore.ieee.org/document/7780382.
    [9] Reed S, Akata Z, Yan X C, et al. Generative adversarial text to image synthesis[J/OL]. Neural and Evolutionary Computing,2016[2019-09-25]. http://export.arxiv.org/abs/1605.05396.
    [10] Zhang H, Goodfellow L, Metaxas D N, et al. Self-attention generative adversarial networks[J/OL].arXiv:Machine Learning, 2018[2019-09-25]. https://www.researchgate.net/publication/325311774_Self-Attention_Generative_Adversarial_Networks.
    [11] Akata Z, Reed S, Walter D, et al. Evaluation of output embeddings for fine-grained image classification[C/OL]. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York, USA:IEEE, 2015(2015-10-15)[2019-09-25]. https://ieeexplore.ieee.org/document/7298911.
    [12] Nilsback M E, Zisserman A. Automated flower classification over a large number of classes[C/OL]. 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. New York, USA:IEEE, 2008[2019-09-25]. https://ieeexplore.ieee.org/document/4756141.
    [13] Salimans T, Goodfellow I, Zaremba W, et al. Improved techniques for training GANs[J/OL].Conference on Neural Information Processing Systems, 2016[2019-09-25]. https://wenku.baidu.com/view/7d0398b6f80f76c66137ee06eff9aef-8951e484a.html.
    [14] Diederik P K, Jimmy B. Adam:a method for stochastic optimization[J/OL]. Computer Science.2014[2019-09-25]. https://www.researchgate.net/publication/269935079_Adam_A_Method_for_Stochastic_Optimization.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

黄宏宇,谷子丰.一种基于自注意力机制的文本图像生成对抗网络[J].重庆大学学报,2020,43(3):55-61.

复制
分享
文章指标
  • 点击次数:922
  • 下载次数: 1346
  • HTML阅读次数: 1081
  • 引用次数: 0
历史
  • 收稿日期:2019-03-27
  • 在线发布日期: 2020-03-31
文章二维码