[关键词]
[摘要]
文本分类是自然语言处理中的一项基础研究,目的是为输入的文本输出特定的标签类别。而文本表示是文本分类的中间环节,也是文本分类的重要内容。针对短文本语义信息较少,难以表征的问题,本文提出了一种融合标签信息和自注意力图卷积神经网络的文本表示方法。该方法利用标签和文本之间的语义联系,构造了基于标签注意力的单个文本表示,然后利用自注意力图卷积神经网络提取多个文本的全局特征,获得融合全局特征的特定文本表示用于文本分类。最后将得到的文本输入分类器中,得到分类结果。通过在MR和R8数据集的实验结果表明,相比于其他文本分类模型,本文所提出的模型在MR数据集上F1值提升2.58%,准确率提升2.02%;在R8数据集上F1值提升3.52%,准确率提升2.25%。
[Key word]
[Abstract]
Text classification is a basic research in natural language processing, which aims to output specific label categories for input text. And text representation is the intermediate link of text classification and an important content of text classification. Aiming at the problem that short text has less semantic information and is difficult to represent, this paper proposes a text representation method combines label information and self-attention graph convolutional neural network. This method uses the semantic relation between tags and texts to construct a specific single text representation based on tags, and then extracts the global features of multiple texts by using the self-attention graph convolution neural network to obtain a specific text representation fusing the global features. Finally, the text is input into the classifier to obtain the classification result. The experimental results based on R8 and MR show that compared with the other models, our model increases the F1 values and accuracy by 2.58% and by 2.02% on the MR data; and increases the F1 values and accuracy by 3.52% and by 2.52% on R8 dataset.
[中图分类号]
TP311
[基金项目]
国家自然科学基金(61966020,61762056)。