Sentence similarity computation based on syntactic dependency convolutional neural network

doi:10.11835/j.issn.1000-582X.2020.09.005

Home > Archive>Volume 43, Issue 9, 2020 >41-53. DOI:10.11835/j.issn.1000-582X.2020.09.005

Sentence similarity computation based on syntactic dependency convolutional neural network
DOI:
                        10.11835/j.issn.1000-582X.2020.09.005
                    
CSTR:
                        [cstr]
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:TP391.1
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Sentence similarity computation is a basic task of many natural language processing, and its accuracy has a direct impact on the performance of language related systems, especially in machine translation, plagiarism detection, query ranking and question answering. Compared with the traditional methods that rely on shallow features like morphology, word sequence and grammar structure for sentence similarity computation, deep learning methods can integrate the deep semantic features and achieve better results. However, deep learning methods using convolutional neural networks needs to overcome defects such as narrow receptive field and insufficient long-distance information dependence when extracting text features. In this paper, a DCNN(dependency convolutional neural network) model was established to carry out dependency-based syntactic analysis for information retrieval over longer distance. We made text parsing, employing Stanford NLP for syntactic analysis, and then retrieved mutual relationship between two words in a binary combination or triplet. As lexical supplement information embedded in these word combinations, the dependency information, in addition to that of the original sentence, was added up as Convolutional Neural Network input, thus constructing a Dependency CNN. Experiment results reveal that the long distance dependency information effectively improve the similarity computation performance in our proposed dependency model on MSRP(Microsoft research paraphrase corpus) dataset, and the accuracy and F1 value are 80.33% and 85.91 respectively. The Pearson correlation coefficient of the model reaches 87.5 on SICK(Sentences invloving compositional knowledge) dataset and 92.2 on MSRvid(Microsoft videl paraphrase corpus) dataset.

Reference

Cited by

Get Citation

铉静,吴琼,魏从悦,伍星.基于句法依存卷积神经网络的句子相似度计算[J].重庆大学学报,2020,43(9):41~53

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:January 11,2020
Revised:
Adopted:
Online: September 29,2020
Published:

Home

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code