Abstract:Sentence sentiment classification is an important application task for mining the emotional semantics of text. Currently, the best sentence sentiment classification tool is based on a deep neural network model using BERT. However, its performance heavily relies on a large amount of high-quality labeled training data. In reality, the labeled data is usually scarce, leading to overfitting of deep neural networks on small datasets, which makes it difficult to capture the implicit sentiment features of sentences. Although existing semi-supervised models make full use of the features on a large number of unlabeled samples, they still suffer from the problem of introducing errors from pseudo-labeled unlabeled samples, and once the test data is labeled, the model does not consider further utilizing the feature information in the test data. Therefore, this paper proposes a semi-supervised sentence sentiment classification model. First, a weight mechanism that combines k-nearest neighbors is designed to give higher weights to samples with higher confidence to minimize the propagation of error information during parameter learning. Secondly, a two-stage training mechanism is designed to allow the model to make timely corrections to misclassified samples in the test data. We have conducted extensive experiments on multiple datasets, and the results show that this method can achieve good performance on small datasets.