一种基于半监督的句子情感分类模型
CSTR:
作者:
作者单位:

西北工业大学 计算机学院, 西安710072

作者简介:

苏静(1987—),女,博士,主要从事自然语言处理和人工智能方向研究,(E-mail)sujing@mail.nwpu.edu.cn。

通讯作者:

中图分类号:

基金项目:

国家自然科学基金资助项目(62172335)。


A semi-supervised model for sentence-level sentiment classification
Author:
Affiliation:

School of Computer Science, Northwestern Polytechnical University, Xi’an710072, P. R. China

Fund Project:

Supported by the National Natural Science Foundation of China (62172335).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    句子情感分类致力于挖掘文本中的情感语义,以基于BERT(bidirectional encoder representations from transformers)的深度网络模型表现最佳。这类模型的性能极度依赖大量高质量标注数据,而现实中标注样本往往比较稀缺,导致深度神经网络(deep neural network,DNN)容易在小规模样本集上过拟合,难以准确捕捉句子的隐含情感特征。尽管现有的半监督模型有效利用了未标注样本特征,但对引入未标注样本可能导致错误逐渐累积问题没有有效处理。半监督模型在对测试数据集进行预测后不会重新评估和修正上次的标注结果,无法充分挖掘测试数据的特征信息。研究提出一种新型的半监督句子情感分类模型。该模型首先提出基于K-近邻算法的权重机制,为置信度高的样本分配较高权重,尽可能减少错误信息在模型训练中的传播。接着,采用两阶段训练策略,使模型能对测试数据中预测错误的样本进行及时修正,通过多个数据集的测试,证明本模型在小规模样本集上也能获得良好性能。

    Abstract:

    Sentence sentiment classification is an important task for extracting emotional semantics from text. Currently, the best tools for sentence sentiment classification leverage deep neural networks, particularly BERT-based models. However, these models require large, high-quality labeled datasets to perform effectively. In practice, labeled data is usually limited, leading to overfitting on small datasets and difficulties in capturing subtle sentiment features. Although existing semi-supervised models utilize features from large unlabeled datasets, they still face challenges from errors introduced by pseudo-labeled samples. Additionally, once test data is labeled, these models often do not adapt by incorporating feature information from test data. To address these issues, this paper proposes a semi-supervised sentence sentiment classification model. First, a K-nearest neighbors-based weighting mechanism is designed, assigning higher weights to high confidence samples to minimize error propagation during parameter learning. Second, a two-stage training mechanism is implemented, enabling the model to correct misclassified samples in the test data. Extensive experiments on multiple datasets show that this method achieves strong performance on small datasets.

    参考文献
    相似文献
    引证文献
引用本文

苏静,Murtadha Ahmed.一种基于半监督的句子情感分类模型[J].重庆大学学报,2024,47(12):100-113.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-12-11
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-01-06
  • 出版日期:
文章二维码