基于伪标签和迁移学习的双关语识别方法
DOI:
作者:
作者单位:

1.广州非通用语种智能处理重点实验室,信息科学与技术学院,广东外语外贸大学;2.广州城市理工学院;3.广州城市理工学;4.华南理工大学;5.机械与汽车工程学院;6.广东轻工职业技术学院

作者简介:

通讯作者:

中图分类号:

基金项目:

广州市科技计划项目( 202102020637);广州市科技计划项目( 202002030227);广东外语外贸大学师生合作项目(21SS10)


Pun detection basd on pseudo-label and transfer learning
Author:
Affiliation:

1.Guangzhou Key Laboratory of Multilingual Intelligent Processing,School of Information Science and Technology, Guangdong University of Foreign Studies;2.Guangzhou Key Laboratory of Multilingual Intelligent Processing,School of Information Science and Technology,Guangdong University of Foreign Studies;3.Guangzhou City University of Technology;4.School of Mechanical &5.amp;6.Automotive Engineering, South China University of Technology;7.Guangdong Industry Polytechnic

Fund Project:

Guangzhou Science and Technology Plan Project (202102020637); Guangzhou Science and Technology Plan Project (202002030227);Guangdong Province Soft Science Project (2019A101002108)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    双关语作为幽默研究的一个重要分支,已发展成前沿的研究领域。针对双关语样本短缺的问题,本文提出了基于伪标签和迁移学习的双关语识别模型 (Pun Detection based on Pseudo-label and Transfer Learning)。该模型首先利用上下文语义、音素向量和注意力机制生成伪标签;然后迁移学习和置信度结合挑选可用的伪标签;最后将伪标签数据和真实数据混合到网络中进行训练,并重复伪标签标记和混合训练过程。这在一定程度上解决了双关语样本量少且获取困难的问题。使用该模型在SemEval 2017 shared task 7以及Pun of The Day 数据集上进行双关语检测实验,结果表明模型性能均优于现有的主流双关语识别方法。

    Abstract:

    As an important branch of humor research, puns have developed into an advanced research field. To address the problem of shortage of the pun samples this paper proposes a pun recognition model based on pseudo-label speech-focused context (Pun Detection based on Pseudo-label and Transfer Learning). Firstly, the model uses contextual semantics, phoneme vector and attention mechanism to generate pseudo-labels. Then, it combines transfer learning and confidence to select useful pseudo-labels. Finally, the pseudo-label data and real data are used for network theory and training, and the pseudo-label labeling and mixed training procedures are repeated. To a certain extent, the problem of small sample size and difficulty in obtaining puns has been solved. By this model, we carry out pun detection experiments on both the SemEval 2017 shared task 7 dataset and the Pun of The Day dataset. The results show that the performance of this model is better than that of the existing mainstream pun recognition methods.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-06-19
  • 最后修改日期:2021-10-29
  • 录用日期:2021-11-24
  • 在线发布日期:
  • 出版日期: