面向风电场站智能巡检的高动态噪声鲁棒语音识别模型
DOI:
CSTR:
作者:
作者单位:

1.龙源(北京)新能源工程技术有限公司;2.科大讯飞股份有限公司;3.重庆大学微电子与通信工程学院

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划资助(2022YFF0608503)


High-Dynamic Noise Robust Speech Recognition Model for Intelligent Inspection in Wind Farms
Author:
Affiliation:

1.Longyuan (Beijing) New Energy Engineering Technology Co. Ltd.;2.iFLYTEK Research;3.School of Microelectronics and Communication Engineering, Chongqing University

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对现有语音识别技术在风电场站高动态噪声环境下语音识别准确率低、专业术语识别困难及终端设备算力有限等问题,本文提出一种高动态噪声鲁棒语音识别模型 HDNR-SRM(High-Dynamic Noise Robust Speech Recognition Model)。该模型融合了交叉并行Transformer神经网络(Cross-Parallel Transformer Neural Networks, CPTNN)与增强Wav2Vec 2.0模型(Enhanced Wav2Vec 2.0, EW2),采用基于CPTNN的时域语音增强模型,生成动态掩码以分离语音与噪声;基于风电场巡检语料库对EW2模型的量化码本进行微调,提升专业词汇的嵌入表示区分度;通过共享特征编码器、跨并行注意力机制等轻量化设计降低算法复杂度,使模型参数量大幅减少,以适配终端设备的实时计算需求。实验结果表明,在信噪比为-10dB的模拟高动态噪声和实际高动态噪声环境下,HDNR-SRM的字符错误率分别为16.8%、18.5%,较对比模型低6.4%-16.7%、7.1%-17.2%;句错误率分别为41.3%、45.2%,较对比模型低8.9%-23.4%、7.9%-23.1%。在实际风电场巡检场景中,经6000小时现场验证,模型专业术语识别准确率达 95.4%,使工单录入效率提升 38.5%,人工记录错误率下降 42.7%。模型在噪声鲁棒性、领域适应性及轻量化部署方面优势显著,为风电场智能化巡检提供了可靠技术支撑。

    Abstract:

    Existing speech recognition technologies are confronted with issues of low speech recognition accuracy, difficulty in recognizing specialized terminology, and limited computational power of terminal devices in the high-dynamic noise environment of wind farms. To address the issues, this paper proposes a High-Dynamic Noise Robust Speech Recognition Model (HDNR-SRM). The model integrates Cross-Parallel Transformer Neural Networks (CPTNN) and Enhanced Wav2Vec 2.0 (EW2). It employs a time-domain speech enhancement model based on CPTNN to generate dynamic masks for separating speech from noise; finetunes the quantized codebook of the EW2 model based on the wind farm inspection corpus to improve the discriminative power of the embedded representations of specialized vocabulary; and reduces algorithm complexity through lightweight designs such as shared feature encoders and cross-parallel attention mechanisms, significantly reducing the model's parameter count to adapt to the real-time computational requirements of terminal devices. Experimental results show that in simulated high-dynamic noise and actual high-dynamic noise environments with a signal-to-noise ratio of -10 dB, the character error rates of HDNR-SRM are 16.8% and 18.5%, respectively, which are 6.4%-16.7% and 7.1%-17.2% lower than those of the comparison models; the sentence error rates are 41.3% and 45.2%, respectively, which are 8.9%-23.4% and 7.9%-23.1% lower than those of the comparison models. In wind farm inspection scenarios, after 6000 hours of on-site verification, the model's recognition accuracy for specialized terminology reaches 95.4%, increasing work efficiency by 38.5% and reducing the manual recording error rate by 42.7%. The model exhibits significant advantages in noise robustness, domain adaptability, and lightweight deployment, providing a reliable technical support for the intelligent inspection of wind farms.

    参考文献
    相似文献
    引证文献
引用本文
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-04-14
  • 最后修改日期:2025-04-25
  • 录用日期:2025-06-16
  • 在线发布日期:
  • 出版日期:
文章二维码