High-Dynamic Noise Robust Speech Recognition Model for Intelligent Inspection in Wind Farms
DOI:
CSTR:
Author:
Affiliation:

1.Longyuan (Beijing) New Energy Engineering Technology Co. Ltd.;2.iFLYTEK Research;3.School of Microelectronics and Communication Engineering, Chongqing University

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Existing speech recognition technologies are confronted with issues of low speech recognition accuracy, difficulty in recognizing specialized terminology, and limited computational power of terminal devices in the high-dynamic noise environment of wind farms. To address the issues, this paper proposes a High-Dynamic Noise Robust Speech Recognition Model (HDNR-SRM). The model integrates Cross-Parallel Transformer Neural Networks (CPTNN) and Enhanced Wav2Vec 2.0 (EW2). It employs a time-domain speech enhancement model based on CPTNN to generate dynamic masks for separating speech from noise; finetunes the quantized codebook of the EW2 model based on the wind farm inspection corpus to improve the discriminative power of the embedded representations of specialized vocabulary; and reduces algorithm complexity through lightweight designs such as shared feature encoders and cross-parallel attention mechanisms, significantly reducing the model's parameter count to adapt to the real-time computational requirements of terminal devices. Experimental results show that in simulated high-dynamic noise and actual high-dynamic noise environments with a signal-to-noise ratio of -10 dB, the character error rates of HDNR-SRM are 16.8% and 18.5%, respectively, which are 6.4%-16.7% and 7.1%-17.2% lower than those of the comparison models; the sentence error rates are 41.3% and 45.2%, respectively, which are 8.9%-23.4% and 7.9%-23.1% lower than those of the comparison models. In wind farm inspection scenarios, after 6000 hours of on-site verification, the model's recognition accuracy for specialized terminology reaches 95.4%, increasing work efficiency by 38.5% and reducing the manual recording error rate by 42.7%. The model exhibits significant advantages in noise robustness, domain adaptability, and lightweight deployment, providing a reliable technical support for the intelligent inspection of wind farms.

    Reference
    Related
    Cited by
Get Citation
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 14,2025
  • Revised:April 25,2025
  • Adopted:June 16,2025
  • Online:
  • Published:
Article QR Code