Abstract:Existing speech recognition technologies are confronted with issues of low speech recognition accuracy, difficulty in recognizing specialized terminology, and limited computational power of terminal devices in the high-dynamic noise environment of wind farms. To address the issues, this paper proposes a High-Dynamic Noise Robust Speech Recognition Model (HDNR-SRM). The model integrates Cross-Parallel Transformer Neural Networks (CPTNN) and Enhanced Wav2Vec 2.0 (EW2). It employs a time-domain speech enhancement model based on CPTNN to generate dynamic masks for separating speech from noise; finetunes the quantized codebook of the EW2 model based on the wind farm inspection corpus to improve the discriminative power of the embedded representations of specialized vocabulary; and reduces algorithm complexity through lightweight designs such as shared feature encoders and cross-parallel attention mechanisms, significantly reducing the model's parameter count to adapt to the real-time computational requirements of terminal devices. Experimental results show that in simulated high-dynamic noise and actual high-dynamic noise environments with a signal-to-noise ratio of -10 dB, the character error rates of HDNR-SRM are 16.8% and 18.5%, respectively, which are 6.4%-16.7% and 7.1%-17.2% lower than those of the comparison models; the sentence error rates are 41.3% and 45.2%, respectively, which are 8.9%-23.4% and 7.9%-23.1% lower than those of the comparison models. In wind farm inspection scenarios, after 6000 hours of on-site verification, the model's recognition accuracy for specialized terminology reaches 95.4%, increasing work efficiency by 38.5% and reducing the manual recording error rate by 42.7%. The model exhibits significant advantages in noise robustness, domain adaptability, and lightweight deployment, providing a reliable technical support for the intelligent inspection of wind farms.