[关键词]
[摘要]
为了提高城市声音分类的准确率,降低模型应用的难度,提出一种基于改进双通道一维卷积神经网络的城市声音分类方法。首先,对音频的Fbank特征按时间帧和梅尔频段两个不同的方向进行展平得到一维数据;其次,将 AlexNet模型中的二维卷积替换为一维卷积,并对模型结构进行改进,根据不同的展平方式分别对应增加初次卷积的感受野并增加卷积步长以减少特征数据量;最后,利用改进的AlexNet模型和决策融合的方法设计了一种双通道卷积神经网络模型。为了验证该方法的有效性,在UrbanSound8K数据集上进行城市声音分类实验,结果显示该方法的分类准确率达到96.76%,并且能够有效缩小模型体积,便于在存储和计算资源较少的场景中应用。
[Key word]
[Abstract]
A new urban sound classification method based on improved dual-channel one-dimensional convolutional neural network is proposed to improve the accuracy of urban sound classification and reduce the difficulty of model application. Firstly, the Fbank features of audio are flattened according to two different directions of the time frame and Mel frequency band to obtain one-dimensional data. Secondly, the two-dimensional convolution in the AlexNet model is replaced by one-dimensional convolution, and the model structure is improved. Moreover,according to different flattening methods, the receptive field of the first convolution is increased and the convolution step size is also increased to reduce the amount of feature data. Finally, a two-channel convolutional neural network model is designed using the modified AlexNet model and the decision fusion method. To verify the effectiveness of the proposed method, an urban sound classification experiment was carried out on the UrbanSound8K data set. The results show that the classification accuracy of the proposed method is 96.76%, and the size of the model can be effectively reduced, which is convenient for application in the scene with few storage and computing resources.
[中图分类号]
[基金项目]
山西省基础研究计划项目(青年基金)(20210302124544);山西省应用基础研究计划项目(201901D111094)