基于改进双通道1DCNN的城市声音分类方法

基于改进双通道1DCNN的城市声音分类方法
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        董浙南1董浙南
太原理工大学电子信息与光学工程学院
在期刊界中查找
在百度中查找
在本站中查找
薛珮芸1,2薛珮芸
太原理工大学电子信息与光学工程学院;山西高等创新研究院
在期刊界中查找
在百度中查找
在本站中查找
白静1白静
太原理工大学电子信息与光学工程学院
在期刊界中查找
在百度中查找
在本站中查找
高翔1高翔
太原理工大学电子信息与光学工程学院
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:1.太原理工大学电子信息与光学工程学院;2.山西高等创新研究院
作者简介:
通讯作者:
中图分类号:
基金项目:山西省基础研究计划项目(青年基金)(20210302124544)；山西省应用基础研究计划项目(201901D111094)

Urban sound classification method based on improved dual-channel 1DCNN

Author:

dongzhenan ^¹
dongzhenan
College of Electronic Information and Optical Engineering，Taiyuan University of Technology
在期刊界中查找
在百度中查找
在本站中查找
xuepeiyun ^{^1,2}
xuepeiyun
College of Electronic Information and Optical Engineering，Taiyuan University of Technology；Shanxi Academy of Advanced Research and Innovation
在期刊界中查找
在百度中查找
在本站中查找
baijing ^¹
baijing
College of Electronic Information and Optical Engineering，Taiyuan University of Technology
在期刊界中查找
在百度中查找
在本站中查找
gaoxiang ^¹
gaoxiang
College of Electronic Information and Optical Engineering，Taiyuan University of Technology
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

1.College of Electronic Information and Optical Engineering，Taiyuan University of Technology;2.Shanxi Academy of Advanced Research and Innovation

Fund Project:

Natural Science Foundation of Shanxi Province(20210302124544)；Applied Basic Research Project of Shanxi Province(201901D111094)

摘要

图/表

访问统计

参考文献 [18]

相似文献

引证文献

资源附件

文章评论

摘要:

为了提高城市声音分类的准确率，降低模型应用的难度，提出一种基于改进双通道一维卷积神经网络的城市声音分类方法。首先，对音频的Fbank特征按时间帧和梅尔频段两个不同的方向进行展平得到一维数据；其次，将 AlexNet模型中的二维卷积替换为一维卷积，并对模型结构进行改进，根据不同的展平方式分别对应增加初次卷积的感受野并增加卷积步长以减少特征数据量；最后，利用改进的AlexNet模型和决策融合的方法设计了一种双通道卷积神经网络模型。为了验证该方法的有效性，在UrbanSound8K数据集上进行城市声音分类实验，结果显示该方法的分类准确率达到96.76%，并且能够有效缩小模型体积，便于在存储和计算资源较少的场景中应用。

关键词:城市声音分类;卷积神经网络;Fbank特征展平;改进AlexNet;模型结构;决策融合

Abstract:

A new urban sound classification method based on improved dual-channel one-dimensional convolutional neural network is proposed to improve the accuracy of urban sound classification and reduce the difficulty of model application. Firstly, the Fbank features of audio are flattened according to two different directions of the time frame and Mel frequency band to obtain one-dimensional data. Secondly, the two-dimensional convolution in the AlexNet model is replaced by one-dimensional convolution, and the model structure is improved. Moreover，according to different flattening methods, the receptive field of the first convolution is increased and the convolution step size is also increased to reduce the amount of feature data. Finally, a two-channel convolutional neural network model is designed using the modified AlexNet model and the decision fusion method. To verify the effectiveness of the proposed method, an urban sound classification experiment was carried out on the UrbanSound8K data set. The results show that the classification accuracy of the proposed method is 96.76%, and the size of the model can be effectively reduced, which is convenient for application in the scene with few storage and computing resources.

Key words:Urban sound classification; Convolutional neural networks; Fbank features flatten out; Improving AlexNet; Model structures; Decision fusion

参考文献

[1] Ye J, Kobayashi T, Murakawa M. Urban sound event classification based on local and global features aggregation[J]. Applied Acoustics, 2017, 117: 246-256.

[2] Bai J, Chen C, Chen J F. A multi-feature fusion based method for urban sound tagging[C]//2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Piscataway: IEEE, 2019: 1313-1317.

[3] Bai J S, Chen J F, Wang M. Multimodal urban sound tagging with spatiotemporal context[J]. IEEE Transactions on Cognitive and Developmental Systems, 2023, 15(2): 555-565.

[4] Salamon J, Jacoby C, Bello J P. A dataset and taxonomy for urban sound research[C]//Proceedings of the 2014 ACM Conference on Multimedia. New York: ACM, 2014: 1041-1044.

[5] 刘芙伶,李伟红,龚卫国. 可变形特征图残差网络用于城市声音识别[J].计算机辅助设计与图形学学报, 2020, 32(11): 1853-1862.Liu F L, Li W H, Gong W G. Deformable feature map residual network for urban sound recognition[J]. Journal of Computer-Aided Design Computer Graphics, 2020, 32(11): 1853-1862. (in Chinese)

[6] 曹毅,费鸿博,李平,等. 基于多流卷积和数据增强的声场景分类方法[J]. 华中科技大学学报(自然科学版), 2022, 50(4): 40-46.Cao Y, Fei H B, Li P, et al. Acoustic scene classification method based on multi-stream convolution and data augmentation[J]. Journal of Huazhong University of Science and Technology (Nature Science Edition), 2022, 50(4):40-46. (in Chinese)

[7] Tran T, Pham N T, Lundgren J.A deep learning approach for detecting drill bit failures from a small sound dataset[J]. Scientific Reports, 2022, 12(1): 1-13.

[8] Sinha H, Awasthi V, Ajmera P K. Audio classification using braided convolutional neural networks[J]. IET Signal Processing, 2020, 14(7): 448-454.

[9] Park H, Yoo C D. CNN-Based Learnable Gammatone Filterbank and Equal-Loudness Normalization for Environmental Sound Classification[J]. IEEE Signal Processing Letters, 2020, 27: 411-415.

[10] Ragab M G, Abdulkadir S J, Aziz N, et al. An Ensemble One Dimensional Convolutional Neural Network with Bayesian Optimization for Environmental Sound Classification[J/OL]. Applied Sciences-Basel, 2021, 11(10)[2022-10-18]. https://doi.rog/10.3390/app11104660.

[11] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.

[12] Hershey S, Chaudhuri S, Ellis D P W, et al. CNN architectures for large-Scale audio classification[C]//2017 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2017: 131-135.

[13] 王维博,张斌,曾文入,等. 基于特征融合一维卷积神经网络的电能质量扰动分类[J]. 电力系统保护与控制, 2020, 48(6): 53-60.Wang W B, Zhang B, Zeng W R, et al. Power quality disturbance classification of one-dimensional convolutional neural networks based on feature fusion[J]. Power System Protection and Control, 2020, 48(6): 53-60. (in Chinese)

[14] IOffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning. Cambridge: JMLR, 2015: 448-456.

[15] Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.

[16] Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition[J/OL]. arXiv, 2014: 1409.1556[2022-10-18]. http://arxiv.org/abs/1409.1556.

[17] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. New York: IEEE, 2016: 770-778.

[18] Huang P, Qiu W N. A robust decision fusion strategy for SAR target recognition[J]. Remote Sensing Letters, 2018, 9(6): 507-514.

引用本文

复制

文章指标

点击次数:161
下载次数: 0
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2023-12-05
最后修改日期:2024-01-06
录用日期:2024-02-22
在线发布日期:
出版日期:

期刊社主页

编辑部首页

期刊介绍

编委会

数据库收录

过刊浏览

联系我们

引用本文

相关视频

分享

文章指标

历史

文章二维码

期刊社主页

编辑部首页

期刊介绍

编委会

数据库收录

过刊浏览

联系我们

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码