基于知识蒸馏与ResNet的声纹识别

基于知识蒸馏与ResNet的声纹识别
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        荣玉军1荣玉军
中移（杭州）信息技术有限公司
在期刊界中查找
在百度中查找
在本站中查找
方昳凡2方昳凡
重庆邮电大学 自动化学院
在期刊界中查找
在百度中查找
在本站中查找
田鹏2田鹏
重庆邮电大学 自动化学院
在期刊界中查找
在百度中查找
在本站中查找
程家伟2程家伟
重庆邮电大学 自动化学院
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:1.中移（杭州）信息技术有限公司;2.重庆邮电大学 自动化学院
作者简介:
通讯作者:
中图分类号:TP751
基金项目:教育部-中国移动科研基金资助项目(MCM20180404)。

Voiceprint recognition based on knowledge distillation and ResNet

Author:

RONG Yujun ^¹
RONG Yujun
China Mobile Hangzhou Informat Technol Co Ltd
在期刊界中查找
在百度中查找
在本站中查找
FANG Yifan ^²
FANG Yifan
Chongqing Univ Posts Telecommun,Coll Automat
在期刊界中查找
在百度中查找
在本站中查找
TIAN Peng ^²
TIAN Peng
Chongqing Univ Posts Telecommun,Coll Automat
在期刊界中查找
在百度中查找
在本站中查找
CHENG Jiawei ^²
CHENG Jiawei
Chongqing Univ Posts Telecommun,Coll Automat
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

1.China Mobile Hangzhou Informat Technol Co Ltd;2.Chongqing Univ Posts Telecommun,Coll Automat

Fund Project:

Ministry of Education China Mobile Research Fund

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对声纹识别领域中存在信道失配与对短语音或噪声条件下声纹特征获取不完全的问题，提出一种将传统方法与深度学习相结合的方法，以I-Vector模型作为教师模型对学生模型ResNet进行知识蒸馏。构建基于度量学习的ResNet网络，引入注意力统计池化层，捕获并强调声纹特征的重要信息，提高声纹特征的可区分性。设计联合训练损失函数，将均方根误差(Mean Square Error，MSE)与基于度量学习的损失相结合，降低计算复杂度，增强模型学习能力。最后，利用训练完成的模型进行声纹识别测试，并且与多种深度学习方法下的声纹识别模型相比，等错误率（Equal Error Rate，EER）是最低的，达到了3.229%，表明该模型能够更有效地进行声纹识别。

关键词:深度学习;知识蒸馏;声纹识别;说话人识别

Abstract:

Channel mismatch and incomplete acquisition of voiceprint features under short speech or noise conditions are two thorny problems for voiceprint recognition. This paper proposes a solution that combines traditional techniques with deep learning: A I-Vector model was used as the teacher model to conduct knowledge distillation of the student model ResNet, a ResNet network based on metric learning was constructed, including an attentive statistics pooling layer to capture and emphasize the critical information of voiceprint features and improve the distinguishability of voiceprint features, and the mean square error (MSE) was combined with the loss based on metric learning to reduce computational complexity and enhance model learning capabilities. The trained model was then used for the voiceprint recognition test. Compared with the voiceprint recognition model under various deep learning methods, the equal error rate (EER) was the lowest, and the equal error rate reached 3.229%, indicating that the model can perform voiceprint recognition more effectively.

Key words:deep learning; knowledge distillation; voiceprint recognition; speaker verification

引用本文

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-07-05
最后修改日期:2021-08-05
录用日期:2021-08-30
在线发布日期:
出版日期:

期刊社主页

编辑部首页

期刊介绍

编委会

数据库收录

过刊浏览

联系我们

引用本文

相关视频

分享

文章指标

历史

文章二维码

期刊社主页

编辑部首页

期刊介绍

编委会

数据库收录

过刊浏览

联系我们

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码