语谱特征的身份认证向量识别方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

重庆市教育成果转化基金资助项目(KJZH14207)。


An i-vector speaker recognition method based on spectrogram
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对采用梅尔频率倒谱系数(mel-frequency cepstrum coefficient,MFCC)作为身份认证向量(identity vector,i-vector)进行说话人识别存在语音信息不全的问题,提出一种基于语谱特征的身份认证向量识别说话人的方法。语音信号经过预加重、分帧加窗预处理之后,通过短时傅立叶变换转换成语谱图,语谱图被提交到高斯通用背景模型,在高维均值超向量空间中选择合适的低维线性子空间流型结构以构造符合正态分布的向量作为身份认证向量。这些获取的身份认证向量经过线性判别性分析实现降维并存储。最后采用对数似然比(log-likelihood ratio,LLR)方法对训练和测试阶段的i-vector进行评分,完成说话人识别。以TIMIT数据库为标准的数值实验结果表明,相比采用MFCC作为特征的识别方法,研究的等错误率(equal error rate,EER)更低。

    Abstract:

    An i-vector speaker recognition method using spectral features was proposed to solve the problem that there is always insufficient information when the mel-frequency cepstrum coefficients (MFCC) are used as feature vectors of i-vectors. Specifically, the speech signals are pre-emphasized, framed and windowed first, and then fed to the short-time Fourier transform to obtain spectrogram. These spectrograms are submitted into Gaussian universal background model for constructing the i-vectors in an appropriate low-dimensional linear subspace flow pattern. These vectors are conformed to normal distribution and reduced by linear discriminant analysis. Finally, Log-likelihood ratio (LLR) method is used for marking i-vectors in training and testing stage to complete the speaker recognition. Standard numerical experiment results with TIMIT database show that compared with recognition method using MFCC as features, the EER(equal error rate) of the method in this paper is lower.

    参考文献
    相似文献
    引证文献
引用本文

冯辉宗,王芸芳.语谱特征的身份认证向量识别方法[J].重庆大学学报,2017,40(5):88-94.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2016-10-21
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2017-06-03
  • 出版日期:
文章二维码