首页> 外文期刊>The Journal of the Acoustical Society of Japan >Integrating pitch and LPC-residual information with LPC-cepstrum for text-independent speaker recognition
【24h】

Integrating pitch and LPC-residual information with LPC-cepstrum for text-independent speaker recognition

机译:将音高和LPC残余信息与LPC倒谱相集成,以实现与文本无关的说话人识别

获取原文
获取原文并翻译 | 示例
       

摘要

In the speaker recognition, when the cepstral coefficients are calculated form the LPC analysis parameters, the prediction error, or LPC residual signal, is usually ignored. However, there is an evidense that it contains a speaker specific information. The fundamental frequency of the speech signal or the pitch, which is usually extracted from the LPC residual, has been used for speaker recognition purposes, but because of the high intraspeaker variability of the pitch it is also often ignored. This paper describes our approach to integrating the pitch and LPC-residual with the LPC-cepstrum in a Gaussian Mixture Model (GMM) based speaker recognition system. The pitch and/or LPC-residual are considered as an additional features to the main LPC derived cepstral coefficients and are represented as a logarithm of the F_0 and as a filter bank mel frequency cepstral (MFCC) vector respectively. The second task of this research was to verify whether the correlation between the different information sources is useful for the speaker recognition task. For the experiments we used the NTT database consisting of high quality speech samples. The speaker recognition system was evaluated in three modes-integrating only pitch or only LPC-residual and integrating both of them. The results showed that adding the pitch gives significant improvement only when the correlation between the pitch and cepstral coefficients is used. Adding only LPC-residual also gives significant improvement, but in contrast to the pitch, using the correlation with the cepstral coefficients does not have big effect. The best results we achieved using both the pitch and LPC-residual and are 98.5% speaker identification rate and 0.21% speaker verification equal error rate compared to 97.0% and 1.07% of the baseline system respectively.
机译:在说话人识别中,当根据LPC分析参数计算倒频谱系数时,通常会忽略预测误差或LPC残留信号。但是,有证据表明它包含说话者特定的信息。通常从LPC残差中提取的语音信号或基音的基本频率已用于说话人识别,但由于扬声器内音高的可变性,通常也将其忽略。本文介绍了在基于高斯混合模型(GMM)的说话者识别系统中,将音高和LPC残余与LPC倒谱进行集成的方法。音调和/或LPC残差被视为主要LPC导出的倒谱系数的附加特征,并分别表示为F_0的对数和滤波器组梅尔频率倒谱(MFCC)矢量。这项研究的第二个任务是验证不同信息源之间的相关性是否对说话人识别任务有用。对于实验,我们使用由高质量语音样本组成的NTT数据库。对说话人识别系统进行了三种模式的评估-仅整合音高或仅整合LPC残余,并将两者融合。结果表明,仅当使用音高和倒谱系数之间的相关性时,增加音高才能获得显着改善。仅添加LPC残差也会带来显着改善,但与音高相比,使用与倒谱系数的相关性不会产生太大影响。我们使用音高和LPC残余均获得了最佳结果,说话人识别率和说话人验证相等错误率分别为98.5%和97.0%和1.07%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号