首页> 外文期刊>IEICE Transactions on Information and Systems >Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions
【24h】

Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions

机译:噪声条件下结合MFCC和相位信息进行说话人识别

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. In our previous study, a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech was proposed, and the performance of the combination of the phase information and MFCCs was remarkably better than that of MFCCs. In this paper, we evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. Spectral subtraction, a method skipping frames with low energy/Signal-to-Noise (SN) and noisy speech training models are used to analyze the effect of the phase information and MFCCs in noisy conditions. The NTT database and the JNAS (Japanese Newspaper Article Sentences) database added with stationaryon-stationary noise were used to evaluate our proposed method. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By deleting unreliable frames (frames having low energy/SN), the speaker identification performance was improved significantly. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.
机译:在本文中,我们研究了相位在嘈杂条件下识别说话人的有效性,并将相位信息与梅尔频率倒谱系数(MFCC)相结合。迄今为止,即使在嘈杂的条件下,几乎所有的说话人识别方法都基于MFCC。对于主要捕获声道信息的MFCC,仅使用时域语音帧的傅立叶变换的幅度,而忽略了相位信息。由于相位信息包括丰富的语音源信息,因此可以期望相位信息和MFCC的高度互补。此外,一些研究报告了基于相位的特征对噪声具有鲁棒性。在我们之前的研究中,提出了一种相位信息提取方法,该方法可以根据输入语音的剪切位置对相位变化变化进行归一化,并且相位信息和MFCC的组合性能明显优于MFCC。在本文中,我们评估了所提出的相位信息在嘈杂条件下用于说话人识别的鲁棒性。频谱减法,一种跳过具有低能量/信噪比(SN)的帧和噪声语音训练模型的方法,用于分析噪声条件下相位信息和MFCC的影响。使用NTT数据库和添加了平稳/非平稳噪声的JNAS(日本报纸文章句子)数据库来评估我们提出的方法。对于干净的语音,MFCC的性能优于相位信息。另一方面,对于嘈杂的语音,相位信息的降级明显小于MFCC。通过干净的语音训练模型,在许多情况下,相位信息的单独结果甚至比MFCC更好。通过删除不可靠的帧(具有低能量/ SN的帧),说话人识别性能得到显着提高。通过将相位信息与MFCC集成在一起,与基于MFCC的标准方法相比,说话人识别错误的降低率约为30%-60%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号