Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions

Longbiao WANG; Kazue MINAMI; Kazumasa YAMAMOTO; Seiichi NAKAGAWA

首页> 外文期刊>IEICE Transactions on Information and Systems >Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions

【24h】

Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions

机译：噪声条件下结合MFCC和相位信息进行说话人识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. In our previous study, a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech was proposed, and the performance of the combination of the phase information and MFCCs was remarkably better than that of MFCCs. In this paper, we evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. Spectral subtraction, a method skipping frames with low energy/Signal-to-Noise (SN) and noisy speech training models are used to analyze the effect of the phase information and MFCCs in noisy conditions. The NTT database and the JNAS (Japanese Newspaper Article Sentences) database added with stationaryon-stationary noise were used to evaluate our proposed method. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By deleting unreliable frames (frames having low energy/SN), the speaker identification performance was improved significantly. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.

机译：在本文中，我们研究了相位在嘈杂条件下识别说话人的有效性，并将相位信息与梅尔频率倒谱系数（MFCC）相结合。迄今为止，即使在嘈杂的条件下，几乎所有的说话人识别方法都基于MFCC。对于主要捕获声道信息的MFCC，仅使用时域语音帧的傅立叶变换的幅度，而忽略了相位信息。由于相位信息包括丰富的语音源信息，因此可以期望相位信息和MFCC的高度互补。此外，一些研究报告了基于相位的特征对噪声具有鲁棒性。在我们之前的研究中，提出了一种相位信息提取方法，该方法可以根据输入语音的剪切位置对相位变化变化进行归一化，并且相位信息和MFCC的组合性能明显优于MFCC。在本文中，我们评估了所提出的相位信息在嘈杂条件下用于说话人识别的鲁棒性。频谱减法，一种跳过具有低能量/信噪比（SN）的帧和噪声语音训练模型的方法，用于分析噪声条件下相位信息和MFCC的影响。使用NTT数据库和添加了平稳/非平稳噪声的JNAS（日本报纸文章句子）数据库来评估我们提出的方法。对于干净的语音，MFCC的性能优于相位信息。另一方面，对于嘈杂的语音，相位信息的降级明显小于MFCC。通过干净的语音训练模型，在许多情况下，相位信息的单独结果甚至比MFCC更好。通过删除不可靠的帧（具有低能量/ SN的帧），说话人识别性能得到显着提高。通过将相位信息与MFCC集成在一起，与基于MFCC的标准方法相比，说话人识别错误的降低率约为30％-60％。

著录项

来源
《IEICE Transactions on Information and Systems》 |2010年第9期|P.2397-2406|共10页
作者
Longbiao WANG; Kazue MINAMI; Kazumasa YAMAMOTO; Seiichi NAKAGAWA;
展开▼
作者单位

Shizuoka University, Hamamatsu-shi, 432-8560 Japan;

rnToyohashi University of Technology, Toyohashi-shi, 441-8580 Japan;

rnToyohashi University of Technology, Toyohashi-shi, 441-8580 Japan;

rnToyohashi University of Technology, Toyohashi-shi, 441-8580 Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
speaker identification; phase information; MFCC; noisy environment; GMM;

机译：说话人识别;阶段信息;MFCC;嘈杂的环境;GMM;
入库时间 2022-08-18 00:27:04

相似文献

外文文献
中文文献
专利

1. Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions [J] . Longbiao WANG, Kazue MINAMI, Kazumasa YAMAMOTO, IEICE transactions on information and systems . 2010,第9期

机译：噪声条件下结合MFCC和相位信息进行说话人识别
2. Improving short utterance speaker verification by combining MFCC and Entrocy in Noisy conditions [J] . Al-karawi Khamis A., Mohammed Duraid Y. Multimedia Tools and Applications . 2021,第14期

机译：通过在嘈杂的条件下结合MFCC和entocy改进短语扬声器验证
3. Combining evidence from residual phase and MFCC features for speaker recognition [J] . Murty K.S.R., Yegnanarayana B. IEEE signal processing letters . 2006,第1期

机译：结合残余相位和MFCC功能的证据进行说话人识别
4. Speaker identification by combining MFCC and phase information in noisy environments [C] . Wang, Longbiao, Minami, Kazue, Yamamoto, Kazumasa, IEEE International Conference on Acoustics Speech and Signal;ICASSP 2010 . 2010

机译：通过在嘈杂的环境中结合MFCC和相位信息来识别说话人
5. Characterization of Speaker Recognition in Noisy Channels [D] . Ghilduta, Robert 2012

机译：嘈杂渠道中扬声器识别的特征
6. Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions [O] . Youngja Nam, Chankyu Lee 2021

机译：级联卷积神经网络架构用于嘈杂的条件下的语音情感识别
7. Speaker Identification by Combining MFCC and Kohonen Neural Networks in Noisy Environments [O] . 2018

机译：通过将MFCC和Kohonen神经网络组合在嘈杂环境中的扬声器识别

Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions

摘要

著录项

相似文献

相关主题

期刊订阅