首页> 外文期刊>Journal of Intelligent Systems >Speaker Identification Using Empirical Mode Decomposition-Based Voice Activity Detection Algorithm under Realistic Conditions
【24h】

Speaker Identification Using Empirical Mode Decomposition-Based Voice Activity Detection Algorithm under Realistic Conditions

机译:真实条件下基于经验模式分解的语音活动检测算法的说话人识别

获取原文
获取原文并翻译 | 示例
       

摘要

Speaker recognition (SR) under mismatched conditions is a challenging task. Speech signal is non-linear and nonstationary, and therefore, difficult to analyze under realistic conditions. Also, in real conditions, the nature of the noise present in speech data is not known a priori. In such cases, the performance of speaker identification (SI) or speaker verification (SV) degrades considerably under realistic conditions. Any SR system uses a voice activity detector (VAD) as the front-end subsystem of the whole system. The performance of most VADs deteriorates at the front end of the SR task or system under degraded conditions or in realistic conditions where noise plays a major role. Recently, speech data analysis and processing using Norden E. Huang's empirical mode decomposition (EMD) combined with Hilbert transform, commonly referred to as Hilbert-Huang transform (HHT), has become an emerging trend. EMD is an a posteriori, adaptive, data analysis tool used in time domain that is widely accepted by the research community. Recently, speech data analysis and speech data processing for speech recognition and SR tasks using EMD have been increasing. EMD-based VAD has become an important adaptive subsystem of the SR system that mostly mitigates the effect of mismatch between the training and the testing phase. Recently, we have developed a VAD algorithm using a zero-frequency filter-assisted peaking resonator (ZFFPR) and EMD. In this article, the efficacy of an EMD-based VAD algorithm is studied at the front end of a text-independent language-independent SI task for the speaker's data collected in three languages at five different places, such as home, street, laboratory, college campus, and restaurant, under realistic conditions using EDIROL-R09 HR, a 24-bit wav/MP3 recorder. The performance of this proposed SI task is compared against the traditional energy-based VAD in terms of percentage identification rate. In both cases, widely accepted Mel frequency cepstral coefficients are computed by employing frame processing (20-ms frame size and 10-ms frame shift) from the extracted voiced speech regions using the respective VAD techniques from the realistic speech utterances, and are used as a feature vector for speaker modeling using popular Gaussian mixture models. The experimental results showed that the proposed SI task with the VAD algorithm using ZFFPR and EMD at its front end performs better than the SI task with short-term energy-based VAD when used at its front end, and is somewhat encouraging.
机译:在不匹配条件下的说话人识别(SR)是一项艰巨的任务。语音信号是非线性且不稳定的,因此在现实条件下难以分析。而且,在真实条件下,语音数据中存在的噪声的性质也不是先验已知的。在这种情况下,说话人识别(SI)或说话人验证(SV)的性能在实际条件下会大大降低。任何SR系统都使用语音活动检测器(VAD)作为整个系统的前端子系统。在降级条件下或在噪声起主要作用的实际条件下,大多数VAD的性能在SR任务或系统的前端都会变差。最近,使用Norden E. Huang的经验模式分解(EMD)结合Hilbert变换(通常称为Hilbert-Huang变换(HHT))进行语音数据分析和处理已成为一种新兴趋势。 EMD是时域使用的一种后验,自适应数据分析工具,已被研究界广泛接受。近来,用于使用EMD的语音识别和SR任务的语音数据分析和语音数据处理已经在增加。基于EMD的VAD已成为SR系统的重要自适应子系统,主要减轻了训练和测试阶段之间不匹配的影响。最近,我们开发了一种使用零频率滤波器辅助峰值谐振器(ZFFPR)和EMD的VAD算法。在本文中,在与文本无关的语言无关的SI任务的前端,研究了基于EMD的VAD算法的有效性,该任务针对在五个不同地点(例如家庭,街道,实验室,大学校园和餐厅,在现实条件下使用24位WAV / MP3录音机EDIROL-R09 HR。在百分比识别率方面,将本提议的SI任务的性能与传统的基于能源的VAD进行了比较。在这两种情况下,广泛采用的梅尔频率倒谱系数是通过使用提取的有声语音区域的帧处理(20 ms帧大小和10 ms帧移位),使用来自现实语音发声的相应VAD技术来计算的,并用作使用流行的高斯混合模型进行说话人建模的特征向量。实验结果表明,在前端使用ZFFPR和EMD的带有VAD算法的SI任务比在前端使用基于能量的短期VAD的SI任务表现更好,并且令人鼓舞。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号