首页> 外文期刊>EURASIP journal on audio, speech, and music processing >Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
【24h】

Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

机译:基于深度神经网络的瓶颈特征和基于去噪自动编码器的去混响用于远距离说话者识别

获取原文
       

摘要

Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a combination of these two approaches is proposed. For the DNN-based bottleneck feature, we noted that DNNs can transform the reverberant speech feature to a new feature space with greater discriminative classification ability for distant-talking speaker recognition. Conversely, cepstral domain DAE-based dereverberation tries to suppress the reverberation by mapping the cepstrum of reverberant speech to that of clean speech with the expectation of improving the performance of distant-talking speaker recognition. Since the DNN-based discriminant bottleneck feature and DAE-based dereverberation have a strong complementary nature, the combination of these two methods is expected to be very effective for distant-talking speaker identification. A speaker identification experiment was performed on a distant-talking speech set, with reverberant environments differing from the training environments. In suppressing late reverberation, our method outperformed some state-of-the-art dereverberation approaches such as the multichannel least mean squares (MCLMS). Compared with the MCLMS, we obtained a reduction in relative error rates of 21.4% for the bottleneck feature and 47.0% for the autoencoder feature. Moreover, the combination of likelihoods of the DNN-based bottleneck feature and DAE-based dereverberation further improved the performance. Keywords Speaker recognition Bottleneck features Denoising autoencoder Deep neural network Reverberant speech
机译:基于深度神经网络(DNN)的方法已被证明在许多自动语音识别系统中是有效的。但是,很少有作品专注于DNN来进行远距离说话者识别。在这项研究中,提出了一种基于DNN的瓶颈特征和基于倒谱域降噪自编码器(DAE)的去混响技术,用于远距离说话者识别,并提出了这两种方法的组合。对于基于DNN的瓶颈特征,我们注意到DNN可以将混响语音特征转换为具有更高判别分类能力的新特征空间,以用于远距离说话者识别。相反,基于倒谱域DAE的去混响试图通过将混响语音的倒谱映射到纯语音的倒谱来抑制混响,以期改善远距离说话者的识别性能。由于基于DNN的判别瓶颈功能和基于DAE的去混响具有很强的互补性,因此,这两种方法的结合对于远程说话者识别非常有效。说话人识别实验是在远距讲话的语音集上进行的,混响环境与训练环境不同。在抑制后期混响方面,我们的方法优于一些最新的混响方法,例如多通道最小均方(MCLMS)。与MCLMS相比,瓶颈功能的相对错误率降低了21.4%,自动编码器功能的相对错误率降低了47.0%。此外,基于DNN的瓶颈特征和基于DAE的混响的可能性的组合进一步提高了性能。关键词说话人识别瓶颈特征去噪自编码器深层神经网络混响语音

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号