Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

Zhaofeng Zhang; Longbiao Wang; Atsuhiko Kai; Takanori Yamada; Weifeng Li; Masahiro Iwahashi

首页> 外文期刊>EURASIP journal on audio, speech, and music processing >Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

【24h】

Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

机译：基于深度神经网络的瓶颈特征和基于去噪自动编码器的去混响用于远距离说话者识别

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a combination of these two approaches is proposed. For the DNN-based bottleneck feature, we noted that DNNs can transform the reverberant speech feature to a new feature space with greater discriminative classification ability for distant-talking speaker recognition. Conversely, cepstral domain DAE-based dereverberation tries to suppress the reverberation by mapping the cepstrum of reverberant speech to that of clean speech with the expectation of improving the performance of distant-talking speaker recognition. Since the DNN-based discriminant bottleneck feature and DAE-based dereverberation have a strong complementary nature, the combination of these two methods is expected to be very effective for distant-talking speaker identification. A speaker identification experiment was performed on a distant-talking speech set, with reverberant environments differing from the training environments. In suppressing late reverberation, our method outperformed some state-of-the-art dereverberation approaches such as the multichannel least mean squares (MCLMS). Compared with the MCLMS, we obtained a reduction in relative error rates of 21.4% for the bottleneck feature and 47.0% for the autoencoder feature. Moreover, the combination of likelihoods of the DNN-based bottleneck feature and DAE-based dereverberation further improved the performance. Keywords Speaker recognition Bottleneck features Denoising autoencoder Deep neural network Reverberant speech

机译：基于深度神经网络（DNN）的方法已被证明在许多自动语音识别系统中是有效的。但是，很少有作品专注于DNN来进行远距离说话者识别。在这项研究中，提出了一种基于DNN的瓶颈特征和基于倒谱域降噪自编码器（DAE）的去混响技术，用于远距离说话者识别，并提出了这两种方法的组合。对于基于DNN的瓶颈特征，我们注意到DNN可以将混响语音特征转换为具有更高判别分类能力的新特征空间，以用于远距离说话者识别。相反，基于倒谱域DAE的去混响试图通过将混响语音的倒谱映射到纯语音的倒谱来抑制混响，以期改善远距离说话者的识别性能。由于基于DNN的判别瓶颈功能和基于DAE的去混响具有很强的互补性，因此，这两种方法的结合对于远程说话者识别非常有效。说话人识别实验是在远距讲话的语音集上进行的，混响环境与训练环境不同。在抑制后期混响方面，我们的方法优于一些最新的混响方法，例如多通道最小均方（MCLMS）。与MCLMS相比，瓶颈功能的相对错误率降低了21.4％，自动编码器功能的相对错误率降低了47.0％。此外，基于DNN的瓶颈特征和基于DAE的混响的可能性的组合进一步提高了性能。关键词说话人识别瓶颈特征去噪自编码器深层神经网络混响语音

著录项

来源
《EURASIP journal on audio, speech, and music processing》 |2015年第1期|共13页
作者
Zhaofeng Zhang; Longbiao Wang; Atsuhiko Kai; Takanori Yamada; Weifeng Li; Masahiro Iwahashi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词
入库时间 2022-08-18 10:16:46

相似文献

外文文献
中文文献
专利

1. Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition [J] . Ren Bo, Wang Longbiao, Lu Liang, Multimedia Tools and Applications . 2016,第9期

机译：瓶颈特征提取与去混响相结合，用于远距离语音识别
2. Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization [J] . Ueda Yuma, Wang Longbiao, Kai Atsuhiko, Journal of signal processing systems for signal, image, and video technology . 2016,第2期

机译：结合去噪自动编码器和时间结构归一化的单通道去混响用于远距离语音识别
3. Robust and Fast Temperature Extraction for Brillouin Optical Time-Domain Analyzer by Using Denoising Autoencoder-Based Deep Neural Networks [J] . Wang Biwei, Guo Nan, Wang Liang, IEEE sensors journal . 2020,第7期

机译：Brillouin光学时域分析仪通过基于去噪的深度神经网络的鲁棒和快速提取
4. Improvement of distant-talking speaker identification using bottleneck features of DNN [C] . Takanori Yamada, Longbiao Wang, Atsuhiko Kai Conference of the International Speech Communication Association . 2013

机译：使用DNN的瓶颈特征改进遥远的扬声器识别
5. A Framework for Enhancing Speaker Age and Gender Classification by Using a New Feature Set and Deep Neural Network Architectures [D] . Abumallouh, Arafat. 2017

机译：通过使用新功能集和深度神经网络体系结构提高演讲者年龄和性别分类的框架
6. Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments [O] . Kyoungjin Noh, Joon-Hyuk Chang 2020

机译：基于深度神经网络的混响和波束成形的联合优化用于多通道环境中的声音事件检测
7. Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification [O] . Zhaofeng Zhang, Longbiao Wang, Atsuhiko Kai, 2015

机译：基于深度神经网络的瓶颈特征和基于去噪自动编码器的去混响用于远距离说话者识别

Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

摘要

著录项

相似文献

相关主题

期刊订阅