首页> 外文OA文献 >Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature
【2h】

Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature

机译:结合了深度神经网络和深度自动编码器的混响语音识别,并增强了电话类功能

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We propose an approach to reverberant speech recognition adopting deep learning in the front-end as well as b a c k-e n d o f a r e v e r b e r a n t s p e e c h r e c o g n i t i o n s y s t e m, a n d a n o v e l m e t h o d t o i m p r o v e t h e d e r e v e r b e r a t i o n p e r f o r m a n c e of the front-end network using phone-class information. At the front-end, we adopt a deep autoencoder (DAE) for enhancing the speech feature parameters, and speech recognition is performed in the back-end using DNN-HMM acoustic models trained on multi-condition data. The system was evaluated through the ASR task in the Reverb Challenge 2014. The DNN-HMM system trained on the multi-condition training set achieved a conspicuously higher word accuracy compared to the MLLR-adapted GMM-HMM system trained on the same data. Furthermore, feature enhancement with the deep autoencoder contributed to the improvement of recognition accuracy especially in the more adverse conditions. While the mapping between reverberant and clean speech in DAE-based dereverberation is conventionally conducted only with the acoustic information, we presume the mapping is also dependent on the phone information. Therefore, we propose a new scheme (pDAE), which augments a phone-class feature to the standard acoustic features as input. Two types of the phone-class feature are investigated. One is the hard recognition result of monophones, and the other is a soft representation derived from the posterior outputs of monophone DNN. The augmented feature in either type results in a significant improvement (7–8 % relative) from the standard DAE.
机译:我们提出了一种在前端采用深度学习的混响语音识别方法以及ab k k-e nd o af e e e e e e e e e e r e e e e r e e e r e e e r e e e r e e e r e e e r在前端,我们采用深度自动编码器(DAE)来增强语音特征参数,并使用在多条件数据上训练的DNN-HMM声学模型在后端执行语音识别。该系统是通过Reverb Challenge 2014中的ASR任务进行评估的。与在相同数据上训练的MLLR适配的GMM-HMM系统相比,在多条件训练集上训练的DNN-HMM系统实现了明显更高的单词准确性。此外,深度自动编码器的功能增强有助于提高识别精度,尤其是在更恶劣的条件下。虽然在基于DAE的混响中混响和清晰语音之间的映射通常仅使用声学信息进行,但我们假设映射也依赖于电话信息。因此,我们提出了一种新方案(pDAE),该方案将电话类功能扩展为标准声学功能作为输入。研究了两种类型的电话类功能。一个是单声道电话的硬识别结果,另一个是从单声道电话DNN的后输出得出的软表示。两种类型的增强功能均比标准DAE显着改善(相对值7-8%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号