首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >A Hybrid Approach to Combining Conventional and Deep Learning Techniques for Single-Channel Speech Enhancement and Recognition
【24h】

A Hybrid Approach to Combining Conventional and Deep Learning Techniques for Single-Channel Speech Enhancement and Recognition

机译:一种混合方法,以结合常规和深层学习技术进行单通道语音增强和识别

获取原文
获取外文期刊封面目录资料

摘要

Conventional speech-enhancement techniques employ statistical signal-processing algorithms. They are computationally efficient and improve speech quality even under unknown noise conditions. For these reasons, they are preferred for deployment in unpredictable environments. One limitation of these algorithms is that they fail to suppress non-stationary noise. This hinders their broad usage. Emerging algorithms based on deep-learning promise to overcome this limitation of conventional methods. However, these algorithms under-perform when presented with noise conditions that were not captured in the training data. In this paper, we propose a single-channel speech-enhancement technique that combines the benefits of both worlds to achieve the best listening-quality and recognition-accuracy under conditions of noise that are both unknown and nonstationary. Our method utilizes a conventional speech-enhancement algorithm to produce an intermediate representation of the input data by multiplying noisy input spectrogram features with gain vectors (known as the suppression rule). We process this intermediate representation through a recurrent neural-network based on long short-term memory (LSTM) units. Furthermore, we train this network to jointly learn two targets: a direct estimate of clean-speech features and a noise-reduction mask. Based on this LSTM multi-style training (LSTM-MT) architecture, we demonstrate PESQ improvement of 0.76 and a relative word-error rate reduction of 47.73%.
机译:传统的语音增强技术采用统计信号处理算法。即使在未知的噪声条件下,它们也是在计算上有效的,提高语音质量。由于这些原因,它们是在不可预测的环境中进行部署的。这些算法的一个限制是它们无法抑制非静止噪声。这阻碍了他们的广泛用法。基于深度学习承诺的新兴算法来克服传统方法的这种限制。但是,当呈现未在训练数据中未捕获的噪声条件呈现时,这些算法。在本文中,我们提出了一种单通道语音增强技术,将两种世界的好处结合在噪声均为未知和非间断的条件下实现最佳聆听质量和识别准确性。我们的方法利用传统的语音增强算法通过将具有增益向量的噪声输入频谱图特征乘以增益向量(称为抑制规则)来产生输入数据的中间表示。我们通过基于长短期内存(LSTM)单元的经常性神经网络来处理该中间表示。此外,我们培训该网络共同学习两个目标:直接估计清洁语音特征和降噪掩模。基于这一LSTM多风格培训(LSTM-MT)架构,我们证明了0.76的PESQ改善,相对词汇率降低47.73%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号