A Hybrid Approach to Combining Conventional and Deep Learning Techniques for Single-Channel Speech Enhancement and Recognition

机译：一种混合方法，以结合常规和深层学习技术进行单通道语音增强和识别

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Conventional speech-enhancement techniques employ statistical signal-processing algorithms. They are computationally efficient and improve speech quality even under unknown noise conditions. For these reasons, they are preferred for deployment in unpredictable environments. One limitation of these algorithms is that they fail to suppress non-stationary noise. This hinders their broad usage. Emerging algorithms based on deep-learning promise to overcome this limitation of conventional methods. However, these algorithms under-perform when presented with noise conditions that were not captured in the training data. In this paper, we propose a single-channel speech-enhancement technique that combines the benefits of both worlds to achieve the best listening-quality and recognition-accuracy under conditions of noise that are both unknown and nonstationary. Our method utilizes a conventional speech-enhancement algorithm to produce an intermediate representation of the input data by multiplying noisy input spectrogram features with gain vectors (known as the suppression rule). We process this intermediate representation through a recurrent neural-network based on long short-term memory (LSTM) units. Furthermore, we train this network to jointly learn two targets: a direct estimate of clean-speech features and a noise-reduction mask. Based on this LSTM multi-style training (LSTM-MT) architecture, we demonstrate PESQ improvement of 0.76 and a relative word-error rate reduction of 47.73%.

机译：传统的语音增强技术采用统计信号处理算法。即使在未知的噪声条件下，它们也是在计算上有效的，提高语音质量。由于这些原因，它们是在不可预测的环境中进行部署的。这些算法的一个限制是它们无法抑制非静止噪声。这阻碍了他们的广泛用法。基于深度学习承诺的新兴算法来克服传统方法的这种限制。但是，当呈现未在训练数据中未捕获的噪声条件呈现时，这些算法。在本文中，我们提出了一种单通道语音增强技术，将两种世界的好处结合在噪声均为未知和非间断的条件下实现最佳聆听质量和识别准确性。我们的方法利用传统的语音增强算法通过将具有增益向量的噪声输入频谱图特征乘以增益向量（称为抑制规则）来产生输入数据的中间表示。我们通过基于长短期内存（LSTM）单元的经常性神经网络来处理该中间表示。此外，我们培训该网络共同学习两个目标：直接估计清洁语音特征和降噪掩模。基于这一LSTM多风格培训（LSTM-MT）架构，我们证明了0.76的PESQ改善，相对词汇率降低47.73％。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2018年|p1892-2535|共5页
会议地点
作者
Yan-Hui Tu; Ivan Tashev; Shuayb Zarar; Chin-Hui Lee;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
statistical speech enhancement; speech recognition; deep learning; recurrent networks;

机译：统计语言增强;语音识别;深入学习;经常性网络;

相似文献

外文文献
中文文献
专利

1. Single-Channel Speech Enhancement Techniques for Distant Speech Recognition [J] . Jaya Kumar Ashwini, Ramaswamy Kumaraswamy Journal of Intelligent Systems . 2013,第2期

机译：用于远距离语音识别的单通道语音增强技术
2. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
3. Noise robust speech recognition system using multimodal audio-visual approach using different deep learning classification techniques [J] . Eslam E. El Maghraby, Amr M. Gody, Mohamed Hesham Farouk International Journal of Advanced Computer Research . 2020,第47期

机译：利用不同深度学习分类技术，使用多模式视听方法的噪声强大语音识别系统
4. A HYBRID APPROACH TO COMBINING CONVENTIONAL AND DEEP LEARNING TECHNIQUES FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION [C] . Yan-Hui Tu, Ivan Tashev, Shuayb Zarar, IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：一种结合单通道语音增强和识别的传统和深层学习技术的混合方法
5. A Combined Statistical and Machine Learning Approach For Single Channel Speech Enhancement [D] . Tseng, Hung-Wei 2015

机译：统计和机器学习相结合的单通道语音增强方法
6. Recognition of Mould Colony on Unhulled Paddy Based on Computer Vision using Conventional Machine-learning and Deep Learning Techniques [O] . Ke Sun, Zhengjie Wang, Kang Tu, -1

机译：基于机器视觉的传统机器学习和深度学习技术对去壳稻霉菌菌落的识别
7. DeepLPC: A Deep Learning Approach to Augmented Kalman Filter-Based Single-Channel Speech Enhancement [O] . Sujan Kumar Roy, Aaron Nicolson, Kuldip K. Paliwal 2021

机译：Deeplpc：基于Kalman滤波器的单声道语音增强的深度学习方法

A Hybrid Approach to Combining Conventional and Deep Learning Techniques for Single-Channel Speech Enhancement and Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅