Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks

Lin Zhou; Siyuan Lu; Qiuyue Zhong; Ying Chen; Yibin Tang; Yan Zhou

首页> 外文期刊>Computers, Materials & Continua >Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks

【24h】

Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks

机译：基于长期短时内存网络的双耳语音分离算法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speaker separation in complex acoustic environment is one of challenging tasks in speech separation. In practice, speakers are very often unmoving or moving slowly in normal communication. In this case, the spatial features among the consecutive speech frames become highly correlated such that it is helpful for speaker separation by providing additional spatial information. To fully exploit this information, we design a separation system on Recurrent Neural Network (RNN) with long short-term memory (LSTM) which effectively learns the temporal dynamics of spatial features. In detail, a LSTM-based speaker separation algorithm is proposed to extract the spatial features in each time-frequency (TF) unit and form the corresponding feature vector. Then, we treat speaker separation as a supervised learning problem, where a modified ideal ratio mask (IRM) is defined as the training function during LSTM learning. Simulations show that the proposed system achieves attractive separation performance in noisy and reverberant environments. Specifically, during the untrained acoustic test with limited priors, e.g., unmatched signal to noise ratio (SNR) and reverberation, the proposed LSTM based algorithm can still outperforms the existing DNN based method in the measures of PESQ and STOI. It indicates our method is more robust in untrained conditions.

机译：复杂声学环境中的扬声器分离是语音分离中的具有挑战性的任务之一。在实践中，扬声器通常在正常通信中慢慢地慢慢移动或缓慢移动。在这种情况下，连续语音帧中的空间特征变得高度相关，使得通过提供额外的空间信息，它是有助于扬声器分离。为了充分利用这些信息，我们在经常性神经网络（RNN）上设计了一个具有长短期存储器（LSTM）的分离系统，有效地学习空间特征的时间动态。详细地，提出了一种基于LSTM的扬声器分离算法以在每个时频（TF）单元中提取空间特征，并形成相应的特征向量。然后，我们将扬声器分离视为监督学习问题，其中修改的理想比率掩模（IRM）被定义为LSTM学习期间的训练功能。仿真表明，该系统在嘈杂和混响环境中实现了吸引力的分离性能。具体而具体地，在具有有限的电视机的未训练的声学测试期间，例如，无与伦比的信噪比（SNR）和混响，所提出的基于LSTM的算法仍然可以在PESQ和STOI的测量中优于现有的基于DNN的方法。它表示我们的方法在未训练的条件下更加强大。

著录项

来源
《Computers, Materials & Continua》 |2020年第3期|1373-1386|共14页
作者
Lin Zhou; Siyuan Lu; Qiuyue Zhong; Ying Chen; Yibin Tang; Yan Zhou;
展开▼
作者单位

School of Information Science and Engineering Southeast University Nanjing 210096 China;

School of Information Science and Engineering Southeast University Nanjing 210096 China;

School of Information Science and Engineering Southeast University Nanjing 210096 China;

School of Information Science and Engineering Southeast University Nanjing 210096 China Department of Psychiatry Columbia University and NYSPI New York 10032 USA;

College of Internet of Things Engineering Hohai University Changzhou 213022 China;

College of Internet of Things Engineering Hohai University Changzhou 213022 China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Binaural speech separation; long and short time memory networks; feature vectors; ideal ratio mask;

机译：双耳言语分离;长期短的时间内存网络;特征向量;理想比率面具;

相似文献

外文文献
中文文献
专利

1. Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments [J] . Li Ruwei, Li Tao, Sun Xiaoyue, Applied Acoustics . 2020,第Nova期

机译：基于可靠的双耳线路与双阶段神经网络在嘈杂的环境中的语音分离
2. Enhancing the energy efficiency of wireless-communicated binaural hearing aids for speech separation driven by soft-computing algorithms [J] . R. Gil-Pita, L. Cuadra, E. Alexandre, Applied Soft Computing . 2012,第7期

机译：通过软计算算法提高无线双耳助听器语音分离的能效
3. Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited speech responses [J] . Huang Kun-Yi, Wu Chung-Hsien, Su Ming-Hsiang Pattern Recognition: The Journal of the Pattern Recognition Society . 2019,第期

机译：基于引起的语音响应的关注基于卷积神经网络和长期内记忆的短期内记忆
4. Spatial and coherence cues based time-frequency masking for binaural reverberant speech separation [C] . Alinaghi Atiyeh, Wang Wenwu, Jackson Philip JB IEEE International Conference on Acoustics, Speech and Signal Processing . 2013

机译：基于空间和连贯线索的双频混响语音分离时频掩蔽
5. Short-time Independent Component Analysis for blind separation of speech sources. [D] . Zhang, Jing. 2007

机译：短时独立分量分析，用于语音源的盲分离。
6. Deep Learning Based Binaural Speech Separation in Reverberant Environments [O] . Xueliang Zhang, DeLiang Wang -1

机译：混响环境中基于深度学习的双耳语音分离
7. Spatial and coherence cues based time-frequency masking for binaural reverberant speech separation [O] . Alinaghi, A, Wang, W, Jackson, PJB 2013

机译：基于空间和连贯线索的双耳混响语音分离时频掩蔽

Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks

摘要

著录项

相似文献

相关主题

期刊订阅