...
首页> 外文期刊>Computers, Materials & Continua >Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks
【24h】

Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks

机译:基于长期短时内存网络的双耳语音分离算法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Speaker separation in complex acoustic environment is one of challenging tasks in speech separation. In practice, speakers are very often unmoving or moving slowly in normal communication. In this case, the spatial features among the consecutive speech frames become highly correlated such that it is helpful for speaker separation by providing additional spatial information. To fully exploit this information, we design a separation system on Recurrent Neural Network (RNN) with long short-term memory (LSTM) which effectively learns the temporal dynamics of spatial features. In detail, a LSTM-based speaker separation algorithm is proposed to extract the spatial features in each time-frequency (TF) unit and form the corresponding feature vector. Then, we treat speaker separation as a supervised learning problem, where a modified ideal ratio mask (IRM) is defined as the training function during LSTM learning. Simulations show that the proposed system achieves attractive separation performance in noisy and reverberant environments. Specifically, during the untrained acoustic test with limited priors, e.g., unmatched signal to noise ratio (SNR) and reverberation, the proposed LSTM based algorithm can still outperforms the existing DNN based method in the measures of PESQ and STOI. It indicates our method is more robust in untrained conditions.
机译:复杂声学环境中的扬声器分离是语音分离中的具有挑战性的任务之一。在实践中,扬声器通常在正常通信中慢慢地慢慢移动或缓慢移动。在这种情况下,连续语音帧中的空间特征变得高度相关,使得通过提供额外的空间信息,它是有助于扬声器分离。为了充分利用这些信息,我们在经常性神经网络(RNN)上设计了一个具有长短期存储器(LSTM)的分离系统,有效地学习空间特征的时间动态。详细地,提出了一种基于LSTM的扬声器分离算法以在每个时频(TF)单元中提取空间特征,并形成相应的特征向量。然后,我们将扬声器分离视为监督学习问题,其中修改的理想比率掩模(IRM)被定义为LSTM学习期间的训练功能。仿真表明,该系统在嘈杂和混响环境中实现了吸引力的分离性能。具体而具体地,在具有有限的电视机的未训练的声学测试期间,例如,无与伦比的信噪比(SNR)和混响,所提出的基于LSTM的算法仍然可以在PESQ和STOI的测量中优于现有的基于DNN的方法。它表示我们的方法在未训练的条件下更加强大。

著录项

  • 来源
    《Computers, Materials & Continua》 |2020年第3期|1373-1386|共14页
  • 作者单位

    School of Information Science and Engineering Southeast University Nanjing 210096 China;

    School of Information Science and Engineering Southeast University Nanjing 210096 China;

    School of Information Science and Engineering Southeast University Nanjing 210096 China;

    School of Information Science and Engineering Southeast University Nanjing 210096 China Department of Psychiatry Columbia University and NYSPI New York 10032 USA;

    College of Internet of Things Engineering Hohai University Changzhou 213022 China;

    College of Internet of Things Engineering Hohai University Changzhou 213022 China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Binaural speech separation; long and short time memory networks; feature vectors; ideal ratio mask;

    机译:双耳言语分离;长期短的时间内存网络;特征向量;理想比率面具;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号