...
首页> 外文期刊>Multimedia Tools and Applications >Deep recurrent neural networks based binaural speech segregation for the selection of closest target of interest
【24h】

Deep recurrent neural networks based binaural speech segregation for the selection of closest target of interest

机译:基于深度递归神经网络的双耳语音分离,用于选择最接近的目标

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

An auditory attention model that consists of binaural source segregation and also full localization of a target speech signal in a multi-talker environment is presented. The joint acoustic features, such as monaural, binaural and direct to reverberant ratio (DRR) that are successfully incorporated into deep recurrent neural network (DRNN) based joint discriminative model for the speech source segregation process. The monaural and binaural features are extracted from binaural speech mixtures of two speakers by using mean Hilbert envelope coefficients (MHEC) and interaural time, and level differences, respectively. The performance of deep recurrent network based speech segregation is validated in terms of signal to interference, signal to distortion and signal to artifacts and compared with existing architectures, including deep neural network (DNN). The proposed system is observed and found to be more suitable than monaural speech segregation especially when the desired target and interfering sources are located at different positions. The study also proposes full localization of segregated speech source that created the possibility to select the desired speaker of interest from an input acoustic speech mixture in a reverberant environment. The developed system has the capability to handle binaural segregation problem in multi-source and reverberation conditions. The auditory attention model provides accurate information about speech sources even when the desired targets are located at 2 m and above with higher reverberation time.
机译:提出了一种听觉注意模型,该模型由双耳源隔离以及在多通话者环境中目标语音信号的完全定位组成。联合声学特征(例如单声道,双声道和直接混响比(DRR))已成功地合并到基于深度递归神经网络(DRNN)的语音判别过程的联合判别模型中。通过分别使用平均希尔伯特包络系数(MHEC)和双耳时间以及电平差从两个说话人的双耳混合语音中提取单耳和双耳特征。基于信噪比,信噪比失真和信噪比伪像,验证了基于深度递归网络的语音分离性能,并与包括深度神经网络(DNN)在内的现有体系结构进行了比较。观察并发现所提出的系统比单声道语音隔离更合适,尤其是当所需目标和干扰源位于不同位置时。该研究还提出了隔离语音源的完全定位,这创造了可能性,可以在混响环境中从输入的语音混合中选择所需的目标说话者。开发的系统具有处理多源和混响条件下的双耳分离问题的能力。即使期望的目标位于2 m以上且具有较高的混响时间,听觉注意力模型也可以提供有关语音来源的准确信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号