Deep recurrent neural networks based binaural speech segregation for the selection of closest target of interest

Venkatesan R.; Ganesh A. Balaji

首页> 外文期刊>Multimedia Tools and Applications >Deep recurrent neural networks based binaural speech segregation for the selection of closest target of interest

【24h】

Deep recurrent neural networks based binaural speech segregation for the selection of closest target of interest

机译：基于深度递归神经网络的双耳语音分离，用于选择最接近的目标

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An auditory attention model that consists of binaural source segregation and also full localization of a target speech signal in a multi-talker environment is presented. The joint acoustic features, such as monaural, binaural and direct to reverberant ratio (DRR) that are successfully incorporated into deep recurrent neural network (DRNN) based joint discriminative model for the speech source segregation process. The monaural and binaural features are extracted from binaural speech mixtures of two speakers by using mean Hilbert envelope coefficients (MHEC) and interaural time, and level differences, respectively. The performance of deep recurrent network based speech segregation is validated in terms of signal to interference, signal to distortion and signal to artifacts and compared with existing architectures, including deep neural network (DNN). The proposed system is observed and found to be more suitable than monaural speech segregation especially when the desired target and interfering sources are located at different positions. The study also proposes full localization of segregated speech source that created the possibility to select the desired speaker of interest from an input acoustic speech mixture in a reverberant environment. The developed system has the capability to handle binaural segregation problem in multi-source and reverberation conditions. The auditory attention model provides accurate information about speech sources even when the desired targets are located at 2 m and above with higher reverberation time.

机译：提出了一种听觉注意模型，该模型由双耳源隔离以及在多通话者环境中目标语音信号的完全定位组成。联合声学特征（例如单声道，双声道和直接混响比（DRR））已成功地合并到基于深度递归神经网络（DRNN）的语音判别过程的联合判别模型中。通过分别使用平均希尔伯特包络系数（MHEC）和双耳时间以及电平差从两个说话人的双耳混合语音中提取单耳和双耳特征。基于信噪比，信噪比失真和信噪比伪像，验证了基于深度递归网络的语音分离性能，并与包括深度神经网络（DNN）在内的现有体系结构进行了比较。观察并发现所提出的系统比单声道语音隔离更合适，尤其是当所需目标和干扰源位于不同位置时。该研究还提出了隔离语音源的完全定位，这创造了可能性，可以在混响环境中从输入的语音混合中选择所需的目标说话者。开发的系统具有处理多源和混响条件下的双耳分离问题的能力。即使期望的目标位于2 m以上且具有较高的混响时间，听觉注意力模型也可以提供有关语音来源的准确信息。

著录项

来源
《Multimedia Tools and Applications》 |2018年第15期|20129-20156|共28页
作者
Venkatesan R.; Ganesh A. Balaji;
展开▼
作者单位

Velammal Engn Coll, Dept Elect & Elect Engn, Elect Syst Design Lab, Madras 600066, Tamil Nadu, India;

Velammal Engn Coll, Dept Elect & Elect Engn, Elect Syst Design Lab, Madras 600066, Tamil Nadu, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep recurrent neural network; binaural speech segregation; distance and position information; Computational Auditory Scene Analysis; direct-to-reverberant ratio(DRR);

机译：深度递归神经网络;双耳语音分离;距离和位置信息;计算听觉场景分析;直接混响比（DRR）;

相似文献

外文文献
中文文献
专利

1. Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion Recognition [J] . Zhang Shiqing, Chen Aihua, Guo Wenping, Quality Control, Transactions . 2020,第期

机译：学习深层卷积神经网络的深层双耳陈述，用于自发言论情绪识别
2. Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation [J] . Kim Han-Gyu, Jang Gil-Jin, Oh Yung-Hwan, Journal of supercomputing . 2020,第10期

机译：使用反复性神经网络进行语音和音乐音调轨迹分类，用于单一语音隔离
3. Speech selection and environmental adaptation for asynchronous speech recording based on deep neural network [J] . BO REN, LONGBIAO WANG, ATSUHIKO KAI 電子情報通信学会技術研究報告. 音声. Speech . 2014,第365期

机译：基于深度神经网络的异步语音录制的语音选择和环境适应
4. A regression approach to binaural speech segregation via deep neural network [C] . Nana Fan, Jun Du, Li-Rong Dai International Symposium on Chinese Spoken Language Processing . 2016

机译：深度神经网络的双耳语音分离回归方法
5. Deep Neural Language Model for Text Classification Based on Convolutional and Recurrent Neural Networks [D] . Hassan, Abdalraouf. 2018

机译：基于卷积神经网络和递归神经网络的深度神经语言文本分类模型
6. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility [O] . Thomas Bentsen, Tobias May, Abigail A. Kressner, 2012

机译：在计算语音隔离中将深度神经网络架构与理想比率掩码估计相结合的好处，可以提高语音清晰度
7. Binaural speech separation using recurrent timing neural networks for joint F0-localisation estimation [O] . Stuart N. Wrigley, Guy J. Brown 2008

机译：使用循环定时神经网络进行联合F0定位估计的双耳语音分离

Deep recurrent neural networks based binaural speech segregation for the selection of closest target of interest

摘要

著录项

相似文献

相关主题

期刊订阅