首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
【24h】

Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation

机译:蒙版和深度递归神经网络的联合优化用于单声道源分离

获取原文
获取原文并翻译 | 示例

摘要

Monaural source separation is important for many real world applications. It is challenging because, with only a single channel of information available, without any constraints, an infinite number of solutions are possible. In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including speech separation, singing voice separation, and speech denoising. The joint optimization of the deep recurrent neural networks with an extra masking layer enforces a reconstruction constraint. Moreover, we explore a discriminative criterion for training neural networks to further enhance the separation performance. We evaluate the proposed system on the TSP, MIR-1K, and TIMIT datasets for speech separation, singing voice separation, and speech denoising tasks, respectively. Our approaches achieve 2.30–4.98 dB SDR gain compared to NMF models in the speech separation task, 2.30–2.48 dB GNSDR gain and 4.32–5.42 dB GSIR gain compared to existing models in the singing voice separation task, and outperform NMF and DNN baselines in the speech denoising task.
机译:单声道信号源分离对于许多实际应用很重要。之所以具有挑战性,是因为只有一个信息通道可用,没有任何限制,所以可能有无数种解决方案。在本文中,我们探索了针对单声道源分离任务(包括语音分离,歌声分离和语音降噪)的屏蔽功能和深度递归神经网络的联合优化。具有额外掩膜层的深度递归神经网络的联合优化实施了重构约束。此外,我们探索了训练神经网络的判别标准,以进一步提高分离性能。我们在TSP,MIR-1K和TIMIT数据集上分别针对语音分离,歌声分离和语音去噪任务评估了所提出的系统。与语音分离任务中的现有模型相比,与语音分离任务中的NMF模型相比,我们的方法可实现2.30–4.98 dB SDR增益;与现有模型相比,我们的方法可实现2.30–2.48 dB GNSDR增益和4.32–5.42 dB GSIR增益,并在语音去噪任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号