首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Deep Learning for Talker-Dependent Reverberant Speaker Separation: An Empirical Study
【24h】

Deep Learning for Talker-Dependent Reverberant Speaker Separation: An Empirical Study

机译:深度学习用于依赖于说话者的混响说话人分离:一项实证研究

获取原文
获取原文并翻译 | 示例

摘要

Speaker separation refers to the problem of separating speech signals from a mixture of simultaneous speakers. Previous studies are limited to addressing the speaker separation problem in anechoic conditions. This paper addresses the problem of talker-dependent speaker separation in reverberant conditions, which are characteristic of real-world environments. We employ recurrent neural networks with bidirectional long short-term memory (BLSTM) to separate and dereverberate the target speech signal. We propose two-stage networks to effectively deal with both speaker separation and speech dereverheration. In the two-stage model, the first stage separates and dereverberates two-talker mixtures and the second stage further enhances the separated target signal. We have extensively evaluated the two-stage architecture, and our empirical results demonstrate large improvements over unprocessed mixtures and clear performance gain over single-stage networks in a wide range of target-to-interferer ratios and reverberation times in simulated as well as recorded rooms. Moreover, we show that time-frequency masking yields better performance than spectral mapping for reverberant speaker separation.
机译:说话人分离是指从同时说话的人的混合物中分离语音信号的问题。先前的研究仅限于解决消声条件下的说话人分离问题。本文解决了混响条件下与说话者相关的说话者分离问题,这是现实环境的特征。我们采用具有双向长短期记忆(BLSTM)的递归神经网络来分离和去皮目标语音信号。我们提出了两个阶段的网络来有效地处理说话人分离和语音去粘连。在两阶段模型中,第一阶段分离并去除了两个通话者的混合气,第二阶段进一步增强了分离的目标信号。我们已经对两阶段体系结构进行了广泛的评估,我们的经验结果表明,在模拟室和录音室中,在各种目标干扰比和混响时间范围内,未经处理的混合物都有很大的改进,并且在单级网络上具有明显的性能提升。此外,我们表明,对于混响说话人分离,时频掩膜比频谱映射具有更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号