【24h】

Demystifying TasNet: A Dissecting Approach

机译:揭开TasNet神秘面纱:一种剖析方法

获取原文

摘要

In recent years time domain speech separation has excelled over frequency domain separation in single channel scenarios and noise-free environments. In this paper we dissect the gains of the time-domain audio separation network (TasNet) approach by gradually replacing components of an utterance-level permutation invariant training (u-PIT) based separation system in the frequency domain until the TasNet system is reached, thus blending components of frequency domain approaches with those of time domain approaches. Some of the intermediate variants achieve comparable signal-to-distortion ratio (SDR) gains to TasNet, but retain the advantage of frequency domain processing: compatibility with classic signal processing tools such as frequency-domain beamforming and the human interpretability of the masks. Furthermore, we show that the scale invariant signal-to-distortion ratio (si-SDR) criterion used as loss function in TasNet is related to a logarithmic mean square error criterion and that it is this criterion which contributes most reliable to the performance advantage of TasNet. Finally, we critically assess which gains in a noise-free single channel environment generalize to more realistic reverberant conditions.
机译:近年来,在单通道场景和无噪声环境中,时域语音分离优于频域分离。在本文中,我们通过逐步在频域中替换基于发声级置换不变训练(u-PIT)的分离系统的组件,直到达到TasNet系统,来剖析时域音频分离网络(TasNet)方法的收益,因此,将频域方法的组件与时域方法的组件混合在一起。某些中间变体可实现与TasNet相当的信噪比(SDR)增益,但保留了频域处理的优势:与经典信号处理工具(如频域波束形成)和口罩的人为解释性兼容。此外,我们表明,在TasNet中用作损耗函数的尺度不变信噪比(si-SDR)准则与对数均方误差准则有关,正是该准则对提高性能具有最可靠的贡献。 TasNet。最后,我们严格评估在无噪声单声道环境中哪些增益可以推广到更现实的混响条件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号