首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >TASNET: TIME-DOMAIN AUDIO SEPARATION NETWORK FOR REAL-TIME, SINGLE-CHANNEL SPEECH SEPARATION
【24h】

TASNET: TIME-DOMAIN AUDIO SEPARATION NETWORK FOR REAL-TIME, SINGLE-CHANNEL SPEECH SEPARATION

机译:TASNet:时间域音频分离网络实时,单通道语音分离

获取原文

摘要

Robust speech processing in multi-talker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in real-time, short latency applications. Most methods attempt to construct a mask for each source in time-frequency representation of the mixture signal which is not necessarily an optimal representation for speech separation. In addition, time-frequency decomposition results in inherent problems such as phase/magnitude decoupling and long time window which is required to achieve sufficient frequency resolution. We propose Time-domain Audio Separation Network (TasNet) to overcome these limitations. We directly model the signal in the time-domain using an encoder-decoder framework and perform the source separation on nonnegative encoder outputs. This method removes the frequency decomposition step and reduces the separation problem to estimation of source masks on encoder outputs which is then synthesized by the decoder. Our system outperforms the current state-of-the-art causal and noncausal speech separation algorithms, reduces the computational cost of speech separation, and significantly reduces the minimum required latency of the output. This makes TasNet suitable for applications where low-power, real-time implementation is desirable such as in hearable and telecommunication devices.
机译:多讲话者环境中的强大语音处理需要有效的语音分离。最近的深度学习系统取得了重大进展,旨在解决这个问题,但它尤其在实时,短期延迟应用中仍然具有挑战性。大多数方法尝试为混合信号的时频表示中的每个源构造一个掩模,这不一定是用于语音分离的最佳表示。另外,时间频分解导致固有问题,例如相位/幅度去耦和长时间窗口,这是实现足够的频率分辨率所必需的。我们提出了时域音频分离网络(TASNet)来克服这些限制。我们使用编码器解码器框架直接在时域中模拟信号,并在非负编码器输出上执行源分离。该方法去除频率分解步骤,并减少了对编码器输出的源掩模估计的分离问题,然后由解码器合成。我们的系统优于当前最先进的因果和非共同语音分离算法,降低了语音分离的计算成本,并显着降低了输出所需的最低延迟。这使得TASNET适用于低功率,实时实现的应用,例如可听见和电信设备。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号