首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective
【24h】

Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective

机译:基于深度学习的说话人分离相位重建:三角学视角

获取原文
获取外文期刊封面目录资料

摘要

This study investigates phase reconstruction for deep learning based monaural talker-independent speaker separation in the short-time Fourier transform (STFT) domain. The key observation is that, for a mixture oftwo sources, with their magnitudes accurately estimated and under a geometric constraint, the absolute phase difference between each source and the mixture can be uniquely determined; in addition, the source phases at each time-frequency T - F unit can be narrowed down to only two candidates. To pick the right candidate, we propose three algorithms based on iterative phase reconstruction, group delay estimation, and phase-difference sign prediction. State-of-the-art results are obtained on the publicly available wsj0-2mix and 3 mix corpus.
机译:这项研究调查了在短时傅立叶变换(STFT)域中基于深度学习的基于单声道不依赖于说话者的说话人分离的相位重建。关键的观察结果是,对于两个源的混合物,在准确估计其大小的情况下并在几何约束下,可以唯一确定每个源与混合物之间的绝对相位差。另外,每个时频TF单元的源相位可以缩小到只有两个候选。为了选择合适的候选人,我们提出了三种基于迭代相位重建,群时延估计和相位差符号预测的算法。最新的结果是通过可公开获得的wsj0-2mix和3 mix语料库获得的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号