首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Teacher-student Deep Clustering for Low-delay Single Channel Speech Separation
【24h】

Teacher-student Deep Clustering for Low-delay Single Channel Speech Separation

机译:师生学生深群,用于低延延迟单频言语分离

获取原文
获取外文期刊封面目录资料

摘要

The recently-proposed deep clustering algorithm introduced significant advances in monaural speaker-independent multi-speaker speech separation. Deep clustering operates on magnitude spectro-grams using bidirectional recurrent networks and K-means clustering, both of which require offline operation, i.e., algorithm latency is longer than utterance length. This paper evaluates architectures for reduced latency deep clustering by combining: (1) block processing to efficiently propagate the memory encoded by the recurrent network, and (2) teacher-student learning, where low-latency models learn from an offline teacher. Compared to our best performing offline model, we only lose 0.3 dB SDR at a latency of 1.2 seconds and 0.7 dB SDR at a latency of 0.6 seconds on the publicly available wsj0-2mix dataset. Moreover, by providing a detailed analysis of the failure cases for our low-latency speech separation models, we show that the cause of this performance gap is related to frame-level permutation errors, where the network fails to accurately track speaker identity throughout an utterance.
机译:最近建议的深度聚类算法在单声道扬声器无关的多扬声器语音分离中引入了显着进展。深度聚类使用双向反复网络和K-means聚类在幅度谱克上运行,两者都需要离线操作,即,算法延迟比话语长度长。本文通过组合:(1)块处理来评估减少延迟深度聚类的架构,以有效地传播经常性网络编码的内存,以及(2)教师学习,其中低延迟模型从离线教师学习。与我们最佳执行的离线模型相比,我们在公共可用的WSJ0-2MIX数据集中,我们只丢失了1.2秒和0.7 dB SDR的延迟0.7 dB SDR。此外,通过提供对我们的低延迟语音分离模型的故障情况进行详细分析,我们表明这种性能差距的原因与帧级排列错误有关,网络无法在整个话语中准确地跟踪扬声器标识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号