Teacher-student Deep Clustering for Low-delay Single Channel Speech Separation

机译：师生学生深群，用于低延延迟单频言语分离

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The recently-proposed deep clustering algorithm introduced significant advances in monaural speaker-independent multi-speaker speech separation. Deep clustering operates on magnitude spectro-grams using bidirectional recurrent networks and K-means clustering, both of which require offline operation, i.e., algorithm latency is longer than utterance length. This paper evaluates architectures for reduced latency deep clustering by combining: (1) block processing to efficiently propagate the memory encoded by the recurrent network, and (2) teacher-student learning, where low-latency models learn from an offline teacher. Compared to our best performing offline model, we only lose 0.3 dB SDR at a latency of 1.2 seconds and 0.7 dB SDR at a latency of 0.6 seconds on the publicly available wsj0-2mix dataset. Moreover, by providing a detailed analysis of the failure cases for our low-latency speech separation models, we show that the cause of this performance gap is related to frame-level permutation errors, where the network fails to accurately track speaker identity throughout an utterance.

机译：最近建议的深度聚类算法在单声道扬声器无关的多扬声器语音分离中引入了显着进展。深度聚类使用双向反复网络和K-means聚类在幅度谱克上运行，两者都需要离线操作，即，算法延迟比话语长度长。本文通过组合：（1）块处理来评估减少延迟深度聚类的架构，以有效地传播经常性网络编码的内存，以及（2）教师学习，其中低延迟模型从离线教师学习。与我们最佳执行的离线模型相比，我们在公共可用的WSJ0-2MIX数据集中，我们只丢失了1.2秒和0.7 dB SDR的延迟0.7 dB SDR。此外，通过提供对我们的低延迟语音分离模型的故障情况进行详细分析，我们表明这种性能差距的原因与帧级排列错误有关，网络无法在整个话语中准确地跟踪扬声器标识。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|p666-1327|共5页
会议地点
作者
Ryo Aihara; Toshiyuki Hanazawa; Yohei Okato; Gordon Wichern; Jonathan Le Roux;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
cocktail party problem; speaker-independent speech separation; deep clustering; low-latency; chimera network;

机译：鸡尾酒会问题;扬声器 - 独立的言语分离;深群;低延迟;嵌合网络;

相似文献

外文文献
中文文献
专利

1. Deep clustering-based single-channel speech separation and recent advances [J] . Ryo Aihara, Gordon Wichern, Jonathan Le Roux Acoustical science and technology . 2020,第2期

机译：基于深度聚类的单通道语音分离和最近的进步
2. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
3. Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation [J] . Saleem Nasir, Khattak Muhammad Irfan Applied Acoustics . 2020,第Octa期

机译：基于深度神经网络的单通道扬声器独立多讲车语音分离二进制分类
4. Teacher-student Deep Clustering for Low-delay Single Channel Speech Separation [C] . Ryo Aihara, Toshiyuki Hanazawa, Yohei Okato, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：师生深度聚类用于低延迟单通道语音分离
5. Single-channel Speech Separation Based on Instantaneous Frequency. [D] . Gu, Lingyun. 2010

机译：基于瞬时频率的单通道语音分离。
6. Impact of phase estimation on single-channel speech separation based on time-frequency masking [O] . Florian Mayer, Donald S. Williamson, Pejman Mowlaee, -1

机译：基于时频掩蔽的相位估计对单通道语音分离的影响
7. Deep clustering-based single-channel speech separation and recent advances [O] . Ryo Aihara, Gordon Wichern, Jonathan Le Roux 2020

机译：基于深度聚类的单通道语音分离和最近的进步

Teacher-student Deep Clustering for Low-delay Single Channel Speech Separation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅