Long short-term memory for speaker generalization in supervised speech separation

Chen Jitong; Wang DeLiang

首页> 外文期刊>The Journal of the Acoustical Society of America >Long short-term memory for speaker generalization in supervised speech separation

【24h】

Long short-term memory for speaker generalization in supervised speech separation

机译：监督言论分离中的扬声器概括的长短期内存

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker-and noise-independent speech separation. (C) 2017 Acoustical Society of America.

机译：可以将语音分离制定为学习，以估计从噪声语音中提取的声学特征的时频掩模。对于监督的言语分离，看不见的噪音和看不见的发言者的概括是一个关键问题。尽管深度神经网络（DNN）已经成功地在无关的噪声 - 独立的语音分离中，但DNN是有限的，用于建模大量扬声器。为了提高扬声器泛化，提出了一种基于长短期存储器（LSTM）的分离模型，其自然地占语音的时间动态。系统评估表明，所提出的模型在客观语音清晰度方面基本上超越了基于DNN的扬声器和看不见的噪声。分析LSTM内部表示显示，LSTM捕获了长期语音上下文。还发现LSTM模型对低延迟语音分离更有利的是，如果没有未来的帧，它比未来帧更好地执行DNN模型。所提出的模型代表了扬声器和无关的语音分离的有效方法。（c）2017年声学社会。

著录项

来源
《The Journal of the Acoustical Society of America》 |2017年第6期|共10页
作者
Chen Jitong; Wang DeLiang;
展开▼
作者单位

Ohio State Univ Dept Comp Sci &

Engn Columbus OH 43210 USA;

Ohio State Univ Dept Comp Sci &

Engn Columbus OH 43210 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类声学;
关键词

相似文献

外文文献
中文文献
专利

1. Long short-term memory for speaker generalization in supervised speech separation [J] . Chen Jitong, Wang DeLiang The Journal of the Acoustical Society of America . 2017,第6期

机译：监督言论分离中的扬声器概括的长短期内存
2. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
3. Speech separation using speaker-adapted eigenvoice speech models [J] . Ron J. Weiss, Daniel P.W. Ellis Computer speech and language . 2010,第1期

机译：使用说话者自适应的本征语音模型进行语音分离
4. Long Short-Term Memory for Speaker Generalization in Supervised Speech Separation [C] . Jitong Chen, DeLiang Wang Annual Conference of the International Speech Communication Association . 2016

机译：监督言论分离中的扬声器概括的长短期内存
5. On Generalization of Supervised Speech Separation [D] . Chen, Jitong. 2017

机译：有监督语音分离的一般化
6. Long short-term memory for speaker generalization in supervised speech separation [O] . Jitong Chen, DeLiang Wang -1

机译：长时短时记忆用于监督语音分离中的说话人泛化
7. CASA BASED SUPERVISED SINGLE CHANNEL SPEAKER INDEPENDENT SPEECH SEPARATION [O] . M.Fazal Ur Rehman 2019

机译：基于CASA的监督单通道扬声器独立语音分离
8. Noise Perturbation Improves Supervised Speech Separation. [R] . Chen, J., Wang, Y., Wang, D. 2014

机译：噪声扰动改善了监督语音分离。

Long short-term memory for speaker generalization in supervised speech separation

摘要

著录项

相似文献

相关主题

期刊订阅