首页> 外文期刊>The Journal of the Acoustical Society of America >Long short-term memory for speaker generalization in supervised speech separation
【24h】

Long short-term memory for speaker generalization in supervised speech separation

机译:监督言论分离中的扬声器概括的长短期内存

获取原文
获取原文并翻译 | 示例
           

摘要

Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker-and noise-independent speech separation. (C) 2017 Acoustical Society of America.
机译:可以将语音分离制定为学习,以估计从噪声语音中提取的声学特征的时频掩模。对于监督的言语分离,看不见的噪音和看不见的发言者的概括是一个关键问题。尽管深度神经网络(DNN)已经成功地在无关的噪声 - 独立的语音分离中,但DNN是有限的,用于建模大量扬声器。为了提高扬声器泛化,提出了一种基于长短期存储器(LSTM)的分离模型,其自然地占语音的时间动态。系统评估表明,所提出的模型在客观语音清晰度方面基本上超越了基于DNN的扬声器和看不见的噪声。分析LSTM内部表示显示,LSTM捕获了长期语音上下文。还发现LSTM模型对低延迟语音分离更有利的是,如果没有未来的帧,它比未来帧更好地执行DNN模型。所提出的模型代表了扬声器和无关的语音分离的有效方法。 (c)2017年声学社会。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号