首页> 美国卫生研究院文献>The Journal of the Acoustical Society of America >Long short-term memory for speaker generalization in supervised speech separation
【2h】

Long short-term memory for speaker generalization in supervised speech separation

机译:长时短时记忆用于监督语音分离中的说话人泛化

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker- and noise-independent speech separation.
机译:语音分离可以公式化为学习从嘈杂语音中提取的声学特征来估计时频掩码。对于有监督的语音分离,泛化成看不见的声音和看不见的说话者是一个关键问题。尽管深度神经网络(DNN)在与噪声无关的语音分离中已取得成功,但DNN在建模大量说话者方面受到限制。为了提高说话者的通用性,提出了一种基于长短期记忆(LSTM)的分离模型,该模型自然地考虑了语音的时间动态。系统评估表明,在客观语音清晰度方面,针对看不见的说话者和看不见的噪音,所提出的模型明显优于基于DNN的模型。分析LSTM内部表示可以发现LSTM捕获了长期的语音环境。还发现,LSTM模型在低延迟语音分离方面更具优势,并且在没有未来帧的情况下,其性能要优于具有未来帧的DNN模型。所提出的模型代表了一种独立于说话者和噪声的语音分离的有效方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号