Long short-term memory for speaker generalization in supervised speech separation

机译：长时短时记忆用于监督语音分离中的说话人泛化

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker- and noise-independent speech separation.

机译：语音分离可以公式化为学习从嘈杂语音中提取的声学特征来估计时频掩码。对于有监督的语音分离，泛化成看不见的声音和看不见的说话者是一个关键问题。尽管深度神经网络（DNN）在与噪声无关的语音分离中已取得成功，但DNN在建模大量说话者方面受到限制。为了提高说话者的通用性，提出了一种基于长短期记忆（LSTM）的分离模型，该模型自然地考虑了语音的时间动态。系统评估表明，在客观语音清晰度方面，针对看不见的说话者和看不见的噪音，所提出的模型明显优于基于DNN的模型。分析LSTM内部表示可以发现LSTM捕获了长期的语音环境。还发现，LSTM模型在低延迟语音分离方面更具优势，并且在没有未来帧的情况下，其性能要优于具有未来帧的DNN模型。所提出的模型代表了一种独立于说话者和噪声的语音分离的有效方法。

著录项

期刊名称 The Journal of the Acoustical Society of America
作者
Jitong Chen; DeLiang Wang;
展开▼
作者单位

展开▼
年(卷),期 -1(141),6
年度 -1
页码 4705–4714
总页数 10
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Long short-term memory for speaker generalization in supervised speech separation [J] . Chen Jitong, Wang DeLiang The Journal of the Acoustical Society of America . 2017,第6期

机译：监督言论分离中的扬声器概括的长短期内存
2. Multi-speaker speech synthesis and speaker adaptation based on deep bidirectional long short-term memory recurrent neural network [J] . Yi ZHAO, Nobuaki MINEMATSU, Daisuke SAITO 電子情報通信学会技術研究報告. 音声. Speech . 2015,第346期

机译：基于深度双向长短期记忆递归神经网络的多说话人语音合成与说话人自适应
3. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
4. Long Short-Term Memory for Speaker Generalization in Supervised Speech Separation [C] . Jitong Chen, DeLiang Wang Annual Conference of the International Speech Communication Association . 2016

机译：监督言论分离中的扬声器概括的长短期内存
5. On Generalization of Supervised Speech Separation [D] . Chen, Jitong. 2017

机译：有监督语音分离的一般化
6. Ordered short-term memory differs in signers and speakers: Implications for models of short-term memory [O] . Daphne Bavelier, Elissa L. Newport, Matt Hall, -1

机译：签名者和说话者的有序短期记忆有所不同：短期记忆模型的含义
7. CASA BASED SUPERVISED SINGLE CHANNEL SPEAKER INDEPENDENT SPEECH SEPARATION [O] . M.Fazal Ur Rehman 2019

机译：基于CASA的监督单通道扬声器独立语音分离
8. Noise Perturbation Improves Supervised Speech Separation. [R] . Chen, J., Wang, Y., Wang, D. 2014

机译：噪声扰动改善了监督语音分离。

Long short-term memory for speaker generalization in supervised speech separation

摘要

著录项

相似文献

相关主题

期刊订阅