A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech

Yan-Hui Tu; Jun Du; Chin-Hui Lee

首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech

【24h】

A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a novel speaker-dependent (SD) multi-condition (MC) training approach to joint learning of deep neural networks (DNNs) of acoustic models and an explicit speech separation structure for recognition of multi-talker mixed speech in a single-channel setting. First, an MC acoustic modeling framework is established to train a SD-DNN model in multi-talker scenarios. Such a recognizer significantly reduces the decoding complexity and improves the recognition accuracy over those using speaker-independent DNN models with a complicated joint decoding structure assuming the speaker identities in mixed speech are known. In addition, a SD regression DNN for mapping the acoustic features of mixed speech to the speech features of a target speaker is jointly trained with the SD-DNN based acoustic models. Experimental results on Speech Separation Challenge (SSC) small-vocabulary recognition show that the proposed approach under multi-condition training achieves an average word error rate (WER) of 3.8%, yielding a relative WER reduction of 65.1% from a top performance, DNN-based pre-processing only approach we proposed earlier under clean-condition training (Tu et al. 2016). Furthermore, the proposed joint training DNN framework generates a relative WER reduction of 13.2% from state-of-the-art systems under multi-condition training. Finally, the effectiveness of the proposed approach is also verified on the Wall Street Journal (WSJ0) task with medium-vocabulary continuous speech recognition in a simulated multi-talker setting.

机译：我们提出了一种新颖的说话人相关（SD）多条件（MC）训练方法，用于声学模型的深层神经网络（DNN）的联合学习，以及一种明确的语音分离结构，用于在单通道中识别多说话者混合语音设置。首先，建立了MC声学建模框架，以在多通话者场景中训练SD-DNN模型。假设已知混合语音中的说话者身份，这种识别器与使用具有复杂联合解码结构的独立于说话者的DNN模型相比，可大大降低解码复杂度并提高识别精度。另外，用于将混合语音的声学特征映射到目标说话者的语音特征的SD回归DNN与基于SD-DNN的声学模型共同训练。语音分离挑战（SSC）小词汇识别的实验结果表明，该方法在多条件训练下的平均单词错误率（WER）为3.8％，相对于最佳性能DNN而言，相对WER降低了65.1％基于预处理的方法，我们之前在清洁条件培训下提出了这种方法（Tu等人，2016年）。此外，提出的联合训练DNN框架在多条件训练下与最新系统相比可使WER降低13.2％。最后，在模拟的多方通话设置中，具有中等词汇量连续语音识别功能的《华尔街日报》（WSJ0）任务也验证了该方法的有效性。

著录项

来源
《Journal of signal processing systems for signal, image, and video technology》 |2018年第7期|963-973|共11页
作者
Yan-Hui Tu; Jun Du; Chin-Hui Lee;
展开▼
作者单位

University of Science and Technology of China;

University of Science and Technology of China;

Georgia Institute of Technology;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Multi-talker speech recognition; Speaker-dependent model; Single-channel speech separation; Deep neural networks; Joint training;

机译：多说话人语音识别;说话人相关模型;单通道语音分离;深度神经网络;联合训练;
入库时间 2022-08-18 01:35:22

相似文献

外文文献
中文文献
专利

1. Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition [J] . Weng Chao, Yu Dong, Seltzer Michael L., Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2015,第10期

机译：深度神经网络用于单通道多口语语音识别
2. Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training [J] . Narayanan A., Wang D. Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2015,第1期

机译：通过语音分离和联合自适应训练提高深度神经网络声学模型的鲁棒性
3. Ensemble of jointly trained deep neural network-based acoustic models for reverberant speech recognition [J] . Lee Moa, Lee Jeehye, Chang Joon-Hyuk Digital Signal Processing . 2019,第期

机译：混响语音识别的联合训练深神经网络声学模型的集合
4. A speaker-dependent deep learning approach to joint speech separation and acoustic modeling for multi-talker automatic speech recognition [C] . Yan-Hui Tu, Jun Du, Li-Rong Dai, International Symposium on Chinese Spoken Language Processing . 2016

机译：基于说话者的深度学习方法，用于多说话者自动语音识别的联合语音分离和声学建模
5. Dysarthric Speech Recognition and Offline Handwriting Recognition using Deep Neural Networks. [D] . Pillai, Suhas Balkrishna. 2017

机译：使用深度神经网络的表情异常语音识别和离线手写识别。
6. Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training [O] . Arun Narayanan, DeLiang Wang -1

机译：通过语音分离和联合自适应训练提高深度神经网络声学模型的鲁棒性
7. Improving robustness of deep neural network acoustic models via speech separation and joint adaptive training [O] . Arun Narayanan, DeLiang Wang 2014

机译：通过语音分离和联合自适应培训改善深神经网络声学模型的鲁棒性

A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech

摘要

著录项

相似文献

相关主题

期刊订阅