A speaker-dependent deep learning approach to joint speech separation and acoustic modeling for multi-talker automatic speech recognition

机译：基于说话者的深度学习方法，用于多说话者自动语音识别的联合语音分离和声学建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a novel speaker-dependent (SD) approach to joint training of deep neural networks (DNNs) with an explicit speech separation structure for multi-talker speech recognition in a single-channel setting. First, a multi-condition training strategy is designed for a SD-DNN recognizer in multi-talker scenarios, which can significantly reduce the decoding runtime and improve the recognition accuracy over the approaches that use speaker-independent DNN models with a complicated joint decoding framework. In addition, a SD regression DNN for mapping the acoustic features of mixed speech to the speech features of a target speaker is jointly trained with the SD recognition DNN for acoustic modeling. Our experiments on the Speech Separation Challenge (SSC) task show that the proposed SD recognition system under multi-condition training achieves an average word error rate (WER) of 3.8%, yielding a relative WER reduction of 65.1% from the proposed DNN preprocessing approach under clean-condition training [1]. Furthermore, the jointly trained DNN system generates a relative WER reduction of 13.2% from the state-of-the-art systems under multi-condition training.

机译：我们提出了一种新颖的基于说话者的（SD）方法，用于深度神经网络（DNN）的联合训练，具有显式的语音分离结构，可在单通道设置中进行多方对话者语音识别。首先，针对多说话者场景中的SD-DNN识别器设计了一种多条件训练策略，与使用独立于说话者的DNN模型和复杂的联合解码框架的方法相比，该方法可以显着减少解码时间并提高识别精度。另外，将用于将混合语音的声学特征映射到目标说话者的语音特征的SD回归DNN与用于声学建模的SD识别DNN一起进行训练。我们对语音分离挑战（SSC）任务的实验表明，所提出的SD识别系统在多条件训练下的平均单词错误率（WER）为3.8％，与所提出的DNN预处理方法相比，相对WER降低了65.1％在清洁条件下接受培训[1]。此外，经过联合训练的DNN系统在多条件训练下与最新系统相比，相对WER降低了13.2％。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2016年|1-5|共5页
会议地点
作者
Yan-Hui Tu; Jun Du; Li-Rong Dai; Chin-Hui Lee;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Speech; Speech recognition; Hidden Markov models; Acoustics; Target recognition; Silicon;

机译：训练;语音;语音识别;隐马尔可夫模型;声学;目标识别;硅;

相似文献

外文文献
中文文献
专利

1. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
2. An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition [J] . Bo Wu, Kehuang Li, Fengpei Ge, Selected Topics in Signal Processing, IEEE Journal of . 2017,第8期

机译：端到端深度学习方法可同时进行语音去混响和声学建模，以实现可靠的语音识别
3. A Speaker-Dependent Approach to Separation of Far-Field Multi-Talker Microphone Array Speech for Front-End Processing in the CHiME-5 Challenge [J] . Sun Lei, Du Jun, Gao Tian, Selected Topics in Signal Processing, IEEE Journal of . 2019,第4期

机译：CHiME-5挑战中用于前端处理的远场多方麦克风阵列语音分离的扬声器相关方法
4. A speaker-dependent deep learning approach to joint speech separation and acoustic modeling for multi-talker automatic speech recognition [C] . Yan-Hui Tu, Jun Du, Li-Rong Dai, International Symposium on Chinese Spoken Language Processing . 2016

机译：一种扬声器依赖性深入学习方法，用于多讲车自动语音识别的联合语音分离和声学建模
5. Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition. [D] . Liu, Yuzong. 2016

机译：用于自动语音识别的声学建模中基于图的半监督学习。
6. Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training [O] . Arun Narayanan, DeLiang Wang -1

机译：通过语音分离和联合自适应训练提高深度神经网络声学模型的鲁棒性
7. Improving robustness of deep neural network acoustic models via speech separation and joint adaptive training [O] . Arun Narayanan, DeLiang Wang 2014

机译：通过语音分离和联合自适应培训改善深神经网络声学模型的鲁棒性

A speaker-dependent deep learning approach to joint speech separation and acoustic modeling for multi-talker automatic speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅