Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition

Weng Chao; Yu Dong; Seltzer Michael L.; Droppo Jasha

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition

【24h】

Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition

机译：深度神经网络用于单通道多口语语音识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We investigate techniques based on deep neural networks (DNNs) for attacking the single-channel multi-talker speech recognition problem. Our proposed approach contains five key ingredients: a multi-style training strategy on artificially mixed speech data, a separate DNN to estimate senone posterior probabilities of the louder and softer speakers at each frame, a weighted finite-state transducer (WFST)-based two-talker decoder to jointly estimate and correlate the speaker and speech, a speaker switching penalty estimated from the energy pattern change in the mixed-speech, and a confidence based system combination strategy. Experiments on the 2006 speech separation and recognition challenge task demonstrate that our proposed DNN-based system has remarkable noise robustness to the interference of a competing speaker. The best setup of our proposed systems achieves an average word error rate (WER) of 18.8% across different SNRs and outperforms the state-of-the-art IBM superhuman system by 2.8% absolute with fewer assumptions.

机译：我们研究基于深度神经网络（DNN）的技术，用于攻击单通道多通话者语音识别问题。我们提出的方法包含五个关键要素：针对人为混合语音数据的多样式训练策略，用于估计每帧声音较大和较柔和的扬声器的后音概率的单独DNN，基于加权有限状态换能器（WFST）的两个-说话者解码器，以共同估计和关联说话者和语音，从混合语音中的能量模式变化估计说话者切换代价，以及基于置信度的系统组合策略。 2006年语音分离和识别挑战任务的实验表明，我们提出的基于DNN的系统具有出色的噪声鲁棒性，可防止竞争对手说话者的干扰。我们提出的系统的最佳设置在不同的SNR情况下可实现18.8％的平均单词错误率（WER），并且在较少的假设下，比最新的IBM超人系统的绝对错误率高出2.8％。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2015年第10期|1670-1679|共10页
作者
Weng Chao; Yu Dong; Seltzer Michael L.; Droppo Jasha;
展开▼
作者单位

Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep neural network (DNN); joint decoding; multi-talker automatic speech recognition (ASR); noise robustness; single-channel; weighted finite-state transducer (WFST);

机译：深层神经网络（DNN）;联合解码;多通话者自动语音识别（ASR）;噪声鲁棒性;单通道;加权有限状态传感器（WFST）;

相似文献

外文文献
中文文献
专利

1. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
2. Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation [J] . Saleem Nasir, Khattak Muhammad Irfan Applied Acoustics . 2020,第Octa期

机译：基于深度神经网络的单通道扬声器独立多讲车语音分离二进制分类
3. A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks [J] . Jun Du, Yanhui Tu, Li-Rong Dai, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2016,第8期

机译：高分辨率深度神经网络的单通道语音分离的回归方法
4. Single-channel mixed speech recognition using deep neural networks [C] . Weng Chao, Yu Dong, Seltzer Michael L., IEEE International Conference on Acoustics, Speech and Signal Processing . 2014

机译：使用深度神经网络的单通道混合语音识别
5. Dysarthric Speech Recognition and Offline Handwriting Recognition using Deep Neural Networks. [D] . Pillai, Suhas Balkrishna. 2017

机译：使用深度神经网络的表情异常语音识别和离线手写识别。
6. Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT [O] . Doroteo T. Toledano, María Pilar Fernández-Gallego, Alicia Lozano-Diez 2012

机译：基于深度神经网络的自动语音识别的多分辨率语音分析：TIMIT实验
7. Single-Channel Multi-talker Speech Recognition with Permutation Invariant Training [O] . Qian, Yanmin, Chang, Xuankai, Yu, Dong 2017

机译：具有置换的单通道多语音语音识别不变训练

Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅