首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition
【24h】

Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition

机译:深度神经网络用于单通道多口语语音识别

获取原文
获取原文并翻译 | 示例
           

摘要

We investigate techniques based on deep neural networks (DNNs) for attacking the single-channel multi-talker speech recognition problem. Our proposed approach contains five key ingredients: a multi-style training strategy on artificially mixed speech data, a separate DNN to estimate senone posterior probabilities of the louder and softer speakers at each frame, a weighted finite-state transducer (WFST)-based two-talker decoder to jointly estimate and correlate the speaker and speech, a speaker switching penalty estimated from the energy pattern change in the mixed-speech, and a confidence based system combination strategy. Experiments on the 2006 speech separation and recognition challenge task demonstrate that our proposed DNN-based system has remarkable noise robustness to the interference of a competing speaker. The best setup of our proposed systems achieves an average word error rate (WER) of 18.8% across different SNRs and outperforms the state-of-the-art IBM superhuman system by 2.8% absolute with fewer assumptions.
机译:我们研究基于深度神经网络(DNN)的技术,用于攻击单通道多通话者语音识别问题。我们提出的方法包含五个关键要素:针对人为混合语音数据的多样式训练策略,用于估计每帧声音较大和较柔和的扬声器的后音概率的单独DNN,基于加权有限状态换能器(WFST)的两个-说话者解码器,以共同估计和关联说话者和语音,从混合语音中的能量模式变化估计说话者切换代价,以及基于置信度的系统组合策略。 2006年语音分离和识别挑战任务的实验表明,我们提出的基于DNN的系统具有出色的噪声鲁棒性,可防止竞争对手说话者的干扰。我们提出的系统的最佳设置在不同的SNR情况下可实现18.8%的平均单词错误率(WER),并且在较少的假设下,比最新的IBM超人系统的绝对错误率高出2.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号