首页> 外文会议>IEEE Automatic Speech Recognition and Understanding Workshop >MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition
【24h】

MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition

机译:MIMO-Speech:端到端多通道多扬声器语音识别

获取原文

摘要

Recently, the end-to-end approach has proven its efficacy in monaural multi-speaker speech recognition. However, high word error rates (WERs) still prevent these systems from being used in practical applications. On the other hand, the spatial information in multi-channel signals has proven helpful in far-field speech recognition tasks. In this work, we propose a novel neural sequence-to-sequence (seq2seq) architecture, MIMO-Speech, which extends the original seq2seq to deal with multi-channel input and multi-channel output so that it can fully model multi-channel multi-speaker speech separation and recognition. MIMO-Speech is a fully neural end-to-end framework, which is optimized only via an ASR criterion. It is comprised of: 1) a monaural masking network, 2) a multi-source neural beamformer, and 3) a multi-output speech recognition model. With this processing, the input overlapped speech is directly mapped to text sequences. We further adopted a curriculum learning strategy, making the best use of the training set to improve the performance. The experiments on the spatialized wsj1-2mix corpus show that our model can achieve more than 60% WER reduction compared to the single-channel system with high quality enhanced signals (SI-SDR = 23.1 dB) obtained by the above separation function.
机译:最近,端到端方法已证明其在单声道多说话者语音识别中的功效。但是,高字错误率(WER)仍然阻止这些系统在实际应用中使用。另一方面,事实证明,多通道信号中的空间信息对于远场语音识别任务很有帮助。在这项工作中,我们提出了一种新颖的神经序列到序列(seq2seq)架构MIMO-Speech,该体系结构扩展了原始seq2seq以处理多通道输入和多通道输出,从而可以完全建模多通道多-说话者语音分离和识别。 MIMO-Speech是一个完全神经的端到端框架,仅通过ASR标准对其进行了优化。它包括:1)单声道掩蔽网络,2)多源神经束形成器,以及3)多输出语音识别模型。通过该处理,将输入的重叠语音直接映射到文本序列。我们进一步采用了课程学习策略,充分利用了培训内容来提高绩效。在空间化wsj1-2mix语料库上进行的实验表明,与通过上述分离功能获得的具有高质量增强信号(SI-SDR = 23.1 dB)的单通道系统相比,我们的模型可以实现60%以上的WER降低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号