首页> 外文期刊>Computer speech and language >Hybrid-task learning for robust automatic speech recognition
【24h】

Hybrid-task learning for robust automatic speech recognition

机译:鲁棒自动语音识别的混合任务学习

获取原文
获取原文并翻译 | 示例

摘要

In order to properly train an automatic speech recognition system, speech with its annotated transcriptions is most often required. The amount of real annotated data recorded in noisy and reverberant conditions is extremely limited, especially compared to the amount of data than can be simulated by adding noise to clean annotated speech. Thus, using both real and simulated data is important in order to improve robust speech recognition, as this increases the amount and diversity of training data (thanks to the simulated data) while also benefiting from a reduced mismatch between training and operation of the system (thanks to the real data). Another promising method applied to speech recognition in noisy and reverberant conditions is multi-task learning. The idea is to train one acoustic model to solve simultaneously at least two tasks that are different but related, with speech recognition being the main task. A successful auxiliary task consists of generating clean speech features using a regression loss (as a denoising auto-encoder). This auxiliary task though uses as targets clean speech, which implies that real data cannot be used. In order to tackle this problem a Hybrid-Task Learning system is proposed. This system switches frequently between multi and single-task learning depending on whether the input is real or simulated data respectively. Having a hybrid architecture allows us to benefit from both real and simulated data while using a denoising auto-encoder as auxiliary task of a multi-task setup. We show that the relative improvement brought by the proposed hybrid-task learning architecture can reach up to 4.4% compared to the traditional single-task learning approach on the CHiME4 database. We also demonstrate the benefits of the hybrid approach compared to multi-task learning or adaptation.
机译:为了妥善培训自动语音识别系统,最常需要与其注释的转录进行语音。在嘈杂和混响条件下记录的真实注释数据的量非常有限,特别是与数据量相比,这些数据量比可以通过添加噪声来清洁的注释语音来模拟。因此,使用真实和模拟数据对于提高强大的语音识别,使用这两个数据都是重要的,因为这增加了训练数据的量和多样性(由于模拟数据),同时也受益于系统的训练和操作之间的减少不匹配(感谢真实数据)。在嘈杂和混响条件下应用于语音识别的另一种有希望的方法是多任务学习。这个想法是训练一个声学模型,同时解决至少两个不同但相关的任务,语音识别是主要任务。一个成功的辅助任务包括使用回归损耗(作为去噪自动编码器)生成清洁语音功能。此辅助任务虽然用作目标清洁语音,这意味着无法使用实际数据。为了解决这个问题,提出了混合任务学习系统。该系统经常在多任务学习之间切换,这取决于输入是否分别是真实的或模拟数据。具有混合体系结构允许我们从真实和模拟数据中受益,同时使用Denoising自动编码器作为多任务设置的辅助任务。我们表明,与Chime4数据库上的传统单任务学习方法相比,所提出的混合任务学习架构带来的相对改善可以达到高达4.4%。与多任务学习或适应相比,我们还展示了混合方法的好处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号