首页> 外文会议>Annual conference on Neural Information Processing Systems >Attention-Based Models for Speech Recognition
【24h】

Attention-Based Models for Speech Recognition

机译:基于注意力的语音识别模型

获取原文

摘要

Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks including machine translation, handwriting synthesis and image caption generation. We extend the attention-mechanism with features needed for speech recognition. We show that while an adaptation of the model used for machine translation in reaches a competitive 18.7% phoneme error rate (PER) on the TIMIT phoneme recognition task, it can only be applied to utterances which are roughly as long as the ones it was trained on. We offer a qualitative explanation of this failure and propose a novel and generic method of adding location-awareness to the attention mechanism to alleviate this issue. The new method yields a model that is robust to long inputs and achieves 18% PER in single utterances and 20% in 10-times longer (repeated) utterances. Finally, we propose a change to the attention mechanism that prevents it from concentrating too much on single frames, which further reduces PER to 17.6% level.
机译:最近,通过注意力机制以输入数据为条件的循环序列生成器在包括机器翻译,手写合成和图像标题生成在内的一系列任务上显示出非常好的性能。我们通过语音识别所需的功能扩展了注意力机制。我们显示,虽然对机器翻译模型的改编在TIMIT音素识别任务上达到了具有竞争性的18.7%音素错误率(PER),但它只能应用于大约与训练时一样长的发音在。我们对此失败进行定性解释,并提出了一种新颖的通用方法,将位置感知功能添加到注意力机制中,以缓解此问题。新方法产生的模型对长时间输入具有鲁棒性,单次发声可实现18%的市盈率,而更长的十倍(重复)发声可实现20%的市盈率。最后,我们建议对注意力机制进行更改,以防止注意力过多地集中在单个帧上,从而将PER进一步降低到17.6%的水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号