Attention-Based Models for Speech Recognition

机译：基于注意力的语音识别模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks including machine translation, handwriting synthesis and image caption generation. We extend the attention-mechanism with features needed for speech recognition. We show that while an adaptation of the model used for machine translation in reaches a competitive 18.7% phoneme error rate (PER) on the TIMIT phoneme recognition task, it can only be applied to utterances which are roughly as long as the ones it was trained on. We offer a qualitative explanation of this failure and propose a novel and generic method of adding location-awareness to the attention mechanism to alleviate this issue. The new method yields a model that is robust to long inputs and achieves 18% PER in single utterances and 20% in 10-times longer (repeated) utterances. Finally, we propose a change to the attention mechanism that prevents it from concentrating too much on single frames, which further reduces PER to 17.6% level.

机译：最近，通过注意力机制以输入数据为条件的循环序列生成器在包括机器翻译，手写合成和图像标题生成在内的一系列任务上显示出非常好的性能。我们通过语音识别所需的功能扩展了注意力机制。我们显示，虽然对机器翻译模型的改编在TIMIT音素识别任务上达到了具有竞争性的18.7％音素错误率（PER），但它只能应用于大约与训练时一样长的发音在。我们对此失败进行定性解释，并提出了一种新颖的通用方法，将位置感知功能添加到注意力机制中，以缓解此问题。新方法产生的模型对长时间输入具有鲁棒性，单次发声可实现18％的市盈率，而更长的十倍（重复）发声可实现20％的市盈率。最后，我们建议对注意力机制进行更改，以防止注意力过多地集中在单个帧上，从而将PER进一步降低到17.6％的水平。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2015年|577-585|共9页
会议地点
作者
Jan Chorowski; Dzmitry Bahdanau; Dmitriy Serdyuk; Kyunghyun Cho; Yoshua Bengio;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Towards Understanding Attention-Based Speech Recognition Models [J] . Qin Chu-Xiong, Qu Dan Quality Control, Transactions . 2020,第期

机译：了解基于关注的语音识别模型
2. Attention-Based Convolution Skip Bidirectional Long Short-Term Memory Network for Speech Emotion Recognition [J] . Huiyun Zhang, Heming Huang, Henry Han Quality Control, Transactions . 2021,第1期

机译：基于注意力的卷积跳过双向长期短期记忆网络，用于语音情感识别
3. A novel dual attention-based BLSTM with hybrid features in speech emotion recognition [J] . Qiupu Chen, Guimin Huang Engineering Applications of Artificial Intelligence . 2021,第Juna期

机译：一种基于新的双重关注的BLSTM，语音情感识别中的混合特征
4. Streaming Attention-Based Models with Augmented Memory for End-To-End Speech Recognition [C] . Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Spoken Language Technology Workshop . 2021

机译：利用基于关注的模型，为端到端语音识别的增强内存
5. Speech recognition: The interpretation of training and using speech recognition software from the perspectives of postsecondary students with learning challenges. [D] . Soenksen, Delann. 2006

机译：语音识别：从具有学习挑战的大专学生的角度解释培训和使用语音识别软件的解释。
6. An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records [O] . Luqi Li, Jie Zhao, Li Hou, 2019

机译：基于注意的深度学习模型用于电子病历临床命名实体识别
7. Towards Understanding Attention-Based Speech Recognition Models [O] . Chu-Xiong Qin, Dan Qu 2020

机译：了解基于关注的语音识别模型

Attention-Based Models for Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅