Investigating Joint CTC-Attention Models for End-to-End Russian Speech Recognition

机译：研究用于端到端俄语语音识别的联合CTC注意模型

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We propose an application of attention-based models for automatic recognition of continuous Russian speech. We experimented with three types of attention mechanism, data augmentation based on a tempo and pitch perturbations, and a beam search pruning method. Moreover we propose a using of sparsemax function for our task as a probability distribution generator for an attention mechanism. We experimented with a joint, CTC-Attention encoder-decoders using deep convo-lutional networks to compress input features or waveform spectrograms. Also we experimented with Highway LSTM model as an encoder. We performed experiments with a small dataset of Russian speech with total duration of more than 60 h. We got the recognition accuracy improvement by using proposed methods and showed better performance in terms of speech decoding speed using the beam search optimization method.

机译：我们提出了一种基于注意力的模型的应用，用于自动识别连续的俄语语音。我们尝试了三种注意机制，基于速度和音调扰动的数据增强，以及波束搜索修剪方法。此外，我们提出将sparsemax函数用于我们的任务，作为注意力机制的概率分布生成器。我们使用深层卷积网络对CTC注意编码器-联合解码器进行了实验，以压缩输入特征或波形频谱图。我们还以Highway LSTM模型作为编码器进行了实验。我们用一小段俄语语音数据集进行了实验，总持续时间超过60小时。通过使用所提出的方法，我们提高了识别精度，并且使用波束搜索优化方法在语音解码速度方面表现出更好的性能。

著录项

来源
《International Conference on Speech and Computer》|2019年|337-347|共11页
会议地点 Istanbul(TR)
作者
Nikita Markovnikov; Irina Kipyatkova;
展开▼
作者单位

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS) Saint-Petersburg Russia ITMO University Saint-Petersburg Russia;

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS) Saint-Petersburg Russia St. Petersburg State University of Aerospace Instrumentation (SUAI) St. Petersburg Russia;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
End-to-end models; Attention mechanism; Deep learning; Russian speech; Speech recognition;

机译：端到端模型；注意机制；深度学习；俄语演讲；语音识别;

相似文献

外文文献
中文文献
专利

1. Joint CTC-Attention End-to-End Speech Recognition with a Triangle Recurrent Neural Network Encoder [J] . ZHU Tao, CHENG Chunling 上海交通大学学报（英文版） . 2020,第001期
2. Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L) [J] . Odette Scharenborg, Louis ten Bosch, Lou Boves, The Journal of the Acoustical Society of America . 2003,第6期

机译：桥接自动语音识别和心理语言学：将候选清单扩展到人类语音识别的端到端模型（L）
3. An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition [J] . Bo Wu, Kehuang Li, Fengpei Ge, Selected Topics in Signal Processing, IEEE Journal of . 2017,第8期

机译：端到端深度学习方法可同时进行语音去混响和声学建模，以实现可靠的语音识别
4. Investigating Joint CTC-Attention Models for End-to-End Russian Speech Recognition [C] . Nikita Markovnikov, Irina Kipyatkova International Conference on Speech and Computer . 2019

机译：调查联合CTC关注模型，以实现端到端俄语演讲识别
5. End-to-End Speech Recognition Models. [D] . Chan, William. 2016

机译：端到端语音识别模型。
6. Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition [O] . Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, 2021

机译：用BPE-ropout进行动态声学单元增强用于低资源端到端语音识别
7. Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning [O] . Kim, Suyoun, Hori, Takaaki, Watanabe, Shinji 2017

机译：使用多任务的基于CTC注意的端到端语音识别学习

Investigating Joint CTC-Attention Models for End-to-End Russian Speech Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅