首页> 外文会议>International Conference on Speech and Computer >Investigating Joint CTC-Attention Models for End-to-End Russian Speech Recognition
【24h】

Investigating Joint CTC-Attention Models for End-to-End Russian Speech Recognition

机译:研究用于端到端俄语语音识别的联合CTC注意模型

获取原文

摘要

We propose an application of attention-based models for automatic recognition of continuous Russian speech. We experimented with three types of attention mechanism, data augmentation based on a tempo and pitch perturbations, and a beam search pruning method. Moreover we propose a using of sparsemax function for our task as a probability distribution generator for an attention mechanism. We experimented with a joint, CTC-Attention encoder-decoders using deep convo-lutional networks to compress input features or waveform spectrograms. Also we experimented with Highway LSTM model as an encoder. We performed experiments with a small dataset of Russian speech with total duration of more than 60 h. We got the recognition accuracy improvement by using proposed methods and showed better performance in terms of speech decoding speed using the beam search optimization method.
机译:我们提出了一种基于注意力的模型的应用,用于自动识别连续的俄语语音。我们尝试了三种注意机制,基于速度和音调扰动的数据增强,以及波束搜索修剪方法。此外,我们提出将sparsemax函数用于我们的任务,作为注意力机制的概率分布生成器。我们使用深层卷积网络对CTC注意编码器-联合解码器进行了实验,以压缩输入特征或波形频谱图。我们还以Highway LSTM模型作为编码器进行了实验。我们用一小段俄语语音数据集进行了实验,总持续时间超过60小时。通过使用所提出的方法,我们提高了识别精度,并且使用波束搜索优化方法在语音解码速度方面表现出更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号