首页> 外文会议>Asia-Pacific Signal and Information Processing Association Annual Summit and Conference >End-to-end speech recognition for languages with ideographic characters
【24h】

End-to-end speech recognition for languages with ideographic characters

机译:具有表意字符的语言的端到端语音识别

获取原文

摘要

This paper describes a novel training method for acoustic models using connectionist temporal classification (CTC) for Japanese end-to-end automatic speech recognition (ASR). End-to-end ASR can estimate characters directly without using a pronunciation dictionary; however, this approach was conducted mostly in the English research area. When dealing with languages such as Japanese, we confront difficulties with robust acoustic modeling. One of the issues is caused by a large number of characters, including Japanese kanji, which leads to an increase in the number of model parameters. Additionally, multiple pronunciations of kanji increase the variance of acoustic features for corresponding characters. Therefore, we propose end-to-end ASR based on bi-directional long short-term memory (BLSTM) networks to solve these problems. Our proposal involves two approaches: reducing the number of dimensions of BLSTM and adding character strings to output layer labels. Dimensional compression decreases the number of parameters, while output label expansion reduces the variance of acoustic features. Consequently, we could obtain a robust model with a small number of parameters. Our experimental results with Japanese broadcast programs show the combined method of these two approaches improved the word error rate significantly compared with the conventional character-based end-to-end approach.
机译:本文介绍了一种新的针对声学模型的训练方法,该方法使用连接器时间分类(CTC)进行日语端到端自动语音识别(ASR)。端到端ASR可以直接估计字符,而无需使用发音词典。但是,这种方法主要是在英语研究领域中进行的。在处理诸如日语之类的语言时,我们在稳健的声学建模方面面临困难。问题之一是由包括日文汉字在内的大量字符引起的,这导致模型参数的数量增加。另外,汉字的多个发音增加了对应字符的声学特征的变化。因此,我们提出了基于双向长短期记忆(BLSTM)网络的端到端ASR来解决这些问题。我们的建议涉及两种方法:减少BLSTM的维数,以及向输出层标签添加字符串。尺寸压缩减少了参数的数量,而输出标签的扩展减少了声学特征的变化。因此,我们可以获得具有少量参数的鲁棒模型。我们对日语广播节目的实验结果表明,与传统的基于字符的端到端方法相比,这两种方法的组合方法显着提高了字错误率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号