End-to-end speech recognition for languages with ideographic characters

机译：具有表意字符的语言的端到端语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes a novel training method for acoustic models using connectionist temporal classification (CTC) for Japanese end-to-end automatic speech recognition (ASR). End-to-end ASR can estimate characters directly without using a pronunciation dictionary; however, this approach was conducted mostly in the English research area. When dealing with languages such as Japanese, we confront difficulties with robust acoustic modeling. One of the issues is caused by a large number of characters, including Japanese kanji, which leads to an increase in the number of model parameters. Additionally, multiple pronunciations of kanji increase the variance of acoustic features for corresponding characters. Therefore, we propose end-to-end ASR based on bi-directional long short-term memory (BLSTM) networks to solve these problems. Our proposal involves two approaches: reducing the number of dimensions of BLSTM and adding character strings to output layer labels. Dimensional compression decreases the number of parameters, while output label expansion reduces the variance of acoustic features. Consequently, we could obtain a robust model with a small number of parameters. Our experimental results with Japanese broadcast programs show the combined method of these two approaches improved the word error rate significantly compared with the conventional character-based end-to-end approach.

机译：本文介绍了一种新的针对声学模型的训练方法，该方法使用连接器时间分类（CTC）进行日语端到端自动语音识别（ASR）。端到端ASR可以直接估计字符，而无需使用发音词典。但是，这种方法主要是在英语研究领域中进行的。在处理诸如日语之类的语言时，我们在稳健的声学建模方面面临困难。问题之一是由包括日文汉字在内的大量字符引起的，这导致模型参数的数量增加。另外，汉字的多个发音增加了对应字符的声学特征的变化。因此，我们提出了基于双向长短期记忆（BLSTM）网络的端到端ASR来解决这些问题。我们的建议涉及两种方法：减少BLSTM的维数，以及向输出层标签添加字符串。尺寸压缩减少了参数的数量，而输出标签的扩展减少了声学特征的变化。因此，我们可以获得具有少量参数的鲁棒模型。我们对日语广播节目的实验结果表明，与传统的基于字符的端到端方法相比，这两种方法的组合方法显着提高了字错误率。

著录项

来源
《Asia-Pacific Signal and Information Processing Association Annual Summit and Conference》|2017年|1228-1232|共5页
会议地点
作者
Hitoshi Ito; Aiko Hagiwara; Manon Ichiki; Takeshi Mishima; Shoei Sato; Akio Kobayashi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Acoustics; Hidden Markov models; Training; Matrix decomposition; Mathematical model; Robustness; Bidirectional control;

机译：声学;隐马尔可夫模型;训练;矩阵分解;数学模型;稳健性;双向控制;
入库时间 2022-08-26 15:24:56

相似文献

外文文献
中文文献
专利

1. Syllable language models for Mandarin speech recognition: Exploiting character language models [J] . Liu X., Hieronymus J.L., Gales M.J.F., The Journal of the Acoustical Society of America . 2013,第1期

机译：普通话语音识别的音节语言模型：利用字符语言模型
2. End-to-End Multilingual Speech Recognition System with Language Supervision Training [J] . Danyang LIU, Ji XU, Pengyuan ZHANG IEICE transactions on information and systems . 2020,第6期

机译：具有语言监督培训的端到端多语言语音识别系统
3. Ghost Character Recognition Theory and Arabie Script Based Languages Character Recognition [J] . Muhammad Imran RAZZAK, Abdulrahman A. MIRZA Przeglad Elektrotechniczny . 2011,第11期

机译：鬼字符识别理论和基于阿拉伯脚本的语言字符识别
4. End-to-end speech recognition for languages with ideographic characters [C] . Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . 2017

机译：以表征字符的语言结束于结束语音识别
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition [O] . Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, 2021

机译：用BPE-ropout进行动态声学单元增强用于低资源端到端语音识别
7. End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition [O] . Inyoung Hwang, Joon-Hyuk Chang 2020

机译：利用声学和语言建模知识进行在线低延迟语音识别的端到端语音端点检测

End-to-end speech recognition for languages with ideographic characters

摘要

著录项

相似文献

相关主题

期刊订阅