Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus

机译：基于注意的大型语音语料库基于设备的流式语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a new on-device automatic speech recognition (ASR) system based on monotonic chunk-wise attention (MoChA) models trained with large (> 10K hours) corpus. We attained around 90% of a word recognition rate for general domain mainly by using joint training of connectionist temporal classifier (CTC) and cross entropy (CE) losses, minimum word error rate (MWER) training, layer-wise pretraining and data augmentation methods. In addition, we compressed our models by more than 3.4 times smaller using an iterative hyper low-rank approximation (LRA) method while minimizing the degradation in recognition accuracy. The memory footprint was further reduced with 8-bit quantization to bring down the final model size to lower than 39 MB. For on-demand adaptation, we fused the MoChA models with statistical n-gram models, and we could achieve a relatively 36% improvement on average in word error rate (WER) for target domains including the general domain.

机译：在本文中，我们介绍了一种基于单调块的内容自动语音识别（ASR）系统（MoCha）模型，培训大（> 10k小时）语料库。我们主要通过使用联合培训的连接人颞级分类器（CTC）和跨熵（CE）损失，最小字错误率（MWER）训练，层面预先预订和数据增强方法的联合培训来达到大约90％的识别率。此外，我们使用迭代超低秩近似（LRA）方法，通过迭代超低秩近似（LRA）方法来压缩我们的模型比较小3.4倍，同时最小化识别精度的劣化。通过8位量化进一步减少了内存占用，以降低最终模型尺寸至低于39 MB。对于按需适应，我们将Mocha模型与统计n-gram模型融合，我们可以在包括常规域内的目标域的单词错误率（WER）中平均达到相对36％的改进。

著录项

来源
《IEEE Automatic Speech Recognition and Understanding Workshop》|2019年|956-963|共8页
会议地点
作者
Kwangyoun Kim; Kyungmin Lee; Dhananjaya Gowda; Junmo Park; Sungsoo Kim; Sichen Jin; Young-Yoon Lee; Jinsu Yeo; Daehyun Kim; Seokyeong Jung; Jungin Lee; Myoungji Han; Chanwoo Kim;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Hidden Markov models; Training; Mathematical model; Decoding; Speech recognition; Computational modeling; Context modeling;

机译：隐马尔可夫模型;训练;数学模型;解码;语音识别;计算模型;上下文建模;

相似文献

外文文献
中文文献
专利

1. Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus [J] . Mohammad Abushariah, Raja Ainon, Roziati Zainuddin, The international arab journal of information technology . 2012,第1期

机译：基于语音丰富均衡的语料库的阿拉伯语独立于说话人的连续自动语音识别
2. Chhattisgarhi speech corpus for research and development in automatic speech recognition [J] . Narendra D. Londhe, Ghanahshyam B. Kshirsagar International journal of speech technology . 2018,第2期

机译：Chhattisgarhi语音语料库，用于自动语音识别的研究与开发
3. An automatic speech recognition system for spontaneous Punjabi speech corpus [J] . Yogesh Kumar, Navdeep Singh International journal of speech technology . 2017,第2期

机译：自发旁遮普语语料库的自动语音识别系统
4. Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus [C] . Kwangyoun Kim, Kyungmin Lee, Dhananjaya Gowda, IEEE Automatic Speech Recognition and Understanding Workshop . 2019

机译：基于大语音语料库的设备流媒体语音识别
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition [O] . Minji Seo, Myungho Kim 2020

机译：融合视觉关注CNN和跨语料语音情感识别的视觉词语
7. Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus [O] . Kwangyoun Kim, Kyungmin Lee, Dhananjaya Gowda, 2019

机译：基于大语音语料库的设备流媒体语音识别
8. Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004 [R] . Martin, A., Miller, D., Przybocki, M., 2004

机译：2004年NIsT演讲者认可评估的会话电话语音语料库集

Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus

摘要

著录项

相似文献

相关主题

期刊订阅