首页> 外文会议>IEEE Automatic Speech Recognition and Understanding Workshop >Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus
【24h】

Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus

机译:基于注意的大型语音语料库基于设备的流式语音识别

获取原文

摘要

In this paper, we present a new on-device automatic speech recognition (ASR) system based on monotonic chunk-wise attention (MoChA) models trained with large (> 10K hours) corpus. We attained around 90% of a word recognition rate for general domain mainly by using joint training of connectionist temporal classifier (CTC) and cross entropy (CE) losses, minimum word error rate (MWER) training, layer-wise pretraining and data augmentation methods. In addition, we compressed our models by more than 3.4 times smaller using an iterative hyper low-rank approximation (LRA) method while minimizing the degradation in recognition accuracy. The memory footprint was further reduced with 8-bit quantization to bring down the final model size to lower than 39 MB. For on-demand adaptation, we fused the MoChA models with statistical n-gram models, and we could achieve a relatively 36% improvement on average in word error rate (WER) for target domains including the general domain.
机译:在本文中,我们介绍了一种基于单调块的内容自动语音识别(ASR)系统(MoCha)模型,培训大(> 10k小时)语料库。我们主要通过使用联合培训的连接人颞级分类器(CTC)和跨熵(CE)损失,最小字错误率(MWER)训练,层面预先预订和数据增强方法的联合培训来达到大约90%的识别率。此外,我们使用迭代超低秩近似(LRA)方法,通过迭代超低秩近似(LRA)方法来压缩我们的模型比较小3.4倍,同时最小化识别精度的劣化。通过8位量化进一步减少了内存占用,以降低最终模型尺寸至低于39 MB。对于按需适应,我们将Mocha模型与统计n-gram模型融合,我们可以在包括常规域内的目标域的单词错误率(WER)中平均达到相对36%的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号