首页> 外文会议>Asilomar Conference on Signals, Systems, and Computers >A Review of On-Device Fully Neural End-to-End Automatic Speech Recognition Algorithms
【24h】

A Review of On-Device Fully Neural End-to-End Automatic Speech Recognition Algorithms

机译:综述设备上完全神经元端到端自动语音识别算法

获取原文

摘要

In this paper, we review various end-to-end automatic speech recognition algorithms and their optimization techniques for on-device applications. Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite State Transducer (WFST), and so on. To obtain sufficiently high speech recognition accuracy with such conventional speech recognition systems, a very large language model (up to 100 GB) is usually needed. Hence, the corresponding WFST size becomes enormous, which prohibits their on-device implementation. Recently, fully neural network end-to-end speech recognition algorithms have been proposed. Examples include speech recognition systems based on Connectionist Temporal Classification (CTC), Recurrent Neural Network Transducer (RNN-T), Attention-based Encoder-Decoder models (AED), Monotonic Chunk-wise Attention (MoChA), transformer-based speech recognition systems, and so on. These fully neural network-based systems require much smaller memory footprints compared to conventional algorithms, therefore their on-device implementation has become feasible. In this paper, we review such end-to-end speech recognition models. We extensively discuss their structures, performance, and advantages compared to conventional algorithms.
机译:在本文中,我们审查了各种端到端的自动语音识别算法及其用于设备上应用的优化技术。传统的语音识别系统包括大量离散组件,例如声学模型,语言模型,发音模型,文本归阵者,逆文本归阵者,基于加权有限状态换能器(WFST)的解码器,以及很快。为了获得具有这种传统语音识别系统的充分高音识别精度,通常需要非常大的语言模型(最多100 GB)。因此,相应的WFST尺寸变得巨大,其禁止其在设备上实现。最近,已经提出了完全神经网络端到端语音识别算法。示例包括基于连接主人时间分类(CTC)的语音识别系统,经常性神经网络传感器(RNN-T),基于关注的编码器 - 解码器模型(AED),单调块,注意力(MOCHA),基于变压器的语音识别系统, 等等。与传统算法相比,这些全神经网络的基于网络的系统需要更小的内存占用空间,因此它们的设备的设备变得可行。在本文中,我们审查了这种端到端的语音识别模型。与传统算法相比,我们广泛地讨论了它们的结构,性能和优点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号