A Review of On-Device Fully Neural End-to-End Automatic Speech Recognition Algorithms

机译：综述设备上完全神经元端到端自动语音识别算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we review various end-to-end automatic speech recognition algorithms and their optimization techniques for on-device applications. Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite State Transducer (WFST), and so on. To obtain sufficiently high speech recognition accuracy with such conventional speech recognition systems, a very large language model (up to 100 GB) is usually needed. Hence, the corresponding WFST size becomes enormous, which prohibits their on-device implementation. Recently, fully neural network end-to-end speech recognition algorithms have been proposed. Examples include speech recognition systems based on Connectionist Temporal Classification (CTC), Recurrent Neural Network Transducer (RNN-T), Attention-based Encoder-Decoder models (AED), Monotonic Chunk-wise Attention (MoChA), transformer-based speech recognition systems, and so on. These fully neural network-based systems require much smaller memory footprints compared to conventional algorithms, therefore their on-device implementation has become feasible. In this paper, we review such end-to-end speech recognition models. We extensively discuss their structures, performance, and advantages compared to conventional algorithms.

机译：在本文中，我们审查了各种端到端的自动语音识别算法及其用于设备上应用的优化技术。传统的语音识别系统包括大量离散组件，例如声学模型，语言模型，发音模型，文本归阵者，逆文本归阵者，基于加权有限状态换能器（WFST）的解码器，以及很快。为了获得具有这种传统语音识别系统的充分高音识别精度，通常需要非常大的语言模型（最多100 GB）。因此，相应的WFST尺寸变得巨大，其禁止其在设备上实现。最近，已经提出了完全神经网络端到端语音识别算法。示例包括基于连接主人时间分类（CTC）的语音识别系统，经常性神经网络传感器（RNN-T），基于关注的编码器 - 解码器模型（AED），单调块，注意力（MOCHA），基于变压器的语音识别系统，等等。与传统算法相比，这些全神经网络的基于网络的系统需要更小的内存占用空间，因此它们的设备的设备变得可行。在本文中，我们审查了这种端到端的语音识别模型。与传统算法相比，我们广泛地讨论了它们的结构，性能和优点。

著录项

来源
《Asilomar Conference on Signals, Systems, and Computers》|2020年|277-283|共7页
会议地点
作者
Chanwoo Kim; Dhananjaya Gowda; Dongsoo Lee; Jiyeon Kim; Ankur Kumar; Sungsoo Kim; Abhinav Garg; Changwoo Han;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Transducers; Recurrent neural networks; Quantization (signal); Program processors; Computational modeling; Speech recognition; Classification algorithms;

机译：传感器;经常性神经网络;量化（信号）;程序处理器;计算建模;语音识别;分类算法;

相似文献

外文文献
中文文献
专利

1. Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L) [J] . Odette Scharenborg, Louis ten Bosch, Lou Boves, The Journal of the Acoustical Society of America . 2003,第6期

机译：桥接自动语音识别和心理语言学：将候选清单扩展到人类语音识别的端到端模型（L）
2. An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition [J] . Kadetotad Deepak, Yin Shihui, Berisha Visar, IEEE Journal of Solid-State Circuits . 2020,第7期

机译：8.93个顶部/ W LSTM经常性神经网络加速器，具有用于设备的分层粗粒稀疏性，用于设备上的语音识别
3. Automatic Speech Recognition from Neural Signals: A Focused Review [J] . Christian Herff, Tanja Schultz Frontiers in Neuroscience . 2016,第2009期

机译：来自神经信号的自动语音识别：重点综述
4. On-Device End-to-end Speech Recognition with Multi-Step Parallel Rnns [C] . Yoonho Boo, Jinhwan Park, Lukas Lee, 2018 IEEE Spoken Language Technology Workshop . 2018

机译：具有多步并行Rnns的设备端到端语音识别
5. Robust automatic speech recognition algorithms for dealing with noise and accent. [D] . You, Hong. 2009

机译：强大的自动语音识别算法，可处理噪音和重音。
6. Automatic Speech Recognition from Neural Signals: A Focused Review [O] . Christian Herff, Tanja Schultz 2016

机译：来自神经信号的自动语音识别：重点综述
7. A Review of On-Device Fully Neural End-to-End Automatic Speech Recognition Algorithms [O] . Chanwoo Kim, Dhananjaya Gowda, Dongsoo Lee, 2020

机译：对设备的综述完全神经元端到端自动语音识别算法

A Review of On-Device Fully Neural End-to-End Automatic Speech Recognition Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅