首页> 外文会议>International Symposium on Microarchitecture >UNFOLD: A Memory-Efficient Speech Recognizer Using On-The-Fly WFST Composition
【24h】

UNFOLD: A Memory-Efficient Speech Recognizer Using On-The-Fly WFST Composition

机译:展开:使用禁用WFST组成的记忆高效的语音识别器

获取原文

摘要

Accurate, real-time Automatic Speech Recognition (ASR) requires huge memory storage and computational power. The main bottleneck in state-of-the-art ASR systems is the Viterbi search on a Weighted Finite State Transducer (WFST). The WFST is a graph-based model created by composing an Acoustic Model (AM) and a Language Model (LM) offline. Offline composition simplifies the implementation of a speech recognizer as only one WFST has to be searched. However, the size of the composed WFST is huge, typically larger than a Gigabyte, resulting in a large memory footprint and memory bandwidth requirements. In this paper, we take a completely different approach and propose a hardware accelerator for speech recognition that composes the AM and LM graphs on-the-fly. In our ASR system, the fully-composed WFST is never generated in main memory. On the contrary, only the subset required for decoding each input speech fragment is dynamically generated from the AM and LM models. In addition to the direct benefits of this on-the-fly composition, the resulting approach is more amenable to further reduction in storage requirements through compression techniques. The resulting accelerator, called UNFOLD, performs the decoding in real-time using the compressed AM and LM models, and reduces the size of the datasets from more than one Gigabyte to less than 40 Megabytes, which can be very important in small form factor mobile and wearable devices. Besides, UNFOLD improves energy-efficiency by orders of magnitude with respect to CPUs and GPUs. Compared to a state-of-the-art Viterbi search accelerators, the proposed ASR system outperforms by providing 31x reduction in memory footprint and 28% energy savings on average.
机译:准确的,实时自动语音识别(ASR)需要巨大的内存存储和计算能力。最先进的ASR系统中的主要瓶颈是加权有限状态传感器(WFST)上的Viterbi搜索。 WFST是通过组合声学模型(AM)和脱机语言模型(LM)创建的基于图形的模型。离线组成简化了语音识别器的实现,只有一个WFST必须搜索。然而,组合WFST的大小巨大,通常大于千兆字节,导致大的内存占用和存储器带宽要求。在本文中,我们采取了完全不同的方法,提出了一种用于语音识别的硬件加速器,用于在飞行中构成AM和LM图。在我们的ASR系统中,完全组成的WFST永远不会在主内存中生成。相反,仅从AM和LM模型动态生成解码每个输入语音片段所需的子集。除了这种在飞行组成的直接益处之外,通过压缩技术进一步降低储存要求,所得到的方法还更加可用。所产生的加速器称为展开,使用压缩的AM和LM模型实时执行解码,并将数据集的大小从多于一个千兆字节从小于40兆字节减少到小于40兆字节,这在小型移动中可能非常重要和可穿戴设备。此外,展开通过关于CPU和GPU的数量级来提高能量效率。与最先进的维特达搜索加速器相比,所提出的ASR系统通过提供31倍的内存占地面积和平均节能28%的节能来胜过。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号