首页> 外文学位 >A Generic, Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator.
【24h】

A Generic, Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator.

机译:用于大型声学模型和大型词汇语音识别加速器的通用可扩展体系结构。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation describes a scalable hardware accelerator for Speech Recognition. We propose a generic hardware architecture which can be used with multiple software which use HMM based Speech Recognition . We implement a two pass decoding algorithm with an approximate N-best time synchronous Viterbi Beam Search. The Observation Probability Calculation (Senone Scoring) and first pass of decoding, which uses a simple language model, is implemented in hardware. A word lattice, which is the output from this first pass, is used by the software for the second pass, with a more sophisticated N-gram language model. This allows us to use a very large and generic language model in our hardware. We opt for the logic-on-memory approach to make use of a high bandwidth NOR Flash Memory to improve our random read performance for senone scoring and first pass decoding, both of which are memory intensive operations. For senone scoring, we store all of the acoustic model data in NOR Flash Memory. For the decoding, we partition the data accesses between DRAM, SRAM and NOR Flash, which allows parallelism of these accesses and improves performance. We arrange our data structures in a specific manner, which allows complete sequential access of the DRAM, thereby improving memory access efficiency. We use techniques like block scoring and caching of word an HMM models to reduce the overall power consumption and further improve performance. The use of a word lattice to communicate between hardware and software keeps the communication overhead low, compared to any other partitioning scheme. This architecture provides us with a speed up of 4.3X over a 2.4 GHz Intel Core 2 Duo processor running the CMU Sphinx recognition software, while consuming an estimated 1.72 W of power. The hardware accelerator provides improved speech recognition accuracy by supporting larger acoustic models and word dictionaries while maintaining real time performance.
机译:本文描述了一种可扩展的语音识别硬件加速器。我们提出了一种通用的硬件体系结构,可以与使用基于HMM的语音识别的多种软件一起使用。我们使用近似N最佳时间同步维特比波束搜索实现了两遍解码算法。使用简单语言模型的观察概率计算(Senone评分)和解码的第一遍通过硬件实现。第一次晶格输出的词格,由软件用于第二遍,并具有更复杂的N-gram语言模型。这使我们可以在硬件中使用非常大的通用语言模型。我们选择了基于内存的逻辑方法,以利用高带宽NOR闪存来提高我们的随机读取性能,以进行senone评分和首过解码,这两项都是内存密集型操作。对于senone评分,我们将所有声学模型数据存储在NOR Flash存储器中。对于解码,我们在DRAM,SRAM和NOR Flash之间划分数据访问权限,从而允许这些访问的并行性并提高性能。我们以特定的方式排列数据结构,从而允许对DRAM进行完整的顺序访问,从而提高内存访问效率。我们使用HMM模型的块计分和单词缓存等技术来降低总体功耗并进一步提高性能。与任何其他分区方案相比,使用单词晶格在硬件和软件之间进行通信可以使通信开销较低。这种架构为我们提供了2.4倍的速度,与运行CMU Sphinx识别软件的2.4 GHz英特尔酷睿2双核处理器相比,功耗估计为1.72瓦。硬件加速器通过支持更大的声学模型和单词词典,同时保持实时性能,提高了语音识别的准确性。

著录项

  • 作者

    Bapat, Ojas Ashok.;

  • 作者单位

    North Carolina State University.;

  • 授予单位 North Carolina State University.;
  • 学科 Engineering Computer.;Computer Science.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 117 p.
  • 总页数 117
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号