首页> 外文学位 >A Generic, Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator.

【24h】

A Generic, Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator.

机译：用于大型声学模型和大型词汇语音识别加速器的通用可扩展体系结构。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This dissertation describes a scalable hardware accelerator for Speech Recognition. We propose a generic hardware architecture which can be used with multiple software which use HMM based Speech Recognition . We implement a two pass decoding algorithm with an approximate N-best time synchronous Viterbi Beam Search. The Observation Probability Calculation (Senone Scoring) and first pass of decoding, which uses a simple language model, is implemented in hardware. A word lattice, which is the output from this first pass, is used by the software for the second pass, with a more sophisticated N-gram language model. This allows us to use a very large and generic language model in our hardware. We opt for the logic-on-memory approach to make use of a high bandwidth NOR Flash Memory to improve our random read performance for senone scoring and first pass decoding, both of which are memory intensive operations. For senone scoring, we store all of the acoustic model data in NOR Flash Memory. For the decoding, we partition the data accesses between DRAM, SRAM and NOR Flash, which allows parallelism of these accesses and improves performance. We arrange our data structures in a specific manner, which allows complete sequential access of the DRAM, thereby improving memory access efficiency. We use techniques like block scoring and caching of word an HMM models to reduce the overall power consumption and further improve performance. The use of a word lattice to communicate between hardware and software keeps the communication overhead low, compared to any other partitioning scheme. This architecture provides us with a speed up of 4.3X over a 2.4 GHz Intel Core 2 Duo processor running the CMU Sphinx recognition software, while consuming an estimated 1.72 W of power. The hardware accelerator provides improved speech recognition accuracy by supporting larger acoustic models and word dictionaries while maintaining real time performance.

机译：本文描述了一种可扩展的语音识别硬件加速器。我们提出了一种通用的硬件体系结构，可以与使用基于HMM的语音识别的多种软件一起使用。我们使用近似N最佳时间同步维特比波束搜索实现了两遍解码算法。使用简单语言模型的观察概率计算（Senone评分）和解码的第一遍通过硬件实现。第一次晶格输出的词格，由软件用于第二遍，并具有更复杂的N-gram语言模型。这使我们可以在硬件中使用非常大的通用语言模型。我们选择了基于内存的逻辑方法，以利用高带宽NOR闪存来提高我们的随机读取性能，以进行senone评分和首过解码，这两项都是内存密集型操作。对于senone评分，我们将所有声学模型数据存储在NOR Flash存储器中。对于解码，我们在DRAM，SRAM和NOR Flash之间划分数据访问权限，从而允许这些访问的并行性并提高性能。我们以特定的方式排列数据结构，从而允许对DRAM进行完整的顺序访问，从而提高内存访问效率。我们使用HMM模型的块计分和单词缓存等技术来降低总体功耗并进一步提高性能。与任何其他分区方案相比，使用单词晶格在硬件和软件之间进行通信可以使通信开销较低。这种架构为我们提供了2.4倍的速度，与运行CMU Sphinx识别软件的2.4 GHz英特尔酷睿2双核处理器相比，功耗估计为1.72瓦。硬件加速器通过支持更大的声学模型和单词词典，同时保持实时性能，提高了语音识别的准确性。

著录项

作者
Bapat, Ojas Ashok.;
展开▼
作者单位

North Carolina State University.;

展开▼
授予单位 North Carolina State University.;
学科 Engineering Computer.;Computer Science.
学位 Ph.D.
年度 2013
页码 117 p.
总页数 117
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A Generic and Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator Using Logic on Memory [J] . Bapat O.A., Franzon P.D., Fastow R.M. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2014,第12期

机译：基于内存逻辑的大型声学模型和大型词汇语音识别加速器的通用可扩展体系结构
2. Building DNN acoustic models for large vocabulary speech recognition [J] . Andrew L. Maas, Peng Qi, Ziang Xie, Computer speech and language . 2017,第jana期

机译：建立用于大词汇量语音识别的DNN声学模型
3. A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition [J] . Li Xiangang, Yang Yuning, Pang Zaihu, Neurocomputing . 2015,第deca25期

机译：基于大词汇量中文语音识别的深度神经网络中声学建模单元选择的比较研究
4. Effective Acoustic Modeling for Rate-of-Speech Variation in Large Vocabulary Conversational Speech Recognition [C] . Jing Zheng, Horatio Franco, Andreas Stolcke International Conference on Spoken Language Processing; 20041004-08; Jeju(KR) . 2004

机译：大型词汇会话语音识别中语速变化的有效声学建模
5. Statistical optimization of acoustic models for large vocabulary speech recognition [D] . Hu, Rusheng 2006

机译：用于大词汇量语音识别的声学模型的统计优化
6. Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition [O] . Edvin Pakoci, Branislav Popović, Darko Pekar 2019

机译：在塞尔维亚大型词汇语音识别的语言建模中使用形态学数据
7. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition [O] . Soltau, Hagen, Liao, Hank, Sak, Hasim 2016

机译：神经语音识别器：用于大型的声学到单词LsTm模型词汇语音识别

A Generic, Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator.

摘要

著录项

相似文献

相关主题

期刊订阅