Efficient On-The-Fly Hypothesis Rescoring in a Hybrid GPU/CPU-based Large Vocabulary Continuous Speech Recognition Engine

机译：基于混合GPU / CPU的大词汇量连续语音识别引擎中的有效即时假设记录

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Effectively exploiting the resources available on modern mul-ticore and manycore processors for tasks such as large vocabulary continuous speech recognition (LVCSR) is far from trivial. While prior works have demonstrated the effectiveness of manycore graphic processing units (GPU) for high-throughput, limited vocabulary speech recognition, they are unsuitable for recognition with large acoustic and language models due to the limited 1-6GB of memory on GPUs. To overcome this limitation, we introduce a novel architecture for WFST-based LVCSR that jointly leverages manycore graphic processing units (GPU) and multicore processors (CPU) to efficiently perform recognition even when large acoustic and language models are applied. In the proposed approach, recognition is performed on the GPU using an H-level WFST, composed using a unigram language model. During decoding partial hypotheses generated over this network are rescored on-the-fly using a large language model, which resides on the CPU. By maintaining N-best hypotheses during decoding our proposed architecture obtains comparable accuracy to a standard CPU-based WFST decoder while improving decoding speed by a factor of 11 ×.

机译：有效地利用现代多核和许多核处理器上的资源来完成诸如大词汇量连续语音识别（LVCSR）之类的任务并非易事。尽管先前的工作已经证明了许多核心图形处理单元（GPU）对于高吞吐量，有限的词汇语音识别的有效性，但由于GPU上的1-6GB内存有限，因此它们不适用于大型声学和语言模型。为克服此限制，我们为基于WFST的LVCSR引入了一种新颖的体系结构，该体系结构联合利用多核图形处理单元（GPU）和多核处理器（CPU）来有效地执行识别，即使在应用大型声学和语言模型时也是如此。在提出的方法中，使用H-level WFST在GPU上执行识别，该WFST由unigram语言模型组成。在解码过程中，使用驻留在CPU上的大型语言模型实时重述通过该网络生成的部分假设。通过在解码过程中保持N个最佳假设，我们提出的体系结构可以获得与基于标准CPU的WFST解码器相当的准确性，同时将解码速度提高了11倍。

著录项

来源
《Annual conference of the International Speech Communication Association》|2012年|1034-1037|共4页
会议地点
作者
Jungsuk Kim; Jike Chong; Ian Lane;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Large Vocabulary Continuous Speech Recognition; WFST; On-The-Fly Rescoring; Graphics Processing Units;

机译：大词汇量连续语音识别; WFST;即时记录;图形处理单元;

相似文献

外文文献
中文文献
专利

1. Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition [J] . Hori T., Hori C., Minami Y., IEEE transactions on audio, speech and language processing . 2007,第4期

机译：高效的基于WFST的单遍解码，具有即时假设，可极大地记录词汇量，并能连续语音识别
2. A Study on a Phoneme-graph-based Hypothesis Restriction for Large Vocabulary Continuous Speech Recognition [J] . TAKAAKI HORI, NAOKI OKA, MASAHARU KAfOH 情報処理学会論文誌 . 1999,第4期

机译：基于音素图的大词汇量连续语音识别假设限制的研究
3. A fast and memory-efficient N-gram language model lookup method for large vocabulary continuous speech recognition [J] . Xiaolong Li, Yunxin Zhao Computer speech and language . 2007,第1期

机译：用于大词汇量连续语音识别的快速且高效存储的N元语法模型查找方法
4. Efficient On-The-Fly Hypothesis Rescoring in a Hybrid GPU/CPU-based Large Vocabulary Continuous Speech Recognition Engine [C] . Jungsuk Kim, Jike Chong, Ian Lane INTERSPEECH 2012 . 2012

机译：高效的基于混合GPU / CPU大词汇连续语音识别引擎中的现场假设繁殖
5. An Error Detection and Correction Framework to Improve Large Vocabulary Continuous Speech Recognition [D] . Zhou, Zhengyu 2009

机译：一种提高大词汇量连续语音识别能力的错误检测与纠正框架
6. A systematic comparison of contemporary automatic speech recognition engines for conversational clinical speech [O] . Jodi Kodish-Wachs, Emin Agassi, Patrick Kenny III, 2018

机译：当代自动语音识别引擎用于对话式临床语音的系统比较
7. Hierarchical Hybrid Language models for Open Vocabulary Continuous Speech Recognition using WFST [O] . Basha Shaik Mahaboob Ali, Rybach David, Hahn Stefan, 2012

机译：使用WFST的开放式词汇连续语音识别的分层混合语言模型

Efficient On-The-Fly Hypothesis Rescoring in a Hybrid GPU/CPU-based Large Vocabulary Continuous Speech Recognition Engine

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅