首页> 外文会议>IEEE International Conference on Multimedia and Expo >Scalable HMM based Inference Engine in Large Vocabulary Continuous Speech Recognition
【24h】

Scalable HMM based Inference Engine in Large Vocabulary Continuous Speech Recognition

机译:大型词汇连续语音识别中的可扩展性肝的推理引擎

获取原文

摘要

Parallel scalability allows an application to efficiently utilize an increasing number of processing elements. In this paper we explore a design space for application scalability for an inference engine in large vocabulary continuous speech recognition (LVCSR). Our implementation of the inference engine involves a parallel graph traversal through an irregular graph-based knowledge network with millions of states and arcs. The challenge is not only to define a software architecture that exposes sufficient fine-grained application concurrency, but also to efficiently synchronize between an increasing number of concurrent tasks and to effectively utilize the parallelism opportunities in today's highly parallel processors. We propose four application-level implementation alternatives we call "algorithm styles", and construct highly optimized implementations on two parallel platforms: an Intel Core i7 multicore processor and a NVIDIA GTX280 manycore processor. The highest performing algorithm style varies with the implementation platform. On 44 minutes of speech data set, we demonstrate substantial speedups of 3.4× on Core i7 and 10.5× on GTX280 compared to a highly optimized sequential implementation on Core i7 without sacrificing accuracy. The parallel implementations contain less than 2.5% sequential overhead, promising scalability and significant potential for further speedup on future platforms.
机译:并行可扩展性允许应用程序有效地利用越来越多的处理元件。在本文中,我们探索了大型词汇连续语音识别(LVCSR)中推理引擎的应用程序可扩展性的设计空间。我们的推理引擎的实现涉及通过具有数百万个州和弧的基于不规则的图形知识网络遍历并行图。挑战不仅要定义一个暴露足够的细粒度应用程序并发性的软件架构,而且还可以在越来越多的并发任务之间有效地同步,并有效地利用当今高度并行处理器中的并行机会。我们提出了四种应用程序级别实现替代方案,我们称之为“算法样式”,并在两个并行平台上构建高度优化的实现:英特尔酷睿i7多核处理器和NVIDIA GTX280 MDERCORE处理器。最高执行算法样式随实施平台而变化。在44分钟的语音数据集中,我们在GTX280上展示了3.4倍的大量加速度为3.4倍,而GTX280在核心I7上的高度优化的顺序实现而不会牺牲精度。并行实现包含少于2.5%的顺序开销,有希望的可扩展性和未来平台进一步加速的显着潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号