首页> 外文期刊>ACM Transactions on Parallel Computing >Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation
【24h】

Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation

机译:将SIMD和许多/多核并行性与枚举炒作结合起来的有限状态机

获取原文
获取原文并翻译 | 示例

摘要

Finite-state Machine (FSM) is the key kernel behind many popular applications, including regular expression matching, text tokenization, and Huffman decoding. Parallelizing FSMs is extremely difficult because of the strong dependencies and unpredictable memory accesses. Previous efforts have largely focused on multi-core parallelization and used different approaches, including speculative and enumerative execution, both of which have been effective but also have limitations. With increasing width and improving flexibility in SIMD instruction sets, this article focuses on combining SIMD and many/multi-core parallelism for FSMs. We have developed a novel strategy, called enumerative speculation. Instead of speculating on a single state as in speculative execution or enumerating all possible states as in enumerative execution, our strategy speculates transitions from several possible states, reducing the prediction overheads of speculation approach and the large amount of redundant work in the enumerative approach. A simple lookback approach produces a set of guessed states to achieve high speculation success rates in our enumerative speculation. In addition, to enable continued scalability of enumerative speculation with a large number of threads, we have developed a parallel merge method. We evaluate our method with four popular FSM applications: Huffman decoding, regular expression matching, HTML tokenization, and Div7. We obtain up to 2.5× speedup using SIMD on 1 core and up to 95× combining SIMD with 60 cores of an Intel Xeon Phi. On a single core, we outperform the best single-state speculative execution version by an average of 1.6×, and in combining SIMD and many-core parallelism, outperform enumerative execution by an average of 2×. Finally, when evaluate on a GPU, we show that our parallel merge implementations are 2.02-6.74× more efficient than corresponding sequential merge implementations and achieve better scalability on an Nvidia V100 GPU.
机译:有限状态机(FSM)是许多流行应用程序背后的关键内核,包括正则表达式匹配,文本标记和霍夫曼解码。由于强的依赖性和不可预测的内存访问,并行化FSM非常困难。以前的努力主要集中在多核并行化和使用不同的方法,包括投机和枚举执行,这两者都是有效的,但也有限制。随着宽度越来越高,提高SIMD指令集中的灵活性,本文侧重于组合SIMD和多核并行性的FSMS。我们制定了一种新的战略,称为枚举猜测。在推测执行或枚举所有可能的状态的单个状态上,而不是在枚举执行中枚举所有可能的状态,而不是在枚举执行中枚举,而是我们的策略推测了来自若干可能的状态的转变,从而减少了推测方法的预测开销以及枚举方法中的大量冗余工作。一个简单的寻呼方法产生了一组猜测状态,以实现我们的突出猜测中的高猜测成功率。此外,为了能够使用大量线程突破枚举猜测的可扩展性,我们开发了一个并行合并方法。我们用四个流行的FSM应用程序评估我们的方法:Huffman解码,正则表达式匹配,HTML标记和DIV7。我们在1个核心上使用SIMD获得高达2.5倍的加速,最高可达95×与英特尔Xeon Phi的60个内核结合SIMD。在单一核心上,我们以平均的1.6倍优于最佳单态推测执行版本,并在组合SIMD和许多核心并行性,始终枚举执行平均2×。最后,当对GPU进行评估时,我们表明我们的并行合并实现比相应的顺序合并实现更有效,并在NVIDIA V100 GPU上实现更好的可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号