Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation

PENG JIANG; YANG XIA; GAGAN AGRAWAL

首页> 外文期刊>ACM Transactions on Parallel Computing >Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation

【24h】

Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation

机译：将SIMD和许多/多核并行性与枚举炒作结合起来的有限状态机

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Finite-state Machine (FSM) is the key kernel behind many popular applications, including regular expression matching, text tokenization, and Huffman decoding. Parallelizing FSMs is extremely difficult because of the strong dependencies and unpredictable memory accesses. Previous efforts have largely focused on multi-core parallelization and used different approaches, including speculative and enumerative execution, both of which have been effective but also have limitations. With increasing width and improving flexibility in SIMD instruction sets, this article focuses on combining SIMD and many/multi-core parallelism for FSMs. We have developed a novel strategy, called enumerative speculation. Instead of speculating on a single state as in speculative execution or enumerating all possible states as in enumerative execution, our strategy speculates transitions from several possible states, reducing the prediction overheads of speculation approach and the large amount of redundant work in the enumerative approach. A simple lookback approach produces a set of guessed states to achieve high speculation success rates in our enumerative speculation. In addition, to enable continued scalability of enumerative speculation with a large number of threads, we have developed a parallel merge method. We evaluate our method with four popular FSM applications: Huffman decoding, regular expression matching, HTML tokenization, and Div7. We obtain up to 2.5× speedup using SIMD on 1 core and up to 95× combining SIMD with 60 cores of an Intel Xeon Phi. On a single core, we outperform the best single-state speculative execution version by an average of 1.6×, and in combining SIMD and many-core parallelism, outperform enumerative execution by an average of 2×. Finally, when evaluate on a GPU, we show that our parallel merge implementations are 2.02-6.74× more efficient than corresponding sequential merge implementations and achieve better scalability on an Nvidia V100 GPU.

机译：有限状态机（FSM）是许多流行应用程序背后的关键内核，包括正则表达式匹配，文本标记和霍夫曼解码。由于强的依赖性和不可预测的内存访问，并行化FSM非常困难。以前的努力主要集中在多核并行化和使用不同的方法，包括投机和枚举执行，这两者都是有效的，但也有限制。随着宽度越来越高，提高SIMD指令集中的灵活性，本文侧重于组合SIMD和多核并行性的FSMS。我们制定了一种新的战略，称为枚举猜测。在推测执行或枚举所有可能的状态的单个状态上，而不是在枚举执行中枚举所有可能的状态，而不是在枚举执行中枚举，而是我们的策略推测了来自若干可能的状态的转变，从而减少了推测方法的预测开销以及枚举方法中的大量冗余工作。一个简单的寻呼方法产生了一组猜测状态，以实现我们的突出猜测中的高猜测成功率。此外，为了能够使用大量线程突破枚举猜测的可扩展性，我们开发了一个并行合并方法。我们用四个流行的FSM应用程序评估我们的方法：Huffman解码，正则表达式匹配，HTML标记和DIV7。我们在1个核心上使用SIMD获得高达2.5倍的加速，最高可达95×与英特尔Xeon Phi的60个内核结合SIMD。在单一核心上，我们以平均的1.6倍优于最佳单态推测执行版本，并在组合SIMD和许多核心并行性，始终枚举执行平均2×。最后，当对GPU进行评估时，我们表明我们的并行合并实现比相应的顺序合并实现更有效，并在NVIDIA V100 GPU上实现更好的可扩展性。

著录项

来源
《ACM Transactions on Parallel Computing》 |2020年第3期|15.1-15.26|共26页
作者
PENG JIANG; YANG XIA; GAGAN AGRAWAL;
展开▼
作者单位

The University of Iowa;

The Ohio State University;

Augusta University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Finite-state machine; break dependence; SIMD;

机译：有限状态机;打破依赖;SIMD.;

相似文献

外文文献
中文文献
专利

1. Mapping Parallelism to Multi-cores: A Machine Learning Based Approach [J] . Wang Z, OBoyle MFP ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2009,第4期

机译：将并行映射到多核：一种基于机器学习的方法
2. Transformation of a Mealy Finite-State Machine into a Moore Finite-State Machine by Splitting Internal States [J] . A. S. Klimovich, V. V. Solovev Journal of Computer and Systems Sciences International . 2010,第6期

机译：通过分裂内部状态将中等有限状态机转换为摩尔有限状态机
3. A two-level real-time vision machine combining coarse- and fine-grained parallelism [J] . Lars Baunegaard With Jensen, Anders Kjaer-Nielsen, Karl Pauwels, Journal of Real-Time Image Processing . 2010,第4期

机译：结合了粗粒度和细粒度并行性的两级实时视觉机器
4. Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation [C] . Peng Jiang, Gagan Agrawal ACM SIGPLAN Symposium on Priciples and Practice of Parallel Programming . 2016

机译：将SIMD和许多/多核并行性与枚举猜测结合起来的有限状态机
5. Systems of communicating finite-state machines as a distributed alternative to finite-state machines. [D] . Klemm, Reinhard Peter. 1996

机译：通信有限状态机的系统，作为有限状态机的分布式替代方案。
6. Off-Policy Evaluation of the Performance of a Robot Swarm: Importance Sampling to Assess Potential Modifications to the Finite-State Machine That Controls the Robots [O] . Federico Pagnozzi, Mauro Birattari 2021

机译：对机器人群体性能的违规评估：重要的采样以评估对控制机器人的有限状态机的潜在修改
7. Compiling Rewriting onto SIMD and MIMD/SIMD Machines [O] . P. Lincoln, N. Martí-Oliet, J. Meseguer, 1994

机译：编译重写到sImD和mImD / sImD机器上

Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅