首页> 外文期刊>International journal of parallel programming >A Speculative Parallel DFA Membership Test for Multicore, SIMD and Cloud Computing Environments
【24h】

A Speculative Parallel DFA Membership Test for Multicore, SIMD and Cloud Computing Environments

机译:针对多核,SIMD和云计算环境的推测性并行DFA成员资格测试

获取原文
获取原文并翻译 | 示例
           

摘要

We present techniques to parallelize membership tests for Deterministic Finite Automata (DFAs). Our method searches arbitrary regular expressions by matching multiple bytes in parallel using speculation. We partition the input string into chunks, match chunks in parallel, and combine the matching results. Our parallel matching algorithm exploits structural DFA properties to minimize the speculative overhead. Unlike previous approaches, our speculation is failure-free, i.e., (1) sequential semantics are maintained, and (2) speed-downs are avoided altogether. On architectures with a SIMD gather-operation for indexed memory loads, our matching operation is fully vectorized. The proposed load-balancing scheme uses an off-line profiling step to determine the matching capacity of each participating processor. Based on matching capacities, DFA matches are load-balanced on inhomogeneous parallel architectures such as cloud computing environments. We evaluated our speculative DFA membership test for a representative set of benchmarks from the Perl-compatible Regular Expression (PCRE) library and the PROSITE protein database. Evaluation was conducted on a 4 CPU (40 cores) shared-memory node of the Intel Academic Program Manycore Testing Lab (Intel MTL), on the Intel AVX2 SDE simulator for 8-way fully vectorized SIMD execution, and on a 20-node (288 cores) cluster on the Amazon EC2 computing cloud. Obtained speedups are on the order of O (1 + |P|-1/|Q|·γ), where | P | denotes the number of processors or SIMD units, | Q | denotes the number of DFA states, and 0 < γ ≤ 1 represents a statically computed DFA property. For all observed cases, we found that 0.02 < γ < 0.47. Actual speedups range from 2.3 × to 38.8× for up to 512 DFA states for PCRE, and between 1.3× and 19.9 × for up to 1,288 DFA states for PROSITE on a 40-core MTL node. Speedups on the EC2 computing cloud range from 5.0× to 65.8× for PCRE, and from 5.0× to 138.5× for PROSITE. Speedups of our C-based DFA matcher over the Perl-based ScanProsite scan tool range from 559.3 × to 15079.7 × on a 40-core MTL node. We show the scalability of our approach for input-sizes of up to 10 GB.
机译:我们提出了并行化确定性有限自动机(DFA)成员资格测试的技术。我们的方法通过使用推测并行匹配多个字节来搜索任意正则表达式。我们将输入字符串划分为多个块,并行匹配块,然后组合匹配结果。我们的并行匹配算法利用DFA的结构属性来最大程度地减少投机开销。与以前的方法不同,我们的猜测是无故障的,即,(1)保留了顺序语义,并且(2)完全避免了速度下降。在具有用于索引存储器负载的SIMD收集操作的体系结构上,我们的匹配操作已完全矢量化。所提出的负载平衡方案使用离线分析步骤来确定每个参与处理器的匹配能力。基于匹配能力,DFA匹配将在不均匀的并行体系结构(例如云计算环境)上进行负载平衡。我们评估了Perf兼容正则表达式(PCRE)库和PROSITE蛋白质数据库中一组代表性基准的推测性DFA成员资格测试。评估是在Intel学术计划Manycore测试实验室(Intel MTL)的4 CPU(40核)共享内存节点,用于8路完全矢量化SIMD执行的Intel AVX2 SDE模拟器以及20节点( 288个核心)群集在Amazon EC2计算云上。获得的加速比约为O(1 + | P | -1 / | Q |·γ),其中| P |表示处理器或SIMD单元的数量,|问|表示DFA状态数,0 <γ≤1表示静态计算的DFA属性。对于所有观察到的情况,我们发现0.02 <γ<0.47。对于40核MTL节点,对于PCRS最多512个DFA状态,实际加速范围为2.3×到38.8×,对于PROSITE最多1,288 DFA状态,实际加速范围为1.3×和19.9×。对于PCRE,EC2计算云的加速范围为5.0倍至65.8倍,对于PROSITE,加速范围为5.0倍至138.5倍。在40核MTL节点上,基于Perl的ScanProsite扫描工具的基于C的DFA匹配器的加速范围为559.3×到15079.7×。对于最大10 GB的输入大小,我们展示了我们方法的可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号