Robotomata: A framework for approximate pattern matching of big data on an automata processor

机译：Robotomata：一种用于在自动机处理器上对大数据进行近似模式匹配的框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Approximate pattern matching (APM) has been widely used in big data applications, e.g., genome data analysis, speech recognition, fraud detection, computer vision, etc. Although an automata-based approach is an efficient way to realize APM, the inherent sequentiality of automata deters its implementation on general-purpose parallel platforms, e.g., multicore CPUs and many-core GPUs. Recently, however, Micron has proposed its Automata Processor (AP), a processing-in-memory (PIM) architecture dedicated for non-deterministic automata (NFA) simulation. It has nominally achieved thousands-fold speedup over a multicore CPU for many big data applications. Alas, the AP ecosystem suffers from two major problems. First, the current APIs of AP require manual manipulations of all computational elements. Second, multiple rounds of time-consuming compilation are needed for large datasets. Both problems hinder programmer productivity and end-to-end performance. Therefore, we propose a paradigm-based approach to hierarchically generate automata on AP and use this approach to create Robotomata, a framework for APM on AP. By taking in the following inputs — the types of APM paradigms, desired pattern length, and allowed number of errors as input — our framework can generate the optimized APM-automata codes on AP, so as to improve programmer productivity. The generated codes can also maximize the reuse of pre-compiled macros and significantly reduce the time for reconfiguration. We evaluate Robotomata by comparing it to two state-of-the-art APM implementations on AP with real-world datasets. Our experimental results show that our generated codes can achieve up to 30.5x and 12.8x speedup with respect to configuration while maintaining the computational performance. Compared to the counterparts on CPU, our codes achieve up to 393x overall speedup, even when including the reconfiguration costs. We highlight the importance of counting the configuration time towards the overall performance on AP, which would provide better insight in identifying essential hardware features, specifically for large-scale problem sizes.

机译：近似模式匹配（APM）已广泛用于大数据应用中，例如基因组数据分析，语音识别，欺诈检测，计算机视觉等。尽管基于自动机的方法是实现APM的有效方法，但其固有的顺序性自动机阻止了其在通用并行平台（例如多核CPU和多核GPU）上的实现。但是，最近，美光提出了其自动机处理器（AP），这是一种内存处理（PIM）架构，专用于非确定性自动机（NFA）模拟。对于许多大数据应用程序，它名义上已经比多核CPU达到了数千倍的加速。 AP，AP生态系统面临两个主要问题。首先，AP的当前API需要对所有计算元素进行手动操作。其次，大型数据集需要多轮费时的编译。这两个问题都会影响程序员的生产力和端到端性能。因此，我们提出了一种基于范例的方法来在AP上分层生成自动机，并使用此方法来创建Robotomata，这是AP上APM的框架。通过接受以下输入-APM范式的类型，所需的模式长度和允许的错误数作为输入-我们的框架可以在AP上生成优化的APM自动机代码，从而提高程序员的生产率。生成的代码还可以最大程度地重用预编译的宏，并显着减少重新配置的时间。我们通过将Robotomata与具有实际数据集的AP上的两个最新的APM实现进行比较，来评估Robotomata。我们的实验结果表明，相对于配置，我们生成的代码可以实现高达30.5倍和12.8倍的加速，同时保持计算性能。与CPU上的同类产品相比，即使包括重新配置成本，我们的代码也可以实现高达393倍的整体速度提升。我们强调了将配置时间计入AP整体性能的重要性，这将有助于更好地洞悉必要的硬件功能，特别是针对大规模问题的规模。

著录项

来源
《IEEE International Conference on Big Data》|2017年|283-292|共10页
会议地点
作者
Xiaodong Yu; Kaixi Hou; Hao Wang; Wu-chun Feng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Portable document format;

机译：便携式文件格式;

相似文献

外文文献
中文文献
专利

1. Evaluating High Performance Pattern Matching on the Automata Processor [J] . Roy Indranil, Srivastava Ankit, Grimm Matt, IEEE Transactions on Computers . 2019,第8期

机译：在自动机处理器上评估高性能模式匹配
2. Big Data Approximating Control (BDAC)-A new model-free estimation and control paradigm based on pattern matching and approximation [J] . Stanley G. M. Journal of Process Control . 2018,第期

机译：大数据近似控制（BDAC） - 基于模式匹配和近似的新型无模型估计和控制范例
3. Approximate symbolic pattern matching for protein sequence data [J] . Bill C.H. Chang, Saman K. Halgamuge International Journal of Approximate Reasoning . 2003,第2a3期

机译：蛋白质序列数据的近似符号模式匹配
4. Robotomata: A framework for approximate pattern matching of big data on an automata processor [C] . Xiaodong Yu, Kaixi Hou, Hao Wang, IEEE International Conference on Big Data . 2017

机译：Robotomata：自动处理器上大数据近似模式匹配的框架
5. The development of extended pattern matching operators and a supporting operator framework for relational database management systems. [D] . Wagner, Paul Justen. 2001

机译：关系数据库管理系统的扩展模式匹配运算符和支持运算符框架的开发。
6. Dynamic partitioning of search patterns for approximate pattern matching using search schemes [O] . Luca Renders, Kathleen Marchal, Jan Fostier 2021

机译：使用搜索方案进行近似模式匹配的搜索模式的动态分区
7. Approximate Pattern Matching in a Pattern Database System [O] . Larry So Davis, Nicholas Roussopoulos 1979

机译：模式数据库系统中的近似模式匹配

Robotomata: A framework for approximate pattern matching of big data on an automata processor

摘要

著录项

相似文献

相关主题

期刊订阅