首页> 外文学位 >Data structures, algorithms and architectures for efficient regular expression evaluation.
【24h】

Data structures, algorithms and architectures for efficient regular expression evaluation.

机译:高效的正则表达式评估的数据结构,算法和体系结构。

获取原文
获取原文并翻译 | 示例

摘要

Regular expression matching is a crucial task in several networking applications, where packet payloads need to be inspected at line rate over large sets of complex patterns. Modern, payload-scanning firewalls and intrusion detection systems are perhaps the most notable applications that rely on this mechanism. When targeting memory-based architectures, in which the pattern data structure is stored in off-chip memory such as DRAM, the design of efficient regular expression matching engines is subject to two basic resource requirements: memory size and memory bandwidth. When designing Field Programmable Gate Array implementations, in which the pattern data structure is stored in on-chip memories or logic gates, the focus is on minimizing logic cell utilization and allowing high operational dock frequency. Keeping these requirements feasible is a challenge as pattern sets grow in size and expressiveness.;In this work, we describe novel data structures and algorithms for high-speed regular expression evaluation and map the proposed schemes onto suitable parallel architectures. We aim to provide a comprehensive solution that is scalable with respect to both the size of the rule-set and the complexity of the individual rules. Our main contributions are the following. First, we propose an efficient and low complexity compression scheme for deterministic finite automata (DFAs), leading to a compression factor in excess of 98% across a set of representative data sets. Second, we identify and address problematic subpatterns, such as bounded and unbounded repetitions of wildcards and large character sets and back-references. Specifically, we propose a novel automaton with a limited memory footprint and a bounded memory bandwidth requirement that overcomes the limitations of pure DFA-based solutions. Third, we describe how to utilize our proposed techniques when targeting FPGA implementations. Finally, our tools for constructing and analyzing a range of alternative automata for regular-expression evaluation have been released as open-source, and are used actively by many researchers from around the world.
机译:在几种联网应用中,正则表达式匹配是一项至关重要的任务,在这种应用中,需要以线速检查大型复杂模式集上的数据包有效负载。现代的有效负载扫描防火墙和入侵检测系统可能是依赖此机制的最著名的应用程序。当针对基于存储器的体系结构(其中模式数据结构存储在片外存储器(例如DRAM)中)时,有效的正则表达式匹配引擎的设计要满足两个基本资源要求:存储器大小和存储器带宽。在设计将模式数据结构存储在片内存储器或逻辑门中的现场可编程门阵列实施方案时,重点在于最小化逻辑单元利用率并允许较高的操作停靠频率。随着模式集的大小和表达能力的增长,保持这些要求的可行性是一个挑战。在这项工作中,我们描述了用于高速正则表达式评估的新颖数据结构和算法,并将提出的方案映射到合适的并行体系结构上。我们旨在提供一种全面的解决方案,该解决方案可在规则集的大小和单个规则的复杂性方面进行扩展。我们的主要贡献如下。首先,我们为确定性有限自动机(DFA)提出了一种高效且低复杂度的压缩方案,导致一组代表性数据集的压缩系数超过98%。第二,我们确定并解决有问题的子模式,例如通配符的有界和无界重复以及大型字符集和反向引用。具体来说,我们提出了一种新颖的自动机,该自动机具有有限的内存占用空间和有限的内存带宽要求,从而克服了基于纯DFA解决方案的局限性。第三,我们描述了在针对FPGA实现时如何利用我们提出的技术。最终,我们为正则表达式评估构建和分析各种自动机的工具已作为开源发布,并已为世界各地的许多研究人员所积极使用。

著录项

  • 作者

    Becchi, Michela.;

  • 作者单位

    Washington University in St. Louis.;

  • 授予单位 Washington University in St. Louis.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 173 p.
  • 总页数 173
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号