Token-based dictionary pattern matching for text analytics

机译：用于文本分析的基于令牌的字典模式匹配

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

When performing queries for text analytics on unstructured text data, a large amount of the processing time is spent on regular expressions and dictionary matching. In this paper we present a compilable architecture for token-bound pattern matching with support for token pattern sequence detection. The architecture presented is capable of detecting several hundreds of dictionaries, each containing thousands of elements at high throughput. A programmable state machine is used as pattern detection engine to achieve deterministic performance while maintaining low storage requirements. For the detection of token sequences, a dedicated circuitry is compiled based on a non-deterministic automaton. A cascaded result lookup ensures efficient storage while allowing multi-token elements to be detected and multiple dictionary hits to be reported. We implemented on an Altera Stratix IV GX530, and were able to process up to 16 documents in parallel at a peak throughput rate of 9.7 Gb/s.

机译：当对非结构化文本数据执行文本分析查询时，正则表达式和字典匹配花费了大量处理时间。在本文中，我们提出了一种用于令牌绑定模式匹配的可编译体系结构，并支持令牌模式序列检测。提出的体系结构能够检测数百个字典，每个字典都以高吞吐量包含数千个元素。可编程状态机用作模式检测引擎，以实现确定性性能，同时保持较低的存储要求。为了检测令牌序列，基于非确定性自动机来编译专用电路。级联结果查找可确保有效存储，同时允许检测多令牌元素并报告多个字典命中。我们在Altera Stratix IV GX530上实施，并且能够以9.7 Gb / s的峰值吞吐率并行处理多达16个文档。

著录项

来源
《International Conference on Field Programmable Logic and Applications》|2013年|1-6|共6页
会议地点
作者
Polig Raphael; Atasu Kubilay; Hagleitner Christoph;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Dictionary Matching with a Bounded Gap in Pattern or in Text [J] . Hon Wing-Kai, Lam Tak-Wah, Shah Rahul, Algorithmica . 2018,第2期

机译：词典在图案或文本中与有界空白匹配
2. Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line [J] . Solon P. Pissis, Ahmad Retha LIPIcs : Leibniz International Proceedings in Informatics . 2018,第1期

机译：弹性简并文本中的字典匹配及其在在线搜索VCF文件中的应用
3. Finding URLs in images by text extraction in DCT domain, recognition and matching in dictionary [J] . Antoaneta Popova, Johan Garcia, Nikolay Neshov, International journal of reasoning-based intelligent systems . 2015,第1a2期

机译：通过DCT域中的文本提取，字典中的识别和匹配在图像中查找URL
4. Pattern Matching in Link Streams: A Token-Based Approach [C] . Clement Bertrand, Hanna Klaudel, Matthieu Latapy, International conference on application and theory of petri nets and concurrency . 2018

机译：链接流中的模式匹配：一种基于令牌的方法
5. Using Syntactic Patterns to Enhance Text Analytics. [D] . Meyer, Bradley B. 2017

机译：使用句法模式来增强文本分析。
6. The Application of a Pattern Matching Algorithm to Searching Medical Record Text [O] . Peter Nicholas Yianilos, Robert A. Harbort Jr., Samuel R. Buss, 1978

机译：模式匹配算法在病案文本搜索中的应用
7. TOKEN-BASED DICTIONARY PATTERN MATCHING FOR TEXT ANALYTICS [O] . Raphael Polig, Kubilay Atasu, Christoph Hagleitner 2015

机译：用于文本分析的基于令牌的字典模式匹配

Token-based dictionary pattern matching for text analytics

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅