Performance of Multiple String Matching Algorithms in Text Mining

机译：文本挖掘中多字符串匹配算法的性能

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Ever since the evolution of Internet Information retrieval is being made by surfers in large amount. The data gets increased everyday as the thirst of acquiring knowledge by the users gets increased day-by-day. The data which is raw needs to be processed for usage which increases the potential value in all major areas like Education, Business etc. Therefore Text Mining is an emerging area where unstructured information were made as relevant information. Text mining process can be divided into Information Extraction, Topic Tracking, Summarization, Categorization, Clustering, concept Linkage and Information visualization. Even though all other things can be applied to text only properly it is extracted from the web. Using Pattern matching or String matching algorithms to retrieve proper results from the Sea of information. In this paper we discuss the three types of algorithms Aho Corasick, Wu Manber and Commentz Walter. The performance of the algorithms are identified by implementing it in Python language. Finally the suitable algorithm for extracting information is found.

机译：自从互联网信息检索的演变以来是由冲浪者大量的。每天所述数据增加，因为用户的获取知识的渴望变得如此日趋增加。需要处理原始的数据，以便使用教育，业务等的所有主要领域的潜在价值。因此，文本挖掘是一个新兴区域，其中非结构化信息作为相关信息。文本挖掘过程可分为信息提取，主题跟踪，摘要，分类，群集，概念链接和信息可视化。即使所有其他东西都可以应用于文本，它是从Web中提取的。使用模式匹配或字符串匹配算法来检索信息海洋的正确结果。在本文中，我们讨论了这三种算法AHO Corasick，Wu Manber和Commentz Walter。通过以Python语言实现它来识别算法的性能。最后找到用于提取信息的合适算法。

著录项

来源
《International Conference on Frontiers of Intelligent Computing : Theory and Applications》|2017年|xxiv 685 pages :|共11页
会议地点
作者
Ananthi Sheshasaayee; G. Thailambal;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.4-532;
关键词
Text mining; Aho Corasick; Wu Manber; Commentz Walter; Boyer Moore; Python;

机译：文本挖掘;啊Corasick;吴曼伯;Commendz Walter;Boyer Moore;Python;
入库时间 2022-08-21 12:16:15

相似文献

外文文献
中文文献
专利

1. New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance [J] . Ho ThienLuan, Oh Seung-Rohk, Kim HyunJin Journal of supercomputing . 2018,第5期

机译：海明距离下定长近似字符串匹配和近似圆字符串匹配的新算法
2. Correction to: New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance [J] . Ho ThienLuan, Oh Seung-Rohk, Kim HyunJin Journal of supercomputing . 2018,第5期

机译：更正为：在汉明距离下用于定长近似字符串匹配和近似圆形字符串匹配的新算法
3. Efficient String Matching Algorithm for Searching Large DNA and Binary Texts [J] . Al-Ssulami Abdulrakeeb M., Mathkour Hassan, Arafah Mohammed Amer International journal on Semantic Web and information systems . 2017,第4期

机译：用于搜索大型DNA和二元文本的高效字符串匹配算法
4. Performance of Multiple String Matching Algorithms in Text Mining [C] . Ananthi Sheshasaayee, G. Thailambal International Conference on Frontiers of Intelligent Computing : Theory and Applications . 2017

机译：文本挖掘中多字符串匹配算法的性能
5. Multi-pattern string matching algorithms. [D] . Zha, Xinyan. 2010

机译：多模式字符串匹配算法。
6. DEVELOPMENT AND PERFORMANCE OF TEXT-MINING ALGORITHMS TO EXTRACT SOCIOECONOMIC STATUS FROM DE-IDENTIFIED ELECTRONIC HEALTH RECORDS [O] . Brittany M. Hollister, Nicole A. Restrepo, Eric Farber-Eger, -1

机译：从已识别的电子健康记录中提取社会经济状态的文本挖掘算法的开发和性能
7. Bits filter: a high-performance multiple string pattern matching algorithm for malware detection [O] . Lin Dan 2010

机译：位过滤器：用于恶意软件检测的高性能多字符串模式匹配算法
8. Performance of Single-Keyword and Multiple-Keyword Pattern Matching Algorithms [R] . Watson, B. W. 1994

机译：单关键字和多关键字模式匹配算法的性能

Performance of Multiple String Matching Algorithms in Text Mining

摘要

著录项

相似文献

相关主题

期刊订阅