首页> 外文会议>International Conference on Frontiers of Intelligent Computing : Theory and Applications >Performance of Multiple String Matching Algorithms in Text Mining
【24h】

Performance of Multiple String Matching Algorithms in Text Mining

机译:文本挖掘中多字符串匹配算法的性能

获取原文

摘要

Ever since the evolution of Internet Information retrieval is being made by surfers in large amount. The data gets increased everyday as the thirst of acquiring knowledge by the users gets increased day-by-day. The data which is raw needs to be processed for usage which increases the potential value in all major areas like Education, Business etc. Therefore Text Mining is an emerging area where unstructured information were made as relevant information. Text mining process can be divided into Information Extraction, Topic Tracking, Summarization, Categorization, Clustering, concept Linkage and Information visualization. Even though all other things can be applied to text only properly it is extracted from the web. Using Pattern matching or String matching algorithms to retrieve proper results from the Sea of information. In this paper we discuss the three types of algorithms Aho Corasick, Wu Manber and Commentz Walter. The performance of the algorithms are identified by implementing it in Python language. Finally the suitable algorithm for extracting information is found.
机译:自从互联网信息检索的演变以来是由冲浪者大量的。每天所述数据增加,因为用户的获取知识的渴望变得如此日趋增加。需要处理原始的数据,以便使用教育,业务等的所有主要领域的潜在价值。因此,文本挖掘是一个新兴区域,其中非结构化信息作为相关信息。文本挖掘过程可分为信息提取,主题跟踪,摘要,分类,群集,概念链接和信息可视化。即使所有其他东西都可以应用于文本,它是从Web中提取的。使用模式匹配或字符串匹配算法来检索信息海洋的正确结果。在本文中,我们讨论了这三种算法AHO Corasick,Wu Manber和Commentz Walter。通过以Python语言实现它来识别算法的性能。最后找到用于提取信息的合适算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号