...
首页> 外文期刊>International Journal of Advanced Computer Research >Performance evaluation of top-k sequential mining methods on synthetic and real datasets
【24h】

Performance evaluation of top-k sequential mining methods on synthetic and real datasets

机译:综合和真实数据集的前k个顺序挖掘方法的性能评估

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Discovering sequential pattern from a large sequence database is an important problem in the field of sequential pattern mining, which is the well-known data mining technique. Several articles have surveyed the field of sequential pattern mining over the past few years. In those papers major focus was on improving the efficiency of algorithms by employing different techniques. However, the researchers paid less attention to consider the characteristics of the underlying data that the algorithm uses. It is very less investigated. The properties of data incredibly affect the execution of data mining algorithms. This study complemented the top-k sequential pattern mining field by providing further in depth analysis with respect to data properties and characteristics. The performance of top-k sequential pattern mining (TKS) with top-k closed sequential pattern mining (TSP), the state-of-the-art algorithm for top-k sequential pattern mining were evaluated both on synthetic and real databases. Experiments were carried out on real and synthetic datasets having varied characteristics. The impact of different parameters was investigated against the running time and memory usage analysis of each algorithm. Extensive experiments show that TKS and TSP have certain advantages and disadvantages of different types of data. Furthermore, due to the continuous addition of large amounts of data in the databases, the idea of sequential pattern mining (SPAM) is becoming popular. Various algorithms have been developed that are used for mining the sequential patterns in the data. These algorithms have proved to be more effective for smaller databases, but when the size of the database increased, their performance may decline. Hence these methods have to be amended in order to perform the mining processes in a more efficient way.
机译:从大型序列数据库中发现顺序模式是顺序模式挖掘领域的一个重要问题,而顺序模式挖掘是众所周知的数据挖掘技术。在过去的几年中,有几篇文章对顺序模式挖掘领域进行了调查。在那些论文中,主要重点是通过采用不同的技术来提高算法的效率。但是,研究人员很少关注该算法使用的基础数据的特性。很少进行调查。数据的属性难以置信地影响数据挖掘算法的执行。这项研究通过提供有关数据属性和特征的进一步深入分析,对top-k顺序模式挖掘领域进行了补充。在综合数据库和真实数据库上都评估了top-k顺序模式挖掘(TKS)与top-k封闭顺序模式挖掘(TSP)的性能,top-k顺序模式挖掘的最新算法。在具有不同特征的真实和合成数据集上进行了实验。针对每种算法的运行时间和内存使用情况分析,研究了不同参数的影响。大量的实验表明,TKS和TSP具有不同类型数据的某些优点和缺点。此外,由于数据库中不断添加大量数据,因此,顺序模式挖掘(SPAM)的想法正变得越来越流行。已经开发了用于挖掘数据中的顺序模式的各种算法。这些算法已被证明对较小的数据库更有效,但是当数据库大小增加时,它们的性能可能会下降。因此,必须对这些方法进行修改,以便以更有效的方式执行采矿过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号