Performance evaluation of top-k sequential mining methods on synthetic and real datasets

Asima Jamil; Abdus Salam; Farhat Amin

首页> 外文期刊>International Journal of Advanced Computer Research >Performance evaluation of top-k sequential mining methods on synthetic and real datasets

【24h】

Performance evaluation of top-k sequential mining methods on synthetic and real datasets

机译：综合和真实数据集的前k个顺序挖掘方法的性能评估

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Discovering sequential pattern from a large sequence database is an important problem in the field of sequential pattern mining, which is the well-known data mining technique. Several articles have surveyed the field of sequential pattern mining over the past few years. In those papers major focus was on improving the efficiency of algorithms by employing different techniques. However, the researchers paid less attention to consider the characteristics of the underlying data that the algorithm uses. It is very less investigated. The properties of data incredibly affect the execution of data mining algorithms. This study complemented the top-k sequential pattern mining field by providing further in depth analysis with respect to data properties and characteristics. The performance of top-k sequential pattern mining (TKS) with top-k closed sequential pattern mining (TSP), the state-of-the-art algorithm for top-k sequential pattern mining were evaluated both on synthetic and real databases. Experiments were carried out on real and synthetic datasets having varied characteristics. The impact of different parameters was investigated against the running time and memory usage analysis of each algorithm. Extensive experiments show that TKS and TSP have certain advantages and disadvantages of different types of data. Furthermore, due to the continuous addition of large amounts of data in the databases, the idea of sequential pattern mining (SPAM) is becoming popular. Various algorithms have been developed that are used for mining the sequential patterns in the data. These algorithms have proved to be more effective for smaller databases, but when the size of the database increased, their performance may decline. Hence these methods have to be amended in order to perform the mining processes in a more efficient way.

机译：从大型序列数据库中发现顺序模式是顺序模式挖掘领域的一个重要问题，而顺序模式挖掘是众所周知的数据挖掘技术。在过去的几年中，有几篇文章对顺序模式挖掘领域进行了调查。在那些论文中，主要重点是通过采用不同的技术来提高算法的效率。但是，研究人员很少关注该算法使用的基础数据的特性。很少进行调查。数据的属性难以置信地影响数据挖掘算法的执行。这项研究通过提供有关数据属性和特征的进一步深入分析，对top-k顺序模式挖掘领域进行了补充。在综合数据库和真实数据库上都评估了top-k顺序模式挖掘（TKS）与top-k封闭顺序模式挖掘（TSP）的性能，top-k顺序模式挖掘的最新算法。在具有不同特征的真实和合成数据集上进行了实验。针对每种算法的运行时间和内存使用情况分析，研究了不同参数的影响。大量的实验表明，TKS和TSP具有不同类型数据的某些优点和缺点。此外，由于数据库中不断添加大量数据，因此，顺序模式挖掘（SPAM）的想法正变得越来越流行。已经开发了用于挖掘数据中的顺序模式的各种算法。这些算法已被证明对较小的数据库更有效，但是当数据库大小增加时，它们的性能可能会下降。因此，必须对这些方法进行修改，以便以更有效的方式执行采矿过程。

著录项

来源
《International Journal of Advanced Computer Research》 |2017年第32期|共9页
作者
Asima Jamil; Abdus Salam; Farhat Amin;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. An efficient method for mining cross-timepoint gene regulation sequential patterns from time course gene expression datasets [J] . Chun-Pei Cheng, Yu-Cheng Liu, Yi-Lin Tsai, BMC Bioinformatics . 2013,第SUPPLEMENTa12期

机译：从时程基因表达数据集中挖掘跨时间点基因调控序列模式的有效方法
2. An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets [J] . Drechsler J., Reiter J.P. Computational statistics & data analysis . 2011,第12期

机译：对用于生成综合数据集的易于实施的非参数方法的经验评估
3. Evaluation of face datasets as tools for assessing the performance of face recognition methods [J] . Shamir L International Journal of Computer Vision . 2008,第3期

机译：评估人脸数据集作为评估人脸识别方法性能的工具
4. TKAR: Efficient Mining of Top-k Association Rules on Real-Life Datasets [C] . O. Gireesha, O. Obulesu International Conference on Frontiers of Intelligent Computing : Theory and Applications . 2017

机译：TKAR：高效挖掘现实生活数据集的Top-K关联规则
5. Low-storage sequential methods for data mining and the analysis of massive datasets. [D] . McDermott, James Patrick. 2003

机译：用于数据挖掘和海量数据集分析的低存储顺序方法。
6. An efficient method for mining cross-timepoint gene regulation sequential patterns from time course gene expression datasets [O] . Chun-Pei Cheng, Yu-Cheng Liu, Yi-Lin Tsai, 2013

机译：从时程基因表达数据集中挖掘跨时间点基因调控序列模式的有效方法
7. An Efficient Method for Mining Top-K Closed Sequential Patterns [O] . Thi-Thiet Pham, Tung Do, Anh Nguyen, 2020

机译：挖掘Top-K闭合序列模式的有效方法
8. Methodologie d'Evaluation des Performances des Systemes Repartis en Temps Reel (Methodology of Performance Evaluation of Real Time Distributed Systems) [R] . Dutheilletlamonthezie, C., Zenie, A. 1987

机译：方法论评估des performances des systemes Repartis en Temps Reel（实时分布式系统性能评估方法）

Performance evaluation of top-k sequential mining methods on synthetic and real datasets

摘要

著录项

相似文献

相关主题

期刊订阅