...
首页> 外文期刊>PeerJ Computer Science >TKFIM: Top-K frequent itemset mining technique based on equivalence classes
【24h】

TKFIM: Top-K frequent itemset mining technique based on equivalence classes

机译:TKFIM:基于等同类的Top-K频繁项目集挖掘技术

获取原文

摘要

Frequently used items mining is a significant subject of data mining studies. In the last ten years, due to innovative development, the quantity of data has grown exponentially. For frequent Itemset (FIs) mining applications, it imposes new challenges. Misconceived information may be found in recent algorithms, including both threshold and size based algorithms. Threshold value plays a central role in generating frequent itemsets from the given dataset. Selecting a support threshold value is very complicated for those unaware of the dataset’s characteristics. The performance of algorithms for finding FIs without the support threshold is, however, deficient due to heavy computation. Therefore, we have proposed a method to discover FIs without the support threshold, called Top-k frequent itemsets mining (TKFIM). It uses class equivalence and set-theory concepts for mining FIs. The proposed procedure does not miss any FIs; thus, accurate frequent patterns are mined. Furthermore, the results are compared with state-of-the-art techniques such as Top-k miner and Build Once and Mine Once (BOMO). It is found that the proposed TKFIM has outperformed the results of these approaches in terms of execution and performance, achieving 92.70, 35.87, 28.53, and 81.27 percent gain on Top-k miner using Chess, Mushroom, and Connect and T1014D100K datasets, respectively. Similarly, it has achieved a performance gain of 97.14, 100, 78.10, 99.70 percent on BOMO using Chess, Mushroom, Connect, and T1014D100K datasets, respectively. Therefore, it is argued that the proposed procedure may be adopted on a large dataset for better performance.
机译:经常使用的物品挖掘是数据挖掘研究的重要主题。在过去的十年中,由于创新发展,数据数量已指数增长。对于频繁的项目集(FIS)挖掘应用程序,它会冒充新挑战。最近的算法可以在近期算法中找到误判信息,包括基于阈值和大小的算法。阈值在从给定数据集中生成频繁的项目集中播放核心作用。对于那些未知的数据集的特征,选择支持阈值非常复杂。然而,在没有支持阈值的情况下寻找FIS的算法的性能是由于繁重的计算而缺乏。因此,我们已经提出了一种在没有支持阈值的情况下发现FIS的方法,称为Top-K频繁项目集挖掘(TKFIM)。它使用类等价和集合理论概念来挖掘FIS。拟议的程序不会错过任何FIS;因此,采用精确的频繁模式。此外,将结果与最先进的技术进行比较,例如Top-K矿工,并建立一次和挤在一起(Bomo)。有人发现,拟议的TKFIM在执行和绩效方面表现出这些方法的结果,分别使用国际象索,蘑菇和连接和T1014D100K数据集实现92.70,35.87,28.53和81.27%的增益。同样,它已经使用国际象索,蘑菇,连接和T1014D100K数据集实现了97.14,100,78.10,99.70%的性能增益。因此,认为可以在大型数据集上采用所提出的程序以获得更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号