...
首页> 外文期刊>Knowledge-Based Systems >An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies
【24h】

An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies

机译:使用新颖的阈值提高和修剪策略来挖掘前k个高实用项集的有效算法

获取原文
获取原文并翻译 | 示例
           

摘要

Top-k high utility itemset mining is the process of discovering the k itemsets having the highest utilities in a transactional database. In recent years, several algorithms have been proposed for this task. However, it remains very expensive both in terms of runtime and memory consumption. The reason is that current algorithms often generate a huge amount of candidate itemsets and are unable to prune the search space effectively. In this paper, we address this issue by proposing a novel algorithm named kHMC to discover the top-k high utility itemsets more efficiently. Unlike several algorithms for top-k high utility itemset mining, kHMC discovers high utility itemsets using a single phase. Furthermore, it employs three strategies named RIU, CUD, and COV to raise its internal minimum utility threshold effectively, and thus reduce the search space. The COV strategy introduces a novel concept of coverage. The concept of coverage can be employed to prune the search space in high utility itemset mining, or to raise the threshold in top-k high utility itemset mining, as proposed in this paper. Furthermore, kHMC relies on a novel co-occurrence pruning technique named EUCPT to avoid performing costly join operations for calculating the utilities of itemsets. Moreover, a novel pruning strategy named TEP is proposed for reducing the search space. To evaluate the performance of the proposed algorithm, extensive experiments have been conducted on six datasets having various characteristics. Results show that the proposed algorithm outperforms the state-of-the-art TKO and REPT algorithms for top-k high utility itemset mining both in terms of memory consumption and runtime. (C) 2016 Elsevier B.V. All rights reserved.
机译:前k个高实用性项目集挖掘是在事务性数据库中发现具有最高实用性的k个项目集的过程。近年来,已针对此任务提出了几种算法。但是,就运行时间和内存消耗而言,它仍然非常昂贵。原因是当前的算法通常会生成大量的候选项目集,并且无法有效地修剪搜索空间。在本文中,我们通过提出一种名为kHMC的新颖算法来更有效地发现前k个高实用项集,从而解决了这一问题。与用于top-k高效项目集挖掘的几种算法不同,kHMC使用单个阶段发现高效项目集。此外,它采用三种名为RIU,CUD和COV的策略来有效地提高其内部最小效用阈值,从而减少搜索空间。 COV策略引入了一种新颖的覆盖概念。如本文所提出的,覆盖率概念可用于修剪高效项集挖掘中的搜索空间,或提高前k位高效项集挖掘中的阈值。此外,kHMC依靠一种名为EUCPT的新颖的共现修剪技术来避免执行用于计算项目集效用的昂贵连接操作。此外,提出了一种新颖的修剪策略,称为TEP,以减少搜索空间。为了评估所提出算法的性能,对具有各种特征的六个数据集进行了广泛的实验。结果表明,该算法在内存消耗和运行时间方面均优于最新的TKO和REPT算法,可用于前k个高效项目集挖掘。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号