...
【24h】

An efficient structure for fast mining high utility itemsets

机译:快速采矿高效项目集的高效结构

获取原文
获取原文并翻译 | 示例
           

摘要

High utility itemset mining has emerged to be an important research issue in data mining since it has a wide range of real life applications. Although a number of algorithms have been proposed in recent years, the mining efficiency is still a big challenge since these algorithms suffer from either the problem of low efficiency of calculating candidates' utilities or the problem of generating huge number of candidates. In this paper, we propose a novel data structure named PUN-list (PU-tree-Node list), which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, named MIP (Mining high utility Itemset using PUN-Lists), for efficiently mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, named PUN-list, which avoids costly and repeated utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be efficiently constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search space, named set-enumeration tree, without generating numerous candidates. Extensive experiments on various synthetic and real datasets show that MIP is very efficient since it is much faster than HUI-Miner, d2HUP, and UP-Growth + , especially on dense datasets.
机译:高实用程序项目集挖掘已成为数据挖掘的重要研究问题,因为它具有广泛的现实生活应用。尽管近年来已经提出了许多算法,但采矿效率仍然是一个很大的挑战,因为这些算法遭受了计算候选人的实用程序的低效率的问题或产生了大量候选人的问题。在本文中,我们提出了一个名为Pun-List(PU-Tree节点列表)的新型数据结构,该数据结构维护有关项目集和实用程序的实用程序信息,用于促进挖掘高实用程序集合的处理。基于PUN-LINK,我们呈现了一种名为MIP的方法(使用双关语上挖掘高实用程序项集),以有效地挖掘高实用程序项集。用三种技术实现MIP的效率。首先,项目集由一个名为pun-list的高度浓缩数据结构表示,避免了昂贵和重复的实用程序计算。其次,可以通过扫描项目集的双关个列表和长itement集的双关语列表来有效地计算项目集的实用程序,可以通过短项集的双关语列表有效地构建。第三,通过使用PUM-LIST中的实用程序上限作为修剪策略,MIP直接从名为SET-枚举树的搜索空间发现高实用程序项集,而不会生成众多候选者。关于各种合成和实际数据集的广泛实验表明,MIP非常有效,因为它比Hui-Miner,D2HUP和Up-Grown +更快,尤其是在密集的数据集上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号