首页> 外文期刊>International Journal of Computer Science & Information Technology (IJCSIT) >CLOHUI : An Efficient Algorithm for Mining Closed High Utility Itemsets from Transaction Databases
【24h】

CLOHUI : An Efficient Algorithm for Mining Closed High Utility Itemsets from Transaction Databases

机译:CLOHUI:从事务数据库中挖掘封闭的高实用项集的高效算法

获取原文
           

摘要

High-utility itemset mining (HUIM) is an important research topic in data mining field and extensivealgorithms have been proposed. However, existing methods for HUIM present too many high-utilityitemsets (HUIs), which reduces not only efficiency but also effectiveness of mining since users have to siftthrough a large number of HUIs to find useful ones. Recently a new representation, closed+ high-utilityitemset (CHUI), has been proposed. With this concept, the number of HUIs is reduced massively. Existingmethods adopt two phases to discover CHUIs from a transaction database. In phase I, an itemset is firstchecked whether it is closed. If the itemset is closed, an overestimation technique is adopted to set an upperbound of the utility of this itemset in the database. The itemsets whose overestimated utilities are no lessthan a given threshold are selected as candidate CHUIs. In phase II, the candidate CHUIs generated fromphase 1 are verified through computing their utilities in the database. However, there are two problems inthese methods. 1) The number of candidate CHUIs is usually very huge and extensive memory is required.2) The method computing closed itemsets is time consuming. Thus in this paper we propose an efficientalgorithm CloHUI for mining CHUIs from a transaction database. CloHUI does not generate anycandidate CHUIs during the mining process, and verifies closed itemsets from a tree structure. We proposea strategy to make the verifying process faster. Extensive experiments have been performed on sparse anddense datasets to compare CloHUI with the state-of-the-art algorithm CHUD, the experiment results showthat for dense datasets our proposed algorithm CloHUI significantly outperforms CHUD: it is more thanan order of magnitude faster, and consumes less memory.
机译:高效项集挖掘(HUIM)是数据挖掘领域的重要研究课题,并提出了广泛的算法。但是,现有的HUIM方法存在太多的高实用性项集(HUI),这不仅降低了效率,而且降低了挖掘的效率,因为用户必须筛选大量的HUI才能找到有用的项。最近,已经提出了一种新的表示形式,即封闭+高实用性项目集(CHUI)。有了这个概念,HUI的数量就大大减少了。现有方法采用两个阶段来从事务数据库中发现CHUI。在阶段I中,首先检查项目集是否已关闭。如果项目集已关闭,则采用高估技术来在数据库中设置该项目集的实用程序的上限。高估效用不小于给定阈值的项目集被选为候选CHUI。在阶段II中,通过计算数据库中的实用程序来验证从阶段1生成的候选CHUI。但是,这些方法存在两个问题。 1)候选CHUI的数量通常非常庞大,并且需要大量内存。2)计算封闭项集的方法很耗时。因此,在本文中,我们提出了一种从交易数据库中挖掘CHUI的高效算法CloHUI。 CloHUI在挖掘过程中不会生成任何候选CHUI,而是从树结构中验证已关闭的项目集。我们提出了一种使验证过程更快的策略。在稀疏和密集数据集上进行了广泛的实验,将CloHUI与最新算法CHUD进行了比较,实验结果表明,对于密集数据集,我们提出的算法CloHUI明显优于CHUD:它快了一个数量级以上,而且消耗更少的内存。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号