...
【24h】

Efficient high utility itemset mining using buffered utility-lists

机译:高效使用缓冲实用程序列表挖掘挖掘

获取原文
获取原文并翻译 | 示例
           

摘要

Discovering high utility itemsets in transaction databases is a key task for studying the behavior of customers. It consists of finding groups of items bought together that yield a high profit. Several algorithms have been proposed to mine high utility itemsets using various approaches and more or less complex data structures. Among existing algorithms, one-phase algorithms employing the utility-list structure have shown to be the most efficient. In recent years, the simplicity of the utility-list structure has led to the development of numerous utility-list based algorithms for various tasks related to utility mining. However, a major limitation of utility-list based algorithms is that creating and maintaining utility-lists are time consuming and can consume a huge amount of memory. The reasons are that numerous utility lists are built and that the utility-list intersection/join operation to construct a utility-list is costly. This paper addresses this issue by proposing an improved utility-list structure called utility-list buffer to reduce the memory consumption and speed up the join operation. This structure is integrated into a novel algorithm named ULB-Miner (Utility-List Buffer for high utility itemset Miner), which introduces several new ideas to more efficiently discover high utility itemsets. ULB-Miner uses the designed utility-list buffer structure to efficiently store and retrieve utility-lists, and reuse memory during the mining process. Moreover, the paper also introduces a linear time method for constructing utility-list segments in a utility-list buffer. An extensive experimental study on various datasets shows that the proposed algorithm relying on the novel utility-list buffer structure is highly efficient in terms of both execution time and memory consumption. The ULB-Miner algorithm is up to 10 times faster than the FHM and HUI-Miner algorithms and consumes up to 6 times less memory. Moreover, it performs well on both dense and sparse datasets.
机译:在事务数据库中发现高实用程序项集是研究客户行为的关键任务。它包括找到一组购买的物品,产生高利润。已经提出了几种算法用于使用各种方法和更多或多或少复杂的数据结构来挖掘高实用程序项集。在现有算法中,采用实用程序列表结构的单相算法已显示为最有效。近年来,公用事业列表结构的简单性导致了与与公用事业挖掘相关的各种任务的基于实用列出的算法的开发。但是,基于实用程序列出的算法的主要限制是创建和维护实用程序列表是耗时的,并且可以消耗大量的内存。构建了许多实用程序列表的原因,utility-list / conject构造实用程序列表的原因是昂贵的。本文通过提出称为实用程序列表缓冲区的改进实用程序列表结构来解决此问题,以减少内存消耗并加快加速Join操作。该结构集成到一个名为ULB-Miner(实用程序列表缓冲区的新颖算法)集成到名为ULB-MINER(用于高实用程序项集挖掘机)的算法,这引入了几种新思路,以便更有效地发现高实用程序项集。 ULB-Miner使用所设计的实用程序列表缓冲区结构来有效地存储和检索实用程序列表,并在挖掘过程中重用内存。此外,本文还介绍了用于在实用程序列表缓冲区中构建实用程序列表段的线性时间方法。对各种数据集的广泛实验研究表明,在执行时间和存储器消耗方面,依赖于新型实用程序列出缓冲区结构的所提出的算法。 ULB-Miner算法比FHM和Hui-Miner算法快10倍,并消耗多达6倍的内存。此外,它在密集和稀疏的数据集中表现良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号