首页> 外文期刊>Advanced engineering informatics >An efficient algorithm to mine high average-utility itemsets
【24h】

An efficient algorithm to mine high average-utility itemsets

机译:挖掘高平均效用项集的有效算法

获取原文
获取原文并翻译 | 示例

摘要

With the ever increasing number of applications of data mining, high-utility itemset mining (HUIM) has become a critical issue in recent decades. In traditional HUIM, the utility of an itemset is defined as the sum of the utilities of its items, in transactions where it appears. An important problem with this definition is that it does not take itemset length into account. Because the utility of larger itemset is generally greater than the utility of smaller itemset, traditional HUIM algorithms tend to be biased toward finding a set of large itemsets. Thus, this definition is not a fair measurement of utility. To provide a better assessment of each itemset's utility, the task of high average-utility itemset mining (HAUIM) was proposed. It introduces the average utility measure, which considers both the length of itemsets and their utilities, and is thus more appropriate in real-world situations. Several algorithms have been designed for this task. They can be generally categorized as either level-wise or pattern-growth approaches. Both of them require, however, the amount of computation to find the actual high average-utility itemsets (HAUIs). In this paper, we present an efficient average-utility (AU)-list structure to discover the HAUIs more efficiently. A depth-first search algorithm named HAUI-Miner is proposed to explore the search space without candidate generation, and an efficient pruning strategy is developed to reduce the search space and speed up the mining process. Extensive experiments are conducted to compare the performance of HAUI-Miner with the state-of-the-art HAUIM algorithms in terms of runtime, number of determining nodes, memory usage and scalability.
机译:随着数据挖掘应用程序的不断增加,近几十年来,高功能项集挖掘(HUIM)已成为一个关键问题。在传统的HUIM中,项目集的效用定义为项目出现时在交易中其项目的效用之和。此定义的一个重要问题是它没有考虑项目集长度。由于较大项目集的效用通常大于较小项目集的效用,因此传统的HUIM算法倾向于偏向于查找一组大型项目集。因此,该定义不是效用的合理衡量。为了更好地评估每个项集的效用,提出了高平均效用项集挖掘(HAUIM)的任务。它介绍了平均效用度量,它同时考虑了项目集的长度及其效用,因此更适合实际情况。为此任务设计了几种算法。通常可以将它们分类为逐层方法或模式增长方法。但是,它们两者都需要大量的计算才能找到实际的高平均实用项目集(HAUI)。在本文中,我们提出了一种有效的平均效用(AU)列表结构,以更有效地发现HAUI。提出了一种深度优先的搜索算法HAUI-Miner,以探索没有候选者生成的搜索空间,并提出了一种有效的修剪策略来减少搜索空间并加快挖掘过程。在运行时间,确定节点数,内存使用率和可伸缩性方面,进行了广泛的实验以将HAUI-Miner与最新的HAUIM算法的性能进行比较。

著录项

  • 来源
    《Advanced engineering informatics》 |2016年第2期|233-243|共11页
  • 作者单位

    School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, China;

    School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, China;

    School of Natural Sciences and Humanities, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, China;

    Department of Computer Science and Engineering, National University of Kaohsiung, Kaohsiung, Taiwan,Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan;

    Department of Computer Science, University of Nevada, Las Vegas, USA;

    Department of Telecommunications, VSB-Technical University of Ostrava, Czech Republic;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    High average-utility itemsets; List structure; Data mining; HAUIM;

    机译:高平均实用性项目集;清单结构;数据挖掘;豪姆;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号