An efficient algorithm to mine high average-utility itemsets

Jerry Chun-Wei Lin; Ting Li; Philippe Fournier-Viger; Tzung-Pei Hong; Justin Zhan; Miroslav Voznak

首页> 外文期刊>Advanced engineering informatics >An efficient algorithm to mine high average-utility itemsets

【24h】

An efficient algorithm to mine high average-utility itemsets

机译：挖掘高平均效用项集的有效算法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

With the ever increasing number of applications of data mining, high-utility itemset mining (HUIM) has become a critical issue in recent decades. In traditional HUIM, the utility of an itemset is defined as the sum of the utilities of its items, in transactions where it appears. An important problem with this definition is that it does not take itemset length into account. Because the utility of larger itemset is generally greater than the utility of smaller itemset, traditional HUIM algorithms tend to be biased toward finding a set of large itemsets. Thus, this definition is not a fair measurement of utility. To provide a better assessment of each itemset's utility, the task of high average-utility itemset mining (HAUIM) was proposed. It introduces the average utility measure, which considers both the length of itemsets and their utilities, and is thus more appropriate in real-world situations. Several algorithms have been designed for this task. They can be generally categorized as either level-wise or pattern-growth approaches. Both of them require, however, the amount of computation to find the actual high average-utility itemsets (HAUIs). In this paper, we present an efficient average-utility (AU)-list structure to discover the HAUIs more efficiently. A depth-first search algorithm named HAUI-Miner is proposed to explore the search space without candidate generation, and an efficient pruning strategy is developed to reduce the search space and speed up the mining process. Extensive experiments are conducted to compare the performance of HAUI-Miner with the state-of-the-art HAUIM algorithms in terms of runtime, number of determining nodes, memory usage and scalability.

机译：随着数据挖掘应用程序的不断增加，近几十年来，高功能项集挖掘（HUIM）已成为一个关键问题。在传统的HUIM中，项目集的效用定义为项目出现时在交易中其项目的效用之和。此定义的一个重要问题是它没有考虑项目集长度。由于较大项目集的效用通常大于较小项目集的效用，因此传统的HUIM算法倾向于偏向于查找一组大型项目集。因此，该定义不是效用的合理衡量。为了更好地评估每个项集的效用，提出了高平均效用项集挖掘（HAUIM）的任务。它介绍了平均效用度量，它同时考虑了项目集的长度及其效用，因此更适合实际情况。为此任务设计了几种算法。通常可以将它们分类为逐层方法或模式增长方法。但是，它们两者都需要大量的计算才能找到实际的高平均实用项目集（HAUI）。在本文中，我们提出了一种有效的平均效用（AU）列表结构，以更有效地发现HAUI。提出了一种深度优先的搜索算法HAUI-Miner，以探索没有候选者生成的搜索空间，并提出了一种有效的修剪策略来减少搜索空间并加快挖掘过程。在运行时间，确定节点数，内存使用率和可伸缩性方面，进行了广泛的实验以将HAUI-Miner与最新的HAUIM算法的性能进行比较。

著录项

来源
《Advanced engineering informatics》 |2016年第2期|233-243|共11页
作者
Jerry Chun-Wei Lin; Ting Li; Philippe Fournier-Viger; Tzung-Pei Hong; Justin Zhan; Miroslav Voznak;
展开▼
作者单位

School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, China;

School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, China;

School of Natural Sciences and Humanities, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, China;

Department of Computer Science and Engineering, National University of Kaohsiung, Kaohsiung, Taiwan,Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan;

Department of Computer Science, University of Nevada, Las Vegas, USA;

Department of Telecommunications, VSB-Technical University of Ostrava, Czech Republic;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
High average-utility itemsets; List structure; Data mining; HAUIM;

机译：高平均实用性项目集;清单结构;数据挖掘;豪姆;

相似文献

外文文献
中文文献
专利

1. Efficient algorithm for mining high average-utility itemsets in incremental transaction databases [J] . Kim Donggyu, Yun Unil Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2017,第1期

机译：挖掘增量交易数据库中的高平均实用程序项集的高效算法
2. A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure [J] . Sethi Krishan Kumar, Ramesh Dharavath Journal of supercomputing . 2020,第12期

机译：一种快速的高平均实用程序项目集，具有高效更严格的上限和新颖的列表结构
3. Efficient Vertical Mining of High Average-Utility Itemsets Based on Novel Upper-Bounds [J] . Tin Truong, Hai Duong, Bac Le, IEEE Transactions on Knowledge and Data Engineering . 2019,第2期

机译：基于新型上限的高平均效项目集的垂直挖掘
4. An Efficient Algorithm to Mine High Average-Utility Sequential Patterns [C] . Tiantian Xu International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery . 2020

机译：高效算法的高平均实用程序顺序模式
5. New algorithms for frequent sequential pattern and itemset data mining in certain and uncertain databases. [D] . Peterson, Erich Allen. 2012

机译：在某些不确定数据库中频繁进行顺序模式和项集数据挖掘的新算法。
6. Efficiently Hiding Sensitive Itemsets with Transaction Deletion Based on Genetic Algorithms [O] . Chun-Wei Lin, Binbin Zhang, Kuo-Tung Yang, -1

机译：基于遗传算法的交易隐藏有效隐藏敏感项集
7. Maintenance of Discovered High Average-Utility Itemsets in Dynamic Databases [O] . Binbin Zhang, Jerry Lin, Yinan Shao, 2018

机译：在动态数据库中维护发现的高平均实用程序项集

An efficient algorithm to mine high average-utility itemsets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅