首页> 外文期刊>Engineering Applications of Artificial Intelligence >Efficient mining of high utility itemsets with multiple minimum utility thresholds
【24h】

Efficient mining of high utility itemsets with multiple minimum utility thresholds

机译:具有多个最小实用阈值的高效实用项集的高效挖掘

获取原文
获取原文并翻译 | 示例

摘要

Mining high utility itemsets is considered to be one of the important and challenging problems in the data mining literature. The problem offers greater flexibility to a decision maker in using item utilities such as profits and margins to mine interesting and actionable patterns from databases. Most of the current works in the literature, however, apply a single minimum utility threshold value and fail to consider disparities in item characteristics. This paper proposes an efficient method (MHUI) to mine high utility itemsets with multiple minimum utility threshold values. The presented method generates high utility itemsets in a single phase without an expensive intermediate candidate generation process. It introduces the concept of suffix minimum utility and presents generalized pruning strategies for efficiently mining high utility itemsets. The performance of the algorithm is evaluated against the state-of-the-art methods (HUI-MMU-TE and HIMU-EUCP) on eight benchmark datasets. The experimental results show that the proposed method delivers two to three orders of magnitude execution time improvement over the HUI-MMU-TE method. In addition, MHUI delivers one to two orders of magnitude execution time improvement over the HIMU-EUCP method, especially on moderately long and dense benchmark datasets. The memory requirements of the proposed algorithm was also found to be significantly lower.
机译:挖掘高实用性项目集被认为是数据挖掘文献中重要且具有挑战性的问题之一。该问题为决策者提供了更大的灵活性,使他们可以使用诸如利润和利润之类的项目实用程序从数据库中挖掘出有趣且可行的模式。但是,文献中的大多数当前工作都采用了单个最小效用阈值,并且没有考虑项目特征上的差异。本文提出了一种有效的方法(MHUI),用于挖掘具有多个最小效用阈值的高效项目集。所提出的方法在单个阶段中生成高实用项集,而无需昂贵的中间候选生成过程。它介绍了后缀最小效用的概念,并提出了用于有效挖掘高效能项目集的通用修剪策略。针对八个基准数据集,根据最新方法(HUI-MMU-TE和HIMU-EUCP)评估了算法的性能。实验结果表明,与HUI-MMU-TE方法相比,该方法的执行时间缩短了2-3个数量级。此外,与HIMU-EUCP方法相比,MHUI的执行时间缩短了1-2个数量级,尤其是在中等长度和密集基准数据集上。还发现,所提出算法的存储器需求明显更低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号