首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Efficient Vertical Mining of High Average-Utility Itemsets Based on Novel Upper-Bounds
【24h】

Efficient Vertical Mining of High Average-Utility Itemsets Based on Novel Upper-Bounds

机译:基于新型上限的高平均效项目集的垂直挖掘

获取原文
获取原文并翻译 | 示例

摘要

Mining High Average-Utility Itemsets (HAUIs) in a quantitative database is an extension of the traditional problem of frequent itemset mining, having several practical applications. Discovering HAUIs is more challenging than mining frequent itemsets using the traditional support model since the average-utilities of itemsets do not satisfy the downward-closure property. To design algorithms for mining HAUIs that reduce the search space of itemsets, prior studies have proposed various upper-bounds on the average-utilities of itemsets. However, these algorithms can generate a huge amount of unpromising HAUI candidates, which result in high memory consumption and long runtimes. To address this problem, this paper proposes four tight average-utility upper-bounds, based on a vertical database representation, and three efficient pruning strategies. Furthermore, a novel generic framework for comparing average-utility upper-bounds is presented. Based on these theoretical results, an efficient algorithm named dHAUIM is introduced for mining the complete set of HAUIs. dHAUIM represents the search space and quickly compute upper-bounds using a novel IDUL structure. Extensive experiments show that dHAUIM outperforms four state-of-the-art algorithms for mining HAUIs in terms of runtime on both real-life and synthetic databases. Moreover, results show that the proposed pruning strategies dramatically reduce the number of candidate HAUIs.
机译:在定量数据库中挖掘高平均效用项集(HAUI)是对频繁项集挖掘的传统问题的扩展,具有许多实际应用。与使用传统的支持模型挖掘频繁的项目集相比,发现HAUI更具挑战性,因为项目集的平均效用不满足向下封闭的要求。为了设计用于挖掘可减少项目集搜索空间的HAUI的算法,先前的研究提出了项目集平均效用的各种上限。但是,这些算法会生成大量没有希望的HAUI候选对象,从而导致高内存消耗和长时间运行。为了解决这个问题,本文基于垂直数据库表示和四个有效的修剪策略,提出了四个严格的平均效用上限。此外,提出了一种用于比较平均效用上限的新颖通用框架。基于这些理论结果,引入了一种名为dHAUIM的高效算法来挖掘整套HAUI。 dHAUIM代表搜索空间,并使用新颖的IDUL结构快速计算上限。大量实验表明,在真实数据库和综合数据库的运行时方面,dHAUIM均优于四种用于挖掘HAUI的最新算法。此外,结果表明,提出的修剪策略可大大减少候选HAUI的数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号