首页> 外文会议>International conference on web-age information management >Efficient Mining of Uncertain Data for High-Utility Itemsets
【24h】

Efficient Mining of Uncertain Data for High-Utility Itemsets

机译:高效挖掘高可用性项目集的不确定数据

获取原文

摘要

High-utility itemset mining (HUIM) is emerging as an important research topic in data mining. Most algorithms for HUIM can only handle precise data, however, uncertainty that are embedded in big data which collected from experimental measurements or noisy sensors in real-life applications. In this paper, an efficient algorithm, namely Mining Uncertain data for High-Utility Itemsets (MUHUI), is proposed to efficiently discover potential high-utility itemsets (PHUIs) from uncertain data. Based on the probability-utility-list (PU-list) structure, the MUHUI algorithm directly mine PHUIs without candidate generation and can reduce the construction of PU-lists for numerous unpromising itemsets by using several efficient pruning strategies, thus greatly improving the mining performance. Extensive experiments both on real-life and synthetic datasets proved that the proposed algorithm significantly outperforms the state-of-the-art PHUI-List algorithm in terms of efficiency and scalability, especially, the MUHUI algorithm scales well on large-scale uncertain datasets for mining PHUIs.
机译:高实用项集挖掘(HUIM)成为数据挖掘中的重要研究主题。大多数针对HUIM的算法只能处理精确的数据,但是,不确定性是嵌入在大数据中的,大数据是从实际应用中的实验测量或噪声传感器中收集的。本文提出了一种有效的算法,即为高可用性项目集挖掘不确定数据(MUHUI),以从不确定数据中有效地发现潜在的高可用性项目集(PHUI)。 MUHUI算法基于概率-效用列表(PU-list)结构,直接挖掘PHUI,而无需生成候选对象,并且可以通过使用几种有效的修剪策略来减少针对大量无用项目集的PU列表的构建,从而大大提高了挖掘性能。在现实和合成数据集上进行的大量实验证明,该算法在效率和可扩展性方面明显优于最新的PHUI-List算法,尤其是MUHUI算法可在大型不确定数据集上很好地扩展挖掘PHUI。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号