首页> 外文期刊>Knowledge-Based Systems >Efficient algorithms for mining high-utility itemsets in uncertain databases
【24h】

Efficient algorithms for mining high-utility itemsets in uncertain databases

机译:在不确定数据库中挖掘高效工具集的高效算法

获取原文
获取原文并翻译 | 示例

摘要

High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high utility itemsets (HUIs) assume that the information stored in databases is precise, i.e., that there is no uncertainty. But in many real-life applications, an item or itemset is not only present or absent in transactions but is also associated with an existence probability. This is especially the case for data collected experimentally or using noisy sensors. In the past, many algorithms were respectively proposed to effectively mine frequent itemsets in uncertain databases. But mining HUIs in an uncertain database has not yet been proposed, although uncertainty is commonly seen in real-world applications. In this paper, a novel framework, named potential high-utility itemset mining (PHUIM) in uncertain databases, is proposed to efficiently discover not only the itemsets with high utilities but also the itemsets with high existence probabilities in an uncertain database based on the tuple uncertainty model. The PHUI-UP algorithm (potential high-utility itemsets upper-bound-based mining algorithm) is first presented to mine potential high-utility itemsets (PHUIs) using a level-wise search. Since PHUI-UP adopts a generate-and test approach to mine PHUIs, it suffers from the problem of repeatedly scanning the database. To address this issue, a second algorithm named PHUI-List (potential high-utility itemsets PU-list-based mining algorithm) is also proposed. This latter directly mines PHUIs without generating candidates, thanks to a novel probability-utility-list (PU-list) structure, thus greatly improving the scalability of PHUI mining. Substantial experiments were conducted on both real-life and synthetic datasets to assess the performance of the two designed algorithms in terms of runtime, number of patterns, memory consumption, and scalability. (C) 2015 Elsevier B.V. All rights reserved.
机译:实用项集挖掘(HUIM)是一组有用的技术,可用于在事务数据库中发现模式,该模式同时考虑项的数量和利润。但是,大多数用于挖掘高实用性项目集(HUI)的算法都假定存储在数据库中的信息是精确的,即没有不确定性。但是在许多实际应用中,一个或多个项目集不仅在交易中存在或不存在,而且还与存在概率相关联。对于通过实验或使用噪声传感器收集的数据尤其如此。过去,分别提出了许多算法来有效地挖掘不确定数据库中的频繁项集。但是,尽管在实际应用中通常会发现不确定性,但尚未提出在不确定的数据库中挖掘HUI的建议。本文提出了一种新的框架,即不确定数据库中潜在的高实用性项目集挖掘(PHUIM),以基于元组有效地发现不确定性数据库中的高实用性项目集和存在概率较高的项目集。不确定性模型。首先提出了PHUI-UP算法(潜在的高实用性项目集基于上限的挖掘算法),以使用逐级搜索来挖掘潜在的高实用性项目集(PHUI)。由于PHUI-UP采用生成和测试方法来挖掘PHUI,因此遭受了重复扫描数据库的问题。为了解决这个问题,还提出了第二种算法,称为PHUI-List(基于PU列表的潜在高实用性项集挖掘算法)。由于新颖的概率-效用列表(PU-list)结构,后者可以直接挖掘PHUI,而无需生成候选对象,从而大大提高了PHUI挖掘的可伸缩性。在真实数据集和合成数据集上都进行了大量实验,以评估两种设计算法在运行时间,模式数量,内存消耗和可伸缩性方面的性能。 (C)2015 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号