...
【24h】

Weighted frequent itemset mining over uncertain databases

机译:不确定数据库上的加权频繁项集挖掘

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Frequent itemset mining (FIM) is a fundamental research topic, which consists of discovering useful and meaningful relationships between items in transaction databases. However, FIM suffers from two important limitations. First, it assumes that all items have the same importance. Second, it ignores the fact that data collected in a real-life environment is often inaccurate, imprecise, or incomplete. To address these issues and mine more useful and meaningful knowledge, the problems of weighted and uncertain itemset mining have been respectively proposed, where a user may respectively assign weights to items to specify their relative importance, and specify existential probabilities to represent uncertainty in transactions. However, no work has addressed both of these issues at the same time. In this paper, we address this important research problem by designing a new type of patterns named high expected weighted itemset (HEWI) and the HEWI-Uapriori algorithm to efficiently discover HEWIs. The HEWI-Uapriori finds HEWIs using an Apriori-like two-phase approach. The algorithm introduces a property named high upper-bound expected weighted downward closure (HUBEWDC) to early prune the search space and unpromising itemsets. Substantial experiments on real-life and synthetic datasets are conducted to evaluate the performance of the proposed algorithm in terms of runtime, memory consumption, and number of patterns found. Results show that the proposed algorithm has excellent performance and scalability compared with traditional methods for weighted-itemset mining and uncertain itemset mining.
机译:频繁项集挖掘(FIM)是一项基础研究主题,包括发现交易数据库中项之间的有用和有意义的关系。但是,FIM受到两个重要限制。首先,假设所有项目都具有相同的重要性。其次,它忽略了现实环境中收集的数据通常不准确,不精确或不完整的事实。为了解决这些问题并挖掘更多有用和有意义的知识,分别提出了加权和不确定项目集挖掘的问题,用户可以分别为项目分配权重以指定其相对重要性,并指定存在概率来表示交易中的不确定性。但是,没有一项工作可以同时解决这两个问题。在本文中,我们通过设计一种称为高期望加权项目集(HEWI)的新型模式和HEWI-Uapriori算法来有效发现HEWI,从而解决了这一重要的研究问题。 HEWI-Uapriori使用类似Apriori的两阶段方法找到HEWI。该算法引入了一个名为高上限预期加权向下闭合(HUBEWDC)的属性,以早日修剪搜索空间和没有希望的项目集。进行了现实生活和综合数据集的大量实验,以在运行时间,内存消耗和找到的模式数量方面评估所提出算法的性能。结果表明,与传统的加权项集挖掘和不确定项集挖掘方法相比,该算法具有优良的性能和可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号