...
首页> 外文期刊>Procedia Computer Science >Approximation to Expected Support of Frequent Itemsets in Mining Probabilistic Sets of Uncertain Data
【24h】

Approximation to Expected Support of Frequent Itemsets in Mining Probabilistic Sets of Uncertain Data

机译:挖掘不确定数据概率集中的频繁项集的预期支持的近似值

获取原文
           

摘要

Knowledge discovery and data mining generally discovers implicit, previously unknown, and useful knowledge from data. As one of the popular knowledge discovery and data mining tasks, frequent itemset mining, in particular, discovers knowledge in the form of sets of frequently co-occurring items, events, or objects. On the one hand, in many real-life applications, users mine frequent patterns from traditional databases of precise data, in which users know certainly the presence of items in transactions. On the other hand, in many other real-life applications, users mine frequent itemsets from probabilistic sets of uncertain data, in which users are uncertain about the likelihood of the presence of items in transactions. Each item in these probabilistic sets of uncertain data is often associated with an existential probability expressing the likelihood of its presence in that transaction. To mine frequent itemsets from these probabilistic datasets, many existing algorithms capture lots of information to compute expected support. To reduce the amount of space required, algorithms capture some but not all information in computing or approximating expected support. The tradeoff is that the upper bounds to expected support may not be tight. In this paper, we examine several upper bounds and recommend to the user which ones consume less space while providing good approximation to expected support of frequent itemsets in mining probabilistic sets of uncertain data.
机译:知识发现和数据挖掘通常会从数据中发现隐式,先前未知和有用的知识。作为流行的知识发现和数据挖掘任务之一,频繁项集挖掘尤其以频繁出现的项目,事件或对象的集合的形式发现知识。一方面,在许多实际应用中,用户从传统的精确数据数据库中挖掘出频繁的模式,在这些数据库中,用户当然知道交易中项目的存在。另一方面,在许多其他实际应用中,用户从概率不确定的数据集中挖掘频繁的项目集,其中用户不确定交易中项目存在的可能性。这些概率不确定性数据集中的每个项目通常与一个存在概率相关联,该概率表示该事务中该项目存在的可能性。为了从这些概率数据集中挖掘频繁项集,许多现有算法捕获了大量信息以计算预期支持。为了减少所需的空间量,算法会在计算或近似预期支持时捕获部分而非全部信息。折衷方案是预期支撑的上限可能并不紧。在本文中,我们研究了几个上限,并向用户推荐了哪些上限占用更少的空间,同时为挖掘不确定性数据的概率集中的频繁项集提供了良好的预期支持。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号