首页> 外文会议>Pacific Rim international conference on artificial intelligence >Efficient Probabilistic Frequent Itemset Mining in Big Sparse Uncertain Data
【24h】

Efficient Probabilistic Frequent Itemset Mining in Big Sparse Uncertain Data

机译:大稀疏不确定数据中的概率概率频繁项集挖掘

获取原文

摘要

Probabilistic frequent itemset (PFI) mining in uncertain data has been drawing increasing attention from data mining communities recently. However, data generated in network environments, such as machine logs and retail transactions, tends to be big, sparse and uncertain due to the influence of random factors including unavoidable network latency, unfaithful collection and unreliable transmission, etc. Therefore, most available PFI mining algorithms are not adequately effective on dealing with uncertain data which is greatly big and extremely sparse. To address this issue, we propose a novel tree structure, ApproxFP-Tree and a parallelized ApproxFP algorithm based on the MapReduce platform aiming to mine all PFIs in big, sparse and uncertain data efficiently. Experimental results on real-world and synthetic databases are illustrated and analyzed to show that our approach is significantly efficient than the state-of-the-art algorithms.
机译:最近,不确定数据中的概率频繁项集(PFI)挖掘已引起数据挖掘社区越来越多的关注。但是,由于随机因素(包括不可避免的网络延迟,不忠实的收集和不可靠的传输等)的影响,在网络环境中生成的数据(例如机器日志和零售交易)往往会很大,稀疏和不确定。因此,大多数可用的PFI挖掘这些算法在处理不确定性数据时效率不高,不确定性数据非常大且极为稀疏。为了解决这个问题,我们提出了一种新颖的树结构,ApproxFP-Tree和基于MapReduce平台的并行化ApproxFP算法,旨在有效地挖掘大数据,稀疏数据和不确定数据中的所有PFI。举例说明并分析了在现实世界和合成数据库上的实验结果,表明我们的方法比最新的算法有效得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号