首页> 外文期刊>International Journal of Performability Engineering >A Distributed Frequent Itemset Mining Algorithm for Uncertain Data
【24h】

A Distributed Frequent Itemset Mining Algorithm for Uncertain Data

机译:不确定数据的分布式频繁项目集挖掘算法

获取原文
获取原文并翻译 | 示例
       

摘要

With the rapidly expansion of big data in all domains, it has become a major research topic to improve the performance of mining frequent patterns in massive uncertain datasets in recent years. Most conventional frequent pattern mining approaches take expect, probability, or weight as one single factor of item support, and algorithms that consider both probability and weight are unable to balance execution efficiency under the circumstances of big data. Therefore, we propose a distributed frequent itemset mining algorithm for uncertain data: Dfimud. Firstly, Dfimud calculates the maximum probability weight value of 1-items and prunes the items whose value is less than the given threshold. Secondly, to reduce the times of scanning the datasets, a distributed Dfimud-tree structure inspired by FP-Tree is designed to mine frequent patterns. Finally, experiments on publicly available UCI datasets demonstrate that Dfimud achieves more optimal results than other related approaches across various metrics. In addition, the empirical study also shows that Dfimud has good scalability.
机译:随着所有领域的大数据迅速扩展,它已成为提高近年来大规模不确定数据集在大规模不确定数据集中采矿频繁模式的性能的主要研究课题。大多数常规频繁的挖掘方法采用预期,概率或重量作为项目支持的单一因素,以及考虑概率和重量的算法在大数据情况下无法平衡执行效率。因此,我们提出了一种用于不确定数据的分布式频繁项目集挖掘算法:Dfimud。首先,Dfimud计算1项的最大概率权重值,并将其值略小于给定阈值的项目。其次,为了减少扫描数据集的时间,由FP-Tree启发的分布式DFimud树结构被设计为频繁的模式。最后,公开的UCI数据集上的实验表明,DFimud比各种度量的其他相关方法实现了更优化的结果。此外,实证研究还表明,Dfimud具有良好的可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号