首页> 外文期刊>Expert systems with applications >Incremental frequent itemsets mining based on frequent pattern tree and multi-scale
【24h】

Incremental frequent itemsets mining based on frequent pattern tree and multi-scale

机译:基于频繁模式树和多尺度的增量频繁项目集挖掘

获取原文
获取原文并翻译 | 示例
           

摘要

Multi-scale can reveal the structure and hierarchical characteristics of the data objects to reflect their essence from different perspectives and levels. An incremental frequent itemsets mining algorithm based on frequent pattern tree is proposed by incorporating multi-scale theory(simplified to FP-tree and Multi-Scale based Incremental Mining, FPMSIM). FPMSIM uses the classic FP-Growth to construct a pattern tree and generate frequent itemsets for more fine-grained dataset which is called benchmark scale dataset. The newly added dataset is also independently mined as a benchmark scale dataset. The ultimate frequent itemsets for the target scales are derived by means of the scale-up process. In which, some unknown itemsets counts need to be estimated by comparing the similarity among benchmark scale datasets. In this way, severe dataset rescanning and tree structure adjustment overhead are avoided during the maintenance process. The experimental results show that although the support estimation error will lead to incomplete frequent itemsets mining, it can be offset by the performance gains in the mining efficiency and I/O cost, especially in the field of big data.
机译:多尺度可以揭示数据对象的结构和分层特征,以反映其来自不同视角和水平的本质。通过结合多尺度理论,提出了一种基于频繁模式树的挖掘算法的增量频繁项目集算法(简化到FP树和基于多尺度的增量挖掘,FPMSIM)。 FPMSIM使用经典的FP-增长来构建模式树,并为更细粒度的数据集生成频繁的项目集,该数据集被称为基准标记数据集。新添加的数据集也独立地挖掘为基准缩放数据集。目标尺度的最终频繁项目集通过扩展过程导出。其中,需要通过比较基准规模数据集之间的相似性来估计一些未知的项目集。以这种方式,在维护过程中避免了严格的数据集重新扫描和树结构调整开销。实验结果表明,尽管支撑估计误差将导致不完全频繁的项目集挖掘,但它可以通过采矿效率的性能增益和I / O成本抵消,特别是在大数据领域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号