...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Fast and memory efficient mining of frequent closed itemsets
【24h】

Fast and memory efficient mining of frequent closed itemsets

机译:快速且记忆有效地挖掘频繁关闭的项目集

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a new scalable algorithm for discovering closed frequent itemsets, a lossless and condensed representation of all the frequent itemsets that can be mined from a transactional database. Our algorithm exploits a divide-and-conquer approach and a bitwise vertical representation of the database and adopts a particular visit and partitioning strategy of the search space based on an original theoretical framework, which formalizes the problem of closed itemsets mining in detail. The algorithm adopts several optimizations aimed to save both space and time in computing itemset closures and their supports. In particular, since one of the main problems in this type of algorithms is the multiple generation of the same closed itemset, we propose a new effective and memory-efficient pruning technique, which, unlike other previous proposals, does not require the whole set of closed patterns mined so far to be kept in the main memory. This technique also permits each visited partition of the search space to be mined independently in any order and, thus, also in parallel. The tests conducted on many publicly available data sets show that our algorithm is scalable and outperforms other state-of-the-art algorithms like CLOSET+ and FP-CLOSE, in some cases by more than one order of magnitude. More importantly, the performance improvements become more and more significant as the support threshold is decreased.
机译:本文提出了一种新的可伸缩算法,用于发现封闭的频繁项目集,这是可以从事务数据库中挖掘的所有频繁项目集的无损压缩表示。我们的算法利用了分治法和数据库的按位垂直表示,并在原始理论框架的基础上采用了特定的搜索空间访问和分区策略,从而使封闭项集挖掘的问题正式化。该算法采用了几种优化方法,旨在节省计算项集闭包及其支持时的空间和时间。特别是,由于这类算法的主要问题之一是同一封闭项目集的多次生成,因此我们提出了一种新的有效且内存高效的修剪技术,与其他先前的建议不同,该技术不需要整个到目前为止,已关闭的封闭模式已被保留在主内存中。此技术还允许以任何顺序独立地(因此也可以并行)挖掘搜索空间的每个已访问分区。对许多公开可用数据集进行的测试表明,我们的算法具有可扩展性,并且在某些情况下优于其他最新算法(例如CLOSET +和FP-CLOSE)。更重要的是,随着支持阈值的降低,性能改进变得越来越重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号