首页> 外文期刊>Knowledge and Information Systems >A new concise representation of frequent itemsets using generators and a positive border
【24h】

A new concise representation of frequent itemsets using generators and a positive border

机译:使用生成器和正边界以新的方式简洁表示频繁项集

获取原文
获取原文并翻译 | 示例
           

摘要

A complete set of frequent itemsets can get undesirably large due to redundancy when the minimum support threshold is low or when the database is dense. Several concise representations have been previously proposed to eliminate the redundancy. Generator based representations rely on a negative border to make the representation lossless. However, the number of itemsets on a negative border sometimes even exceeds the total number of frequent itemsets. In this paper, we propose to use a positive border together with frequent generators to form a lossless representation. A positive border is usually orders of magnitude smaller than its corresponding negative border. A set of frequent generators plus its positive border is always no larger than the corresponding complete set of frequent itemsets, thus it is a true concise representation. The generalized form of this representation is also proposed. We develop an efficient algorithm, called GrGrowth, to mine generators and positive borders as well as their generalizations. The GrGrowth algorithm uses the depth-first-search strategy to explore the search space, which is much more efficient than the breadth-first-search strategy adopted by most of the existing generator mining algorithms. Our experiment results show that the GrGrowth algorithm is significantly faster than level-wise algorithms for mining generator based representations, and is comparable to the state-of-the-art algorithms for mining frequent closed itemsets.
机译:当最小支持阈值较低或数据库密集时,由于冗余而导致的一组完整的频繁项集可能会变得异常大。先前已经提出了几种简明的表示以消除冗余。基于生成器的表示形式依靠负边界使表示形式无损。但是,负边界上的项目集的数量有时甚至超过频繁项目集的总数。在本文中,我们建议使用正边界与频繁生成器一起形成无损表示。正边界通常比其相应的负边界小几个数量级。一组频繁生成器及其正边界始终不大于相应完整的频繁项集的集合,因此它是真正简洁的表示。还提出了该表示的一般形式。我们开发了一种称为GrGrowth的高效算法,用于挖掘生成器和正边界以及它们的概括。 GrGrowth算法使用深度优先搜索策略来探索搜索空间,这比大多数现有的生成器挖掘算法所采用的宽度优先搜索策略要高效得多。我们的实验结果表明,GrGrowth算法比基于层次生成器的基于生成器的表示的挖掘算法要快得多,并且可以与最新的频繁闭项集的挖掘算法相提并论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号