首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Mining High Utility Patterns in One Phase without Generating Candidates
【24h】

Mining High Utility Patterns in One Phase without Generating Candidates

机译:在不生成候选对象的情况下,在一个阶段中挖掘高实用性模式

获取原文
获取原文并翻译 | 示例

摘要

Utility mining is a new development of data mining technology. Among utility mining problems, utility mining with the itemset share framework is a hard one as no anti-monotonicity property holds with the interestingness measure. Prior works on this problem all employ a two-phase, candidate generation approach with one exception that is however inefficient and not scalable with large databases. The two-phase approach suffers from scalability issue due to the huge number of candidates. This paper proposes a novel algorithm that finds high utility patterns in a single phase without generating candidates. The novelties lie in a high utility pattern growth approach, a lookahead strategy, and a linear data structure. Concretely, our pattern growth approach is to search a reverse set enumeration tree and to prune search space by utility upper bounding. We also look ahead to identify high utility patterns without enumeration by a closure property and a singleton property. Our linear data structure enables us to compute a tight bound for powerful pruning and to directly identify high utility patterns in an efficient and scalable way, which targets the root cause with prior algorithms. Extensive experiments on sparse and dense, synthetic and real world data suggest that our algorithm is up to 1 to 3 orders of magnitude more efficient and is more scalable than the state-of-the-art algorithms.
机译:实用程序挖掘是数据挖掘技术的新发展。在实用程序挖掘问题中,具有项集共享框架的实用程序挖掘是一项艰巨的任务,因为没有反单调性与趣味性测度一致。关于此问题的现有技术都采用了两阶段的候选生成方法,但有一个例外,该方法效率低下并且无法在大型数据库中扩展。由于候选人数量众多,两阶段方法存在可伸缩性问题。本文提出了一种新颖的算法,该算法可在单相中查找高效模式,而无需生成候选函数。新奇之处在于高实用性模式增长方法,超前策略和线性数据结构。具体而言,我们的模式增长方法是搜索反向集枚举树,并通过效用上限限制修剪搜索空间。我们还期待在不通过闭包属性和单例属性进行枚举的情况下,确定高实用性模式。我们的线性数据结构使我们能够计算出紧密的界限,以进行强大的修剪,并以有效且可扩展的方式直接识别高实用性模式,这是针对现有算法的根本原因。在稀疏,密集,合成和真实世界的数据上进行的大量实验表明,与最新算法相比,我们的算法效率高1到3个数量级,并且可扩展性更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号