【24h】

Summarizing itemset patterns

机译:汇总项目集模式

获取原文

摘要

Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequent-pattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generated by the mining process.In this paper, we examine how to summarize a collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily. The K representatives should not only cover most of the frequent patterns but also approximate their supports. A generative model is built to extract and profile these representatives, under which the supports of the patterns can be easily recovered without consulting the original dataset. Based on the restoration error, we propose a quality measure function to determine the optimal value of parameter K. Polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement.Empirical studies indicate that we can obtain compact summarization in real datasets.
机译:频繁模式挖掘已在可伸缩方法上进行了广泛研究,可扩展方法用于挖掘各种模式,包括项目集,序列和图形。但是,由于挖掘过程中生成的大量模式,频繁模式挖掘的瓶颈不是效率,而是可解释性。本文研究了如何仅使用K个代表来总结项目集模式的集合,用户可以轻松处理的少量模式。 K代表不仅应涵盖大多数常见模式,而且应大致支持他们的支持。建立了一个生成模型来提取和分析这些代表,在此模式下,无需参考原始数据集就可以轻松地恢复模式的支持。基于恢复误差,我们提出了一种质量度量函数来确定参数K的最佳值。多项式时间算法与几种优化启发式算法一起被开发出来以提高效率。经验研究表明,我们可以在真实数据集中获得紧凑的总结。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号