首页> 外文会议>ACMKDD International Conference on Knowledge Discovery and Data Mining;KDD 2008 >Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree
【24h】

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree

机译:通过基于模型的搜索树直接挖掘歧视性和必要性频繁模式

获取原文

摘要

Frequent patterns provide solutions to datasets that do not have well-structured feature vectors. However, frequent pattern mining is non-trivial since the number of unique patterns is exponential but many are non-discriminative and correlated. Currently, frequent pattern mining is performed in two sequential steps: enumerating a set of frequent patterns, followed by feature selection. Although many methods have been proposed in the past few years on how to perform each separate step efficiently, there is still limited success in eventually finding highly compact and discriminative patterns. The culprit is due to the inherent nature of this widely adopted two-step approach. This paper discusses these problems and proposes a new and different method. It builds a decision tree that partitions the data onto different nodes. Then at each node, it directly discovers a discriminative pattern to further divide its examples into purer subsets. Since the number of examples towards leaf level is relatively small, the new approach is able to examine patterns with extremely low global support that could not be enumerated on the whole dataset by the two-step method. The discovered feature vectors are more accurate on some of the most difficult graph as well as frequent itemset problems than most recently proposed algorithms but the total size is typically 50% or more smaller. Importantly, the minimum support of some discriminative patterns can be extremely low (e.g. 0.03%). In order to enumerate these low support patterns, state-of-the-art frequent pattern algorithm either cannot finish due to huge memory consumption or have to enumerate 10~1 to 10~3 times more patterns before they can even be found. Software and datasets are available by contacting the author.
机译:频繁模式为没有结构良好的特征向量的数据集提供解决方案。然而,频繁的模式挖掘是不平凡的,因为唯一模式的数量是指数的,但是许多是非歧视性的且相关的。当前,频繁模式挖掘是通过两个连续步骤执行的:枚举一组频繁模式,然后进行特征选择。尽管在过去的几年中已经提出了许多方法来有效地执行每个单独的步骤,但是在最终找到高度紧凑和具有区别性的模式方面仍然取得了有限的成功。罪魁祸首是由于这种广泛采用的两步法的固有性质。本文讨论了这些问题,并提出了一种新的和不同的方法。它构建了一个决策树,将数据划分到不同的节点上。然后,在每个节点上,它直接发现判别模式,以将其示例进一步划分为更纯的子集。由于面向叶级别的示例数量相对较少,因此新方法能够检查具有极低全局支持的模式,而两步方法无法在整个数据集上枚举这些模式。与最近提出的算法相比,发现的特征向量在某些最困难的图以及频繁出现的项目集问题上更准确,但总大小通常小50%或更多。重要的是,某些判别模式的最低支持可能会非常低(例如0.03%)。为了枚举这些低支持模式,最新的频繁模式算法要么由于巨大的内存消耗而无法完成,要么必须枚举10〜1至10〜3倍的模式才能找到它们。通过联系作者可以获得软件和数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号