Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree

机译：通过基于模型的搜索树直接挖掘歧视性和必要性频繁模式

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Frequent patterns provide solutions to datasets that do not have well-structured feature vectors. However, frequent pattern mining is non-trivial since the number of unique patterns is exponential but many are non-discriminative and correlated. Currently, frequent pattern mining is performed in two sequential steps: enumerating a set of frequent patterns, followed by feature selection. Although many methods have been proposed in the past few years on how to perform each separate step efficiently, there is still limited success in eventually finding highly compact and discriminative patterns. The culprit is due to the inherent nature of this widely adopted two-step approach. This paper discusses these problems and proposes a new and different method. It builds a decision tree that partitions the data onto different nodes. Then at each node, it directly discovers a discriminative pattern to further divide its examples into purer subsets. Since the number of examples towards leaf level is relatively small, the new approach is able to examine patterns with extremely low global support that could not be enumerated on the whole dataset by the two-step method. The discovered feature vectors are more accurate on some of the most difficult graph as well as frequent itemset problems than most recently proposed algorithms but the total size is typically 50% or more smaller. Importantly, the minimum support of some discriminative patterns can be extremely low (e.g. 0.03%). In order to enumerate these low support patterns, state-of-the-art frequent pattern algorithm either cannot finish due to huge memory consumption or have to enumerate 10~1 to 10~3 times more patterns before they can even be found. Software and datasets are available by contacting the author.

机译：频繁模式为没有结构良好的特征向量的数据集提供解决方案。然而，频繁的模式挖掘是不平凡的，因为唯一模式的数量是指数的，但是许多是非歧视性的且相关的。当前，频繁模式挖掘是通过两个连续步骤执行的：枚举一组频繁模式，然后进行特征选择。尽管在过去的几年中已经提出了许多方法来有效地执行每个单独的步骤，但是在最终找到高度紧凑和具有区别性的模式方面仍然取得了有限的成功。罪魁祸首是由于这种广泛采用的两步法的固有性质。本文讨论了这些问题，并提出了一种新的和不同的方法。它构建了一个决策树，将数据划分到不同的节点上。然后，在每个节点上，它直接发现判别模式，以将其示例进一步划分为更纯的子集。由于面向叶级别的示例数量相对较少，因此新方法能够检查具有极低全局支持的模式，而两步方法无法在整个数据集上枚举这些模式。与最近提出的算法相比，发现的特征向量在某些最困难的图以及频繁出现的项目集问题上更准确，但总大小通常小50％或更多。重要的是，某些判别模式的最低支持可能会非常低（例如0.03％）。为了枚举这些低支持模式，最新的频繁模式算法要么由于巨大的内存消耗而无法完成，要么必须枚举10〜1至10〜3倍的模式才能找到它们。通过联系作者可以获得软件和数据集。

著录项

来源
《ACMKDD International Conference on Knowledge Discovery and Data Mining;KDD 2008》|2008年|212-220|共9页
会议地点
作者
Wei Fan; Kun Zhang; Hong Cheng; Jing Gao; Xifeng Yan; Jiawei Han; Philip S. Yu; Olivier Verscheure;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息与知识传播;
关键词
algorithms;

机译：算法;

相似文献

外文文献
中文文献
专利

1. Mining Maximal Frequent Patterns in a Unidirectional FP-tree [J] . SONG Jing-jing, LIU Rui-xin, WANG Yan, Journal of Dong Hua University . 2006,第6期

机译：在单向FP树中挖掘最大频繁模式
2. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach [J] . Jiawei Han, Jian Pei, Yiwen Yin, Data mining and knowledge discovery . 2004,第1期

机译：没有候选生成的频繁模式：频繁模式树方法
3. An Improvised Frequent Pattern Tree Based Association Rule Mining Technique with Mining Frequent Item Sets Algorithm and a Modified Header Table [J] . Vandit Agarwal, Mandhani Kushal, Preetham Kumar International Journal of Data Mining & Knowledge Management Process . 2015,第2期

机译：一种改进的基于频繁模式树的关联规则挖掘技术，具有挖掘频繁项集算法和改进的表头表的能力
4. Direct mining of discriminative and essential frequent patterns via model-based search tree [C] . Wei Fan, Kun Zhang, Hong Cheng, ACM SIGKDD international conference on Knowledge discovery and data mining . 2008

机译：通过基于模型的搜索树直接挖掘区分性和必要的频繁模式
5. Frequent Itemset Hiding Algorithm Using Frequent Pattern Tree Approach. [D] . Alnatsheh, Rami. 2012

机译：使用频繁模式树方法的频繁项集隐藏算法。
6. Frequent and Discriminative Subnetwork Mining for Mild Cognitive Impairment Classification [O] . Fei Fei, Biao Jie, Daoqiang Zhang -1

机译：频繁和区分性子网挖掘用于轻度认知障碍分类
7. Direct mining of discriminative and essential frequent patterns via model-based search tree [O] . Wei Fan, Kun Zhang, Hong Cheng, 2008

机译：通过基于模型的搜索树直接挖掘判别性和基本频繁模式

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree

摘要

著录项

相似文献

相关主题

期刊订阅