【24h】

Extraction of Frequent Tree Patterns without Subtrees Maintenance

机译:无需子树维护的频繁树模式提取

获取原文

摘要

The inherent flexibility in both structure and semantics let tree capture most kinds of data, model a wide variety of data sources, and produce an enormous number of information. The ability to extract valuable knowledge from them becomes increasingly important and desirable, however, existing tree mining algorithms suffer from several serious pitfalls in finding frequent patterns from massive tree datasets, because most of them have used a priori property for candidate generation and frequency counting. Some of the major problems are due to (1) modeling data as hierarchical tree structure, (2) computationally high cost of the candidate maintenance, (3) repetitious input dataset scans, and (4) the high memory dependency. Therefore, a more efficient and practical approach for tree data is required. In this paper, we systematically develop the pattern growth method instead of the a priori method, for mining maximal frequent tree patterns which are special frequent patterns of a set of trees. The proposed method not only gets rid of the process for infrequent subtrees pruning, but also totally eliminates the problem of generating candidate subtrees. Hence, it significantly improves the whole mining process.
机译:结构和语义上固有的灵活性使树可以捕获大多数数据,为各种数据源建模,并产生大量信息。从它们中提取有价值的知识的能力变得越来越重要和令人期望,但是,现有的树挖掘算法在从大量树数据集中查找频繁模式时会遇到一些严重的陷阱,因为它们中的大多数已将先验属性用于候选者生成和频率计数。一些主要问题归因于(1)将数据建模为分层树结构,(2)候选维护的计算成本高,(3)重复的输入数据集扫描和(4)高度的内存依赖性。因此,需要用于树数据的更有效和实用的方法。在本文中,我们系统地开发了模式增长方法而不是先验方法,以挖掘最大频繁树模式,该模式是一组树的特殊频繁模式。所提出的方法不仅摆脱了子树不频繁修剪的过程,而且完全消除了生成候选子树的问题。因此,它极大地改善了整个采矿过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号