首页> 外文期刊>Big Data Research >From Homomorphisms to Embeddings: A Novel Approach for Mining Embedded Patterns from Large Tree Data
【24h】

From Homomorphisms to Embeddings: A Novel Approach for Mining Embedded Patterns from Large Tree Data

机译:从同性恋到嵌入式:从大树数据中挖掘嵌入式模式的新方法

获取原文
获取原文并翻译 | 示例
           

摘要

Many modern applications and systems represent and exchange data in tree-structured form and process and produce large tree datasets. Discovering informative patterns in large tree datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Embedded patterns allow for discovering useful relationships hidden deeply in the datasets which cannot be captured by induced patterns. Unfortunately, previous embedded tree pattern mining approaches cannot scale satisfactorily when the size of the dataset increases. As a consequence, they focus almost exclusively on mining patterns from a collection of small trees and they are incapable of mining patterns from large data trees. However, given the ubiquitous use of tree data, this pattern mining problem needs efficient solutions. In this paper, we address the problem of mining frequent unordered embedded treepatterns from large data trees. We propose a novel approach that exploits efficient homomorphic pattern matching algorithms to compute pattern support incrementally and avoids the costly enumeration of all pattern matchings required by previous approaches. To reduce space consumption, matching information of already computed patterns is materialized as bitmaps. We further optimize our basic support computation method by designing an algorithm which incrementally generates the bitmaps of the embeddings of a new candidate pattern without first explicitly computing the embeddings of this pattern. Our extensive experimental results on real and synthetic large-tree datasets show that our approach displays orders of magnitude performance improvements over a state-of-the-art tree mining algorithm and a recent graph mining algorithm.
机译:许多现代应用程序和系统以树结构形式和过程代表和交换数据并生成大树数据集。发现大树数据集中的信息模式是一个重要的研究区域,具有许多实际应用。沿着多年来,研究已经从采矿诱导的模式发展到采矿嵌入式图案。嵌入式模式允许发现在数据集中隐藏在无法被引起的模式捕获的有用关系。不幸的是,当数据集的大小增加时,以前的嵌入式树纹挖掘方法不能令人满意地缩放。因此,他们几乎专注于来自一系列小树的挖掘模式,它们无法从大型数据树中挖掘模式。但是,鉴于树数据的使用无处不在地,这种模式挖掘问题需要有效的解决方案。在本文中,我们解决了从大型数据树中常用的频繁无序嵌入式Teepatterns的问题。我们提出了一种新的方法,该方法利用高效的同性恋模式匹配算法来逐步计算模式支持,并避免先前方法所需的所有模式匹配的昂贵枚举。为了减少空间消耗,已经计算过的模式的匹配信息被整体化为位图。我们通过设计一种算法来进一步优化我们的基本支持计算方法,该算法逐步地生成新候选模式的嵌入物的位图而不首先明确计算该模式的嵌入物。我们对实际和综合大树数据集的广泛实验结果表明,我们的方法通过最先进的树挖掘算法和最近的图形挖掘算法显示幅度性能改进的秩序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号