...
首页> 外文期刊>Data & Knowledge Engineering >Discovering closed and maximal embedded patterns from large tree data
【24h】

Discovering closed and maximal embedded patterns from large tree data

机译:从大树数据中发现已关闭和最大嵌入式模式

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Many current applications and systems produce large tree datasets and export, exchange, and represent data in tree-structured form. Extracting informative patterns from large data trees is an important research direction with multiple applications in practice. Pattern mining research initially focused on mining induced patterns and gradually evolved into mining embedded patterns. A well-known problem of frequent pattern mining is the huge number of patterns it produces. This affects not only the efficiency but also the effectiveness of mining. A typical solution to this problem is to summarize frequent patterns through closed and maximal patterns. No previous work addresses the problem of mining closed and/or maximal embedded tree patterns, not even in the framework of mining multiple small trees.We address the problem of summarizing embedded tree patterns extracted from large data trees, by defining and mining closed and maximal embedded unordered tree patterns. We design an embedded frequent pattern mining algorithm extended with a local closedness checking technique. This algorithm is called closedEmbTM-eager as it eagerly eliminates non closed patterns. To mitigate the generation of intermediate patterns, we devise pattern search space pruning rules to proactively detect and prune branches in the pattern search space which do not correspond to closed patterns. The pruning rules are accommodated into the extended embedded pattern miner to produce a new algorithm, called closedEmbTM-prune, for mining all the closed and maximal embedded frequent patterns. Our extensive experiments on synthetic and real large-tree datasets demonstrate that, on dense datasets, closedEmbTM-prune not only generates a complete closed and maximal pattern set which is substantially smaller than that generated by the embedded pattern miner, but also runs much faster with negligible overhead on pattern pruning.
机译:许多当前的应用程序和系统生成大树数据集和导出,交换,并表示树木结构形式的数据。从大型数据树中提取信息模式是具有多种应用中的重要研究方向。模式采矿研究最初专注于采矿诱导的图案,并逐渐发展成采矿嵌入式图案。频繁模式挖掘的众所周知的问题是它产生的巨大模式。这不仅影响了效率,也影响了采矿的有效性。对此问题的典型解决方案是通过闭合和最大模式来总结频繁模式。之前没有以前的工作解决了挖掘和/或最大嵌入树模式的问题,即使在挖掘多个小树的框架中,也不会解决从大数据树中提取的嵌入树模式的问题,通过定义和开采闭合和最大化嵌入无序的树模式。我们设计了一种嵌入式频繁模式挖掘算法,延长了局部闭合检查技术。此算法称为CloseDemBtm-eAliger,因为它急切地消除了非闭合模式。为了减轻中间模式的产生,我们设计了模式搜索空间修剪规则,以主动地检测和修剪图案搜索空间中的分支,这些搜索空间不对应于闭合模式。修剪规则被容纳到扩展的嵌入式模式矿器中,以产生一种名为ClifteMBTM-PRUNE的新算法,用于挖掘所有闭合和最大嵌入的频繁模式。我们对综合和真正的大树数据集的广泛实验表明,在密集的数据集上,ControlEmBTM-Preune不仅产生了一个完整的封闭和最大模式集,它的基本上比嵌入式图案矿器产生的更小,而且还可以更快地运行模式修剪上可以忽略不计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号