...
首页> 外文期刊>Machine Learning >Probabilistic and exact frequent subtree mining in graphs beyond forests
【24h】

Probabilistic and exact frequent subtree mining in graphs beyond forests

机译:在森林之外的图表中的概率和精确频繁的子树挖掘

获取原文
获取原文并翻译 | 示例
           

摘要

Motivated by the impressive predictive power of simple patterns, we consider the problem of mining frequent subtrees in arbitrary graphs. Although the restriction of the pattern language to trees does not resolve the computational complexity of frequent subgraph mining, in a recent work we have shown that it gives rise to an algorithm generating probabilistic frequent subtrees, a random subset of all frequent subtrees, from arbitrary graphs with polynomial delay. It is based on replacing each transaction graph in the input database with a forest formed by a random subset of its spanning trees. This simple technique turned out to be quite powerful on molecule classification tasks. It has, however, the drawback that the number of sampled spanning trees must be bounded by a polynomial of the size of the transaction graphs, resulting in less impressive recall even for slightly more complex structures beyond molecular graphs. To overcome this limitation, in this work we propose an algorithm mining probabilistic frequent subtrees also with polynomial delay, but by replacing each graph with a forest formed by an exponentially large implicit subset of its spanning trees. We demonstrate the superiority of our algorithm over the simple one on threshold graphs used e.g. in spectral clustering. In addition, providing sufficient conditions for the completeness and efficiency of our algorithm, we obtain a positive complexity result on exact frequent subtree mining for a novel, practically and theoretically relevant graph class that is orthogonal to all graph classes defined by some constant bound on monotone graph properties.
机译:通过简单模式的令人印象深刻的预测力,我们认为在任意图中挖出频繁的子树的问题。虽然图案语言对树木的限制并不能解决频繁的子图挖掘的计算复杂性,但在最近的工作中,我们已经表明它产生了一种从任意图形产生概率频繁子树的算法,从任意图中占据所有频繁子树的随机子集具有多项式延迟。它基于将输入数据库中的每个交易图替换为具有由其生成树的随机子集形成的森林。这种简单的技术在分子分类任务上是非常强大的。然而,它具有所采样的跨越树的数量必须由交易图的大小的多项式限制的缺点,从而令人印象深刻的召回,即使对于超出分子图的稍微复杂的结构也甚至令人印象深刻。为了克服这一限制,在这项工作中,我们提出了一种算法挖掘概率常见的子树,也具有多项式延迟,而是通过用跨越树的指数大隐式子集形成的森林来替换每个图。我们通过例如在使用的阈值图上展示了我们算法的优越性。在光谱聚类中。此外,为我们的算法的完整性和效率提供足够的条件,我们获得了正常的复杂性导致精确的频繁的子树挖掘,几乎和理论上和理论上相关的图形类,其与单调上的某些常量绑定的所有图形类正交图形属性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号