首页> 外文会议>IEEE International Congress on Big Data >FS3: A sampling based method for top-k frequent subgraph mining
【24h】

FS3: A sampling based method for top-k frequent subgraph mining

机译:FS 3 :一种基于采样的top-k频繁子图挖掘方法

获取原文

摘要

Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS, which is a sampling based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on database of large graphs show that FS is efficient, and it obtains subgraphs that are the most frequent amongst the subgraphs of a given size.
机译:带有标签的子图的挖掘在数据挖掘中是一项流行的研究任务,因为它在许多不同的科学领域中都有潜在的应用。用于该任务的所有现有方法显式或隐式地解决了子图同构任务,该子图同构任务在计算上很昂贵,因此,当输入数据库中的图很大时,它们会遇到缺乏可伸缩性的问题。在这项工作中,我们提出了FS,这是一种基于采样的方法。它从概率意义上挖掘一小部分子图集合。 FS在固定大小的子图的空间上执行马尔可夫链蒙特卡洛(MCMC)采样,以便更频繁地采样潜在的频繁子图。此外,FS还配备了创新的队列管理器。它将整个挖掘过程中的采样子图存储在有限队列中,以使队列中前k个位置包含最频繁的子图。我们在大型图数据库上的实验表明,FS是有效的,并且它获得了在给定大小的子图中最频繁出现的子图。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号