FS3: A sampling based method for top-k frequent subgraph mining

机译：FS 3 ：一种基于采样的top-k频繁子图挖掘方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS, which is a sampling based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on database of large graphs show that FS is efficient, and it obtains subgraphs that are the most frequent amongst the subgraphs of a given size.

机译：带有标签的子图的挖掘在数据挖掘中是一项流行的研究任务，因为它在许多不同的科学领域中都有潜在的应用。用于该任务的所有现有方法显式或隐式地解决了子图同构任务，该子图同构任务在计算上很昂贵，因此，当输入数据库中的图很大时，它们会遇到缺乏可伸缩性的问题。在这项工作中，我们提出了FS，这是一种基于采样的方法。它从概率意义上挖掘一小部分子图集合。 FS在固定大小的子图的空间上执行马尔可夫链蒙特卡洛（MCMC）采样，以便更频繁地采样潜在的频繁子图。此外，FS还配备了创新的队列管理器。它将整个挖掘过程中的采样子图存储在有限队列中，以使队列中前k个位置包含最频繁的子图。我们在大型图数据库上的实验表明，FS是有效的，并且它获得了在给定大小的子图中最频繁出现的子图。

著录项

来源
《IEEE International Congress on Big Data》|2014年|72-79|共8页
会议地点
作者
Saha Tapan K.; Al Hasan Mohammad;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Markov processes; Monte Carlo methods; data mining; graph theory; queueing theory; sampling methods; FS3; MCMC; Markov chain Monte Carlo sampling; data mining; finite queue; fixed-size subgraphs; innovative queue manager; labeled subgraph mining; probabilistic sense; sampling based method; scientific domains; subgraph isomorphism task; top-k frequent subgraph mining; Data mining; Databases; Markov processes; Proposals; Scalability; Silicon; Software;

机译：马尔可夫过程;蒙特卡罗方法;数据挖掘;图论;排队论;采样方法; FS ^{3 ; MCMC;马尔可夫链蒙特卡洛采样;数据挖掘;有限队列;固定大小的子图;创新队列经理;带标签的子图挖掘;概率意义;基于采样的方法;科学领域;子图同构任务; top-k频繁子图挖掘;数据挖掘;数据库; Markov过程;建议;可伸缩性;硅;软件;}

相似文献

外文文献
中文文献
专利

1. FS3: A Sampling Based Method for Top-k Frequent Subgraph Mining [J] . Saha Tanay Kumar, Al Hasan Mohammad Statistical Analysis and Data Mining . 2015,第4期

机译：FS3：基于采样的Top-K频繁子图挖掘方法
2. FS3: A sampling based method for top‐k frequent subgraph mining [J] . Tanay Kumar Saha, Mohammad Al Hasan Statistical Analysis and Data Mining . 2015,第4期

机译：FS3：一种基于采样的Top-k频繁子图挖掘方法
3. Mining top-K frequent itemsets through progressive sampling [J] . Pietracaprina A., Riondato M., Upfal E., Data mining and knowledge discovery . 2010,第2期

机译：通过渐进式采样挖掘前K个频繁项集
4. FS3: A sampling based method for top-k frequent subgraph mining [C] . Saha Tapan K., Al Hasan Mohammad IEEE International Congress on Big Data . 2014

机译：FS 3 ：基于采样的Top-K频繁子图挖掘方法
5. Development and application of ligand-based and structure-based computational drug discovery tools based on frequent subgraph mining of chemical structures [D] . Khashan, Raed Saeed 2007

机译：基于化学结构频繁子图挖掘的基于配体和基于结构的计算药物发现工具的开发和应用
6. Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning [O] . Shengzhi Xu, Sen Su, Xiang Cheng, -1

机译：通过基于采样的候选修剪进行差分私有频繁序列挖掘
7. FS3: A Sampling based method for top-k Frequent Subgraph Mining [O] . Tanay Kumar Saha, Mohammad Al Hasan 2016

机译：Fs3：基于采样的top-k频繁子图挖掘方法
8. Top-K Interesting Subgraph Discovery in Information Networks. [R] . Gupta, M., Gao, J., Yan, X., 2014

机译：信息网络中的Top-K有趣子图发现。

FS3: A sampling based method for top-k frequent subgraph mining

摘要

著录项

相似文献

相关主题

期刊订阅