Large-scale frequent subgraph mining in MapReduce

机译：MapReduce中的大规模频繁子图挖掘

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Mining frequent subgraphs from a large collection of graph objects is an important problem in several application domains such as bio-informatics, social networks, computer vision, etc. The main challenge in subgraph mining is efficiency, as (i) testing for graph isomorphisms is computationally intensive, and (ii) the cardinality of the graph collection to be mined may be very large. We propose a two-step filter-and-refinement approach that is suitable to massive parallelization within the scalable MapReduce computing model. We partition the collection of graphs among worker nodes, and each worker applies the filter step to determine a set of candidate subgraphs that are locally frequent in its partition. The union of all such graphs is the input to the refinement step, where each candidate is checked against all partitions and only the globally frequent graphs are retained. We devise a statistical threshold mechanism that allows us to predict which subgraphs have a high chance to become globally frequent, and thus reduce the computational overhead in the refinement step. We also propose effective strategies to avoid redundant computation in each round when searching for candidate graphs, as well as a lightweight graph compression mechanism to reduce the communication cost between machines. Extensive experimental evaluation results on several real-world large graph datasets show that the proposed approach clearly outperforms the existing state-of-the-art and provides a practical solution to the problem of frequent subgraph mining for massive collections of graphs.

机译：从大量图对象中挖掘频繁的子图是几个应用领域（例如生物信息学，社交网络，计算机视觉等）中的重要问题。子图挖掘的主要挑战是效率，因为（i）测试图同构是计算密集型;（ii）要挖掘的图形集合的基数可能非常大。我们提出了一种两步式的过滤和细化方法，该方法适用于可伸缩MapReduce计算模型中的大规模并行化。我们在工作程序节点之间划分图的集合，每个工作程序都应用过滤步骤来确定一组在其分区中本地频繁出现的候选子图。所有这些图的并集是细化步骤的输入，在该步骤中，将针对所有分区检查每个候选项，并且仅保留全局频繁使用的图。我们设计了一种统计阈值机制，该机制使我们能够预测哪些子图很有可能变得全局频繁，从而减少了优化步骤中的计算开销。我们还提出了有效的策略，以避免在搜索候选图时在每一轮中进行多余的计算，并提出了一种轻量级的图压缩机制来减少机器之间的通信成本。在几个实际的大型图形数据集上进行的广泛实验评估结果表明，所提出的方法明显优于现有的最新技术，并为大量图形的频繁子图挖掘问题提供了实用的解决方案。

著录项

来源
《IEEE international conference on data engineering》|2014年|844-855|共12页
会议地点
作者
Lin Wenqing; Xiao Xiaokui; Ghinita Gabriel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Mining frequent subgraphs from tremendous amount of small graphs using MapReduce [J] . Peng Zhe, Wang Tongtong, Lu Wei, Knowledge and information systems . 2018,第3期

机译：使用MapReduce的巨大数量的小图来挖掘频繁的子图
2. An Iterative MapReduce Based Frequent Subgraph Mining Algorithm [J] . Bhuiyan M.A., Al Hasan M. Knowledge and Data Engineering, IEEE Transactions on . 2015,第3期

机译：基于迭代MapReduce的频繁子图挖掘算法
3. Distributed discovery of frequent subgraphs of a network using MapReduce [J] . Shahrivari Saeed, Jalili Saeed Computing . 2015,第11期

机译：使用MapReduce分布式发现网络的频繁子图
4. Large-scale frequent subgraph mining in MapReduce [C] . Lin Wenqing, Xiao Xiaokui, Ghinita Gabriel IEEE international conference on data engineering . 2014

机译：MapReduce的大规模频繁子图挖掘
5. Mining Frequent Patterns from Uncertain Data with MapReduce [D] . Hayduk, Yaroslav 2012

机译：使用MapReduce从不确定的数据中挖掘频繁模式
6. FREQUENT SUBGRAPH MINING OF PERSONALIZED SIGNALING PATHWAY NETWORKS GROUPS PATIENTS WITH FREQUENTLY DYSREGULATED DISEASE PATHWAYS AND PREDICTS PROGNOSIS [O] . Arda Durmaz, Tim A. D. Henderson, Douglas Brubaker, -1

机译：频繁失调的疾病通路和预测的个性化信号通路网络组的频率子图挖掘
7. An Iterative MapReduce Approach to Frequent Subgraph Mining in Biological Datasets [O] . Steven Hill, Bismita Srichandan 2015

机译：生物数据集中频繁子图挖掘的迭代mapReduce方法
8. Discovering Frequent Geometric Subgraphs. [R] . Kuramochi, M., Karypis, G. 2004

机译：发现频繁的几何子图。

Large-scale frequent subgraph mining in MapReduce

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅