首页> 外文期刊>The VLDB journal >Scalable subgraph enumeration in MapReduce: a cost-oriented approach
【24h】

Scalable subgraph enumeration in MapReduce: a cost-oriented approach

机译:MapReduce中的可伸缩子图枚举:一种面向成本的方法

获取原文
获取原文并翻译 | 示例
       

摘要

Subgraph enumeration, which aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph, is a fundamental graph problem with a wide range of applications. However, existing sequential algorithms for subgraph enumeration fall short in handling large graphs due to the involvement of computationally intensive subgraph isomorphism operations. Thus, some recent researches focus on solving the problem using MapReduce. Nevertheless, exiting MapReduce approaches are not scalable to handle very large graphs since they either produce a huge number of partial results or consume a large amount of memory. Motivated by this, in this paper, we propose a new algorithm based on a left-deep-join framework in MapReduce, in which the basic join unit is a (an edge or two incident edges of a node). We show that in the Erdos-R,nyi random graph model, is instance optimal in the left-deep-join framework under reasonable assumptions, and we devise an algorithm to compute the optimal join plan. We further discuss how our approach can be adapted to handle the power-law random graph model. Three optimization strategies are explored to improve our algorithm. Ultimately, by aggregating equivalent nodes into a compressed node, we construct the compressed graph, upon which the subgraph enumeration is further improved. We conduct extensive performance studies in several real graphs, one of which contains billions of edges. Our approach significantly outperforms existing solutions in all tests.
机译:子图枚举旨在查找与给定模式图同构的大型数据图的所有子图,这是一个具有广泛应用范围的基本图问题。但是,由于涉及计算量大的子图同构运算,因此现有的子图枚举顺序算法在处理大型图时不够。因此,最近的一些研究集中在使用MapReduce解决问题上。但是,现有的MapReduce方法无法扩展以处理非常大的图形,因为它们要么产生大量的部分结果,要么消耗大量的内存。为此,本文提出了一种新的基于MapReduce中左深连接框架的算法,其中基本连接单元是一个(节点的一个边缘或两个入射边缘)。我们证明,在合理的假设下,在Erdos-R,nyi随机图模型中,在左深连接框架中实例最优,并且设计了一种算法来计算最佳连接计划。我们将进一步讨论如何调整我们的方法来处理幂律随机图模型。探索了三种优化策略来改进我们的算法。最终,通过将等效节点聚合到一个压缩节点中,我们构造了压缩图,在此图上进一步改进了子图枚举。我们在几个真实的图中进行了广泛的性能研究,其中一个包含数十亿条边。我们的方法在所有测试中均明显优于现有解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号