首页> 外文期刊>Frontiers of computer science in China >An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment
【24h】

An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment

机译:分布式环境中图上N跳邻域的高效并行算法

获取原文
获取原文并翻译 | 示例
           

摘要

N-hop neighborhoods information is very useful in analytic tasks on large-scale graphs, like finding clique in a social network, recommending friends or advertising links according to one's interests, predicting links among websites and etc. To get the N-hop neighborhoods information on a large graph, such as a web graph, a twitter social graph, the most straightforward method is to conduct a breadth first search (BFS) on a parallel distributed graph processing framework, such as Pregel and GraphLab. However, due to the massive volume of message transfer, the BFS method results in high communication cost and has low efficiency.In this work, we propose a key/value based method, namely KVB, which perfectly fits into the prevailing parallel graph processing framework and computes N-hop neighborhoods on a large scale graph efficiently. Unlike the BFS method, our method need not transfer large amount of neighborhoods information, thus, significantly reduces the overhead on both the communication and intermediate results in the distributed framework.We formalize the N-hop neighborhoods query processing as an optimization problem based on a set of quantitative cost metrics of parallel graph processing. Moreover, we propose a solution to efficiently load only the relevant neighborhoods for computation. Specially, we prove the optimal partial neighborhoods load problem is NP-hard and carefully design a heuristic strategy. We have implemented our algorithm on a distributed graph framework- Spark GraphX and validated our solution with extensive experiments over a number of real world and synthetic large graphs on a modest indoor cluster. Experiments show that our solution generally gains an order of magnitude speedup comparing to the state-of-art BFS implementation.
机译:N跳邻域信息在大规模图形分析任务中非常有用,例如在社交网络中查找团伙,根据自己的兴趣推荐朋友或广告链接,预测网站之间的链接等。获得N跳邻域信息在大型图(例如网络图,Twitter社交图)上,最直接的方法是在并行分布式图处理框架(例如Pregel和GraphLab)上进行广度优先搜索(BFS)。但是,由于消息传递量巨大,BFS方法导致通信成本高且效率低下。在这项工作中,我们提出了一种基于键/值的方法,即KVB,它完全适合当前流行的并行图处理框架并有效地在大规模图形上计算N跳邻域。与BFS方法不同,我们的方法不需要传输大量的邻域信息,从而显着减少了分布式框架中通信和中间结果的开销。我们将N跳邻域查询处理形式化为基于A的优化问题一组并行图处理的定量成本度量。此外,我们提出了一种解决方案,可以有效地仅加载相关邻域进行计算。特别地,我们证明了最优的局部邻域负荷问题是NP难的,并精心设计了启发式策略。我们已经在分布式图形框架Spark GraphX上实现了我们的算法,并通过在适度的室内集群上对大量现实世界和合成大型图形进行的大量实验验证了我们的解决方案。实验表明,与最新的BFS实现相比,我们的解决方案通常可提高一个数量级的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号