首页> 外文期刊>Distributed and Parallel Databases >Distributed computing connected components with linear communication cost
【24h】

Distributed computing connected components with linear communication cost

机译:具有线性通信成本的分布式计算连接组件

获取原文
获取原文并翻译 | 示例

摘要

The paper studies three fundamental problems in graph analytics, computing connected components ( CC s), biconnected components ( BCC s), and 2-edge-connected components ( ECC s) of a graph. With the recent advent of big data, developing efficient distributed algorithms for computing CC s, BCC s and ECC s of a big graph has received increasing interests. As with the existing research efforts, we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark . The state-of-the-art techniques for computing CCs and BCCs in Pregel incur $$O(mtimes #text {supersteps})$$ O ( m × # supersteps ) total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to compute CC s and BCC s with O ( m ) total communication cost. The total computation costs of our techniques are also smaller than that of the existing techniques in practice, though theoretically almost the same. Moreover, we also study distributed computing ECC s. We are the first to study this problem and an approach with O ( m ) total communication cost is proposed. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time.
机译:本文研究了图分析中的三个基本问题:计算图的连接组件(CC),双连接组件(BCC)和2边连接组件(ECC)。随着大数据的最新出现,开发用于计算大图的CC,BCC和ECC的高效分布式算法已引起越来越多的关注。与现有的研究工作一样,我们将重点放在Pregel编程模型上,而该技术可能会扩展到其他编程模型,包括MapReduce和Spark。 Pregel中用于计算CC和BCC的最新技术会导致$ O(mtimes #text {supersteps})$$ O(m×##supersteps)的数据通信和计算总成本,其中m是图中的边数和#supersteps是超步数。由于网络通信速度通常比计算速度要慢得多,因此通信成本是现有技术中总运行时间的主要成本。在本文中,我们提出了一种基于图分解的新范式,以计算总通信成本为O(m)的CC s和BCC。尽管理论上几乎相同,但我们的技术的总计算成本也比实践中的现有技术小。此外,我们还研究了分布式计算ECC。我们是第一个研究此问题的人,提出了一种以O(m)总通信成本为代价的方法。全面的经验研究表明,就总运行时间而言,我们的方法可以比现有技术好一个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号