首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems
【24h】

Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

机译:顶点中心分布式图形处理系统中的快速故障恢复

获取原文
获取原文并翻译 | 示例
           

摘要

There is a growing need for distributed graph processing systems to have many more compute nodes processing graph-based Big Data applications, which, however, increases the chance of node failures. To address the issue, we propose a novel recovery scheme to accelerate the recovery process by parallelizing the recomputation. Once a failure occurs, all recomputations are confined to subgraphs that originally reside in the failed compute nodes. When the recovery starts, these subgraphs are reassigned to another set of compute nodes, where the recomputation over these subgraphs are conducted in parallel. To minimize the recovery latency, we also develop a reassignment strategy, from these subgraphs to the replaced compute nodes, by properly leveraging the computation and communication cost. We integrate the proposed recovery scheme into Giraph system, a widely used graph processing system. The experimental results over a variety of real graph datasets demonstrate that our proposed recovery scheme outperforms existing recovery methods by up to 30x on a cluster of 40 compute nodes.
机译:分布式图形处理系统的需求越来越大,需要更多计算基于图形的大数据应用程序的计算节点,但是,增加节点故障的可能性。要解决此问题,我们提出了一种新的恢复计划,通过并行化重新计算来加速恢复过程。发生故障后,所有重新计算都将限制为最初驻留在失败的计算节点中的子图。当恢复开始时,将这些子图重新分配到另一组计算节点,其中通过这些子图进行并行进行的重新计算。为了最大限度地减少恢复延迟,我们还通过正确利用计算和通信成本,从这些子图中开发重新分配策略到替换的计算节点。我们将拟议的恢复方案集成到吉拉鱼系统中,是一个广泛使用的图形处理系统。在各种真实图数据集上的实验结果表明,我们所提出的恢复方案在40个计算节点的群集中最多超过30倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号