首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems
【24h】

Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

机译:以顶点为中心的分布式图形处理系统中的快速故障恢复

获取原文
获取原文并翻译 | 示例

摘要

There is a growing need for distributed graph processing systems to have many more compute nodes processing graph-based Big Data applications, which, however, increases the chance of node failures. To address the issue, we propose a novel recovery scheme to accelerate the recovery process by parallelizing the recomputation. Once a failure occurs, all recomputations are confined to subgraphs that originally reside in the failed compute nodes. When the recovery starts, these subgraphs are reassigned to another set of compute nodes, where the recomputation over these subgraphs are conducted in parallel. To minimize the recovery latency, we also develop a reassignment strategy, from these subgraphs to the replaced compute nodes, by properly leveraging the computation and communication cost. We integrate the proposed recovery scheme into Giraph system, a widely used graph processing system. The experimental results over a variety of real graph datasets demonstrate that our proposed recovery scheme outperforms existing recovery methods by up to 30x on a cluster of 40 compute nodes.
机译:越来越需要分布式图形处理系统具有更多的计算节点来处理基于图形的大数据应用程序,但是,这增加了节点故障的机会。为了解决这个问题,我们提出了一种新颖的恢复方案,通过并行化计算来加快恢复过程。一旦发生故障,所有重新计算都将限制在最初位于故障计算节点中的子图上。恢复开始时,将这些子图重新分配给另一组计算节点,在这些计算节点上并行执行这些子图的重新计算。为了最小化恢复延迟,我们还通过适当利用计算和通信成本,开发了从这些子图到替换的计算节点的重新分配策略。我们将建议的恢复方案集成到了广泛使用的图形处理系统Giraph系统中。在各种实际图形数据集上的实验结果表明,我们提出的恢复方案在40个计算节点的群集上的性能比现有恢复方法高30倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号