Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

Lu Wei; Shen Yanyan; Wang Tongtong; Zhang Meihui; Jagadish H. V.; Du Xiaoyong

首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

【24h】

Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

机译：以顶点为中心的分布式图形处理系统中的快速故障恢复

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

There is a growing need for distributed graph processing systems to have many more compute nodes processing graph-based Big Data applications, which, however, increases the chance of node failures. To address the issue, we propose a novel recovery scheme to accelerate the recovery process by parallelizing the recomputation. Once a failure occurs, all recomputations are confined to subgraphs that originally reside in the failed compute nodes. When the recovery starts, these subgraphs are reassigned to another set of compute nodes, where the recomputation over these subgraphs are conducted in parallel. To minimize the recovery latency, we also develop a reassignment strategy, from these subgraphs to the replaced compute nodes, by properly leveraging the computation and communication cost. We integrate the proposed recovery scheme into Giraph system, a widely used graph processing system. The experimental results over a variety of real graph datasets demonstrate that our proposed recovery scheme outperforms existing recovery methods by up to 30x on a cluster of 40 compute nodes.

机译：越来越需要分布式图形处理系统具有更多的计算节点来处理基于图形的大数据应用程序，但是，这增加了节点故障的机会。为了解决这个问题，我们提出了一种新颖的恢复方案，通过并行化计算来加快恢复过程。一旦发生故障，所有重新计算都将限制在最初位于故障计算节点中的子图上。恢复开始时，将这些子图重新分配给另一组计算节点，在这些计算节点上并行执行这些子图的重新计算。为了最小化恢复延迟，我们还通过适当利用计算和通信成本，开发了从这些子图到替换的计算节点的重新分配策略。我们将建议的恢复方案集成到了广泛使用的图形处理系统Giraph系统中。在各种实际图形数据集上的实验结果表明，我们提出的恢复方案在40个计算节点的群集上的性能比现有恢复方法高30倍。

著录项

来源
《IEEE Transactions on Knowledge and Data Engineering》 |2019年第4期|733-746|共14页
作者
Lu Wei; Shen Yanyan; Wang Tongtong; Zhang Meihui; Jagadish H. V.; Du Xiaoyong;
展开▼
作者单位

Renmin Univ China, MOE, DEKE, Beijing 100872, Peoples R China|Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China;

Shanghai Jiao Tong Univ, Xuhui Qu 200000, Shanghai Shi, Peoples R China;

Renmin Univ China, MOE, DEKE, Beijing 100872, Peoples R China|Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China;

Beijing Inst Technol, Beijing 100081, Peoples R China;

Univ Michigan, Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA;

Renmin Univ China, MOE, DEKE, Beijing 100872, Peoples R China|Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Distributed graph processing systems; failure recovery; checkpoint; log; compression; partition-based recovery;

机译：分布式图形处理系统;故障恢复;检查点;日志;压缩;基于分区的恢复;

相似文献

外文文献
中文文献
专利

1. Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems [J] . Lu Wei, Shen Yanyan, Wang Tongtong, IEEE Transactions on Knowledge and Data Engineering . 2019,第4期

机译：顶点中心分布式图形处理系统中的快速故障恢复
2. GraphD: Distributed Vertex-Centric Graph Processing Beyond the Memory Limit [J] . Da Yan, Yuzhen Huang, Miao Liu, IEEE Transactions on Parallel and Distributed Systems . 2018,第1期

机译：GraphD：超出内存限制的分布式顶点中心图处理
3. Thinking Like a Vertex: A Survey of Vertex-Centric Frameworks for Large-Scale Distributed Graph Processing [J] . McCune Robert Ryan, Weninger Tim, Madey Greg ACM Computing Surveys . 2016,第2期

机译：像顶点一样思考：大规模分布图处理的以顶点为中心的框架的概述
4. Fast Failure Recovery in Distributed Graph Processing Systems [C] . Yanyan Shen, Gang Chen, H. V. Jagadish, International conference on very large data bases . 2015

机译：分布式图形处理系统中的快速故障恢复
5. A Modified Rapid Spanning Tree Protocol (MOD-RSTP) for Fast and Efficient Failure Recovery in Bridged Shipboard Networked Control Systems [D] . Penera, Eric. 2018

机译：修改的快速生成树协议（MOD-RSTP），用于桥接船舶网络控制系统中的快速高效故障恢复
6. Ultra-Fast Displaying Spectral Domain Optical Doppler Tomography System Using a Graphics Processing Unit [O] . Hyosang Jeong, Nam Hyun Cho, Unsang Jung, 2012

机译：使用图形处理单元的超快显示光谱域光学多普勒层析成像系统
7. Fast Failure Recovery in Distributed Graph Processing Systems [O] . 2016

机译：分布式图形处理系统中的快速故障恢复
8. Dealing with Failures During Failure Recovery of Distributed Systems. [R] . Arshad, N., Heimbigner, D., Wolf, A. 2006

机译：处理分布式系统故障恢复过程中的故障。

Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅