Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

Lu Wei; Shen Yanyan; Wang Tongtong; Zhang Meihui; Jagadish H. V.; Du Xiaoyong

首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

【24h】

Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

机译：顶点中心分布式图形处理系统中的快速故障恢复

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

There is a growing need for distributed graph processing systems to have many more compute nodes processing graph-based Big Data applications, which, however, increases the chance of node failures. To address the issue, we propose a novel recovery scheme to accelerate the recovery process by parallelizing the recomputation. Once a failure occurs, all recomputations are confined to subgraphs that originally reside in the failed compute nodes. When the recovery starts, these subgraphs are reassigned to another set of compute nodes, where the recomputation over these subgraphs are conducted in parallel. To minimize the recovery latency, we also develop a reassignment strategy, from these subgraphs to the replaced compute nodes, by properly leveraging the computation and communication cost. We integrate the proposed recovery scheme into Giraph system, a widely used graph processing system. The experimental results over a variety of real graph datasets demonstrate that our proposed recovery scheme outperforms existing recovery methods by up to 30x on a cluster of 40 compute nodes.

机译：分布式图形处理系统的需求越来越大，需要更多计算基于图形的大数据应用程序的计算节点，但是，增加节点故障的可能性。要解决此问题，我们提出了一种新的恢复计划，通过并行化重新计算来加速恢复过程。发生故障后，所有重新计算都将限制为最初驻留在失败的计算节点中的子图。当恢复开始时，将这些子图重新分配到另一组计算节点，其中通过这些子图进行并行进行的重新计算。为了最大限度地减少恢复延迟，我们还通过正确利用计算和通信成本，从这些子图中开发重新分配策略到替换的计算节点。我们将拟议的恢复方案集成到吉拉鱼系统中，是一个广泛使用的图形处理系统。在各种真实图数据集上的实验结果表明，我们所提出的恢复方案在40个计算节点的群集中最多超过30倍。

著录项

来源
《IEEE Transactions on Knowledge and Data Engineering》 |2019年第4期|733-746|共14页
作者
Lu Wei; Shen Yanyan; Wang Tongtong; Zhang Meihui; Jagadish H. V.; Du Xiaoyong;
展开▼
作者单位

Renmin Univ China MOE DEKE Beijing 100872 Peoples R China|Renmin Univ China Sch Informat Beijing 100872 Peoples R China;

Shanghai Jiao Tong Univ Xuhui Qu 200000 Shanghai Shi Peoples R China;

Renmin Univ China MOE DEKE Beijing 100872 Peoples R China|Renmin Univ China Sch Informat Beijing 100872 Peoples R China;

Beijing Inst Technol Beijing 100081 Peoples R China;

Univ Michigan Elect Engn & Comp Sci Ann Arbor MI 48109 USA;

Renmin Univ China MOE DEKE Beijing 100872 Peoples R China|Renmin Univ China Sch Informat Beijing 100872 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Distributed graph processing systems; failure recovery; checkpoint; log; compression; partition-based recovery;

机译：分布式图形处理系统;故障恢复;检查点;日志;压缩;基于分区的恢复;

相似文献

外文文献
中文文献
专利

1. Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems [J] . Lu Wei, Shen Yanyan, Wang Tongtong, IEEE Transactions on Knowledge and Data Engineering . 2019,第4期

机译：以顶点为中心的分布式图形处理系统中的快速故障恢复
2. GraphD: Distributed Vertex-Centric Graph Processing Beyond the Memory Limit [J] . Da Yan, Yuzhen Huang, Miao Liu, IEEE Transactions on Parallel and Distributed Systems . 2018,第1期

机译：GraphD：超出内存限制的分布式顶点中心图处理
3. Thinking Like a Vertex: A Survey of Vertex-Centric Frameworks for Large-Scale Distributed Graph Processing [J] . McCune Robert Ryan, Weninger Tim, Madey Greg ACM Computing Surveys . 2016,第2期

机译：像顶点一样思考：大规模分布图处理的以顶点为中心的框架的概述
4. Fast Failure Recovery in Distributed Graph Processing Systems [C] . Yanyan Shen, Gang Chen, H. V. Jagadish, International conference on very large data bases . 2015

机译：分布式图形处理系统中的快速故障恢复
5. A Modified Rapid Spanning Tree Protocol (MOD-RSTP) for Fast and Efficient Failure Recovery in Bridged Shipboard Networked Control Systems [D] . Penera, Eric. 2018

机译：修改的快速生成树协议（MOD-RSTP），用于桥接船舶网络控制系统中的快速高效故障恢复
6. Ultra-Fast Displaying Spectral Domain Optical Doppler Tomography System Using a Graphics Processing Unit [O] . Hyosang Jeong, Nam Hyun Cho, Unsang Jung, 2012

机译：使用图形处理单元的超快显示光谱域光学多普勒层析成像系统
7. Fast Failure Recovery in Distributed Graph Processing Systems [O] . 2016

机译：分布式图形处理系统中的快速故障恢复
8. Dealing with Failures During Failure Recovery of Distributed Systems. [R] . Arshad, N., Heimbigner, D., Wolf, A. 2006

机译：处理分布式系统故障恢复过程中的故障。

Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅