首页> 外文期刊>Concurrency, practice and experience >An efficient iterative graph data processing framework based on bulk synchronous parallelmodel
【24h】

An efficient iterative graph data processing framework based on bulk synchronous parallelmodel

机译:基于批量同步并行模型的高效迭代图数据处理框架

获取原文
获取原文并翻译 | 示例

摘要

Graph data processing has been widely applied in a variety of domains such as industry, science, social network, and so on. It therefore has stimulated many efforts devoted to this area. To embrace the fast development trend of big graph data, graph data processing based on Pregel-like systems has been regarded as one of the most promising ways and has widely attracted the attention of researchers. However, it still remains in its early stage and there still exist many challenges. In Pregel, the superstep synchronization is time consuming as the graph data iteration operation requires multiple synchronizations. Furthermore, the graph data partition strategy adopted by Pregel fails to support load balancing, therefore causing the increase of network I/O overhead as the scale of graph data grows. To address these issues, this paper presents an efficient computational framework for graph data processing based on the bulk synchronous parallel model. The global synchronization control mechanism is improved by determining the start time of the next round of superstep through counting the number of global message files. Furthermore, an improved graph data partition mechanism based on a balanced hash method is proposed to reduce the communication overhead between different partitions of sub-graph computational tasks. We also re-design the PageRank algorithm to verify the effectiveness of the proposed framework. Experimental results on different real-world datasets verify the efficiency of our proposed framework as it outperforms Giraph (an open source Pregel-like system) by 58%-69%, and achieves 10x-17x performance improvement over Hadoop.
机译:图形数据处理已广泛应用于工业,科学,社交网络等各个领域。因此,它激发了致力于这一领域的许多努力。为了适应大图数据的快速发展趋势,基于Pregel样系统的图数据处理被认为是最有前途的方法之一,并引起了研究者的广泛关注。但是,它仍然处于早期阶段,仍然存在许多挑战。在Pregel中,超步同步非常耗时,因为图形数据迭代操作需要多次同步。此外,Pregel采用的图形数据分区策略无法支持负载平衡,因此随着图形数据规模的增长,导致网络I / O开销的增加。为了解决这些问题,本文提出了一种基于批量同步并行模型的高效图形数据处理计算框架。通过对全局消息文件的数量进行计数来确定下一轮超级步骤的开始时间,从而改进了全局同步控制机制。此外,提出了一种基于平衡哈希方法的改进图数据分区机制,以减少子图计算任务不同分区之间的通信开销。我们还重新设计了PageRank算法,以验证所提出框架的有效性。在不同的实际数据集上的实验结果证明了我们提出的框架的效率,因为它的性能优于Giraph(类似于Pregel的开源系统)58%-69%,并且性能比Hadoop提高了10到17倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号