首页> 外文学位 >A performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark's GraphX
【24h】

A performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark's GraphX

机译:在Apache Spark的GraphX中执行推入重贴标签最大流量算法的性能研究

获取原文
获取原文并翻译 | 示例

摘要

GraphX is an API for graph computation built upon Apache Spark, a fast and generalized engine for large-scale data processing in the cloud. While the popularity of Spark and GraphX is growing, the relatively young technology has yet to explore the breadth of graph problems that exist in the field. In order to examine and gain insights into the capabilities of GraphX, this thesis approaches the framework with the intention of implementing a solution to the Maximum Flow Problem, a complex graph problem without a trivial distributed approach. Specifically, the implementation is to be based on the serial Push-Relabel algorithm. An original MapReduce-based approach to the problem is presented, as well as an implementation of the approach in GraphX. In addition to the implementation, experimentation and deployment to an Amazon EC2 cluster allowed observations on caching and checkpointing intervals to be made.
机译:GraphX是基于Apache Spark构建的图形计算API,Apache Spark是用于云中大规模数据处理的快速通用引擎。随着Spark和GraphX的流行度增加,相对较年轻的技术还没有探索该领域中存在的图形问题的广度。为了检查和了解GraphX的功能,本文采用了该框架,旨在实现最大流量问题的解决方案,该问题是一种复杂的图形问题,没有简单的分布式方法。具体而言,该实现应基于串行Push-Relabel算法。提出了一种基于MapReduce的原始方法,并在GraphX中实现了该方法。除了实施之外,通过对Amazon EC2集群进行试验和部署,还可以观察缓存和检查点间隔。

著录项

  • 作者

    Langewisch, Ryan P.;

  • 作者单位

    Colorado School of Mines.;

  • 授予单位 Colorado School of Mines.;
  • 学科 Computer science.
  • 学位 M.S.
  • 年度 2015
  • 页码 90 p.
  • 总页数 90
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号