首页> 外文会议>Annual symposium on parallelism in algorithms and architectures >Filtering: A Method for Solving Graph Problems in MapReduce
【24h】

Filtering: A Method for Solving Graph Problems in MapReduce

机译:过滤:MapReduce中求解图形问题的方法

获取原文

摘要

The MapReduce framework is currently the de facto standard used throughout both industry and academia for petabyte scale data analysis. As the input to a typical MapReduce computation is large, one of the key requirements of the framework is that the input cannot be stored on a single machine and must be processed in parallel. In this paper we describe a general algorithmic design technique in the MapReduce framework called filtering. The main idea behind filtering is to reduce the size of the input in a distributed fashion so that the resulting, much smaller, problem instance can be solved on a single machine. Using this approach we give new algorithms in the MapReduce framework for a variety of fundamental graph problems for sufficiently dense graphs. Specifically, we present algorithms for minimum spanning trees, maximal matchings, approximate weighted matchings, approximate vertex and edge covers and minimum cuts. In all of these cases, we parameterize our algorithms by the amount of memory available on the machines allowing us to show tradeoffs between the memory available and the number of MapReduce rounds. For each setting we will show that even if the machines are only given substantially sublinear memory, our algorithms run in a constant number of MapReduce rounds. To demonstrate the practical viability of our algorithms we implement the maximal matching algorithm that lies at the core of our analysis and show that it achieves a significant speedup over the sequential version.
机译:MapReduce框架是目前整个工业界和学术界使用PB级别的数据分析的事实标准。作为输入到一个典型的MapReduce计算大时,框架的关键要求之一是输入不能被存储在单个机器上,必须并行地处理。在本文中,我们描述了所谓的过滤MapReduce框架的通用算法设计技术。后面滤波的主要思想是,以减少输入的大小以分布方式使得所得小得多,问题实例可以在单个机器上得到解决。我们使用这种方法在各种用于足够密集的图形基本图问题MapReduce框架赋予了新的算法。具体来说,我们本最小生成树,最大匹配数,近似加权匹配数,近似顶点和边缘盖和最小割算法。在所有这些情况下,我们通过内存的机器上使我们能够显示可用内存和MapReduce的回合数之间进行权衡使用量参数我们的算法。对于每种设置,我们将显示,即使机器只给基本次线性内存,我们的算法中的MapReduce轮固定数量的运行。为了证明我们的算法的实际可行性,我们实现了最大匹配算法,谎言在我们分析的核心,并表明它实现了连续版本显著加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号