【24h】

Filtering: A Method for Solving Graph Problems in MapReduce

机译:过滤:MapReduce中解决图形问题的一种方法

获取原文
获取原文并翻译 | 示例

摘要

The MapReduce framework is currently the de facto standard used throughout both industry and academia for petabyte scale data analysis. As the input to a typical MapReduce computation is large, one of the key requirements of the framework is that the input cannot be stored on a single machine and must be processed in parallel. In this paper we describe a general algorithmic design technique in the MapReduce framework called filtering. The main idea behind filtering is to reduce the size of the input in a distributed fashion so that the resulting, much smaller, problem instance can be solved on a single machine. Using this approach we give new algorithms in the MapReduce framework for a variety of fundamental graph problems for sufficiently dense graphs. Specifically, we present algorithms for minimum spanning trees, maximal matchings, approximate weighted matchings, approximate vertex and edge covers and minimum cuts. In all of these cases, we parameterize our algorithms by the amount of memory available on the machines allowing us to show tradeoffs between the memory available and the number of MapReduce rounds. For each setting we will show that even if the machines are only given substantially sublinear memory, our algorithms run in a constant number of MapReduce rounds. To demonstrate the practical viability of our algorithms we implement the maximal matching algorithm that lies at the core of our analysis and show that it achieves a significant speedup over the sequential version.
机译:MapReduce框架当前是整个行业和学术界用于PB级数据分析的事实上的标准。由于典型MapReduce计算的输入量很大,因此框架的关键要求之一是输入不能存储在单台计算机上,而必须并行处理。在本文中,我们描述了MapReduce框架中称为过滤的通用算法设计技术。过滤背后的主要思想是以分布式方式减小输入的大小,以便可以在单个计算机上解决所产生的小得多的问题实例。使用这种方法,我们在MapReduce框架中提供了针对足够密集的图的各种基本图问题的新算法。具体来说,我们提出了最小生成树,最大匹配,近似加权匹配,近似顶点和边缘覆盖以及最小切割的算法。在所有这些情况下,我们通过计算机上可用的内存量来参数化算法,从而使我们能够显示可用内存与MapReduce轮次之间的折衷。对于每种设置,我们将显示,即使仅为机器提供了基本上亚线性的内存,我们的算法也会以恒定数量的MapReduce轮次运行。为了证明我们算法的实际可行性,我们实现了最大匹配算法,该算法是我们分析的核心,并表明与顺序版本相比,它实现了显着的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号