首页> 外文会议>International conference on Parallel architectures and compilation techniques >Keynote talk: Experiences with MapReduce, an abstraction for large-scale computation
【24h】

Keynote talk: Experiences with MapReduce, an abstraction for large-scale computation

机译:主题演讲:MapReduce的经验,MapReduce是大规模计算的抽象

获取原文

摘要

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a Map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a Reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The MapReduce run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: thousands of MapReduce programs have been implemented and several thousand thousand MapReduce jobs are executed on Google's clusters every day. In this talk I'll describe the basic programming model, discuss our experience using it in a variety of domains, and talk about the implications of programming models like MapReduce as one paradigm to simplify development of parallel software for multi-core microprocessors.
机译:MapReduce是用于处理和生成大型数据集的编程模型和相关的实现。用户指定一个Map函数处理一个键/值对以生成一组中间键/值对,以及一个Reduce函数,该函数合并与同一中间键关联的所有中间值。在此模型中,许多现实世界的任务都是可以表达的。以这种功能风格编写的程序会自动并行化,并在大型商用机器集群上执行。 MapReduce运行时系统负责划分输入数据,安排程序在一组计算机上的执行,处理计算机故障以及管理所需的计算机间通信的细节。这使没有并行和分布式系统经验的程序员可以轻松利用大型分布式系统的资源。我们对MapReduce的实现可在大型商用机器集群上运行,并且具有高度可扩展性:典型的MapReduce计算可在数千台机器上处理数TB的数据。程序员发现该系统易于使用:每天执行数千个MapReduce程序,每天在Google的集群上执行数千个MapReduce作业。在本次演讲中,我将描述基本的编程模型,讨论我们在各种领域中使用它的经验,并讨论诸如MapReduce这样的编程模型作为简化多核微处理器并行软件开发的一种范例的含义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号