【24h】

MapReduce: Simplified Data Processing on Large Clusters

机译:MapReduce:大型集群上的简化数据处理

获取原文
获取原文并翻译 | 示例

摘要

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.rnPrograms written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.rnOur implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.
机译:MapReduce是用于处理和生成大型数据集的编程模型和相关的实现。用户指定一个处理键/值对以生成一组中间键/值对的映射函数,以及一个归约合并与同一中间键关联的所有中间值的reduce函数。如该论文所示,该模型可表达许多现实世界中的任务。以这种功能风格编写的程序会自动并行化并在大型商用机器集群上执行。运行时系统负责划分输入数据,安排程序在一组机器上的执行,处理机器故障以及管理所需的机器间通信的细节。这使没有并行和分布式系统经验的程序员可以轻松利用大型分布式系统的资源。rn我们的MapReduce实现可在大型商用机器集群上运行,并且具有高度可扩展性:典型的MapReduce计算可处理成千上万兆字节的数据机器。程序员发现该系统易于使用:每天执行数百个MapReduce程序,每天在Google的集群上执行多达一千个MapReduce作业。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号