首页> 外文会议>International conference on very large data bases >Piranha: Optimizing Short Jobs in Hadoop
【24h】

Piranha: Optimizing Short Jobs in Hadoop

机译:食人鱼:优化Hadoop的短暂工作

获取原文

摘要

Cluster computing has emerged as a key parallel processing platform for large scale data. All major internet companies use it as their major central processing platform. One of cluster computing's most popular examples is MapReduce and its open source implementation Hadoop. These systems were originally designed for batch and massive-scale computations. Interestingly, over time their production workloads have evolved into a mix of a small fraction of large and long-running jobs and a much bigger fraction of short jobs. This came about because these systems end up being used as data warehouses, which store most of the data sets and attract ad hoc, short, data-mining queries. Moreover, the availability of higher level query languages that operate on top of these cluster systems proliferated these ad hoc queries. Since existing systems were not designed for short, latency-sensistive jobs, short interactive jobs suffer from poor response times. In this paper, we present Piranha-a system for optimizing short jobs on Hadoop without affecting the larger jobs. It runs on existing unmodified Hadoop clusters facilitating its adoption. Piranha exploits characteristics of short jobs learned from production workloads at Yahoo! clusters to reduce the latency of such jobs. To demonstrate Piranha's effectiveness, we evaluated its performance using three realistic short queries. Piranha was able to reduce the queries' response times by up to 71%.
机译:集群计算已成为大规模数据的关键并行处理平台。所有主要互联网公司都将其作为其主要的中央处理平台。群集计算最流行的示例之一是MapReduce及其开源实现Hadoop。这些系统最初是为批量和大规模计算的计算而设计的。有趣的是,随着时间的推移,他们的生产工作量已经发展成为一小部分大型和长期运行的工作以及更大的短岗位。这是因为这些系统最终用作数据仓库,它存储大多数数据集并吸引临时,短,数据挖掘查询。此外,在这些群集系统的顶部运行的更高级别查询语言的可用性激增了这些临时查询。由于现有系统没有为短暂的,延迟传感工作而设计,因此短暂的互动工作遭受较差的响应时间。在本文中,我们提供了Piranha-A系统,以优化Hadoop的短暂工作而不影响更大的工作。它在现有的未修改的Hadoop集群上运行,促进了它的采用。食人鱼利用在雅虎的生产工作负载中了解的短暂工作的特点减少这些工作的延迟的集群。为了展示食人鱼的有效性,我们使用三个现实简短查询评估其性能。食人鱼能够将查询的响应时间降低到71%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号