首页> 外文期刊>Services Computing, IEEE Transactions on >Workload Analysis, Implications, and Optimization on a Production Hadoop Cluster: A Case Study on Taobao
【24h】

Workload Analysis, Implications, and Optimization on a Production Hadoop Cluster: A Case Study on Taobao

机译:生产Hadoop集群上的工作负载分析,含义和优化:以淘宝网为例

获取原文
获取原文并翻译 | 示例

摘要

Understanding the characteristics of MapReduce workloads in a Hadoop cluster is the key to making optimal configuration decisions and improving the system efficiency and throughput. However, workload analysis on a Hadoop cluster, particularly in a large-scale e-commerce production environment, has not been well studied yet. In this paper, we performed a comprehensive workload analysis using the trace collected from a 2000-node Hadoop cluster at Taobao, which is the biggest online e-commerce enterprise in Asia, ranked 10th in the world as reported by Alexa. The results of the workload analysis are representative and generally consistent with the data warehouses for e-commerce web sites, which can help researchers and engineers understand the workload characteristics of Hadoop in their production environments. Based on the observations and implications derived from the trace, we designed a workload generator Ankus, to expedite the performance evaluation and debugging of new mechanisms. Ankus supports synthesizing an e-commerce style MapReduce workload at a low cost. Furthermore, we proposed and implemented a job scheduling algorithm, Fair4S , which is designed to be biased towards small jobs. Small jobs account for the majority of the workload, and most of them require instant and interactive responses, which is an important phenomenon at production Hadoop systems. The inefficiency of Hadoop fair scheduler for handling small jobs motivates us to design the Fair4S, which introduces pool weights and extends job priorities to guarantee the rapid responses for small jobs. Experimental evaluation verified that the Fair4S accelerates the average waiting times of small jobs by a factor of 7 compared with the fair scheduler.
机译:了解Hadoop集群中MapReduce工作负载的特征是做出最佳配置决策以及提高系统效率和吞吐量的关键。但是,对Hadoop集群(特别是在大型电子商务生产环境中)的工作负载分析尚未得到很好的研究。在本文中,我们使用从淘宝网中的一个2000节点的Hadoop集群收集的跟踪信息进行了全面的工作负载分析,淘宝网是亚洲最大的在线电子商务企业,在Alexa的报告中排名世界第十。工作负载分析的结果具有代表性,并且通常与电子商务网站的数据仓库一致,这可以帮助研究人员和工程师了解其生产环境中Hadoop的工作负载特征。基于跟踪的观察结果和含义,我们设计了工作负载生成器Ankus,以加快性能评估和调试新机制的速度。 Ankus支持以低成本合成电子商务样式的MapReduce工作负载。此外,我们提出并实现了作业调度算法Fair4S,该算法旨在偏向于小型作业。小型工作占了大部分工作量,其中大多数需要即时和交互式响应,这在生产Hadoop系统中很重要。 Hadoop公平调度程序处理小工作的效率低下,促使我们设计Fair4S,它引入了池权重并扩展了工作优先级以确保对小工作的快速响应。实验评估证明,与Fair Scheduler相比,Fair4S将小型作业的平均等待时间缩短了7倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号