首页> 外文会议>IEEE International Symposium on Workload Characterization >Workload characterization on a production Hadoop cluster: A case study on Taobao
【24h】

Workload characterization on a production Hadoop cluster: A case study on Taobao

机译:生产Hadoop集群的工作负载表征:淘宝对淘宝进行案例研究

获取原文

摘要

MapReduce is becoming the state-of-the-art computing paradigm for processing large-scale datasets on a large cluster with tens or thousands of nodes. It has been widely used in various fields such as e-commerce, Web search, social networks, and scientific computation. Understanding the characteristics of MapReduce workloads is the key to achieving better configuration decisions and improving the system throughput. However, workload characterization of MapReduce, especially in a large-scale production environment, has not been well studied yet. To gain insight on MapReduce workloads, we collected a two-week workload trace from a 2,000-node Hadoop cluster at Taobao, which is the biggest online e-commerce enterprise in Asia, ranked 14th in the world as reported by Alexa. The workload trace covered 912,157 jobs, logged from Dec. 4 to Dec. 20, 2011. We characterized the workload at the granularity of job and task, respectively and concluded with a set of interesting observations. The results of workload characterization are representative and generally consistent with data platforms for e-commerce websites, which can help other researchers and engineers understand the performance and job characteristics of Hadoop in their production environments. In addition, we use these job analysis statistics to derive several implications for potential performance optimization solutions.
机译:MapReduce的正在成为与节点的数十或数千大型集群上处理大型数据集的状态的最先进的计算范例。它已被广泛应用于各个领域,如电子商务,网络搜索,社交网络和科学计算。理解的MapReduce工作负载的特性的关键是实现更好的配置决策,提高了系统的吞吐量。然而,工作负载的MapReduce的特性,特别是在大规模生产环境,一直没有得到很好的研究还没有。为了获得MapReduce的工作负载的洞察力,我们收集了来自淘宝有2000个节点的Hadoop集群,这是亚洲最大的在线电子商务企业一个为期两周的工作量跟踪,第14位世界通过Alexa的报道。工作量跟踪覆盖912157个作业,从记录的12月4日至12月20日,2011年我们的特点工作量分别的工作和任务,粒度和一组有趣的观察得出的结论。工作负载特性的结果具有代表性和普遍与电子商务网站的数据平台,它可以帮助其他研究人员和工程师了解他们的生产环境中的Hadoop的性能和工作特性是一致的。此外,我们使用这些工作分析的统计数据来推导潜在的性能优化解决方案几方面的含义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号