首页> 外文会议>2012 IEEE International Symposium on Workload Characterization. >Workload characterization on a production Hadoop cluster: A case study on Taobao
【24h】

Workload characterization on a production Hadoop cluster: A case study on Taobao

机译:生产Hadoop集群上的工作负载表征:以淘宝为例

获取原文
获取原文并翻译 | 示例

摘要

MapReduce is becoming the state-of-the-art computing paradigm for processing large-scale datasets on a large cluster with tens or thousands of nodes. It has been widely used in various fields such as e-commerce, Web search, social networks, and scientific computation. Understanding the characteristics of MapReduce workloads is the key to achieving better configuration decisions and improving the system throughput. However, workload characterization of MapReduce, especially in a large-scale production environment, has not been well studied yet. To gain insight on MapReduce workloads, we collected a two-week workload trace from a 2,000-node Hadoop cluster at Taobao, which is the biggest online e-commerce enterprise in Asia, ranked 14th in the world as reported by Alexa. The workload trace covered 912,157 jobs, logged from Dec. 4 to Dec. 20, 2011. We characterized the workload at the granularity of job and task, respectively and concluded with a set of interesting observations. The results of workload characterization are representative and generally consistent with data platforms for e-commerce websites, which can help other researchers and engineers understand the performance and job characteristics of Hadoop in their production environments. In addition, we use these job analysis statistics to derive several implications for potential performance optimization solutions.
机译:MapReduce正在成为处理具有数十或数千个节点的大型集群上的大规模数据集的最新计算范例。它已广泛用于电子商务,Web搜索,社交网络和科学计算等各个领域。了解MapReduce工作负载的特征是实现更好的配置决策和提高系统吞吐量的关键。但是,MapReduce的工作负载表征,尤其是在大规模生产环境中,尚未得到很好的研究。为了深入了解MapReduce工作负载,我们从淘宝网的2,000个节点Hadoop集群中收集了为期两周的工作负载跟踪,淘宝网是亚洲最大的在线电子商务企业,在全球排名第14位。据Alexa报道。从2011年12月4日到12月20日,工作负载跟踪记录了912,157个工作。我们分别按照工作和任务的粒度对工作负载进行了特征描述,并得出了一系列有趣的结论。工作负载表征的结果具有代表性,并且通常与电子商务网站的数据平台保持一致,从而可以帮助其他研究人员和工程师了解其生产环境中Hadoop的性能和工作特征。此外,我们使用这些工作分析统计数据来得出潜在性能优化解决方案的若干含义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号