【24h】

SkewControl: Gini Out of the Bottle

机译:SkewControl:瓶子里的吉妮

获取原文

摘要

In the age of big data, MapReduce plays an important role in the extreme-scale data processing system. Among all the hot issues, the data skew weights heavily for the MapReduce system performance. In traditional approaches, researchers attempt to leave the users to address the issue which requires the user to possess the application-dependent domain knowledge. Other approaches address the issue automatically but in an open-loop manner which lacks of sufficient adaptivity for different applications. To well address these issues, we conduct trace-driven empirical studies and show that the skew has strong stable and predictable characteristics, which allows us to design a closed-loop automatic mechanism for task partitioning and scheduling, called SkewControl. We implement SkewControl on top of a Hadoop 1.0.4 production system. The experimental results show that compared with the state-of-art LATE and SkewTune systems, SkewControl can consistently improve the system response time by 23.8% and 17% respectively.
机译:在大数据时代,MapReduce在超大规模数据处理系统中扮演着重要角色。在所有热门问题中,数据偏斜严重影响了MapReduce系统的性能。在传统方法中,研究人员试图让用户解决该问题,该问题要求用户拥有依赖于应用程序的领域知识。其他方法会自动但以开环的方式解决该问题,这对于不同的应用程序缺乏足够的适应性。为了很好地解决这些问题,我们进行了跟踪驱动的经验研究,并证明了偏斜具有很强的稳定性和可预测性,这使我们可以设计一种用于任务分配和调度的闭环自动机制,称为SkewControl。我们在Hadoop 1.0.4生产系统之上实现SkewControl。实验结果表明,与最新的LATE和SkewTune系统相比,SkewControl可以分别将系统响应时间分别提高23.8%和17%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号