首页> 外文会议>International Workshop on Job Scheduling Strategies for Parallel Processing >Choosing Optimal Maintenance Time for Stateless Data-Processing Clusters A Case Study of Hadoop Cluster
【24h】

Choosing Optimal Maintenance Time for Stateless Data-Processing Clusters A Case Study of Hadoop Cluster

机译:为无状态数据处理群集选择最佳维护时间,是Hadoop集群的案例研究

获取原文

摘要

Stateless clusters such as Hadoop clusters are widely deployed to drive the business data analysis. When a cluster needs to be restarted for cluster-wide maintenance, it is desired for the administrators to choose a maintenance window that results in: (1) least disturbance to the cluster operation; and (2) maximized job processing throughput. A straightforward but naive approach is to choose maintenance time that has the least number of running jobs, but such an approach is suboptimal. In this work, we use Hadoop as an use case and propose to determine the optimal cluster maintenance time based on the accumulated job progress, as opposed the number of running jobs. The approach can maximize the job throughput of a stateless cluster by minimizing the amount of lost works due to maintenance. Compared to the straightforward approach, the proposed approach can save up to 50% of wasted cluster resources caused by maintenance according to production cluster traces.
机译:诸如Hadoop集群等无状态集群被广泛部署以推动业务数据分析。当需要重新开始群集进行群集维护时,管理员需要选择一个维护窗口,导致:(1)对群集操作的最小干扰; (2)最大化的作业处理吞吐量。一个简单但天真的方法是选择具有最少数量的运行作业的维护时间,但这样的方法是次优。在这项工作中,我们将Hadoop用作用例,并建议根据累积的作业进度确定最佳的群集维护时间,而不是运行作业的数量。通过最大限度地减少由于维护导致的丢失工作量,该方法可以最大化无状态群集的作业吞吐量。与直接的方法相比,根据生产群集痕迹,所提出的方法可以节省高达50%的浪费群资源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号