首页> 外文期刊>Concurrency and computation: practice and experience >Mary, Hugo, and Hugo: Learning to schedule distributed data-parallel processing jobs on shared clusters
【24h】

Mary, Hugo, and Hugo: Learning to schedule distributed data-parallel processing jobs on shared clusters

机译:Mary,Hugo和Hugo:学习在共享群集中安排分布式数据并行处理作业

获取原文
获取原文并翻译 | 示例

摘要

Distributed data-parallel processing systems like MapReduce, Spark, and Flink are popular for analyzing large datasets using cluster resources. Resource management systems like YARN or Mesos in turn allow multiple data-parallel processing jobs to share cluster resources in temporary containers. Often, the containers do not isolate resource usage to achieve high degrees of overall resource utilization despite overprovisioning and the often fluctuating utilization of specific jobs. However, some combinations of jobs utilize resources better and interfere less with each other when running on the same shared nodes than others. This article presents an approach for improving the resource utilization and job throughput when scheduling recurring distributed data-parallel processing jobs in shared clusters. The approach is based on reinforcement learning and a measure of co-location goodness to have cluster schedulers learn over time which jobs are best executed together on shared resources. We evaluated this approach over the last years with three prototype schedulers that build on each other: Mary, Hugo, and Hugo*. For the evaluation we used exemplary Flink and Spark jobs from different application domains and clusters of commodity nodes managed by YARN. The results of these experiments show that our approach can increase resource utilization and job throughput significantly.
机译:像MapReduce,Spark和Flink这样的分布式数据并行处理系统是使用群集资源分析大型数据集的流行。资源管理系统等纱线或MESOS又允许多个数据并行处理作业在临时容器中共享群集资源。通常,尽管超级设压和特定工作的利用率波动,但容器不会隔离资源使用,以实现高度的整体资源利用率。然而,在与其他合作节点上运行时,作业的某些组合利用资源更好地互相干扰。本文在调度共享集群中调度重复分布式数据并行处理作业时,提高资源利用率和作业吞吐量的方法。该方法是基于加强学习和共同位置良好度的衡量标准,让群集调度员学会随着时间的推移,这些作业最好在共享资源上一起执行。我们在过去几年中评估了这种方法,其中三个原型调度员互相建立:玛丽,雨果和雨果*。对于评估,我们使用由纱线管理的不同应用领域的示例性传递和火花作业和由纱线管理的商品节点集群。这些实验的结果表明,我们的方法可以显着提高资源利用率和作业吞吐量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号