首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium Workshops >Data-Locality Aware Dynamic Schedulers for Independent Tasks with Replicated Inputs
【24h】

Data-Locality Aware Dynamic Schedulers for Independent Tasks with Replicated Inputs

机译:具有重复输入的独立任务的数据局部性动态调度程序

获取原文
获取外文期刊封面目录资料

摘要

In this paper we concentrate on a crucial parameter for efficiency in Big Data and HPC applications: data locality. We focus on the scheduling of a set of independant tasks, each depending on an input file. We assume that each of these input files has been replicated several times and placed in local storage of different nodes of a cluster, similarly of what we can find on HDFS system for example. We consider two optimization problems, related to the two natural metrics: makespan optimization (under the constraint that only local tasks are allowed) and communication optimization (under the constraint of never letting a processor idle in order to optimize makespan). For both problems we investigate the performance of dynamic schedulers, in particular the basic greedy algorithm we can for example find in the default MapReduce scheduler. First we theoretically study its performance, with probabilistic models, and provide a lower bound for communication metric and asymptotic behaviour for both metrics. Second we propose simulations based on traces from a Hadoop cluster to compare the different dynamic schedulers and assess the expected behaviour obtained with the theoretical study.
机译:在本文中,我们专注于提高大数据和HPC应用程序效率的关键参数:数据局部性。我们专注于安排一组独立的任务,每个任务都取决于一个输入文件。我们假设每个输入文件已被复制多次,并放置在群集中不同节点的本地存储中,类似于我们在HDFS系统上可以找到的文件。我们考虑了两个与两个自然指标有关的优化问题:makepan优化(在只允许本地任务的约束下)和通信优化(在从不让处理器闲置以优化makepan的约束下)。对于这两个问题,我们研究了动态调度程序的性能,尤其是我们可以在默认MapReduce调度程序中找到的基本贪婪算法。首先,我们使用概率模型从理论上研究其性能,并为通信度量和两个度量的渐近行为提供一个下限。其次,我们提出了基于Hadoop集群跟踪的模拟,以比较不同的动态调度程序并评估通过理论研究获得的预期行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号