首页> 外文期刊>Future generation computer systems >Performance analysis and optimality results for data-locality aware tasks scheduling with replicated inputs
【24h】

Performance analysis and optimality results for data-locality aware tasks scheduling with replicated inputs

机译:数据局部地意识到与复制输入调度的数据局部意识任务的性能分析和最优性

获取原文
获取原文并翻译 | 示例
           

摘要

Replication of data files, as automatically performed by Distributed File Systems such as HDFS, is known to have a crucial impact on data locality in addition to system fault tolerance. Indeed, intuitively, having more replicas of the same input file gives more opportunities for this task to be processed locally, i.e. without any input file transfer. Given the practical importance of this problem, a vast literature has been proposed to schedule tasks, based on a random placement of replicated input files. Our goal in this paper is to study the performance of these algorithms, both in terms of makespan minimization (minimize the completion time of the last task when non-local processing is forbidden) and communication minimization (minimize the number of non-local tasks when no idle time on resources is allowed). In the case of homogeneous tasks, we are able to prove, using models based on "balls into bins" and "power of two choices" problems, that the well known good behavior of classical strategies can be theoretically grounded. Going further, we even establish that it is possible, using semi-matchings theory, to find the optimal solution in very small time. We also use known graph-orientation results to prove that this optimal solution is indeed near-perfect with strong probability. In the more general case of heterogeneous tasks, we propose heuristics solutions both in the clairvoyant and non-clairvoyant cases (i.e. task length is known in advance or not), and we evaluate them through simulations, using actual traces of a Hadoop cluster.
机译:通过诸如HDFS的分布式文件系统自动执行数据文件的复制,还已知对系统容错除了系统容错之外对数据局部的关键影响。实际上,直观地,拥有更多相同输入文件的副本,为本地处理此任务提供更多机会,即没有任何输入文件传输。鉴于此问题的实际重要性,已提出了一种基于复制输入文件的随机放置的巨大文献来安排任务。我们本文的目标是研究这些算法的性能,无论是Mapespan最小化(最小化最后一个任务时,禁止非本地处理时最后一次任务的完成时间)和通信最小化(最小化非本地任务的数量允许在资源上没有空闲时间)。在同类任务的情况下,我们能够证明,使用基于“球进入垃圾箱”的模型和“两种选择的力量”问题,所以众所周知的经典策略的良好行为可以理论上。更进一步,我们甚至建立了使用半匹配理论,在非常少的时间内找到最佳解决方案。我们还使用已知的图形方向结果证明,这种最佳解决方案确实近乎完美,具有很强的概率。在更常见的非均匀任务的情况下,我们提出了在透视和非批评者中的启发式解决方案(即,任务长度是预先知道的),我们使用Hadoop集群的实际迹线通过模拟评估它们。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号