首页> 外文会议>IEEE Conference on Computer Communications >Scheduling Placement-Sensitive BSP Jobs with Inaccurate Execution Time Estimation
【24h】

Scheduling Placement-Sensitive BSP Jobs with Inaccurate Execution Time Estimation

机译:使用不正确的执行时间估计来安排对位置敏感的BSP作业

获取原文

摘要

The Bulk Synchronous Parallel (BSP) paradigm is gaining tremendous importance recently because of the pop-ularity of computations such as distributed machine learning and graph computation. In a typical BSP job, multiple workers concurrently conduct iterative computations, where frequent synchronization is required. Therefore, the workers should be scheduled simultaneously and their placement on different computing devices could significantly affect the performance. Simply retrofitting a traditional scheduling discipline will likely not yield the desired performance due to the unique characteristics of BSP jobs. In this work, we derive SPIN, a novel scheduling designed for BSP jobs with placement-sensitive execution to minimize the makespan of all jobs. We first prove the problem approximation hardness and then present how SPIN solves it with a rounding-based randomized approximation approach. Our analysis indicates SPIN achieves a good performance guarantee efficiently. Moreover, SPIN is robust against misestimation of job execution time by theoretically bounding its negative impact. We implement SPIN on a production-trace driven testbed with 40 GPUs. Our extensive experiments show that SPIN can reduce the job makespan and the average job completion time by up to 3× and 4.68×, respectively. Our approach also demonstrates better robustness to execution time misestimation compared with heuristic baselines.
机译:由于诸如分布式机器学习和图形计算之类的计算的普遍性,大批量同步并行(BSP)范式最近变得越来越重要。在典型的BSP作业中,需要频繁同步的情况下,多个工作人员会同时进行迭代计算。因此,应同时安排工作人员,并且将他们放置在不同的计算设备上可能会严重影响性能。由于BSP作业的独特特性,仅对传统的调度规则进行改造可能无法获得理想的性能。在这项工作中,我们得出了SPIN,这是一种针对BSP作业而设计的新颖调度,具有对位置敏感的执行功能,可最大程度地减少所有作业的制造时间。我们首先证明问题的近似硬度,然后介绍SPIN如何使用基于舍入的随机近似方法解决它。我们的分析表明,SPIN有效地实现了良好的性能保证。此外,SPIN通过从理论上限制其负面影响,可以有效地防止误估计作业执行时间。我们在具有40个GPU的生产跟踪驱动的测试平台上实施SPIN。我们的广泛实验表明,SPIN可以分别将作业完成时间和平均作业完成时间减少多达3倍和4.68倍。与启发式基准相比,我们的方法还展示了对执行时间错误估计的更好的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号