首页> 外文会议>IEEE International Conference on Parallel and Distributed Systems >Effectiveness of Moldable and Malleable Scheduling in Deep Learning Tasks
【24h】

Effectiveness of Moldable and Malleable Scheduling in Deep Learning Tasks

机译:深度学习任务中可塑和可塑调度的有效性

获取原文

摘要

Research and development of deep learning (DL) applications often involves exhaustive trial-and-error, which demands that shared computational resources, especially GPUs, be efficiently allocated. Most DL tasks are moldable or malleable (i.e., the number of allocated GPUs can be changed before or during execution). However, conventional batch schedulers do not take advantage of DL tasks' moldability/malleability, inhibiting speedup when some GPU resources are unallocated. Another opportunity for speedup is to run multiple tasks concurrently on one GPU, which may improve the overall throughput because a single task does not always fully utilize the GPU's computational resources. We propose designing a batch scheduling system that exploits these opportunities to accelerate DL tasks. As a first step, this study conducts an extensive case study to evaluate the speedup of DL tasks when a scheduler treats them as moldable or malleable. That is, the scheduler adjusts the number of GPUs to be (or already) allocated to a task in response to the fluctuating availability of GPUs. Simulations using our real workload trace show that if the scheduler can allocate 1-4 GPUs to a task or assign 1-4 tasks to a GPU, then the average flow time of moldable/malleable DL tasks is shortened by at least 15.1 %/42.5 %, respectively, compared to a Rigid FCFS schedule in which one GPU is allocated to each task.
机译:深度学习(DL)应用程序的研究和开发通常涉及详尽的反复试验,这要求有效分配共享的计算资源(尤其是GPU)。大多数DL任务是可塑的或可塑的(即可以在执行之前或执行期间更改分配的GPU的数量)。但是,常规的批处理调度程序无法利用DL任务的可成型性/可恶性,从而在未分配某些GPU资源时会抑制加速。另一个提速的机会是在一个GPU上同时运行多个任务,这可能会提高整体吞吐量,因为单个任务并不总是完全利用GPU的计算资源。我们建议设计一个批处理调度系统,以利用这些机会来加速DL任务。作为第一步,本研究进行了广泛的案例研究,以评估调度程序将DL任务视为可塑或可塑时的速度。即,调度器响应于GPU的可用性的波动而调整要(或已经)分配给任务的GPU的数量。使用实际工作负载跟踪进行的模拟显示,如果调度程序可以为任务分配1-4个GPU或为GPU分配1-4个任务,那么可模制/可恶意攻击的DL任务的平均流程时间至少缩短了15.1 \%/与其中每个任务分配一个GPU的刚性FCFS计划相比,分别为42.5 \%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号