首页> 外文期刊>Future generation computer systems >Locality-aware and load-balanced static task scheduling for MapReduce
【24h】

Locality-aware and load-balanced static task scheduling for MapReduce

机译:MapReduce的位置感知和负载平衡的静态任务调度

获取原文
获取原文并翻译 | 示例
       

摘要

Task scheduling for MapReduce jobs has been an active area of research with the objective of decreasing the amount of data transferred during the shuffle phase via exploiting data locality. In the literature, generally only the scheduling of reduce tasks is considered with the assumption that scheduling of map tasks is already determined by the input data placement. However, in cloud or HPC deployments of MapReduce, the input data is located in a remote storage and scheduling map tasks gains importance. Here, we propose models for simultaneous scheduling of map and reduce tasks in order to improve data locality and balance the processors' loads in both map and reduce phases. Our approach is based on graph and hypergraph models which correctly encode the interactions between map and reduce tasks. Partitions produced by these models are decoded to schedule map and reduce tasks. A two-constraint formulation utilized in these models enables balancing processors' loads in both map and reduce phases. The partitioning objective in the hypergraph models correctly encapsulates the minimization of data transfer when a local combine step is performed prior to shuffle, whereas the partitioning objective in the graph models achieve the same feat when a local combine is not performed. We show the validity of our scheduling on the MapReduce parallelizations of two important kernel operations - sparse matrix-vector multiplication (SpMV) and generalized sparse matrix-matrix multiplication (SpGEMM) - that are widely encountered in big data analytics and scientific computations. Compared to random scheduling, our models lead to tremendous savings in data transfer by reducing data traffic from several hundreds of megabytes to just a few megabytes in the shuffle phase and consequently leading up to 2.6x and 4.2x speedup for SpMV and SpGEMM, respectively. (C) 2018 Elsevier B.V. All rights reserved.
机译:MapReduce作业的任务调度一直是研究的活跃领域,其目标是通过利用数据局部性来减少在混洗阶段中传输的数据量。在文献中,通常仅在假设映射任务的调度已由输入数据放置确定的前提下,才考虑化简任务的调度。但是,在MapReduce的云或HPC部署中,输入数据位于远程存储中,调度地图任务变得非常重要。在这里,我们提出了用于同时调度地图和减少任务的模型,以改善数据局部性并平衡地图和减少阶段的处理器负载。我们的方法基于图形和超图模型,它们可以正确编码地图和约简任务之间的交互。这些模型产生的分区将被解码以安排映射并减少任务。这些模型中使用的两个约束公式可以在映射阶段和缩减阶段中平衡处理器的负载。当在混洗之前执行局部合并步骤时,超图模型中的分区目标正确封装了数据传输的最小化,而当未执行局部合并时,图模型中的分区目标实现了相同的壮举。我们在两个重要的内核操作-稀疏矩阵矢量乘法(SpMV)和广义稀疏矩阵矩阵乘法(SpGEMM)的MapReduce并行化上显示了调度的有效性,这在大数据分析和科学计算中经常遇到。与随机调度相比,我们的模型通过将数据流量在混洗阶段从几百兆字节减少到几兆字节,从而极大地节省了数据传输量,因此SpMV和SpGEMM的速度分别提高了2.6倍和4.2倍。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号