首页> 外文会议>East European Conference on Advances in Databases and Information Systems >H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution
【24h】

H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution

机译:H-Word:支持Hadoop中的作业调度,使用工作负载驱动数据重新分发

获取原文

摘要

Today's distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we address the challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapRe-duce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution.
机译:今天的分布式数据处理系统通常遵循查询的运输方法并利用数据局部性来减少网络流量。在这种系统中,通过群集资源的数据分发效果显着作用,并且当偏斜时,它可能会损害执行应用程序的性能。在本文中,我们解决了自动调整集群中数据分发到输入应用程序所施加的工作负载的挑战。我们提出了一种名为H-Word的通用算法,该算法基于估计的工作负载超过资源,建议任务的替代执行方案,因此识别所需的输入数据的传输,以便及时将数据接近执行。我们举例说明了我们在Hadoop生态系统中的Mapre-Duce作业的上下文中的算法。最后,我们评估了我们的方法并展示了自动数据再分配的性能增益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号