首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium Workshops >A Scheduling Algorithm for Hadoop MapReduce Workflows with Budget Constraints in the Heterogeneous Cloud
【24h】

A Scheduling Algorithm for Hadoop MapReduce Workflows with Budget Constraints in the Heterogeneous Cloud

机译:异构云中的Hadoop MapReduce工作流程调度算法

获取原文

摘要

In recent years cloud services have gained much attention as a result of their availability, scalability, and low cost. One use of these services has been for the execution of scientific workflows as part of Big Data Analytics, which are employed in a diverse range of fields including astronomy, physics, seismology, and bioinformatics. There has been much research on heuristic scheduling algorithms for these workflows due to the problem's inherent complexity, however existing work has mainly considered execution in a utility grid environment using a generic distributed framework. For our research, we consider the ever-increasingly popular Apache Hadoop framework for scheduling workflow onto resources rented from cloud service providers. Contrary to other distributed frameworks, the Hadoop MapReduce model imposes a functional style onto application definition, and as such presents an interesting and unapproached challenge for workflow scheduling. Investigated in our work is budget-constrained workflow scheduling on the Hadoop MapReduce platform, wherein we devise both an optimal and a heuristic approach to minimize workflow makespan while satisfying a given budget constraint. We have implemented modifications to the Apache Hadoop framework to allow fully integrated workflow scheduling. These modifications are novel and have led to the completion of the first generic workflow scheduler fully integrated with the Apache Hadoop framework. Both the framework modifications and the proposed scheduler implementation have been extensively tested via execution on multiple workflow applications, which demonstrates the ability of our implementation to handle all possible workflow substructures. Results from our empirical studies further establish these facts.
机译:近年来,由于其可用性,可扩展性和低成本而导致云服务产生了很大的关注。这些服务的一次使用是为了执行科学工作流程,作为大数据分析的一部分,这些工作流程在包括天文学,物理学,地震学和生物信息学的各种领域中使用。由于问题的固有复杂性,对这些工作流的启发式调度算法有很多研究,但是现有的工作主要使用通用分布式框架在公用事业网格环境中执行。对于我们的研究,我们认为,越来越受欢迎的Apache Hadoop框架,用于将工作流程调度到云服务提供商租用的资源上。与其他分布式框架相反,Hadoop MapReduce模型对应用程序定义强加了功能样式,因此对工作流程调度具有一个有趣和不满的挑战。在我们的工作中调查是Hadoop MapReduce平台上的预算限制工作流程调度,其中我们设计了最佳和启发式方法,以最小化工作流程徒步行程,同时满足给定的预算约束。我们已经为Apache Hadoop框架实现了修改,以允许完全集成的工作流程调度。这些修改是新颖的,并导致完成与Apache Hadoop框架完全集成的第一通用工作流程调度程序。框架修改和所提出的调度程序实现都通过多个工作流应用程序进行了广泛测试,这展示了我们实现处理所有可能的工作流子结构的能力。我们的实证研究结果进一步建立了这些事实。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号