...
首页> 外文期刊>The Computer journal >Scheduling Workflow Applications Based on Multi-source Parallel Data Retrieval in Distributed Computing Networks
【24h】

Scheduling Workflow Applications Based on Multi-source Parallel Data Retrieval in Distributed Computing Networks

机译:分布式计算网络中基于多源并行数据检索的工作流应用程序调度

获取原文
获取原文并翻译 | 示例
           

摘要

Many scientific experiments are carried out in collaboration with researchers around the world to use existing infrastructures and conduct experiments at massive scale. Data produced by such experiments are thus replicated and cached at multiple geographic locations. This gives rise to new challenges when selecting distributed data and compute resources so that the execution of applications is time- and cost-efficient. Existing heuristic techniques select 'best' data source for retrieving data to a compute resource and subsequently process task-resource assignment. However, this approach of scheduling, which is based only on single source data retrieval, may not give time-efficient schedules when: (ⅰ) tasks are interdependent on data, (ⅱ) the average size of data processed by most tasks is large and (ⅲ) data transfer time exceeds task computation time by at least one order of magnitude. In order to address these characteristics of data-intensive applications, we propose to leverage the presence of replicated data sources, retrieve data in parallel from multiple locations and thus achieve time-efficient schedules. In this article, we propose two multi-source data-retrieval-based scheduling heuristic that assigns interdependent tasks to compute resources based on both data retrieval time and task-computation time. We carry out experiments using real applications and deploy them on emulated as well as real environments. With a combination of data retrieval and task-resource mapping technique, we show that our heuristic produces time-efficient schedules that are better than existing heuristic-based techniques for scheduling application workflows.
机译:与世界各地的研究人员合作进行了许多科学实验,以使用现有的基础结构并进行大规模的实验。通过这种实验产生的数据因此被复制并缓存在多个地理位置。在选择分布式数据和计算资源时,这带来了新的挑战,因此应用程序的执行既节省时间又节省成本。现有的启发式技术选择“最佳”数据源,以将数据检索到计算资源并随后处理任务资源分配。但是,这种仅基于单一源数据检索的调度方法在以下情况下可能无法提供省时的调度:(ⅰ)任务与数据相互依赖,(ⅱ)大多数任务处理的数据的平均大小很大,并且(ⅲ)数据传输时间超过任务计算时间至少一个数量级。为了解决数据密集型应用程序的这些特征,我们建议利用复制数据源的存在,从多个位置并行检索数据,从而实现省时的计划。在本文中,我们提出了两种基于多源数据检索的调度启发式方法,它们基于数据检索时间和任务计算时间分配相互依赖的任务来计算资源。我们使用真实的应用程序进行实验,并将其部署在仿真和真实环境中。结合数据检索和任务资源映射技术,我们证明了我们的启发式方法可以产生省时的调度,而这种调度比现有的基于启发式的应用程序工作流调度技术要好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号