首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >Acceleration of Data-Intensive Workflow Applications by Using File Access History
【24h】

Acceleration of Data-Intensive Workflow Applications by Using File Access History

机译:使用文件访问历史记录加速数据密集型工作流应用程序

获取原文

摘要

Data I/O has been one of major bottlenecks in the execution of data-intensive workflow applications. Appropriate task scheduling of a workflow can achieve high I/O throughput by reducing remote data accesses. However, most such task scheduling algorithms require the user to explicitly describe files to be accessed by each job, typically by stage-in/stage- out directives in job description, where such annotations are at best tedious and sometime impossible. Thus, a more automated mechanism is necessary. In this paper, we propose a method for predicting input/output files of each job without user-supplied annotations. It predicts I/O files by collecting file access history in a profiling run prior to the production run. We implemented the proposed method in a workflow system GXP Make and a distributed file system Mogami. We evaluate our system with two real workflow applications. Our data-aware job scheduler increases the ratio of local file accesses from 50% to 75% in one application and from 23% to 45% in the other. As a result, it reduces the makespan of the two applications by 2.5% and 7.5%, respectively.
机译:数据I / O一直是执行数据密集型工作流程应用程序中的主要瓶颈之一。通过减少远程数据访问来实现工作流的适当任务调度可以通过减少远程数据访问来实现高I / O吞吐量。然而,大多数此类任务调度算法要求用户明确描述每个作业访问的文件,通常由作业描述中的舞台/阶段输出指令,其中这些注释是最佳繁琐的,有时是不可能的。因此,需要更自动化的机制。在本文中,我们提出了一种用于预测每个作业的输入/输出文件而无需用户提供的注释。它通过在生产运行之前收集文件访问历史记录来预测I / O文件。我们在工作流系统GXP制作和分布式文件系统Mogami中实现了所提出的方法。我们使用两个实际工作流程应用评估我们的系统。我们的数据感知作业调度程序将本地文件访问的比率从一个应用程序中的50%增加到75%,另一个应用程序中的23%到45%。因此,它分别将两种应用的MEPESPAN减少2.5%和7.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号