Acceleration of Data-Intensive Workflow Applications by Using File Access History

机译：使用文件访问历史记录加速数据密集型工作流应用程序

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data I/O has been one of major bottlenecks in the execution of data-intensive workflow applications. Appropriate task scheduling of a workflow can achieve high I/O throughput by reducing remote data accesses. However, most such task scheduling algorithms require the user to explicitly describe files to be accessed by each job, typically by stage-in/stage- out directives in job description, where such annotations are at best tedious and sometime impossible. Thus, a more automated mechanism is necessary. In this paper, we propose a method for predicting input/output files of each job without user-supplied annotations. It predicts I/O files by collecting file access history in a profiling run prior to the production run. We implemented the proposed method in a workflow system GXP Make and a distributed file system Mogami. We evaluate our system with two real workflow applications. Our data-aware job scheduler increases the ratio of local file accesses from 50% to 75% in one application and from 23% to 45% in the other. As a result, it reduces the makespan of the two applications by 2.5% and 7.5%, respectively.

机译：数据I / O一直是执行数据密集型工作流程应用程序中的主要瓶颈之一。通过减少远程数据访问来实现工作流的适当任务调度可以通过减少远程数据访问来实现高I / O吞吐量。然而，大多数此类任务调度算法要求用户明确描述每个作业访问的文件，通常由作业描述中的舞台/阶段输出指令，其中这些注释是最佳繁琐的，有时是不可能的。因此，需要更自动化的机制。在本文中，我们提出了一种用于预测每个作业的输入/输出文件而无需用户提供的注释。它通过在生产运行之前收集文件访问历史记录来预测I / O文件。我们在工作流系统GXP制作和分布式文件系统Mogami中实现了所提出的方法。我们使用两个实际工作流程应用评估我们的系统。我们的数据感知作业调度程序将本地文件访问的比率从一个应用程序中的50％增加到75％，另一个应用程序中的23％到45％。因此，它分别将两种应用的MEPESPAN减少2.5％和7.5％。

著录项

来源
《International Conference for High Performance Computing, Networking, Storage and Analysis》|2012年||共9页
会议地点
作者
Horiuchi Miki; Taura Kenjiro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词

相似文献

外文文献
中文文献
专利

1. File and Task Abstraction in Task Workflow Patterns for File Recommendation Using File-Access Log [J] . Qiang SONG, Takayuki KAWABATA, Fumiaki ITOH, IEICE transactions on information and systems . 2014,第4期

机译：任务工作流模式中的文件和任务抽象，用于使用文件访问日志进行文件推荐
2. FLEXIBLE AND EFFICIENT WORKFLOW DEPLOYMENT OF DATA-INTENSIVE APPLICATIONS ON GRIDS WITH MOTEUR [J] . Tristan Glatard, Johan Montagnat, Diane Lingrand, International Journal of High Performance Computing Applications . 2008,第3期

机译：带有网格的网格上数据密集型应用程序的灵活高效的工作流部署
3. Power-Effective File Layout Based on Large Scale Data-Intensive Application in Virtualized Environment [J] . Shunsuke YAGAI, Masato OGUCHI, Miyuki NAKANO, IEICE transactions on information and systems . 2017,第12期

机译：虚拟环境中基于大规模数据密集型应用的节能文件布局
4. Acceleration of Data-Intensive Workflow Applications by Using File Access History [C] . Horiuchi Miki, Taura Kenjiro 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis. . 2012

机译：通过使用文件访问历史记录来加速数据密集型工作流应用程序
5. Engineering High Performance Workflows for End-to-End Acceleration of Genomic Applications [D] . Rengasamy, Vasudevan 2018

机译：工程高性能工作流，用于基因组应用程序的端到端加速
6. BrowseVCF: a web-based application and workflow to quickly prioritize disease-causative variants in VCF files [O] . Silvia Salatino, Varun Ramraj -1

机译：BrowseVCF：一个基于Web的应用程序和工作流用于快速确定VCF文件中引起疾病的变异的优先级
7. File and Task Abstraction in Task Workflow Patterns for File Recommendation Using File-Access Log [O] . Qiang SONG, Takayuki KAWABATA, Fumiaki ITOH, 2014

机译：使用文件访问日志的文件建议的任务工作流模式中的文件和任务抽象
8. Generalized Grid File: An Access Structure for CIM (Computer Integrated Manufacturing) Applications [R] . Blanken, H., Ijbema, A., Meek, P., 1988

机译：通用网格文件：CIm（计算机集成制造）应用程序的访问结构

Acceleration of Data-Intensive Workflow Applications by Using File Access History

摘要

著录项

相似文献

相关主题

期刊订阅