首页> 外文期刊>Future generation computer systems >A Critical Path File Location (CPFL) algorithm for data-aware multiworkflow scheduling on HPC clusters
【24h】

A Critical Path File Location (CPFL) algorithm for data-aware multiworkflow scheduling on HPC clusters

机译:用于HPC群集上的数据感知多工作流调度的关键路径文件位置(CPFL)算法

获取原文
获取原文并翻译 | 示例

摘要

A representative set of workflows found in bioinformatics pipelines must deal with large data sets. Most scientific workflows are defined as Direct Acyclic Graphs (DAGs). Despite DAGs are useful to understand dependence relationships, they do not provide any information about input, output and temporal data files. This information about the location of files of data intensive applications helps to avoid performance issues. This paper presents a multiworkflow store-aware scheduler in a cluster environment called Critical Path File Location (CPFL) policy where the access time to disk is more relevant than network, as an extension of the classical list scheduling policies. Our purpose is to find the best location of data files in a hierarchical storage system. The resulting algorithm is tested in an HPC cluster and in a simulated cluster scenario with bioinformatics synthetic workflows, and largely used benchmarks like Montage and Epigenomics. The resulting simulator is tuned and validated with the first test results from the real infrastructure. The evaluation of our proposal shows promising results up to 70% on benchmarks in real HPC clusters using 128 cores and up to 69% of makespan improvement on simulated 512 cores clusters with a deviation between 0.9% and 3% regarding the real HPC cluster.
机译:在生物信息学管道中找到的一组代表性工作流程必须处理大数据集。大多数科学工作流程都定义为直接非循环图(DAG)。尽管DAG有助于理解依赖关系,但它们不提供有关输入,输出和时间数据文件的任何信息。有关数据密集型应用程序文件位置的信息有助于避免性能问题。本文介绍了在称为关键路径文件位置(CPFL)策略的群集环境中的多工作流感知存储调度程序,其中对磁盘的访问时间比网络访问时间更重要,这是对传统列表调度策略的扩展。我们的目的是在分层存储系统中找到数据文件的最佳位置。生成的算法在HPC集群中以及在具有生物信息学合成工作流程的模拟集群场景中进行了测试,并广泛使用了Montage和Epigenomics等基准测试。使用实际基础结构中的第一个测试结果对生成的模拟器进行调整和验证。对我们提案的评估显示,在使用128核的真实HPC集群中,基准测试结果可望达到70%,而在模拟512核心集群中,模拟性能提升可达69%,而相对于实际HPC集群,则有0.9%至3%的偏差。

著录项

  • 来源
    《Future generation computer systems》 |2017年第9期|51-62|共12页
  • 作者单位

    Computer Architecture & Operating Systems Department (CAOS), Universitat Autonoma tie Barcelona (UAB), Bellaterra (Barcelona), Spain;

    Computer Architecture & Operating Systems Department (CAOS), Universitat Autonoma tie Barcelona (UAB), Bellaterra (Barcelona), Spain;

    Computer Architecture & Operating Systems Department (CAOS), Universitat Autonoma tie Barcelona (UAB), Bellaterra (Barcelona), Spain;

    Computer Architecture & Operating Systems Department (CAOS), Universitat Autonoma tie Barcelona (UAB), Bellaterra (Barcelona), Spain;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Multiworkflows; Cluster; Scheduler; Simulation; Critical path; Data processing;

    机译:多工作流;簇;调度程序;模拟;关键路径;数据处理;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号