首页> 外文期刊>Concurrency, practice and experience >Dataflow detection and applications to workflow scheduling
【24h】

Dataflow detection and applications to workflow scheduling

机译:数据流检测及其在工作流调度中的应用

获取原文
获取原文并翻译 | 示例

摘要

In high-performance computing (HPC) workloads (i.e. the set of computations to be completed), the same computational workflow of jobs (e.g. a Pipeline, a Fork&Join, or a Lattice graph) may be applied to different input files and parameters. Each of these workflow instances has the same workflow shape, but accesses (possibly) separate input, intermediate, and output files. Therefore, the selective isolation of each workflow instance can be important for maximizing scheduling flexibility and performance. However, in practice, realizing this benefit is not obvious due to a variety of problems and constraints. For example, the unmediated interaction of different workflow instances can lead to a problem of filename conflicts between concurrent workflow instances overwriting common files, which, for a control-flow driven batch scheduler, may result in either unsafe computation of the multiple instances in the same sub-directory or storage overheads when multiple directories are used. We propose a novel approach of selectively coupling and integrating job schedulers and file systems, known as a Workflow-aware File System (WaFS), with two major benefits. First, separate namespaces can be constructed on a per-instance basis to maximize the concurrency of workflow instances, despite filename conflicts, while minimizing storage overhead. Second, exploiting inferred dataflow information, trade-offs can be made between makespan and storage overhead while maintaining correctness. Through a simulation-based study, we have shown the potential benefits of WaFS to job concurrency and we have characterized the trade-offs that can be made between storage overhead and performance. New scheduling policies, Versioned Namespace (VNS), Overwrite-Safe Concurrency (OSC) and hybrids, are made possible by WaFS, with different advantages and disadvantages. Copyright
机译:在高性能计算(HPC)工作负载(即要完成的一组计算)中,相同的作业计算工作流(例如管道,Fork&Join或格形图)可以应用于不同的输入文件和参数。这些工作流程实例中的每一个都具有相同的工作流程形状,但是(可能)访问单独的输入,中间和输出文件。因此,每个工作流实例的选择性隔离对于最大化调度灵活性和性能可能很重要。但是,实际上,由于各种问题和限制,实现这种好处并不明显。例如,不同工作流实例的无中介交互可能导致并发工作流实例覆盖通用文件之间文件名冲突的问题,对于控制流驱动的批处理调度程序,这可能导致同一实例中多个实例的不安全计算使用多个目录时的子目录或存储开销。我们提出了一种选择性地耦合和集成作业调度程序和文件系统(称为工作流程感知文件系统(WaFS))的新颖方法,它具有两个主要优点。首先,可以在每个实例的基础上构造独立的名称空间,以最大程度地提高工作流实例的并发性,尽管文件名发生冲突,同时又可以最大程度地减少存储开销。其次,利用推断的数据流信息,可以在保持正确性的同时,在制造期和存储开销之间进行权衡。通过基于仿真的研究,我们展示了WaFS对作业并发的潜在好处,并且我们描述了可以在存储开销和性能之间进行权衡的特征。 WaFS使得新的调度策略(版本命名空间(VNS),覆盖安全并发(OSC)和混合)成为可能,它们各有优缺点。版权

著录项

  • 来源
    《Concurrency, practice and experience》 |2011年第11期|p.1261-1283|共23页
  • 作者

    Yang Wang; Paul Lu;

  • 作者单位

    Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada T6G 2E8,Yang Wang,National University of Singapore,Singapore;

    Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada T6G 2E8;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    dataflow; concurrency; storage;

    机译:数据流;并发存储;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号