首页> 外文期刊>Future generation computer systems >Raw data queries during data-intensive parallel workflow execution
【24h】

Raw data queries during data-intensive parallel workflow execution

机译:数据密集型并行工作流执行期间的原始数据查询

获取原文
获取原文并翻译 | 示例

摘要

Computer simulations consume and produce huge amounts of raw data files presented in different formats, e.g., HDF5 in computational fluid dynamics simulations. Users often need to analyze domain-specific data based on related data elements from multiple files during the execution of computer simulations. In a raw data analysis, one should identify regions of interest in the data space and retrieve the content of specific related raw data files. Existing solutions, such as FastBit and RAW, are limited to a single raw data file analysis and can only be used after the execution of computer simulations. Scientific Workflow Management Systems (SWMS) can manage the dataflow of computer simulations and register related raw data files at a provenance database. This paper aims to combine the advantages of a dataflow-aware SWMS and the raw data file analysis techniques to allow for queries on raw data file elements that are related, but reside in separate files. We propose a component-based architecture, named as ARMFUL (Analysis of Raw data from Multiple Files) with raw data extraction and indexing techniques, which allows for a direct access to specific elements or regions of raw data space. ARMFUL innovates by using a SWMS provenance database to add a dataflow access path to raw data files. ARMFUL facilitates the invocation of ad-hoc programs and third party tools (e.g., FastBit tool) for raw data analyses. In our experiments, a real parallel computational fluid dynamics is executed, exploring different alternatives of raw data extraction, indexing and analysis.
机译:计算机模拟会消耗并产生大量以不同格式显示的原始数据文件,例如计算流体动力学模拟中的HDF5。用户通常需要在执行计算机模拟期间基于来自多个文件的相关数据元素来分析特定于域的数据。在原始数据分析中,应该识别数据空间中感兴趣的区域,并检索特定相关原始数据文件的内容。现有解决方案(例如FastBit和RAW)仅限于单个原始数据文件分析,并且只能在执行计算机模拟之后使用。科学工作流管理系统(SWMS)可以管理计算机模拟的数据流,并在出处数据库中注册相关的原始数据文件。本文旨在将可感知数据流的SWMS的优势与原始数据文件分析技术相结合,以允许查询相关但位于单独文件中的原始数据文件元素。我们提出了一种基于组件的体系结构,称为ARMFUL(来自多个文件的原始数据分析),具有原始数据提取和索引编制技术,它允许直接访问原始数据空间的特定元素或区域。 ARMFUL通过使用SWMS来源数据库进行创新,以向原始数据文件添加数据流访问路径。 ARMFUL有助于调用临时程序和第三方工具(例如FastBit工具)进行原始数据分析。在我们的实验中,执行真正的并行计算流体动力学,探索原始数据提取,索引编制和分析的不同选择。

著录项

  • 来源
    《Future generation computer systems》 |2017年第10期|402-422|共21页
  • 作者单位

    Department of Computer Science, COPPE, Federal University of Rio de Janeiro, Brazil;

    Department of Computer Science, COPPE, Federal University of Rio de Janeiro, Brazil;

    High Performance Computing Center, COPPE, Federal University of Rio de Janeiro, Brazil,Department of Civil Engineering COPPE, Federal University of Rio de Janeiro, Brazil;

    Institute of Computing, Fluminense Federal University, Brazil;

    High Performance Computing Center, COPPE, Federal University of Rio de Janeiro, Brazil,Department of Civil Engineering COPPE, Federal University of Rio de Janeiro, Brazil;

    Inria, France,URMM, France;

    Department of Computer Science, COPPE, Federal University of Rio de Janeiro, Brazil;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Scientific workflows; Dataflow; Raw data analysis; Index raw data;

    机译:科学的工作流程;数据流;原始数据分析;索引原始数据;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号