首页> 外文会议>Supercomputing frontiers >Querying Large Scientific Data Sets with Adaptable IO System ADIOS
【24h】

Querying Large Scientific Data Sets with Adaptable IO System ADIOS

机译:使用自适应IO系统ADIOS查询大型科学数据集

获取原文
获取原文并翻译 | 示例

摘要

When working with a large dataset, a relatively small fraction of data records are of interest in each analysis operation. For example, while examining a billion-particle dataset from an accelerator model, the scientists might focus on a few thousand fastest particles, or on the particle farthest from the beam center. In general, this type of selective data access is challenging because the selected data records could be anywhere in the dataset and require a significant amount of time to locate and retrieve. In this paper, we report our experience of addressing this data access challenge with the Adaptable IO System ADIOS. More specifically, we design a query interface for ADIOS to allow arbitrary combinations of range conditions on known variables, implement a number of different mechanisms for resolving these selection conditions, and devise strategies to reduce the time needed to retrieve the scattered data records. In many cases, the query mechanism can retrieve the selected data records orders of magnitude faster than the brute-force approach. Our work relies heavily on the in situ data processing feature of ADIOS to allow user functions to be executed in the data transport pipeline. This feature allows us to build indexes for efficient query processing, and to perform other intricate analyses while the data is in memory.
机译:当使用大型数据集时,在每个分析操作中都需要相对较少的数据记录。例如,在检查加速器模型中的十亿个粒子数据集时,科学家们可能会专注于几千个最快的粒子,或者最远离光束中心的粒子。通常,这种类型的选择性数据访问具有挑战性,因为选定的数据记录可能在数据集中的任何位置,并且需要大量时间来定位和检索。在本文中,我们报告了使用Adaptable IO System ADIOS应对数据访问挑战的经验。更具体地说,我们为ADIOS设计了一个查询界面,以允许对已知变量使用范围条件的任意组合,实现多种用于解决这些选择条件的不同机制,并设计出减少检索分散数据记录所需时间的策略。在许多情况下,查询机制可以比暴力破解方法更快地检索选定的数据记录。我们的工作在很大程度上依赖于ADIOS的原地数据处理功能,以允许用户功能在数据传输管道中执行。此功能使我们可以建立索引以进行有效的查询处理,并在数据存储在内存中时执行其他复杂的分析。

著录项

  • 来源
    《Supercomputing frontiers》|2018年|51-69|共19页
  • 会议地点 Singapore(SG)
  • 作者单位

    Lawrence Berkeley National Laboratory (LBNL). Berkeley, USA;

    Oak Ridge National Laboratory (ORNL), Oak Ridge, USA;

    Oak Ridge National Laboratory (ORNL), Oak Ridge, USA;

    Lawrence Berkeley National Laboratory (LBNL). Berkeley, USA;

    Lawrence Berkeley National Laboratory (LBNL). Berkeley, USA;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号