首页> 外文期刊>Frontiers in Neuroinformatics >Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format
【24h】

Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format

机译:实验目录结构(exdir):HDF5的替代方案而不引入新的文件格式

获取原文
       

摘要

Natural sciences generate an increasing amount of data in a wide range of formats developed by different research groups and commercial companies. At the same time there is a growing desire to share data along with publications in order to enable reproducible research. Open formats have publicly available specifications which facilitate data sharing and reproducible research. Hierarchical Data Format 5 (HDF5) is a popular open format widely used in neuroscience, often as a foundation for other, more specialized formats. However, drawbacks related to HDF5's complex specification have initiated a discussion for an improved replacement. We propose a novel alternative, the Experimental Directory Structure (Exdir), an open specification for data storage in experimental pipelines which amends drawbacks associated with HDF5 while retaining its advantages. HDF5 stores data and metadata in a hierarchy within a complex binary file which, among other things, is not human-readable, not optimal for version control systems, and lacks support for easy access to raw data from external applications. Exdir, on the other hand, uses file system directories to represent the hierarchy, with metadata stored in human-readable YAML files, datasets stored in binary NumPy files, and raw data stored directly in subdirectories. Furthermore, storing data in multiple files makes it easier to track for version control systems. Exdir is not a file format in itself, but a specification for organizing files in a directory structure. Exdir uses the same abstractions as HDF5 and is compatible with the HDF5 Abstract Data Model. Several research groups are already using data stored in a directory hierarchy as an alternative to HDF5, but no common standard exists. This complicates and limits the opportunity for data sharing and development of common tools for reading, writing, and analyzing data. Exdir facilitates improved data storage, data sharing, reproducible research, and novel insight from interdisciplinary collaboration. With the publication of Exdir, we invite the scientific community to join the development to create an open specification that will serve as many needs as possible and as a foundation for open access to and exchange of data.
机译:自然科学在不同研究组和商业公司开发的各种格式中产生越来越多的数据。同时,越来越多的愿望与出版物共享数据,以便能够可再现研究。开放格式具有公开可用的规范,促进了数据共享和可重复的研究。分层数据格式5(HDF5)是一种广泛用于神经科学的流行开放格式,通常是其他更专业的格式的基础。然而,与HDF5的复杂规范相关的缺点已经开始讨论改进的替代品。我们提出了一种新颖的替代方案,实验目录结构(EXDIR),实验管道中的数据存储的开放规范,其修改与HDF5相关的缺点,同时保持其优点。 HDF5将数据和元数据存储在一个复杂的二进制文件中的层次结构中,除其他内容不是人类可读的,而不是用于版本控制系统的最佳状态,并且缺乏支持从外部应用程序轻松访问原始数据。另一方面,exdir使用文件系统目录来表示层次结构,使用存储在人类可读的yaml文件中的元数据,存储在二进制numpy文件中的数据集,以及直接存储在子目录中的原始数据。此外,在多个文件中存储数据使得更容易跟踪版本控制系统。 exdir本身不是文件格式,而是一个规范,用于在目录结构中组织文件。 Exdir使用与HDF5相同的抽象,并与HDF5抽象数据模型兼容。几个研究组已经使用存储在目录层次结构中的数据作为HDF5的替代,但不存在共同标准。这种复杂化并限制了数据共享和开发的机会,用于阅读,写作和分析数据。 exdir促进了跨学科合作的数据存储,数据共享,可重复研究和新颖的洞察力。随着exdir的出版,我们邀请科学界加入开发,创建一个开放规范,这些规范将尽可能多的需求,并作为开放访问和交换数据的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号