首页> 外文会议>International Conference on Scientific Computing >Using the Centinel Data Format to Decouple Data Creation from Data Processing in Scientific Programs
【24h】

Using the Centinel Data Format to Decouple Data Creation from Data Processing in Scientific Programs

机译:使用CentInel数据格式将数据创建与科学节目中的数据处理一起删除

获取原文

摘要

Multi-dimensional numerical arrays are a staple of many scientific computer programs, where processing may be intricate but where data structures can be simple. Data for these arrays may be read into the program from text files assembled in advance, often laboriously from multiple sources or from large-scale databases. Notwithstanding simplicity in the structure of such files, their multi-dimensional nature and the very regularity of their data makes it difficult or impossible to know by inspection that they are assembled exactly as required by the processing programs. Moreover, data errors inadvertently may appear through unintended alteration of some parts of a file while other parts intentionally are being edited. Verifying the correctness of scientific programs is hindered by such difficulties. Here we describe how we have applied the Centinel archival data format to such problems. Centinel (1) provides a format that can be read without difficulty by both people and computers, (2) keeps all metadata locally in the same files as the data themselves, and (3) optionally protects the data with error correcting codes on each row, from the time the data are prepared until they are finally processed. In addition, we show how we have used the Centinel format to produce prototypes of large datasets for initial program testing before the actual data have been prepared. This effort is one step in the uncompromising process of ensuring that complex scientific programs rigorously perform the tasks they are intended to do.
机译:多维数值阵列是许多科学计算机程序的主食,其中处理可能是复杂的,但数据结构可以简单。可以从预先组装的文本文件中读取这些阵列的数据,通常来自多个源或来自大规模数据库。尽管在这些文件的结构中,但它们的多维性质和数据的非常规律使其难以或不可能知道它们由处理程序的要求完全组装。此外,数据错误可能通过意外改变文件的某些部分而无意中出现,而有意地编辑其他部件。验证科学计划的正确性受到这种困难的阻碍。在这里,我们描述了我们如何将Centinel档案数据格式应用于此类问题。 Centinel(1)提供了一种可以毫无困难地读取的格式,(2)(2)将所有元数据在与数据本身的同一文件中保存在同一文件中,(3)可选地保护数据在每行上有错误纠正代码。 ,从准备数据之前直到它们最终处理。此外,我们展示了我们如何使用CentInel格式在准备实际数据之前为初始程序测试产生大型数据集的原型。这种努力是不妥协的过程中的一步,确保复杂的科学计划严格执行他们打算做的任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利