首页> 外文会议>IEEE International Symposium on Parallel Distributed Processing >PreDatA – preparatory data analytics on peta-scale machines
【24h】

PreDatA – preparatory data analytics on peta-scale machines

机译:PETA尺度机上的预替代数据分析

获取原文

摘要

Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to science end users, such data must be organized in its layout, indexed, sorted, and otherwise manipulated for subsequent data presentation, visualization, and detailed analysis. In addition, scientists desire to gain insights into selected data characteristics `hidden' or `latent' in these massive datasets while data is being produced by simulations. PreDatA, short for Preparatory Data Analytics, is an approach to preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines. By dedicating additional compute nodes on the machine as `staging' nodes and by staging simulations' output data through these nodes, PreDatA can exploit their computational power to perform select data manipulations with lower latency than attainable by first moving data into file systems and storage. Such intransit manipulations are supported by the PreDatA middleware through asynchronous data movement to reduce write latency, application-specific operations on streaming data that are able to discover latent data characteristics, and appropriate data reorganization and metadata annotation to speed up subsequent data access. PreDatA enhances the scalability and flexibility of the current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and inspection, as well as for data exchange between concurrently running simulations.
机译:在高端计算(HEC)平台上运行的PETA规模的科学应用程序可以生成大量数据。对于高性能存储,并且为了对科学最终用户有用,必须在其布局,索引,排序和否则以供后续数据呈现,可视化和详细分析中被组织这样的数据。此外,科学家们希望在这些大规模数据集中的所选数据特征“隐藏”或“潜伏”中,在这些大规模数据集中产生的,而数据正在通过模拟生产。 Predata,用于预备数据分析的简短,是一种方法来准备和表征数据,而在PETA级机上运行的大规模模拟。通过将机器上的附加计算节点作为“分段”节点和通过这些节点分段为分段,PEDATA可以利用它们的计算能力来执行比第一次将数据更低的延迟执行的选择数据操作,而不是将数据转移到文件系统和存储中。这种intrata操纵由Predata中间件通过异步数据移动来支持,以减少对流数据的写入延迟,特定于流数据的应用程序,该数据能够发现潜在数据特征,以及适当的数据重组和元数据注释,以加速后续数据访问。 Predata增强了HEC平台上当前I / O堆栈的可扩展性和灵活性,并且对数据预处理,运行时数据分析和检查有用,以及同时运行模拟之间的数据交换。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号