首页> 外文会议>2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) >PreDatA – preparatory data analytics on peta-scale machines
【24h】

PreDatA – preparatory data analytics on peta-scale machines

机译:PreDatA – Peta级机器上的准备数据分析

获取原文
获取原文并翻译 | 示例

摘要

Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to science end users, such data must be organized in its layout, indexed, sorted, and otherwise manipulated for subsequent data presentation, visualization, and detailed analysis. In addition, scientists desire to gain insights into selected data characteristics ‘hidden’ or ‘latent’ in these massive datasets while data is being produced by simulations. PreDatA, short for Preparatory Data Analytics, is an approach to preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines. By dedicating additional compute nodes on the machine as ‘staging’ nodes and by staging simulations'' output data through these nodes, PreDatA can exploit their computational power to perform select data manipulations with lower latency than attainable by first moving data into file systems and storage. Such intransit manipulations are supported by the PreDatA middleware through asynchronous data movement to reduce write latency, application-specific operations on streaming data that are able to discover latent data characteristics, and appropriate data reorganization and metadata annotation to speed up subsequent data access. PreDatA enhances the scalability and flexibility of the current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and inspection, as well as for data exchange between concurrently running simulations.
机译:在高端计算(HEC)平台上运行的Peta级科学应用程序可以生成大量数据。为了实现高性能存储并为了对科学最终用户有用,此类数据必须按其布局进行组织,索引,排序和以其他方式进行处理,以进行后续数据呈现,可视化和详细分析。此外,科学家希望在通过模拟生成数据的同时,洞悉这些海量数据集中“隐藏”或“潜在”的选定数据特征。 PreDatA是Preparatory Data Analytics的缩写,是一种在Peta规模的计算机上运行的大规模模拟生成数据时对其进行准备和表征的方法。通过将机器上的其他计算节点指定为“暂存”节点,并通过这些节点暂存模拟“输出数据”,PreDatA可以利用其计算能力来执行选择数据操作,其延迟时间比先将数据移入文件系统和存储所能达到的延迟要短。 。 PreDatA中间件通过异步数据移动来减少写入等待时间,对流数据进行能够发现潜在数据特征的特定于应用程序的操作以及适当的数据重组和元数据注释,以加快后续数据访问的速度,从而为此类过渡操作提供支持。 PreDatA增强了HEC平台上当前I / O堆栈的可伸缩性和灵活性,对于数据预处理,运行时数据分析和检查以及同时运行的仿真之间的数据交换非常有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号