PreDatA – preparatory data analytics on peta-scale machines

机译：PreDatA – Peta级机器上的准备数据分析

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to science end users, such data must be organized in its layout, indexed, sorted, and otherwise manipulated for subsequent data presentation, visualization, and detailed analysis. In addition, scientists desire to gain insights into selected data characteristics ‘hidden’ or ‘latent’ in these massive datasets while data is being produced by simulations. PreDatA, short for Preparatory Data Analytics, is an approach to preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines. By dedicating additional compute nodes on the machine as ‘staging’ nodes and by staging simulations'' output data through these nodes, PreDatA can exploit their computational power to perform select data manipulations with lower latency than attainable by first moving data into file systems and storage. Such intransit manipulations are supported by the PreDatA middleware through asynchronous data movement to reduce write latency, application-specific operations on streaming data that are able to discover latent data characteristics, and appropriate data reorganization and metadata annotation to speed up subsequent data access. PreDatA enhances the scalability and flexibility of the current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and inspection, as well as for data exchange between concurrently running simulations.

机译：在高端计算（HEC）平台上运行的Peta级科学应用程序可以生成大量数据。为了实现高性能存储并为了对科学最终用户有用，此类数据必须按其布局进行组织，索引，排序和以其他方式进行处理，以进行后续数据呈现，可视化和详细分析。此外，科学家希望在通过模拟生成数据的同时，洞悉这些海量数据集中“隐藏”或“潜在”的选定数据特征。 PreDatA是Preparatory Data Analytics的缩写，是一种在Peta规模的计算机上运行的大规模模拟生成数据时对其进行准备和表征的方法。通过将机器上的其他计算节点指定为“暂存”节点，并通过这些节点暂存模拟“输出数据”，PreDatA可以利用其计算能力来执行选择数据操作，其延迟时间比先将数据移入文件系统和存储所能达到的延迟要短。。 PreDatA中间件通过异步数据移动来减少写入等待时间，对流数据进行能够发现潜在数据特征的特定于应用程序的操作以及适当的数据重组和元数据注释，以加快后续数据访问的速度，从而为此类过渡操作提供支持。 PreDatA增强了HEC平台上当前I / O堆栈的可伸缩性和灵活性，对于数据预处理，运行时数据分析和检查以及同时运行的仿真之间的数据交换非常有用。

著录项

来源
《2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS)》|2010年|P.1-12|共12页
会议地点 Atlanta GA(US);Atlanta GA(US)
作者
Zheng Fang; Abbasi Hasan; Docan Ciprian; Lofstead Jay; Liu Qing; Klasky Scott; Parashar Manish; Podhorszki Norbert; Schwan Karsten; Wolf Matthew;
展开▼
作者单位

College of Computing, Georgia Institute of Technology, Atlanta, GA 30332;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.133;
关键词

相似文献

外文文献
中文文献
专利

1. Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems [J] . Balayn Agathe, Lofi Christoph, Houben Geert-Jan The VLDB journal . 2021,第5期

机译：管理决策数据数据的偏见和不公平：对机器学习和数据工程方法的调查，以确定和减轻数据管理和分析系统中的偏见和不公平的方法
2. BIG DATA ANALYTICS AND PRECISION ANIMAL AGRICULTURE SYMPOSIUM: Machine learning and data mining advance predictive big data analysis in precision animal agriculture [J] . Morota Gota, Ventura Ricardo V., Silva Fabyano F., Journal of Animal Science . 2018,第4期

机译：大数据分析和精密动物农业研讨会：机器学习与数据挖掘预测性大数据分析精密动物农业
3. Big data analytics using Splunk: deriving operational intelligence from social media, machine data, existing data warehouses, and other real-time streaming sources [J] . Alessandro Berni Computing reviews . 2014,第5期

机译：使用Splunk进行大数据分析：从社交媒体，机器数据，现有数据仓库和其他实时流源中获取运营情报
4. PreDatA – preparatory data analytics on peta-scale machines [C] . Fang Zheng, Abbasi H., Docan C., 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：PreDatA – Peta级机器上的准备数据分析
5. Statistical Approaches for Big Data Analytics and Machine Learning: Data-Driven Network Reconstruction and Predictive Modeling of Time Series Biological Systems. [D] . Farhangmehr, Farzaneh. 2014

机译：大数据分析和机器学习的统计方法：时间序列生物系统的数据驱动网络重构和预测建模。
6. BIG DATA ANALYTICS AND PRECISION ANIMAL AGRICULTURE SYMPOSIUM: Machine learning and data mining advance predictive big data analysis in precision animal agriculture [O] . Gota Morota, Ricardo V Ventura, Fabyano F Silva, 2018

机译：大数据分析和精密动物农业研讨会：机器学习和数据挖掘促进了精确动物农业中的预测性大数据分析
7. Statistics and machine learning methods for EHR data – from data extraction to data analytics [O] . Madan G. Kundu 2021

机译：EHR数据的统计和机器学习方法 - 从数据提取到数据分析

PreDatA – preparatory data analytics on peta-scale machines

摘要

著录项

相似文献

相关主题

期刊订阅