PreDatA – preparatory data analytics on peta-scale machines

机译：PETA尺度机上的预替代数据分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to science end users, such data must be organized in its layout, indexed, sorted, and otherwise manipulated for subsequent data presentation, visualization, and detailed analysis. In addition, scientists desire to gain insights into selected data characteristics `hidden' or `latent' in these massive datasets while data is being produced by simulations. PreDatA, short for Preparatory Data Analytics, is an approach to preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines. By dedicating additional compute nodes on the machine as `staging' nodes and by staging simulations' output data through these nodes, PreDatA can exploit their computational power to perform select data manipulations with lower latency than attainable by first moving data into file systems and storage. Such intransit manipulations are supported by the PreDatA middleware through asynchronous data movement to reduce write latency, application-specific operations on streaming data that are able to discover latent data characteristics, and appropriate data reorganization and metadata annotation to speed up subsequent data access. PreDatA enhances the scalability and flexibility of the current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and inspection, as well as for data exchange between concurrently running simulations.

机译：在高端计算（HEC）平台上运行的PETA规模的科学应用程序可以生成大量数据。对于高性能存储，并且为了对科学最终用户有用，必须在其布局，索引，排序和否则以供后续数据呈现，可视化和详细分析中被组织这样的数据。此外，科学家们希望在这些大规模数据集中的所选数据特征“隐藏”或“潜伏”中，在这些大规模数据集中产生的，而数据正在通过模拟生产。 Predata，用于预备数据分析的简短，是一种方法来准备和表征数据，而在PETA级机上运行的大规模模拟。通过将机器上的附加计算节点作为“分段”节点和通过这些节点分段为分段，PEDATA可以利用它们的计算能力来执行比第一次将数据更低的延迟执行的选择数据操作，而不是将数据转移到文件系统和存储中。这种intrata操纵由Predata中间件通过异步数据移动来支持，以减少对流数据的写入延迟，特定于流数据的应用程序，该数据能够发现潜在数据特征，以及适当的数据重组和元数据注释，以加速后续数据访问。 Predata增强了HEC平台上当前I / O堆栈的可扩展性和灵活性，并且对数据预处理，运行时数据分析和检查有用，以及同时运行模拟之间的数据交换。

著录项

来源
《IEEE International Symposium on Parallel Distributed Processing》|2010年||共12页
会议地点
作者
Fang Zheng; Abbasi H.; Docan C.; Lofstead J.; Qing Liu; Klasky S.; Parashar M.; Podhorszki N.; Schwan K.; Wolf M.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.138-53;
关键词

相似文献

外文文献
中文文献
专利

1. Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems [J] . Balayn Agathe, Lofi Christoph, Houben Geert-Jan The VLDB journal . 2021,第5期

机译：管理决策数据数据的偏见和不公平：对机器学习和数据工程方法的调查，以确定和减轻数据管理和分析系统中的偏见和不公平的方法
2. BIG DATA ANALYTICS AND PRECISION ANIMAL AGRICULTURE SYMPOSIUM: Machine learning and data mining advance predictive big data analysis in precision animal agriculture [J] . Morota Gota, Ventura Ricardo V., Silva Fabyano F., Journal of Animal Science . 2018,第4期

机译：大数据分析和精密动物农业研讨会：机器学习与数据挖掘预测性大数据分析精密动物农业
3. Big data analytics using Splunk: deriving operational intelligence from social media, machine data, existing data warehouses, and other real-time streaming sources [J] . Alessandro Berni Computing reviews . 2014,第5期

机译：使用Splunk进行大数据分析：从社交媒体，机器数据，现有数据仓库和其他实时流源中获取运营情报
4. PreDatA – preparatory data analytics on peta-scale machines [C] . Fang Zheng, Abbasi H., Docan C., 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：PreDatA – Peta级机器上的准备数据分析
5. Statistical Approaches for Big Data Analytics and Machine Learning: Data-Driven Network Reconstruction and Predictive Modeling of Time Series Biological Systems. [D] . Farhangmehr, Farzaneh. 2014

机译：大数据分析和机器学习的统计方法：时间序列生物系统的数据驱动网络重构和预测建模。
6. BIG DATA ANALYTICS AND PRECISION ANIMAL AGRICULTURE SYMPOSIUM: Machine learning and data mining advance predictive big data analysis in precision animal agriculture [O] . Gota Morota, Ricardo V Ventura, Fabyano F Silva, 2018

机译：大数据分析和精密动物农业研讨会：机器学习和数据挖掘促进了精确动物农业中的预测性大数据分析
7. Statistics and machine learning methods for EHR data – from data extraction to data analytics [O] . Madan G. Kundu 2021

机译：EHR数据的统计和机器学习方法 - 从数据提取到数据分析

PreDatA – preparatory data analytics on peta-scale machines

摘要

著录项

相似文献

相关主题

期刊订阅