首页> 外文OA文献 >Implementation of parallel NetCDF in the ParFlow hydrological model: A code modernisation effort as part of a big data handling strategy
【2h】

Implementation of parallel NetCDF in the ParFlow hydrological model: A code modernisation effort as part of a big data handling strategy

机译:在ParFlow水文模型中实现并行NetCDF:将代码现代化作为大数据处理策略的一部分

摘要

State-of-the-art geoscience simulations are tending towards ever increasing model complexity. This is due to the incorporation of multi-physics and fully coupled model systems, often in combination with higher spatial resolutions. In addition, simulations are being run for longer time periods in order to model phenomena such as climate change and water resources. These factors combined lead to a big data challenge. This data challenge is typically characterized by the TB-scale data volumes involved, namely I/O, where data variety, velocity and complexity are smaller issues in comparison. In this context, the NIC Scientific Big Data Analytics project “Towards a high-performance big data storage, handling and analysis framework for Earth science simulations” has been working since autumn 2015 on a code modernisation effort, towards a big data readiness of geoscience simulation codes, and data processing and analysis applications. The simulation code considered is the massively MPI-parallel hydrological model ParFlow. Thus far, work has centred around the modernisation of ParFlow's parallel I/O: A standalone C code was used to assess and test the pNetCDF and the HDF5-based NetCDF4 I/O libraries' features and their parallel read and write performance. Tuning and scaling studies on the JSC/JURECA HPC system led to optimised runtime environment settings and a near linear scaling behaviour of the API. This MPI C-code can be used as a showcase implementation for parallel I/O for some of the Geoverbund ABC/J modelling groups. The NetCDF4 interface was chosen as it constitutes a quasi-standard in geosciences and ensures consistent and efficient data flow paths and compression. The I/O testing and the scaling experiments have been done in a JUBE2-based benchmarking framework which also integrates the Score-P profiling and tracing infrastructure, the Scalasca performance optimisation tool and the Darshan HPC I/O characterisation tool. This JUBE2-based framework was then further extended to act as a portable generic testing platform for all benchmarking, development and testing work with ParFlow, including idealised and real data reference test cases for weak and strong scaling studies, a variety of compiler options, as well as common profiling tools, which are all embedded in an easy to use run environment. To further improve ParFlow's I/O functionality, we propose adding NetCDF4 interfaces that write to a shared compressed NetCDF file concurrently with one MPI task per node. The proposed code will automatically adjust for the computational set up, such as gathering of data on single node, number of nodes, I/O interfaces and MPI ranks per node. Another obvious big data challenge for complex geoscience simulations is post-processing terabytes of data. Therefore we plan to develop on-the-fly processing and visualisation for ParFlow, once the I/O optimisation is finished. This will be a joint effort with the JSC Cross Sectional Team Visualisation, in order to implement an in-situ, i.e. during runtime, processing and visualisation functionality, using the VisIt software. Additionally, this will help to improve scalability and performance whilst substantially reducing total processing time and model output.
机译:最先进的地球科学模拟趋向于日益增加的模型复杂性。这是由于通常将更高的空间分辨率与多物理场和完全耦合的模型系统结合使用。另外,为了模拟气候变化和水资源等现象,模拟运行了更长的时间。这些因素共同导致了大数据挑战。这种数据挑战通常以涉及的TB级数据量(即I / O)为特征,相比之下,数据种类,速度和复杂性是较小的问题。在这种情况下,NIC科学大数据分析项目“致力于为地球科学模拟建立高性能的大数据存储,处理和分析框架”自2015年秋季以来一直致力于代码现代化工作,以实现地球科学模拟的大数据准备代码以及数据处理和分析应用程序。所考虑的模拟代码是大规模MPI并行水文模型ParFlow。到目前为止,工作集中在ParFlow并行I / O的现代化上:一个独立的C代码用于评估和测试pNetCDF和基于HDF5的NetCDF4 I / O库的功能及其并行读写性能。在JSC / JURECA HPC系统上进行的调优和缩放研究导致优化的运行时环境设置和API的接近线性缩放行为。该MPI C代码可以用作Geoverbund ABC / J建模组的并行I / O的展示实现。选择NetCDF4接口是因为它构成了地球科学的准标准,并确保了一致且有效的数据流路径和压缩。 I / O测试和扩展实验已在基于JUBE2的基准测试框架中完成,该框架还集成了Score-P分析和跟踪基础结构,Scalasca性能优化工具和Darshan HPC I / O表征工具。然后,进一步扩展了这个基于JUBE2的框架,使其成为可移植的通用测试平台,用于ParFlow的所有基准测试,开发和测试工作,包括用于弱扩展和强扩展研究的理想化和真实数据参考测试用例,以及各种编译器选项,例如以及常见的分析工具,它们都嵌入在易于使用的运行环境中。为了进一步改善ParFlow的I / O功能,我们建议添加NetCDF4接口,该接口同时向每个节点一个MPI任务的同时写入共享的压缩NetCDF文件。建议的代码将自动针对计算设置进行调整,例如在单个节点上收集数据,节点数,I / O接口和每个节点的MPI等级。对于复杂的地球科学模拟,另一个明显的大数据挑战是对数TB的数据进行后处理。因此,一旦I / O优化完成,我们计划为ParFlow开发即时处理和可视化。这是与JSC横断面团队可视化部门共同努力的结果,以便在现场(即在运行时,处理和可视化功能期间)使用VisIt软件进行实施。此外,这将有助于提高可伸缩性和性能,同时大幅减少总处理时间和模型输出。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号