首页> 外文期刊>Concurrency and Computation >In‐place query driven big data platform: Applications to post processing of environmental monitoring
【24h】

In‐place query driven big data platform: Applications to post processing of environmental monitoring

机译:就地查询驱动的大数据平台:环境监测后处理的应用程序

获取原文
获取原文并翻译 | 示例

摘要

This paper describes the use of an experimental big data platform for applications of environmentalrnmonitoring associated with visualization of global climate forecast data and air quality modelrnsimulation and response. Environmental monitoring in general requires both capabilities of modelrnsimulation for forecast, and data processing for visualization and analyses. The in‐place queryrndriven big data platform, based on concepts of Query Driven Visualization and shared‐nothingrndistributed database, thus is developed for the need. The system architecture of this experimentalrnbig data platform entails one master data node and 17 slave data nodes, while the system links tornthe National Center for High‐performance Computing supercomputer, Advanced Large‐scale ParallelrnSupercluster, and storage pool. For software implementation, the openSUSE operating systemrnand MariaDB database are installed on all nodes. The master data node is responsible forrnmetadata management and information integration and the 17 slave data nodes for distributedrndatabase and parallel model simulation, data visualization, and analyses. The application of globalrnclimate data visualization (Outgoing Longwave Radiation or OLR, temperature, rainfall, etc.) in thernplatform serves first to partition Network Common Data Form file data into shared‐nothing distributedrndatabases for partial visualization in slave data nodes, then integrated into whole visualizationrnin the master node through Message Passing Interface communication.rnFor the application of air quality management, we first accessed Taiwan Environmental ProtectionrnAdministration (EPA) observed data in the master node. EPA observed data are replicatedrnto distributed databases in slave nodes; and the air pollution model, Gaussian plume trajectoryrnmodel, is replicated in all slave nodes for model simulation, which produces output data and associatedrnimage files in the local file system. The master node is able to collect whole image filesrnthrough the remote shared file system for display of the results. We can see the approach of datarnI/O access in 2 applications, due to individual problem features, each application is unique. Examplesrnof benchmark cases reveal strong performance in accelerating computing speed and reducingrnthe I/O operational time. It is found that the platform is able to accelerate climate data visualizationrnprocesses, help research scientists gain the deep insights into data, and explore the potentialrnphenomena and features, such as formation of Typhoon eddies. In air quality management applications,rnthe platform is used to perform the air pollution model Gaussian plume trajectory model.rnBackward trajectory simulation of PM2.5 concentrations is used to identify the 30+ pointrnsource's contribution on 73 EPA monitor stations (receptors) in Taiwan. A user‐friendly, web‐servicernbased big data presentation uses the heterogeneous observed and forecast pollutant data inrnspace and time. The results support for air quality decision‐making and emergency response. Thernlimitation of data size for applications in the platform, the current users and future developmentrnof the platform, and the linkage of PRAGMA collaboration are also described in the paper.
机译:本文介绍了实验性大数据平台在环境监测与全球气候预测数据的可视化以及空气质量模型模拟和响应相关的应用中的使用。一般而言,环境监测既需要模型仿真的能力进行预测,又需要进行数据处理以进行可视化和分析。因此,基于查询驱动可视化和无共享分布式数据库的概念开发了就地查询驱动的大数据平台。这个实验性大数据平台的系统架构需要一个主数据节点和17个从数据节点,而该系统则链接到国家高性能计算超级计算机中心,高级大规模并行超级集群和存储池。为了实现软件,在所有节点上都安装了openSUSE操作系统和MariaDB数据库。主数据节点负责元数据管理和信息集成,而17个从数据节点负责分布式数据库和并行模型仿真,数据可视化和分析。在rnrnplatform中应用全局气候数据可视化(传出长波辐射或OLR,温度,降雨等)首先是将Network Common Data Form文件数据划分为无共享的分布式数据库,以便在从属数据节点中进行部分可视化,然后集成到整个可视化中。对于空气质量管理的应用,我们首先访问了台湾环境保护署(EPA)在主节点中观察到的数据。 EPA观察到的数据被复制到从属节点中的分布式数据库中;空气污染模型高斯羽流轨迹模型会在所有从节点中复制以进行模型仿真,从而在本地文件系统中生成输出数据和相关的图像文件。主节点能够通过远程共享文件系统收集整个图像文件以显示结果。我们可以看到在2个应用程序中进行数据I / O访问的方法,由于各自的问题特征,每个应用程序都是唯一的。基准测试案例展示了在加快计算速度和减少I / O操作时间方面的强大性能。发现该平台能够加速气候数据可视化过程,帮助研究科学家获得对数据的深刻见解,并探索潜在的现象和特征,例如台风涡流的形成。在空气质量管理应用中,该平台用于执行空气污染模型高斯羽流轨迹模型。pm2.5浓度的向后轨迹仿真用于确定台湾73个EPA监测站(接收器)中30+点源的贡献。用户友好的基于Web服务的大数据表示使用空间和时间上的异构观测和预测污染物数据。结果支持空气质量决策和应急响应。本文还描述了平台中应用程序的数据大小限制,平台的当前用户和未来开发以及PRAGMA协作的链接。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号