首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring
【2h】

A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring

机译:大数据平台上异构数据实时分析的分布式流处理中间件框架:环境监测案例

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical ‘batch’ processing—extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using open source technologies in a big data environment. The system ingests datasets from legacy systems and sensor data from heterogeneous automated weather systems irrespective of the data types to Apache topics using Connect APIs for processing by the streaming processing engine. The stream processing engine executes the predictive numerical models and algorithms represented in event processing (EP) languages for real-time analysis of the data streams. To prove the feasibility of the proposed framework, we implemented the system using a case study scenario of drought prediction and forecasting based on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form that could be executed by the streaming engine for real-time computing. Secondly, the model is applied to the ingested data streams and datasets to predict drought through persistent querying of the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of the distributed stream processing middleware infrastructure is calculated to determine the real-time effectiveness of the framework.
机译:近年来,基于物联网(IoT)的技术的应用和广泛使用增加了监视系统的扩散,因此,它已成倍增加了生成的异构数据量。处理和分析生成的大量数据非常麻烦,并且逐渐从经典的“批处理”(提取,转换,加载(ETL)技术)转变为实时处理。例如,在环境监测和管理领域,时间序列数据和历史数据集对于预测模型至关重要。但是,环境监控领域仍然使用传统系统,这使基本数据的实时分析,与大数据平台的集成以及对批处理的依赖变得复杂。在本文中,作为解决方案,提出了一种用于实时分析异构环境监视和管理数据的分布式流处理中间件框架,并在大数据环境中使用开源技术在群集上对其进行了测试。无论使用哪种数据类型,系统都会使用Connect API从旧系统中提取数据集,从异构自动化天气系统中提取传感器数据,以供流处理引擎进行处理。流处理引擎执行以事件处理(EP)语言表示的预测性数值模型和算法,以对数据流进行实时分析。为了证明所提出的框架的可行性,我们使用了基于有效干旱指数(EDI)模型的干旱预测和预报案例研究案例来实现了该系统。首先,我们将预测模型转换为可由流引擎执行以进行实时计算的形式。其次,将该模型应用于摄取的数据流和数据集,以通过对无限流的持续查询来检测干旱以预测干旱。作为本研究的结论,计算了分布式流处理中间件基础结构的性能评估,以确定框架的实时有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号