首页> 外文会议>IFIP WG 5.11 international symposium on environmental software systemsISESS >A Best of Both Worlds Approach to Complex, Efficient, Time Series Data Delivery
【24h】

A Best of Both Worlds Approach to Complex, Efficient, Time Series Data Delivery

机译:最好的两个世界方法都可以复杂,高效,时间序列数据传递

获取原文

摘要

Point time series are a key data-type for the description of real or modelled environmental phenomena. Delivering this data in useful ways can be challenging when the data volume is large, when computational work (such as aggregation, subsetting, or re-sampling) needs to be performed, or when complex metadata is needed to place data in context for understanding. Some aspects of these problems are especially relevant to the environmental domain: large sensor networks measuring continuous environmental phenomena sampling frequently over long periods of time generate very large datasets, and rich metadata is often required to understand the context of observations. Nevertheless, timeseries data, and most of these challenges, are prevalent beyond the environmental domain, for example in financial and industrial domains. A review of recent technologies illustrates an emerging trend toward high performance, lightweight, databases specialized for time series data. These databases tend to have non-existent or minimalistic formal metadata capacities. In contrast, the environmental domain boasts standards such as the Sensor Observation Service (SOS) that have mature and comprehensive metadata models but existing implementations have had problems with slow performance. In this paper we describe our hybrid approach to achieve efficient delivery of large time series datasets with complex metadata. We use three subsystems within a single system-of-systems: a proxy (Python), an efficient time series database (InfluxDB) and a SOS implementation (52 North SOS). Together these present a regular SOS interface. The proxy processes standard SOS queries and issues them to the either 52 North SOS or to InfluxDB for processing. Responses are returned directly from 52 North SOS or indirectly from InfluxDB via Python proxy where they are processed into WaterML. This enables the scalability and performance advantages of the time series database to be married with the sophisticated metadata handling of SOS. Testing indicates that a recent version of 52 North SOS configured with a Postgres/PostGIS database performs well but an implementation incorporating InfluxDB and 52 North SOS in a hybrid architecture performs approximately 12 times faster.
机译:点时间序列是用于真实或建模环境现象的描述的关键数据类型。当数据量很大时,在需要执行计算工作(例如聚合,子集或重新采样)时,或者当需要在上下文中放置数据以进行理解时,将这些数据提供具体挑战。这些问题的某些方面与环境域尤其相关:大传感器网络测量连续环境现象的频率经常在很长一段时间内产生非常大的数据集,并且通常需要丰富的元数据来理解观察的背景。尽管如此,数据和大多数这些挑战都在环境领域中普遍存在,例如金融和工业领域。最近技术的审查说明了高性能,轻量级,专门用于时间序列数据的数据库的新兴趋势。这些数据库倾向于具有不存在或最小的正式元数据能力。相比之下,环境领域拥有具有成熟和全面的元数据模型的传感器观测服务(SOS),但现有实现具有较慢的性能问题。在本文中,我们描述了使用复杂元数据实现大型时间序列数据集的混合方法。我们在单个系统系统中使用三个子系统:代理(Python),一个有效的时间序列数据库(涌入DB)和SOS实现(52北SOS)。这些常规SOS界面一起。代理处理标准SOS查询,并将其发出至52个北部SO或涌入以进行处理。响应直接从52个北部SOS或间接从涌入,通过Python Proxy进入涌入,在那里他们被加工到Waterml中。这使得时间序列数据库的可扩展性和性能优势与SOS的复杂元数据处理结婚。测试表明,最新版本的52个北部SOS配置了Postgres / Postgis数据库,但在混合架构中包含influxDB和52 North SO的实现大约需要12倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号