首页> 外文期刊>Parallel Computing >GeneaLog: Fine-grained data streaming provenance in cyber-physical systems
【24h】

GeneaLog: Fine-grained data streaming provenance in cyber-physical systems

机译:GeneaLog:网络物理系统中的细粒度数据流来源

获取原文
获取原文并翻译 | 示例

摘要

Streaming applications continuously process data to deliver streams of up-to-date results. Their growing adoption for data analysis in many distributed systems is motivated by their performance (in terms of processing throughput and latency) and their support for easy-to-program distributed and parallel analysis. When streaming applications are designed to detect unusual or critical events (e.g., security-or safety-related), it can be beneficial to maintain the associated source data for further analysis. This can be achieved by fine-grained data provenance, which links each detected event back to the source data that contributed to it, allowing to distinguish and isolate the source data that generated such unusual or critical events.Fine-grained data provenance can be especially useful in cyber-physical systems, such as vehicular networks and smart grids. By enabling the extraction of valuable information from raw sensor data, it could, for instance, reduce data transmission and storage requirements. Since cyber-physical systems can have heterogeneous multi-core architectures, ranging from inexpensive single-board computers to high end servers, there is a demand for efficient provenance techniques that can take advantage of such parallel architectures with minimal overhead. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard (instrumented) data streaming operators. This is particularly useful to distribute the provenance overheads to operators that can be run in parallel, thus leveraging multi core architectures. We evaluate two implementations of GeneaLog, one based on Apache Rink, a widely-adopted state-of-the-art Stream Processing Engine, and one based on Liebre, an edge-tailored lightweight Stream Processing Engine. We test them both on vehicular and smart grid applications with single-board embedded devices and a high-end server, also studying how GeneaLog affects their scalability and confirming that it efficiently captures fine-grained provenance data with minimal overhead. (C) 2019 Elsevier B.V. All rights reserved.
机译:流应用程序连续处理数据以提供最新结果流。它们在性能(在处理吞吐量和延迟方面)以及对易于编程的分布式和并行分析的支持,促使它们在许多分布式系统中越来越多地采用数据分析。当流应用程序设计为检测异常或严重事件(例如,与安全性或安全性相关)时,维护关联的源数据以进行进一步分析可能是有益的。这可以通过细粒度的数据来源来实现,该数据来源将每个检测到的事件都链接到对事件有贡献的源数据,从而可以区分和隔离生成此类异常或关键事件的源数据。在诸如车载网络和智能电网等网络物理系统中很有用。通过从原始传感器数据中提取有价值的信息,可以例如减少数据传输和存储需求。由于网络物理系统可以具有从便宜的单板计算机到高端服务器的异构多核体系结构,因此需要一种高效的出处技术,以最小的开销利用此类并行体系结构。受此挑战的启发,我们提出了GeneaLog,这是一种用于数据流应用程序的新颖的细粒度数据出处技术。利用数据的逻辑依赖性,GeneaLog可以利用软件堆栈的跨层属性,并产生最小的,恒定的每元组开销。此外,它允许仅使用标准(仪表)数据流运算符进行模块化且高效的算法实现。这对于将来源开销分配给可以并行运行的运营商特别有用,从而可以利用多核体系结构。我们评估了GeneaLog的两种实现,一种基于广泛使用的最新流处理引擎Apache Rink,另一种基于边缘定制的轻量级流处理引擎Liebre。我们在具有单板嵌入式设备和高端服务器的车载和智能电网应用程序上对它们进行了测试,还研究了GeneaLog如何影响其可扩展性,并确认它可以以最小的开销有效地捕获细粒度的出处数据。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号