首页> 外文期刊>Journal of Parallel and Distributed Computing >Incremental dataflow execution, resource efficiency and probabilistic guarantees with Fuzzy Boolean nets
【24h】

Incremental dataflow execution, resource efficiency and probabilistic guarantees with Fuzzy Boolean nets

机译:使用模糊布尔网络递增数据流执行,资源效率和概率保证

获取原文
获取原文并翻译 | 示例

摘要

Currently, there is a strong need for organizations to analyze and process ever-increasing volumes of data in order to answer to real-time processing demands. Such continuous and data-intensive processing is often achieved through the composition of complex data-intensive workflows (i.e., dataflows). Dataflow management systems typically enforce strict temporal synchronization across the various processing steps. Non-synchronous behavior often has to be explicitly programmed on an ad-hoc basis, which requires additional lines of code in programs and thus the possibility of errors. More so, in a large set of scenarios for continuous and incremental processing, the output of dataflow applications at each execution can suffer almost no difference when comparing to the previous execution, and therefore resources, energy and computational power are unknowingly wasted. To face such lack of efficiency, transparency, and generality, we introduce the notion of Quality-of-Data (QpD), which describes the level of changes required on a data store that cause the triggering of processing steps. This, so that the dataflow (re-)execution is reduced until its outcome would reach a significant and meaningful variation, which is inside a specified freshness limit. Based on the QoD notion, we propose a novel dataflow model, with framework (Fluxy), for orchestrating data-intensive processing steps, which communicate data via a NoSQL storage, and whose triggering semantics is driven by dynamic QoD constraints automatically defined for different datasets by means of Fuzzy Boolean Nets. These nets give probabilistic guarantees about the prediction of the cumulative error between consecutive dataflow executions. With Fluxy, we demonstrate how dataflows can be leveraged to respond to quality boundaries (that can be seen as SLAs) to deliver controlled and augmented performance, rationalization of resources, and task prioritization.
机译:当前,企业非常需要分析和处理不断增长的数据量,以便满足实时处理需求。通常通过复杂的数据密集型工作流程(即数据流)的组合来实现这种连续的数据密集型处理。数据流管理系统通常会在各个处理步骤之间实施严格的时间同步。通常必须在临时基础上对非同步行为进行显式编程,这需要在程序中添加额外的代码行,因此可能会出现错误。更重要的是,在用于连续和增量处理的大量场景中,与前一次执行相比,每次执行时数据流应用程序的输出几乎没有差异,因此,在不知不觉中浪费了资源,精力和计算能力。为了解决效率,透明度和通用性方面的不足,我们引入了数据质量(QpD)概念,该概念描述了导致触发处理步骤的数据存储所需的更改级别。这样,可以减少数据流(重新执行),直到其结果达到显着且有意义的变化为止,该变化在指定的新鲜度限制内。基于QoD的概念,我们提出了一个新颖的数据流模型,该框架带有框架(Fluxy),用于编排数据密集型处理步骤,这些步骤通过NoSQL存储传递数据,并且其触发语义由为不同数据集自动定义的动态QoD约束驱动通过模糊布尔网络。这些网络为连续数据流执行之间的累积误差的预测提供了概率保证。借助Fluxy,我们演示了如何利用数据流来响应质量边界(可以看作SLA),以提供受控和增强的性能,资源合理化以及任务优先级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号