首页> 外文学位 >Computing Sliding Window Aggregates over Data Streams in a Scientific Workflow System.

【24h】

Computing Sliding Window Aggregates over Data Streams in a Scientific Workflow System.

机译：在科学工作流系统中计算数据流上的滑动窗口聚合。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Scientific workflow management is a very useful tool for scientists to create and automate scientific tasks for scientific data management, analysis, simulation, and visualization. Kepler, a scientific workflow management system, is a free, open source software that builds upon Ptolemy II. It supports a design where the components called actors can communicate with one anther via data pipes and schedule execution under different models of computation.;The objective of this thesis is to provide additional capabilities to Kepler in processing continuous data streams. The current approach to doing aggregation in scientific workflows and in Kepler is to use s-aggregations i.e using sequence order only. As we shall see later in more details, using t-aggregation i.e using timestamps can lead to more user-friendly workflows than using s-aggregation. One of the examples is a Growing Degree Day workflow that processes sensor data streams containing hourly temperatures ordered by timestamps. It's aim is to compute daily growing degree day units by counting tuples in every hour in the incoming stream. This is not a very reliable method because we know that rate at which data streams arrive is indeterministic and therefore one cannot, in general, predict the number of tokens present in every window. Another current limitation of Kepler is to compute aggregates over sliding windows where the user has to rerun the workflow for every window. Thus currently Kepler is not very flexible, reliable and user-friendly in performing t-aggregations over sliding windows.;In the context of scientific workflows, we have developed an actor in Kepler that can be used in several workflows to perform t-aggregations. Its input is a data stream which is a sequence of tuples containing the data and the timestamp, and a window stream which is a sequence of tuples containing a start timestamp and an end timestamp. The actor includes the following aggregates: count, sum, average, maximum, minimum, and array (a form of grouping). These are all standard aggregates, except perhaps the array function. This function will group all data values that fill within a t-window into a single array (conceptually: a list). We describe the algorithm and an example demonstrating the working of actor. We also present several scientific case-studies demonstrating the utility of actor in several applications.

机译：科学工作流程管理是科学家创建和自动化科学任务的非常有用的工具，用于科学数据管理，分析，模拟和可视化。开普勒（Kepler）是科学的工作流程管理系统，是基于Ptolemy II的免费开放源代码软件。它支持一种设计，在这种设计中，称为参与者的组件可以通过数据管道与其他花药进行通信，并在不同的计算模型下安排执行时间。本文的目的是为开普勒提供处理连续数据流的附加功能。在科学工作流程和开普勒中进行聚合的当前方法是使用s聚合，即仅使用序列顺序。正如我们将在后面更详细地介绍的那样，使用t聚合（即使用时间戳）比使用s聚合可导致更加用户友好的工作流程。示例之一是“成长日”工作流，该工作流处理包含按时间戳排序的每小时温度的传感器数据流。其目的是通过计算传入流中每小时的元组来计算每日的增长度日单位。这不是一种非常可靠的方法，因为我们知道数据流到达的速率是不确定的，因此通常无法预测每个窗口中存在的令牌数量。开普勒的另一个当前限制是在滑动窗口上计算聚合，用户必须为每个窗口重新运行工作流程。因此，当前开普勒在滑动窗口上执行t聚合时不是很灵活，可靠且用户友好。；在科学工作流的背景下，我们在开普勒中开发了一个actor，可在多个工作流中使用它来执行t聚合。它的输入是一个数据流，该数据流是一个包含数据和时间戳的元组序列，而一个窗口流是一个包含开始时间戳和结束时间戳的元组序列。参与者包括以下汇总：计数，总和，平均值，最大值，最小值和数组（一种分组形式）。这些都是标准的聚合，也许阵列函数除外。此函数会将填充在t窗口中的所有数据值分组到单个数组中（概念上：一个列表）。我们描述了算法，并举例说明了actor的工作。我们还提供了一些科学的案例研究，证明了actor在多种应用中的效用。

著录项

作者
Gulati, Supriya Sudhir.;
展开▼
作者单位

University of California, Davis.;

展开▼
授予单位 University of California, Davis.;
学科 Computer Science.
学位 M.S.
年度 2010
页码 57 p.
总页数 57
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Sliding Window Calculations on Streaming Data using the Kepler Scientific Workflow System [J] . Sven Kohler, Supriya Gulati, Gongjing Cao, Procedia Computer Science . 2012,第1期

机译：使用Kepler Scientific Workflow系统对流数据进行滑动窗口计算
2. Sliding-Window Probabilistic Threshold Aggregate Queries on Uncertain Data Streams [J] . Information Sciences: An International Journal . 2020,第期

机译：滑动窗口概率阈值在不确定数据流上的汇总查询
3. L-BiX: incremental sliding-window aggregation over data streams using linear bidirectional aggregating indexes [J] . Knowledge and information systems . 2020,第8期

机译：L-BIX：使用线性双向聚合索引对数据流的增量滑动窗口聚合
4. Sliding Window Calculations on Streaming Data using the Kepler Scientific Workflow System [C] . Sven K?hler, Supriya Gulati, Gongjing Cao, International Conference on Computational Science . 2013

机译：使用开普勒科学工作流系统的流数据滑动窗口计算
5. Autonomic management of data streaming and in-transit processing for data intensive scientific workflows. [D] . Bhat, Viraj. 2008

机译：数据流的自主管理和数据密集型科学工作流的在途处理。
6. Reducing False Negative Reads in RFID Data Streams Using an Adaptive Sliding-Window Approach [O] . Libe Valentine Massawe, Johnson D. M. Kinyua, Herman Vermaak 2012

机译：使用自适应滑动窗口方法减少RFID数据流中的假阴性读取
7. Sliding Window Calculations on Streaming Data using the Kepler Scientific Workflow System [O] . Kohler Sven, Gulati Supriya, Cao Gongjing, 2012

机译：使用Kepler Scientific Workflow系统对流数据进行滑动窗口计算
8. Computing Diameter in the Streaming and Sliding-Window Models (Preprint) [R] . Feigenbaum, J. , Kannan, S. , Zhang, J. 2002

机译：在流媒体和滑动窗口模型中计算直径（预印本）

Computing Sliding Window Aggregates over Data Streams in a Scientific Workflow System.

摘要

著录项

相似文献

相关主题

期刊订阅