首页> 外文学位 >Computing Sliding Window Aggregates over Data Streams in a Scientific Workflow System.
【24h】

Computing Sliding Window Aggregates over Data Streams in a Scientific Workflow System.

机译:在科学工作流系统中计算数据流上的滑动窗口聚合。

获取原文
获取原文并翻译 | 示例

摘要

Scientific workflow management is a very useful tool for scientists to create and automate scientific tasks for scientific data management, analysis, simulation, and visualization. Kepler, a scientific workflow management system, is a free, open source software that builds upon Ptolemy II. It supports a design where the components called actors can communicate with one anther via data pipes and schedule execution under different models of computation.;The objective of this thesis is to provide additional capabilities to Kepler in processing continuous data streams. The current approach to doing aggregation in scientific workflows and in Kepler is to use s-aggregations i.e using sequence order only. As we shall see later in more details, using t-aggregation i.e using timestamps can lead to more user-friendly workflows than using s-aggregation. One of the examples is a Growing Degree Day workflow that processes sensor data streams containing hourly temperatures ordered by timestamps. It's aim is to compute daily growing degree day units by counting tuples in every hour in the incoming stream. This is not a very reliable method because we know that rate at which data streams arrive is indeterministic and therefore one cannot, in general, predict the number of tokens present in every window. Another current limitation of Kepler is to compute aggregates over sliding windows where the user has to rerun the workflow for every window. Thus currently Kepler is not very flexible, reliable and user-friendly in performing t-aggregations over sliding windows.;In the context of scientific workflows, we have developed an actor in Kepler that can be used in several workflows to perform t-aggregations. Its input is a data stream which is a sequence of tuples containing the data and the timestamp, and a window stream which is a sequence of tuples containing a start timestamp and an end timestamp. The actor includes the following aggregates: count, sum, average, maximum, minimum, and array (a form of grouping). These are all standard aggregates, except perhaps the array function. This function will group all data values that fill within a t-window into a single array (conceptually: a list). We describe the algorithm and an example demonstrating the working of actor. We also present several scientific case-studies demonstrating the utility of actor in several applications.
机译:科学工作流程管理是科学家创建和自动化科学任务的非常有用的工具,用于科学数据管理,分析,模拟和可视化。开普勒(Kepler)是科学的工作流程管理系统,是基于Ptolemy II的免费开放源代码软件。它支持一种设计,在这种设计中,称为参与者的组件可以通过数据管道与其他花药进行通信,并在不同的计算模型下安排执行时间。本文的目的是为开普勒提供处理连续数据流的附加功能。在科学工作流程和开普勒中进行聚合的当前方法是使用s聚合,即仅使用序列顺序。正如我们将在后面更详细地介绍的那样,使用t聚合(即使用时间戳)比使用s聚合可导致更加用户友好的工作流程。示例之一是“成长日”工作流,该工作流处理包含按时间戳排序的每小时温度的传感器数据流。其目的是通过计算传入流中每小时​​的元组来计算每日的增长度日单位。这不是一种非常可靠的方法,因为我们知道数据流到达的速率是不确定的,因此通常无法预测每个窗口中存在的令牌数量。开普勒的另一个当前限制是在滑动窗口上计算聚合,用户必须为每个窗口重新运行工作流程。因此,当前开普勒在滑动窗口上执行t聚合时不是很灵活,可靠且用户友好。;在科学工作流的背景下,我们在开普勒中开发了一个actor,可在多个工作流中使用它来执行t聚合。它的输入是一个数据流,该数据流是一个包含数据和时间戳的元组序列,而一个窗口流是一个包含开始时间戳和结束时间戳的元组序列。参与者包括以下汇总:计数,总和,平均值,最大值,最小值和数组(一种分组形式)。这些都是标准的聚合,也许阵列函数除外。此函数会将填充在t窗口中的所有数据值分组到单个数组中(概念上:一个列表)。我们描述了算法,并举例说明了actor的工作。我们还提供了一些科学的案例研究,证明了actor在多种应用中的效用。

著录项

  • 作者

    Gulati, Supriya Sudhir.;

  • 作者单位

    University of California, Davis.;

  • 授予单位 University of California, Davis.;
  • 学科 Computer Science.
  • 学位 M.S.
  • 年度 2010
  • 页码 57 p.
  • 总页数 57
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号