首页> 外文学位 >Algorithms for data stream systems.
【24h】

Algorithms for data stream systems.

机译:数据流系统的算法。

获取原文
获取原文并翻译 | 示例

摘要

In a growing number of information-processing applications, data takes the form of “continuous data streams” rather than traditional stored databases. These applications share several distinguishing features like the need for real time analysis, huge volumes of data, and unpredictable and bursty data arrivals. Application areas include network-traffic monitoring, sensor networks, and many more. For such applications, it is not feasible to simply load the arriving data into a traditional database management system (DBMS) or load it into main memory, and operate on it there. The failure to use these solutions precludes random access to data elements that is assumed in the traditional main memory model of computation. This leads us to the data stream model of computation, in which some or all of the input data that is to be operated on, is not available for random access from disk or memory, but rather arrives as one or more continuous data streams . Using a small amount of main memory and small update time, we are required to give continuous answers to queries posed over data streams.; This thesis considers two sets of issues in the area of data-stream computation: algorithms and theory, and techniques for runtime resource allocation in systems designed for stream computations.; In the area of algorithms we present space- and time-efficient algorithms for a number of problems in the sliding-window model. In this model only the last N elements, where N is the window size, are relevant to gathering statistics or answering queries. We also propose the computation of Hamming norms over data streams, which is a more general problem than the well-studied distinct value estimation problem, and present efficient algorithms for Hamming-norm computation in the presence of inserts and deletes.; In the area of stream-systems we study two runtime resource allocation problems: operator scheduling and load shedding. We develop an operator-scheduling strategy called Chain that minimizes the total memory requirement of the system. We then study load shedding for aggregations queries, and present optimal algorithms that minimize the inaccuracy in query answers given a load constraint. We present a thorough experimental evaluation for both techniques.
机译:在越来越多的信息处理应用程序中,数据采用“连续数据流”的形式,而不是传统的存储数据库。这些应用程序共享一些独特的功能,例如实时分析的需求,海量数据以及不可预测的突发数据到达。应用领域包括网络流量监控,传感器网络等。对于此类应用程序,简单地将到达的数据加载到传统的数据库管理系统(DBMS)中或将其加载到主存储器中并在此进行操作是不可行的。由于无法使用这些解决方案,因此无法对传统的主内存计算模型中假定的数据元素进行随机访问。这导致我们进入数据流计算模型,其中要操作的一些或全部输入数据不可用于从磁盘或内存中进行随机访问,而是作为一个整体到达或更多连续数据流。使用少量的主内存和较短的更新时间,我们需要对数据流中的查询给出连续的答案。本文考虑了数据流计算领域的两套问题:算法和理论,以及用于流计算的系统中运行时资源分配的技术。在算法领域,我们针对滑动窗口模型中的许多问题提出了节省空间和时间的算法。在此模型中,只有最后 N 个元素(其中 N 是窗口大小)与收集统计信息或回答查询有关。我们还提出了对数据流进行汉明范数的计算,这是比经过充分研究的离散值估计问题更普遍的问题,并且提出了在存在插入和删除的情况下进行汉明范数计算的有效算法。在流系统领域,我们研究了两个运行时资源分配问题:操作员调度和负载削减。我们开发了一种称为 Chain 的操作员调度策略,该策略可最大程度地减少系统的总内存需求。然后,我们研究聚合查询的减载,并提出了最佳算法,可在给定负载约束的情况下最大程度地减少查询答案中的不准确性。我们对这两种技术进行了全面的实验评估。

著录项

  • 作者

    Datar, Mayur.;

  • 作者单位

    Stanford University.;

  • 授予单位 Stanford University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 224 p.
  • 总页数 224
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号