Algorithms for data stream systems.

机译：数据流系统的算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In a growing number of information-processing applications, data takes the form of “continuous data streams” rather than traditional stored databases. These applications share several distinguishing features like the need for real time analysis, huge volumes of data, and unpredictable and bursty data arrivals. Application areas include network-traffic monitoring, sensor networks, and many more. For such applications, it is not feasible to simply load the arriving data into a traditional database management system (DBMS) or load it into main memory, and operate on it there. The failure to use these solutions precludes random access to data elements that is assumed in the traditional main memory model of computation. This leads us to the data stream model of computation, in which some or all of the input data that is to be operated on, is not available for random access from disk or memory, but rather arrives as one or more continuous data streams . Using a small amount of main memory and small update time, we are required to give continuous answers to queries posed over data streams.; This thesis considers two sets of issues in the area of data-stream computation: algorithms and theory, and techniques for runtime resource allocation in systems designed for stream computations.; In the area of algorithms we present space- and time-efficient algorithms for a number of problems in the sliding-window model. In this model only the last N elements, where N is the window size, are relevant to gathering statistics or answering queries. We also propose the computation of Hamming norms over data streams, which is a more general problem than the well-studied distinct value estimation problem, and present efficient algorithms for Hamming-norm computation in the presence of inserts and deletes.; In the area of stream-systems we study two runtime resource allocation problems: operator scheduling and load shedding. We develop an operator-scheduling strategy called Chain that minimizes the total memory requirement of the system. We then study load shedding for aggregations queries, and present optimal algorithms that minimize the inaccuracy in query answers given a load constraint. We present a thorough experimental evaluation for both techniques.

机译：在越来越多的信息处理应用程序中，数据采用“连续数据流”的形式，而不是传统的存储数据库。这些应用程序共享一些独特的功能，例如实时分析的需求，海量数据以及不可预测的突发数据到达。应用领域包括网络流量监控，传感器网络等。对于此类应用程序，简单地将到达的数据加载到传统的数据库管理系统（DBMS）中或将其加载到主存储器中并在此进行操作是不可行的。由于无法使用这些解决方案，因此无法对传统的主内存计算模型中假定的数据元素进行随机访问。这导致我们进入数据流计算模型，其中要操作的一些或全部输入数据不可用于从磁盘或内存中进行随机访问，而是作为一个整体到达或更多连续数据流。使用少量的主内存和较短的更新时间，我们需要对数据流中的查询给出连续的答案。本文考虑了数据流计算领域的两套问题：算法和理论，以及用于流计算的系统中运行时资源分配的技术。在算法领域，我们针对滑动窗口模型中的许多问题提出了节省空间和时间的算法。在此模型中，只有最后 N 个元素（其中 N 是窗口大小）与收集统计信息或回答查询有关。我们还提出了对数据流进行汉明范数的计算，这是比经过充分研究的离散值估计问题更普遍的问题，并且提出了在存在插入和删除的情况下进行汉明范数计算的有效算法。在流系统领域，我们研究了两个运行时资源分配问题：操作员调度和负载削减。我们开发了一种称为 Chain 的操作员调度策略，该策略可最大程度地减少系统的总内存需求。然后，我们研究聚合查询的减载，并提出了最佳算法，可在给定负载约束的情况下最大程度地减少查询答案中的不准确性。我们对这两种技术进行了全面的实验评估。

著录项

作者
Datar, Mayur.;
展开▼
作者单位

Stanford University.;

展开▼
授予单位 Stanford University.;
学科 Computer Science.
学位 Ph.D.
年度 2004
页码 224 p.
总页数 224
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms [J] . Carnein Matthias, Trautmann Heike Wirtschaftsinformatik . 2019,第3期

机译：优化数据流表示形式：对流聚类算法的广泛调查
2. Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms [J] . Carnein Matthias, Trautmann Heike Business & information systems engineering . 2019,第3期

机译：优化数据流表示：对流群集算法的广泛调查
3. Efficient stream subsequence matching algorithms for handheld devices on streaming time-series data [J] . Yang-Sae Moon, Woong-Kee Loh International Journal of Computer Systems Science & Engineering . 2009,第2期

机译：手持设备在流时间序列数据上的高效流子序列匹配算法
4. EFFICIENT STREAM SEQUENCE MATCHING ALGORITHMS FOR HANDHELD DEVICES ON TIME-SERIES STREAM DATA [C] . Yang-Sae Moon Databases and Applications . 2006

机译：时间序列流数据上手持设备的有效流序列匹配算法
5. Architectures and optimizations for integrating data mining algorithms with database systems. [D] . Thomas, Shiby. 1998

机译：用于将数据挖掘算法与数据库系统集成的体系结构和优化。
6. Verification validation algorithms for data used in critical care decision support systems. [O] . D. Carlson, C. J. Wallace, T. D. East, 1995

机译：重症监护决策支持系统中使用的数据的验证和确认算法。
7. Figure 7: Comparison of storage size change as the results of sorting a set of streaming data by the proposed streaming data sort algorithm, approximate sorting (Farnoud, Yaakobi Bruck, 2016) algorithm and external sorting. [O] . -1

机译：图7：存储大小的比较随着所提出的流数据排序算法，近似排序（Farnoud，Yaakobi＆Bruck，2016）算法和外部分类，将存储尺寸的比较变为分类一组流数据的结果。
8. Distributed Computing for Signal Processing: Modeling of Asynchronous Parallel Computation. Appendix D. Analysis of MIMD (Multiple Instruction Streams, Multiple Data Streams) Algorithms: Features, Measurements, and Results [R] . Smith, K. D. 1984

机译：信号处理的分布式计算：异步并行计算的建模。附录D. mImD（多指令流，多数据流）算法的分析：特征，测量和结果

Algorithms for data stream systems.

摘要

著录项

相似文献

相关主题

期刊订阅