Sketching Probabilistic Data Streams

机译：绘制概率数据流

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The management of uncertain, probabilistic data has recently emerged as a useful paradigm for dealing with the inherent unreliabilities of several real-world application domains, including data cleaning, information integration, and pervasive, multi-sensor computing. Unlike conventional data sets, a set of probabilistic tuples de- fines a probability distribution over an exponential number of possible worlds (I.e., "grounded", deterministic databases). This "possible worlds" interpretation allows for clean query semantics but also raises hard computational problems for probabilistic database query processors. To further complicate matters, in many scenarios (e.g., large-scale process and environmental monitoring using multiple sensor modalities), probabilistic data tuples arrive and need to be processed in a streaming fashion; that is, using limited memory and CPU resources and without the benefit of multiple passes over a static probabilistic database. Such probabilistic data streams raise a host of new research challenges for stream-processing engines that, to date, remain largely unaddressed. In this paper, we propose the first space- and time-efficient algorithms for approximating complex aggregate queries (including, the number of distinct values and join/self-join sizes) over probabilistic data streams. Following the possible-worlds semantics, such aggregates essentially define probability distributions over the space of possible aggregation results, and our goal is to characterize such distributions through efficient approximations of their key moments (such as expectation and variance). Our algorithms offer strong randomized estimation guarantees while using only sublinear space in the size of the stream(s), and rely on novel, concise streaming sketch synopses that extend conventional sketching ideas to the probabilistic streams setting. Our experimental results verify the effectiveness of our approach.

机译：最近，不确定性，概率数据的管理已成为一种有用的范例，用于处理多个实际应用程序域的内在不可靠性，包括数据清理，信息集成以及普遍的多传感器计算。与常规数据集不同，一组概率元组定义了可能世界指数形式的概率分布（即确定性数据库）。这种“可能的世界”解释允许干净的查询语义，但也给概率数据库查询处理器带来了困难的计算问题。更复杂的是，在许多情况下（例如，使用多个传感器模式进行大规模过程和环境监控），概率数据元组到达并需要以流方式进行处理;也就是说，使用有限的内存和CPU资源，并且没有通过静态概率数据库进行多次传递的好处。这样的概率数据流给流处理引擎提出了许多新的研究挑战，到目前为止，这些挑战仍未解决。在本文中，我们提出了第一种节省时间和空间的算法，用于近似概率数据流上的复杂聚合查询（包括不同值的数量和连接/自连接大小）。遵循可能世界的语义，此类聚合实质上定义了可能的聚合结果空间上的概率分布，我们的目标是通过关键时刻（例如期望和方差）的有效近似来表征此类分布。我们的算法在仅使用流大小的亚线性空间的同时，提供了强大的随机估计保证，并依赖于新颖，简洁的流草图提要，这些提要将常规的草图绘制思想扩展到了概率流设置。我们的实验结果验证了我们方法的有效性。

著录项

来源
《SIGMOD/PODS 2007》|2007年|P.281-292|共12页
会议地点
作者
Graham Cormode; Minos Garofalakis;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词
Data Streams; Uncertain Data.;

机译：数据流;不确定的数据。;

相似文献

外文文献
中文文献
专利

1. FID-sketch: an accurate sketch to store frequencies in data streams [J] . Yang Tong, Zhang Haowei, Wang Hao, World Wide Web . 2019,第6期

机译：FID草图：将频率存储在数据流中的准确草图
2. Diamond Sketch: Accurate Per-flow Measurement for Big Streaming Data [J] . Yang Tong, Gao Siang, Sun Zhouyi, IEEE Transactions on Parallel and Distributed Systems . 2019,第12期

机译：Diamond Sketch：大流量数据的精确每流测量
3. Sketching distributed sliding-window data streams [J] . Papapetrou Odysseas, Garofalakis Minos, Deligiannakis Antonios The VLDB journal . 2015,第3期

机译：绘制分布式滑动窗口数据流
4. Identification of heavy hitters for network data streams with probabilistic sketch [C] . Aiping Zhou, Huisheng Zhu, Lijun Liu, IEEE International Conference on Cloud Computing and Big Data Analysis . 2018

机译：用概率草图识别网络数据流的重击手
5. Streaming and Sketch Algorithms for Large Data NLP. [D] . Goyal, Amit. 2013

机译：大数据NLP的流和草图算法。
6. Streaming histogram sketching for rapid microbiome analytics [O] . Will PM Rowe, Anna Paola Carrieri, Cristina Alcon-Giner, 2019

机译：流式直方图草绘可快速进行微生物组分析
7. Sketch?-metric: Comparing Data Streams via Sketching [O] . Emmanuelle Anceaume, Yann Busnel 2015

机译：sketch？-metric：通过草图比较数据流
8. Probabilistic Stream Relational Algebra: A Data Model for Sensor Data Streams. [R] . Liu, H., Hwang, S., Ivastava, J. 2004

机译：概率流关系代数：传感器数据流的数据模型。

Sketching Probabilistic Data Streams

摘要

著录项

相似文献

相关主题

期刊订阅