Approximate Lineage for Probabilistic Databases

机译：概率数据库的近似谱系

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In probabilistic databases, lineage is fundamental to both query processing and understanding the data. Current systems s.a. Trio or Mystiq use a complete approach in which the lineage for a tuple t is a Boolean formula which represents all derivations of t. In large databases lineage formulas can become huge: in one public database (the Gene Ontology) we often observed 10MB of lineage (provenance) data for a single tuple. In this paper we propose to use approximate lineage, which is a much smaller formula keeping track of only the most important derivations, which the system can use to process queries and provide explanations. We discuss in detail two specific kinds of approximate lineage: (1) a conservative approximation called sufficient lineage that records the most important derivations for each tuple, and (2) polynomial lineage, which is more aggressive and can provide higher compression ratios, and which is based on Fourier approximations of Boolean expressions. In this paper we define approximate lineage formally, describe algorithms to compute approximate lineage and prove formally their error bounds, and validate our approach experimentally on a real data set.

机译：在概率数据库中，沿袭是查询处理和理解数据的基础。当前系统Trio或Mystiq使用一种完整的方法，其中元组t的谱系是一个布尔公式，表示t的所有派生。在大型数据库中，谱系公式会变得非常庞大：在一个公共数据库（基因本体论）中，我们经常观察到单个元组的10MB谱系（来源）数据。在本文中，我们建议使用近似谱系，该谱系是一个较小的公式，仅跟踪最重要的派生，系统可以使用这些派生来处理查询并提供解释。我们详细讨论了两种特定的近似谱系：（1）称为足够谱系的保守近似，它记录了每个元组的最重要派生，以及（2）多项式谱系，它更具攻击性并且可以提供更高的压缩率，并且基于布尔表达式的傅立叶近似。在本文中，我们正式定义了近似谱系，描述了计算近似谱系并正式证明其误差范围的算法，并在真实数据集上通过实验验证了我们的方法。

著录项

来源
《International conference on very large data bases;VLDB 2008》|2008年|796-807|共12页
会议地点
作者
Christopher Re; Dan Suciu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Approximate probabilistic optimization using exact-capacity-approximate-response-distribution (ECARD) [J] . Sunil Kumar, Richard J. Pippy, Erdem Acar, Structural and Multidisciplinary Optimization . 2009,第6期

机译：使用精确容量近似响应分布（ECARD）的近似概率优化
2. Approximate probabilistic optimization using exact-capacity-approximate-response-distribution (ECARD) [J] . Kumar S, Pippy RJ, Acar E, Structural and multidisciplinary optimization . 2009,第6期

机译：使用精确容量近似响应分布（ECARD）的近似概率优化
3. Digital development: a database of cell lineage differentiation in C. elegans with lineage phenotypes, cell-specific gene functions and a multiscale model [J] . Anthony Santella, David H. Hall, Ismar Kovacevic, Nucleic acids research . 2016,第D1期

机译：数字化发展：秀丽隐杆线虫细胞谱系分化，谱系表型，细胞特异性基因功能和多尺度模型的数据库
4. Approximate Lineage for Probabilistic Databases [C] . Christopher Re, Dan Suciu International conference on very large data bases . 2008

机译：概率数据库的近似血统
5. Approximate Evaluation of Queries for Spatial, Uncertain and Probabilistic Databases. [D] . Chen, Haiquan. 2011

机译：空间，不确定和概率数据库查询的近似评估。
6. Probabilistic Updating of Structural Models for Damage Assessment Using Approximate Bayesian Computation [O] . Zhouquan Feng, Yang Lin, Wenzan Wang, 2020

机译：使用近似贝叶斯计算的损伤评估结构模型的概率更新
7. Approximate lineage for probabilistic databases [O] . Christopher Ré, Dan Suciu 2008

机译：概率数据库的近似谱系

Approximate Lineage for Probabilistic Databases

摘要

著录项

相似文献

相关主题

期刊订阅