Lineage Processing over Correlated Probabilistic Databases

机译：相关概率数据库的谱系处理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we address the problem of scalably evaluating conjunctive queries over correlated probabilistic databases containing tuple or attribute uncertainties. Like previous work, we adopt a two-phase approach where we first compute lineages of the output tuples, and then compute the probabilities of the lineage formulas. However unlike previous work, we allow for arbitrary and complex correlations to be present in the data, captured via a forest of junction trees. We observe that evaluating even read-once (tree structured) lineages (e.g., those generated by hierarchical conjunctive queries), polynomially computable over tuple independent probabilistic databases, is #P-complete for lightly correlated probabilistic databases like Markov sequences. We characterize the complexity of exact computation of the probability of the lineage formula on a correlated database using a parameter called lwidth (analogous to the notion of treewidth). For lineages that result in low lwidth, we compute exact probabilities using a novel message passing algorithm, and for lineages that induce large lwidths, we develop approximate Monte Carlo algorithms to estimate the result probabilities. We scale our algorithms to very large correlated probabilistic databases using the previously proposed INDSEP data structure. To mitigate the complexity of lineage evaluation, we develop optimization techniques to process a batch of lineages by sharing computation across formulas, and to exploit any independence relationships that may exist in the data. Our experimental study illustrates the benefits of using our algorithms for processing lineage formulas over correlated probabilistic databases.

机译：在本文中，我们解决了在包含元组或属性不确定性的相关概率数据库上进行扩展地评估联合查询的问题。与以前的工作一样，我们采用了一种两相方法，在其中我们首先计算输出元组的谱系，然后计算谱系公式的概率。然而，与以前的工作不同，我们允许通过连接树森林捕获的数据中存在任意和复杂的相关性。我们观察到甚至只读（树结构化）谱系（例如，由分层结合查询生成的那些），多项式可在元组织独立的概率数据库上计算，是#p-complete，用于像马尔可夫序列等轻微相关的概率数据库。我们使用名为LWIDTH的参数（类似于树木宽度的概念）来表征与相关数据库上的谱系公式的概率的复杂性。对于导致LWIDTH导致的谱系，我们使用新的消息传递算法计算精确的概率，并且对于诱导大LWIDTH的谱系，我们开发近似蒙特卡罗算法来估计结果概率。我们使用先前提出的INDSEP数据结构将我们的算法扩展为非常大的相关概率数据库。为了缓解谱系评估的复杂性，我们开发通过共享跨公式共享计算来处理一批谱系的优化技术，并利用数据中可能存在的任何独立关系。我们的实验研究说明了利用我们对相关概率数据库处理血统公式的算法的益处。

著录项

来源
《ACM SIGMOD international conference on management of data》|2010年||共12页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
conjunctive queries; indexing; junction trees; lineage; probabilistic databases;

机译：联合疑问;索引;结树;血统;概率数据库;

相似文献

外文文献
中文文献
专利

1. Efficient probabilistic event stream processing with lineage and Kleene-plus [J] . Zhitao Shen, Hideyuki Kawashima, Hiroyuki Kitagawa International journal of communication networks and distributed systems . 2009,第4期

机译：使用沿袭和Kleene-plus进行高效的概率事件流处理
2. Probabilistic top-k range query processing for uncertain databases [J] . Xiao Guoqing, Wu Fan, Zhou Xu, Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2016,第2期

机译：不确定数据库的概率top-k范围查询处理
3. Efficient processing of probabilistic group subspace skyline queries in uncertain databases [J] . Xiang Lian, Lei Chen Information Systems . 2013,第3期

机译：不确定数据库中概率组子空间天际线查询的高效处理
4. Lineage Processing over Correlated Probabilistic Databases [C] . Bhargav Kanagal, Amol Deshpande ACM SIGMOD international conference on management of data;SIGMOD 2010 . 2010

机译：相关概率数据库上的沿袭处理
5. Graph-based data analysis: Tree-structured covariance estimation, prediction by regularized kernel estimation and aggregate database query processing for probabilistic inference. [D] . Bravo, Hector Corrada. 2008

机译：基于图的数据分析：树状协方差估计，通过正则核估计进行预测以及用于概率推断的聚合数据库查询处理。
6. The Effect of Aging on the ERP Correlates of Feedback Processing in the Probabilistic Selection Task [O] . Robert West, AnnMarie Huet 2020

机译：年龄对概率选择任务中反馈处理的ERP相关性的影响
7. Lineage processing over correlated probabilistic databases [O] . Bhargav Kanagal, Amol Deshpande 2010

机译：相关概率数据库上的谱系处理

Lineage Processing over Correlated Probabilistic Databases

摘要

著录项

相似文献

相关主题

期刊订阅