首页> 外文会议>ACM SIGMOD international conference on management of data >Lineage Processing over Correlated Probabilistic Databases
【24h】

Lineage Processing over Correlated Probabilistic Databases

机译:相关概率数据库的谱系处理

获取原文

摘要

In this paper, we address the problem of scalably evaluating conjunctive queries over correlated probabilistic databases containing tuple or attribute uncertainties. Like previous work, we adopt a two-phase approach where we first compute lineages of the output tuples, and then compute the probabilities of the lineage formulas. However unlike previous work, we allow for arbitrary and complex correlations to be present in the data, captured via a forest of junction trees. We observe that evaluating even read-once (tree structured) lineages (e.g., those generated by hierarchical conjunctive queries), polynomially computable over tuple independent probabilistic databases, is #P-complete for lightly correlated probabilistic databases like Markov sequences. We characterize the complexity of exact computation of the probability of the lineage formula on a correlated database using a parameter called lwidth (analogous to the notion of treewidth). For lineages that result in low lwidth, we compute exact probabilities using a novel message passing algorithm, and for lineages that induce large lwidths, we develop approximate Monte Carlo algorithms to estimate the result probabilities. We scale our algorithms to very large correlated probabilistic databases using the previously proposed INDSEP data structure. To mitigate the complexity of lineage evaluation, we develop optimization techniques to process a batch of lineages by sharing computation across formulas, and to exploit any independence relationships that may exist in the data. Our experimental study illustrates the benefits of using our algorithms for processing lineage formulas over correlated probabilistic databases.
机译:在本文中,我们解决了在包含元组或属性不确定性的相关概率数据库上进行扩展地评估联合查询的问题。与以前的工作一样,我们采用了一种两相方法,在其中我们首先计算输出元组的谱系,然后计算谱系公式的概率。然而,与以前的工作不同,我们允许通过连接树森林捕获的数据中存在任意和复杂的相关性。我们观察到甚至只读(树结构化)谱系(例如,由分层结合查询生成的那些),多项式可在元组织独立的概率数据库上计算,是#p-complete,用于像马尔可夫序列等轻微相关的概率数据库。我们使用名为LWIDTH的参数(类似于树木宽度的概念)来表征与相关数据库上的谱系公式的概率的复杂性。对于导致LWIDTH导致的谱系,我们使用新的消息传递算法计算精确的概率,并且对于诱导大LWIDTH的谱系,我们开发近似蒙特卡罗算法来估计结果概率。我们使用先前提出的INDSEP数据结构将我们的算法扩展为非常大的相关概率数据库。为了缓解谱系评估的复杂性,我们开发通过共享跨公式共享计算来处理一批谱系的优化技术,并利用数据中可能存在的任何独立关系。我们的实验研究说明了利用我们对相关概率数据库处理血统公式的算法的益处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号