首页> 外文会议>International conference on very large data bases;VLDB 2008 >Exploiting Shared Correlations in Probabilistic Databases
【24h】

Exploiting Shared Correlations in Probabilistic Databases

机译:利用概率数据库中的共享相关性

获取原文

摘要

There has been a recent surge in work in probabilistic databases, propelled in large part by the huge increase in noisy data sources -from sensor data, experimental data, data from uncurated sources, and many others. There is a growing need for database management systems that can efficiently represent and query such data. In this work, we show how data characteristics can be leveraged to make the query evaluation process more efficient. In particular, we exploit what we refer to as shared correlations where the same uncertainties and correlations occur repeatedly in the data. Shared correlations occur mainly due to two reasons: (1) Uncertainty and correlations usually come from general statistics and rarely vary on a tuple-to-tuple basis; (2) The query evaluation procedure itself tends to re-introduce the same correlations. Prior work has shown that the query evaluation problem on probabilistic databases is equivalent to a probabilistic inference problem on an appropriately constructed probabilistic graphical model (PGM). We leverage this by introducing a new data structure, called the random variable elimination graph (rv-elim graph) that can be built from the PGM obtained from query evaluation. We develop techniques based on bisimulation that can be used to compress the rv-elim graph exploiting the presence of shared correlations in the PGM, the compressed rv-elim graph can then be used to run inference. We validate our methods by evaluating them empirically and show that even with a few shared correlations significant speed-ups are possible.
机译:近年来,概率数据库的工作激增,很大程度上是由噪声数据源(来自传感器数据,实验数据,来自未整理数据源的数据以及其他许多数据)的大量增加所推动的。人们越来越需要能够有效表示和查询此类数据的数据库管理系统。在这项工作中,我们展示了如何利用数据特征来提高查询评估过程的效率。特别是,我们利用所谓的共享相关性,其中相同的不确定性和相关性在数据中反复出现。共有的相关性主要是由于两个原因:(1)不确定性和相关性通常来自一般统计数据,很少在元组之间变化。 (2)查询评估过程本身倾向于重新引入相同的相关性。先前的工作表明,概率数据库上的查询评估问题等同于适当构建的概率图形模型(PGM)上的概率推断问题。我们通过引入一种新的数据结构来利用这一点,该数据结构称为随机变量消除图(rv-elim图),该图可以根据从查询评估中获得的PGM构建。我们开发了基于双仿真的技术,该技术可用于利用PGM中共享相关性的存在来压缩rv-elim图,然后可以使用压缩的rv-elim图进行推理。我们通过经验评估来验证我们的方法,并表明即使有一些共享的相关性,也可以显着提高速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号