Exploiting Shared Correlations in Probabilistic Databases

机译：利用概率数据库中的共享相关性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

There has been a recent surge in work in probabilistic databases, propelled in large part by the huge increase in noisy data sources -from sensor data, experimental data, data from uncurated sources, and many others. There is a growing need for database management systems that can efficiently represent and query such data. In this work, we show how data characteristics can be leveraged to make the query evaluation process more efficient. In particular, we exploit what we refer to as shared correlations where the same uncertainties and correlations occur repeatedly in the data. Shared correlations occur mainly due to two reasons: (1) Uncertainty and correlations usually come from general statistics and rarely vary on a tuple-to-tuple basis; (2) The query evaluation procedure itself tends to re-introduce the same correlations. Prior work has shown that the query evaluation problem on probabilistic databases is equivalent to a probabilistic inference problem on an appropriately constructed probabilistic graphical model (PGM). We leverage this by introducing a new data structure, called the random variable elimination graph (rv-elim graph) that can be built from the PGM obtained from query evaluation. We develop techniques based on bisimulation that can be used to compress the rv-elim graph exploiting the presence of shared correlations in the PGM, the compressed rv-elim graph can then be used to run inference. We validate our methods by evaluating them empirically and show that even with a few shared correlations significant speed-ups are possible.

机译：近年来，概率数据库的工作激增，很大程度上是由噪声数据源（来自传感器数据，实验数据，来自未整理数据源的数据以及其他许多数据）的大量增加所推动的。人们越来越需要能够有效表示和查询此类数据的数据库管理系统。在这项工作中，我们展示了如何利用数据特征来提高查询评估过程的效率。特别是，我们利用所谓的共享相关性，其中相同的不确定性和相关性在数据中反复出现。共有的相关性主要是由于两个原因：（1）不确定性和相关性通常来自一般统计数据，很少在元组之间变化。（2）查询评估过程本身倾向于重新引入相同的相关性。先前的工作表明，概率数据库上的查询评估问题等同于适当构建的概率图形模型（PGM）上的概率推断问题。我们通过引入一种新的数据结构来利用这一点，该数据结构称为随机变量消除图（rv-elim图），该图可以根据从查询评估中获得的PGM构建。我们开发了基于双仿真的技术，该技术可用于利用PGM中共享相关性的存在来压缩rv-elim图，然后可以使用压缩的rv-elim图进行推理。我们通过经验评估来验证我们的方法，并表明即使有一些共享的相关性，也可以显着提高速度。

著录项

来源
《International conference on very large data bases;VLDB 2008》|2008年|808-819|共12页
会议地点
作者
Prithviraj Sen; Amol Deshpande; Lise Getoor;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. PrDB: managing and exploiting rich correlations in probabilistic databases [J] . Prithviraj Sen, Amol Deshpande, Lise Getoor VLDB journal . 2009,第5期

机译：PrDB：管理和利用概率数据库中的丰富关联
2. Quadruple Transfer Learning: Exploiting both shared and non-shared concepts for text classification [J] . Pan Jianhan, Hu Xuegang, Zhang Yuhong, Knowledge-Based Systems . 2015,第DECa期

机译：四重转移学习：利用共享和非共享概念进行文本分类
3. The FBN2 gene: new mutations, locus-specific database (Universal Mutation Database FBN2), and genotype-phenotype correlations. [J] . Frederic MY, Monino C, Marschall C, Human mutation . 2009,第2期

机译：FBN2基因：新突变，基因座特异性数据库（通用突变数据库FBN2）和基因型与表型的相关性。
4. Exploiting Shared Correlations in Probabilistic Databases [C] . Prithviraj Sen, Amol Deshpande, Lise Getoor International conference on very large data bases . 2008

机译：利用概率数据库中的共享相关性
5. Shared Autonomous Vehicles and Older Adults in the Phoenix Metropolitan Area: A Quantitative Correlational Study [D] . Beran, Andrew. 2021

机译：凤凰城大都市区的共用自治车辆和老年人：定量相关研究
6. Shared Regulatory Pathways Reveal Novel Genetic Correlations Between Grip Strength and Neuromuscular Disorders [O] . Sreemol Gokuladhas, William Schierding, David Cameron-Smith, 2020

机译：共享的监管途径揭示了握力和神经肌肉疾病之间的新型遗传相关性。
7. Exploiting shared correlations in probabilistic databases [O] . Prithviraj Sen, Amol Deshpande, Lise Getoor 2008

机译：利用概率数据库中的共享相关性

Exploiting Shared Correlations in Probabilistic Databases

摘要

著录项

相似文献

相关主题

期刊订阅